Data Collection
Overview
Systematic procedure for gathering research data through surveys, interviews, observation, and secondary sources
Steps
Step 1: Define data requirements
Determine what data you need to answer the research question:
-
IDENTIFY CONSTRUCTS
- What concepts need to be measured?
- What is the conceptual definition of each?
- How will you know if you’ve measured it?
-
SPECIFY VARIABLES For each construct:
- Name and type (continuous, categorical, etc.)
- Unit of measurement
- Expected range of values
- Required precision
-
DETERMINE DATA TYPE NEEDED
Quantitative data:
- Numerical measurements
- Counts and frequencies
- Ratings and scales
- Best for: Testing hypotheses, measuring magnitude
Qualitative data:
- Words, narratives, descriptions
- Themes and meanings
- Context and interpretation
- Best for: Exploration, understanding experience
Mixed methods:
- Combine both types
- Triangulation for validation
- Complementary insights
-
ASSESS EXISTING MEASURES
- Are validated instruments available?
- What is their reliability and validity?
- Are they appropriate for your population?
- Do they need adaptation?
-
GAP ANALYSIS
- What measures exist?
- What needs to be created?
- What validation is needed?
Step 2: Design sampling strategy
Develop a plan for selecting study participants or units:
PROBABILITY SAMPLING (for generalization):
-
Simple Random Sampling
- Every unit has equal chance of selection
- Use: Homogeneous populations, sampling frame available
- Method: Random number generator from complete list
- Pro: Unbiased; Con: Requires complete list
-
Stratified Random Sampling
- Divide population into strata, random sample within each
- Use: Ensure representation of subgroups
- Method: Sample proportionally or equally from each stratum
- Pro: Guaranteed representation; Con: Need stratum info
-
Cluster Sampling
- Randomly select clusters, then sample within clusters
- Use: Geographically dispersed populations
- Method: Random selection of schools, clinics, regions
- Pro: Practical; Con: Less efficient, design effect
-
Systematic Sampling
- Select every kth unit from list
- Use: When random selection impractical
- Method: Random start, then every kth
- Pro: Simple; Con: Risk if list has periodicity
NON-PROBABILITY SAMPLING (for depth, not generalization):
-
Purposive/Judgment Sampling
- Researcher selects based on criteria
- Use: Qualitative research, expert selection
- Types: Maximum variation, typical case, extreme case
-
Convenience Sampling
- Whoever is available
- Use: Pilot studies, exploratory research
- Limitation: Cannot generalize
-
Snowball Sampling
- Participants recruit others
- Use: Hidden or hard-to-reach populations
- Limitation: Network effects, not random
-
Quota Sampling
- Fill quotas for subgroups
- Use: Ensure demographic representation
- Limitation: Selection within quota not random
SAMPLE SIZE DETERMINATION:
For quantitative:
- Power analysis (see statistical_analysis procedure)
- Rule of thumb: 30+ per group for central limit theorem
- Survey research: Consider response rate inflation
For qualitative:
- Saturation: Until no new themes emerge
- Typical range: 12-30 interviews for homogeneous group
- More for heterogeneous populations
Document:
- Sampling method and rationale
- Target sample size with justification
- Inclusion/exclusion criteria
- Recruitment procedures
Step 3: Design survey instruments
Create surveys and questionnaires (if applicable):
QUESTION TYPES:
-
Closed-ended questions
- Multiple choice: Select one from options
- Multiple response: Select all that apply
- Rating scales: Likert, semantic differential
- Ranking: Order options by preference
- Pro: Easy to analyze; Con: May miss nuance
-
Open-ended questions
- Free text response
- Pro: Rich data; Con: Harder to analyze
- Use sparingly; at end of survey
SCALE DESIGN:
Likert scales:
- Typically 5 or 7 points
- Balanced (equal positive/negative options)
- Include neutral midpoint (usually)
- Label all points or just endpoints
Examples:
- 5-point agreement: Strongly disagree to Strongly agree
- 5-point frequency: Never to Always
- 7-point satisfaction: Extremely dissatisfied to Extremely satisfied
QUESTION WRITING RULES:
DO:
- Use simple, clear language
- Ask one thing per question (no double-barreled)
- Define terms that might be ambiguous
- Ensure response options are exhaustive and exclusive
- Include “Not applicable” or “Don’t know” when appropriate
DON’T:
- Use leading or loaded questions
- Use double negatives
- Assume knowledge respondent may not have
- Use jargon or technical terms without definition
- Make questions too long
SURVEY STRUCTURE:
-
Introduction
- Purpose of study
- Time estimate
- Confidentiality statement
- Consent information
-
Screening questions
- Verify eligibility
- Route to appropriate sections
-
Main questions
- Start with easy, engaging questions
- Group by topic
- Place sensitive questions later
- Logical flow within sections
-
Demographics
- Usually at end (unless needed for screening)
- Only collect what’s needed
-
Closing
- Thank participant
- Provide contact for questions
PILOT TESTING:
- Test with 5-10 similar respondents
- Check: Comprehension, timing, technical issues
- Cognitive interviews: Ask respondents to think aloud
- Revise based on feedback
Step 4: Design interview and focus group protocols
Create protocols for qualitative data collection (if applicable):
INTERVIEW TYPES:
-
Structured interviews
- Fixed questions in fixed order
- All participants get same questions
- Use: Systematic data for comparison
- Easier to analyze; less depth
-
Semi-structured interviews
- Guide with key topics/questions
- Flexibility to probe and follow-up
- Use: Balance structure and depth
- Most common in qualitative research
-
Unstructured interviews
- Minimal predetermined questions
- Conversation-like, participant-led
- Use: Exploratory, understanding lived experience
- Hardest to analyze; richest data
INTERVIEW GUIDE DESIGN:
Structure:
-
Introduction (5 min)
- Introduce yourself and study
- Explain process, recording, confidentiality
- Obtain consent
- Warm-up question (easy, rapport-building)
-
Main questions (majority of time)
- Start broad, then narrow
- Use open-ended questions
- Group by theme
- Include probes for each question
-
Closing (5 min)
- Wrap-up question (“Anything else?”)
- Thank participant
- Explain next steps
QUESTION TYPES FOR INTERVIEWS:
- Grand tour: “Tell me about your experience with…”
- Specific: “What happened when…?”
- Contrast: “How does X compare to Y?”
- Hypothetical: “What would happen if…?”
- Devil’s advocate: “Some people say… What do you think?”
PROBES (follow-up techniques):
- Elaboration: “Tell me more about that”
- Clarification: “What do you mean by…?”
- Example: “Can you give me an example?”
- Silence: Pause and wait for more
- Echo: Repeat last phrase as question
FOCUS GROUP CONSIDERATIONS:
Group composition:
- Typically 6-10 participants
- Homogeneous enough for comfort
- Heterogeneous enough for diverse views
Moderator guide:
- Opening activity/icebreaker
- 5-6 main discussion questions
- Activities if appropriate (card sorts, etc.)
- Plan for dominant/quiet participants
Logistics:
- Room setup (circle or U-shape)
- Recording equipment (audio, video)
- Note-taker role
- Refreshments (builds rapport)
Step 5: Plan observational methods
Design systematic observation protocols (if applicable):
OBSERVATION TYPES:
-
Participant observation
- Researcher participates in setting
- Insider perspective
- Use: Ethnography, understanding culture
- Challenge: Maintain objectivity
-
Non-participant observation
- Researcher observes without participating
- Outsider perspective
- Use: Behavior coding, unobtrusive measurement
- Challenge: May miss context
-
Structured observation
- Predetermined behaviors to observe
- Coding scheme with categories
- Quantifiable data
- Use: Testing specific hypotheses
-
Unstructured observation
- Open-ended field notes
- Emergent categories
- Qualitative data
- Use: Exploration, discovery
DEVELOPING OBSERVATION PROTOCOL:
-
Define what to observe
- Specific behaviors, events, interactions
- Physical environment
- Time and duration
-
Create coding scheme (for structured)
- Mutually exclusive categories
- Exhaustive (covers all possibilities)
- Operational definitions for each code
- Examples and non-examples
-
Determine recording method
- Real-time coding: Code as events occur
- Interval recording: Code at fixed intervals
- Event recording: Code each occurrence
- Narrative: Write field notes
-
Plan observation schedule
- When and where to observe
- Duration of sessions
- Number of sessions needed
- Sampling of times/locations
FIELD NOTES:
Include:
- Date, time, location, duration
- Who was present
- What happened (descriptive)
- Your interpretations (separate from description)
- Questions and reflections
Write up:
- During observation: Brief notes
- Immediately after: Expand to full notes
- Regular memos: Emerging themes and insights
INTER-RATER RELIABILITY:
For structured observation:
- Train multiple observers
- Have them code same sessions
- Calculate agreement (Cohen’s kappa)
- Target: kappa > 0.80
- Resolve disagreements through discussion
Step 6: Identify and evaluate secondary data
Assess and acquire existing data sources (if applicable):
TYPES OF SECONDARY DATA:
-
Official statistics
- Government data (census, BLS, CDC)
- International organizations (World Bank, UN)
- Pro: Large scale, authoritative
- Con: May not match your exact needs
-
Administrative data
- Records created for operational purposes
- Health records, school records, transaction logs
- Pro: Covers entire population, objective
- Con: Not designed for research, access issues
-
Survey data archives
- Data from previous surveys
- ICPSR, Pew, GSS, etc.
- Pro: Often high quality, documented
- Con: Variables may not match your needs
-
Research datasets
- Data from previous studies
- Shared by researchers for replication
- Pro: Relevant to your field
- Con: May have restrictions
-
Commercial data
- Purchased from data vendors
- Market research, consumer behavior
- Pro: Detailed, proprietary
- Con: Expensive, unclear methodology
EVALUATION CRITERIA:
-
Relevance
- Does it measure what you need?
- Does it cover your population?
- Is the time period appropriate?
-
Quality
- How was data collected?
- What is the sample size and response rate?
- Are there known quality issues?
- Is methodology documented?
-
Availability
- Is data publicly available?
- What are access restrictions?
- Cost to acquire?
- Data use agreements required?
-
Documentation
- Is there a codebook?
- Are variables well-defined?
- Is methodology described?
- Are limitations documented?
COMBINING DATA SOURCES:
- Ensure compatible definitions
- Harmonize coding schemes
- Document any transformations
- Be aware of different error structures
Step 7: Develop quality assurance procedures
Plan for ensuring data quality throughout collection:
PREVENTION (before/during collection):
-
Training
- Train all data collectors
- Standardize procedures
- Practice with mock sessions
- Certification/testing
-
Standardization
- Written protocols for all procedures
- Scripts for surveys/interviews
- Decision rules for ambiguous situations
-
Built-in checks
- Validation rules in electronic surveys
- Range checks (flag out-of-range values)
- Consistency checks (flag contradictions)
- Attention checks (catch inattentive respondents)
MONITORING (during collection):
-
Progress tracking
- Monitor response rates
- Track completion times
- Identify problematic items/sections
-
Ongoing review
- Review first batch of data closely
- Check for interviewer effects
- Spot-check recordings/transcripts
-
Regular calibration
- Periodic retraining
- Inter-rater reliability checks
- Address drift in coding
CLEANING (after collection):
-
Data validation
- Check for duplicates
- Verify range constraints
- Check skip logic executed correctly
- Validate against external data if possible
-
Missing data assessment
- Extent of missingness
- Patterns (random vs. systematic)
- Reasons for missingness
-
Outlier identification
- Statistical detection methods
- Investigate flagged values
- Decide: Correct, exclude, or retain
-
Documentation
- Data dictionary with all variables
- Record all cleaning decisions
- Version control for datasets
RESPONSE RATE AND NON-RESPONSE:
- Track response rate: Completed / Eligible contacted
- Analyze non-response: Do responders differ from non-responders?
- Non-response bias analysis if possible
- Document efforts to maximize response
Step 8: Address ethical considerations
Ensure ethical data collection:
CORE ETHICAL PRINCIPLES:
-
Respect for persons (autonomy)
- Informed consent
- Voluntary participation
- Right to withdraw
- Protect those with diminished autonomy
-
Beneficence
- Maximize benefits
- Minimize harm
- Balance risks and benefits
-
Justice
- Fair selection of participants
- Equitable distribution of burdens and benefits
- No exploitation of vulnerable groups
INFORMED CONSENT:
Must include:
- Purpose of research
- Procedures involved
- Duration of participation
- Risks and benefits
- Confidentiality protections
- Voluntary nature, right to withdraw
- Contact information for questions
Documentation:
- Written consent for most research
- Verbal consent with waiver if justified
- Online: Click-through consent
- Keep consent forms separate from data
Special populations:
- Children: Parental consent + child assent
- Cognitively impaired: Legally authorized representative
- Prisoners: Special protections required
CONFIDENTIALITY AND PRIVACY:
Data protection:
- De-identify data (remove names, IDs)
- Secure storage (encryption, access controls)
- Limited access (need-to-know basis)
- Secure disposal when no longer needed
Anonymity vs. confidentiality:
- Anonymous: You don’t know who provided data
- Confidential: You know but protect identity
Reporting:
- Never report individual-level identifying data
- Suppress small cells in tables
- Use pseudonyms in qualitative reports
IRB/ETHICS REVIEW:
Categories:
- Exempt: Minimal risk, standard procedures
- Expedited: Minimal risk, more complex
- Full board: Greater than minimal risk
Prepare:
- Protocol describing all procedures
- Consent documents
- Data collection instruments
- Data security plan
Ongoing:
- Report adverse events
- Get approval for protocol changes
- Annual renewal if continuing
When to Use
- Planning primary data collection for research
- Designing surveys or questionnaires
- Preparing for interviews or focus groups
- Setting up observational studies
- Acquiring and evaluating secondary data
- Determining sampling strategy for a study
- Validating existing measures or creating new ones
Verification
- Data requirements clearly linked to research questions
- Sampling strategy appropriate and justified
- Instruments measure intended constructs
- Quality assurance procedures in place
- Ethical requirements addressed
- All procedures documented for reproducibility
Input: $ARGUMENTS
Apply this procedure to the input provided.