Tier 4

data_collection

Systematic procedure for gathering research data through surveys, interviews, observation, and secondary sources

Usage in Claude Code: /data_collection your question here

Data Collection

Overview

Systematic procedure for gathering research data through surveys, interviews, observation, and secondary sources

Steps

Step 1: Define data requirements

Determine what data you need to answer the research question:

  1. IDENTIFY CONSTRUCTS

    • What concepts need to be measured?
    • What is the conceptual definition of each?
    • How will you know if you’ve measured it?
  2. SPECIFY VARIABLES For each construct:

    • Name and type (continuous, categorical, etc.)
    • Unit of measurement
    • Expected range of values
    • Required precision
  3. DETERMINE DATA TYPE NEEDED

    Quantitative data:

    • Numerical measurements
    • Counts and frequencies
    • Ratings and scales
    • Best for: Testing hypotheses, measuring magnitude

    Qualitative data:

    • Words, narratives, descriptions
    • Themes and meanings
    • Context and interpretation
    • Best for: Exploration, understanding experience

    Mixed methods:

    • Combine both types
    • Triangulation for validation
    • Complementary insights
  4. ASSESS EXISTING MEASURES

    • Are validated instruments available?
    • What is their reliability and validity?
    • Are they appropriate for your population?
    • Do they need adaptation?
  5. GAP ANALYSIS

    • What measures exist?
    • What needs to be created?
    • What validation is needed?

Step 2: Design sampling strategy

Develop a plan for selecting study participants or units:

PROBABILITY SAMPLING (for generalization):

  1. Simple Random Sampling

    • Every unit has equal chance of selection
    • Use: Homogeneous populations, sampling frame available
    • Method: Random number generator from complete list
    • Pro: Unbiased; Con: Requires complete list
  2. Stratified Random Sampling

    • Divide population into strata, random sample within each
    • Use: Ensure representation of subgroups
    • Method: Sample proportionally or equally from each stratum
    • Pro: Guaranteed representation; Con: Need stratum info
  3. Cluster Sampling

    • Randomly select clusters, then sample within clusters
    • Use: Geographically dispersed populations
    • Method: Random selection of schools, clinics, regions
    • Pro: Practical; Con: Less efficient, design effect
  4. Systematic Sampling

    • Select every kth unit from list
    • Use: When random selection impractical
    • Method: Random start, then every kth
    • Pro: Simple; Con: Risk if list has periodicity

NON-PROBABILITY SAMPLING (for depth, not generalization):

  1. Purposive/Judgment Sampling

    • Researcher selects based on criteria
    • Use: Qualitative research, expert selection
    • Types: Maximum variation, typical case, extreme case
  2. Convenience Sampling

    • Whoever is available
    • Use: Pilot studies, exploratory research
    • Limitation: Cannot generalize
  3. Snowball Sampling

    • Participants recruit others
    • Use: Hidden or hard-to-reach populations
    • Limitation: Network effects, not random
  4. Quota Sampling

    • Fill quotas for subgroups
    • Use: Ensure demographic representation
    • Limitation: Selection within quota not random

SAMPLE SIZE DETERMINATION:

For quantitative:

  • Power analysis (see statistical_analysis procedure)
  • Rule of thumb: 30+ per group for central limit theorem
  • Survey research: Consider response rate inflation

For qualitative:

  • Saturation: Until no new themes emerge
  • Typical range: 12-30 interviews for homogeneous group
  • More for heterogeneous populations

Document:

  • Sampling method and rationale
  • Target sample size with justification
  • Inclusion/exclusion criteria
  • Recruitment procedures

Step 3: Design survey instruments

Create surveys and questionnaires (if applicable):

QUESTION TYPES:

  1. Closed-ended questions

    • Multiple choice: Select one from options
    • Multiple response: Select all that apply
    • Rating scales: Likert, semantic differential
    • Ranking: Order options by preference
    • Pro: Easy to analyze; Con: May miss nuance
  2. Open-ended questions

    • Free text response
    • Pro: Rich data; Con: Harder to analyze
    • Use sparingly; at end of survey

SCALE DESIGN:

Likert scales:

  • Typically 5 or 7 points
  • Balanced (equal positive/negative options)
  • Include neutral midpoint (usually)
  • Label all points or just endpoints

Examples:

  • 5-point agreement: Strongly disagree to Strongly agree
  • 5-point frequency: Never to Always
  • 7-point satisfaction: Extremely dissatisfied to Extremely satisfied

QUESTION WRITING RULES:

DO:

  • Use simple, clear language
  • Ask one thing per question (no double-barreled)
  • Define terms that might be ambiguous
  • Ensure response options are exhaustive and exclusive
  • Include “Not applicable” or “Don’t know” when appropriate

DON’T:

  • Use leading or loaded questions
  • Use double negatives
  • Assume knowledge respondent may not have
  • Use jargon or technical terms without definition
  • Make questions too long

SURVEY STRUCTURE:

  1. Introduction

    • Purpose of study
    • Time estimate
    • Confidentiality statement
    • Consent information
  2. Screening questions

    • Verify eligibility
    • Route to appropriate sections
  3. Main questions

    • Start with easy, engaging questions
    • Group by topic
    • Place sensitive questions later
    • Logical flow within sections
  4. Demographics

    • Usually at end (unless needed for screening)
    • Only collect what’s needed
  5. Closing

    • Thank participant
    • Provide contact for questions

PILOT TESTING:

  • Test with 5-10 similar respondents
  • Check: Comprehension, timing, technical issues
  • Cognitive interviews: Ask respondents to think aloud
  • Revise based on feedback

Step 4: Design interview and focus group protocols

Create protocols for qualitative data collection (if applicable):

INTERVIEW TYPES:

  1. Structured interviews

    • Fixed questions in fixed order
    • All participants get same questions
    • Use: Systematic data for comparison
    • Easier to analyze; less depth
  2. Semi-structured interviews

    • Guide with key topics/questions
    • Flexibility to probe and follow-up
    • Use: Balance structure and depth
    • Most common in qualitative research
  3. Unstructured interviews

    • Minimal predetermined questions
    • Conversation-like, participant-led
    • Use: Exploratory, understanding lived experience
    • Hardest to analyze; richest data

INTERVIEW GUIDE DESIGN:

Structure:

  1. Introduction (5 min)

    • Introduce yourself and study
    • Explain process, recording, confidentiality
    • Obtain consent
    • Warm-up question (easy, rapport-building)
  2. Main questions (majority of time)

    • Start broad, then narrow
    • Use open-ended questions
    • Group by theme
    • Include probes for each question
  3. Closing (5 min)

    • Wrap-up question (“Anything else?”)
    • Thank participant
    • Explain next steps

QUESTION TYPES FOR INTERVIEWS:

  • Grand tour: “Tell me about your experience with…”
  • Specific: “What happened when…?”
  • Contrast: “How does X compare to Y?”
  • Hypothetical: “What would happen if…?”
  • Devil’s advocate: “Some people say… What do you think?”

PROBES (follow-up techniques):

  • Elaboration: “Tell me more about that”
  • Clarification: “What do you mean by…?”
  • Example: “Can you give me an example?”
  • Silence: Pause and wait for more
  • Echo: Repeat last phrase as question

FOCUS GROUP CONSIDERATIONS:

Group composition:

  • Typically 6-10 participants
  • Homogeneous enough for comfort
  • Heterogeneous enough for diverse views

Moderator guide:

  • Opening activity/icebreaker
  • 5-6 main discussion questions
  • Activities if appropriate (card sorts, etc.)
  • Plan for dominant/quiet participants

Logistics:

  • Room setup (circle or U-shape)
  • Recording equipment (audio, video)
  • Note-taker role
  • Refreshments (builds rapport)

Step 5: Plan observational methods

Design systematic observation protocols (if applicable):

OBSERVATION TYPES:

  1. Participant observation

    • Researcher participates in setting
    • Insider perspective
    • Use: Ethnography, understanding culture
    • Challenge: Maintain objectivity
  2. Non-participant observation

    • Researcher observes without participating
    • Outsider perspective
    • Use: Behavior coding, unobtrusive measurement
    • Challenge: May miss context
  3. Structured observation

    • Predetermined behaviors to observe
    • Coding scheme with categories
    • Quantifiable data
    • Use: Testing specific hypotheses
  4. Unstructured observation

    • Open-ended field notes
    • Emergent categories
    • Qualitative data
    • Use: Exploration, discovery

DEVELOPING OBSERVATION PROTOCOL:

  1. Define what to observe

    • Specific behaviors, events, interactions
    • Physical environment
    • Time and duration
  2. Create coding scheme (for structured)

    • Mutually exclusive categories
    • Exhaustive (covers all possibilities)
    • Operational definitions for each code
    • Examples and non-examples
  3. Determine recording method

    • Real-time coding: Code as events occur
    • Interval recording: Code at fixed intervals
    • Event recording: Code each occurrence
    • Narrative: Write field notes
  4. Plan observation schedule

    • When and where to observe
    • Duration of sessions
    • Number of sessions needed
    • Sampling of times/locations

FIELD NOTES:

Include:

  • Date, time, location, duration
  • Who was present
  • What happened (descriptive)
  • Your interpretations (separate from description)
  • Questions and reflections

Write up:

  • During observation: Brief notes
  • Immediately after: Expand to full notes
  • Regular memos: Emerging themes and insights

INTER-RATER RELIABILITY:

For structured observation:

  • Train multiple observers
  • Have them code same sessions
  • Calculate agreement (Cohen’s kappa)
  • Target: kappa > 0.80
  • Resolve disagreements through discussion

Step 6: Identify and evaluate secondary data

Assess and acquire existing data sources (if applicable):

TYPES OF SECONDARY DATA:

  1. Official statistics

    • Government data (census, BLS, CDC)
    • International organizations (World Bank, UN)
    • Pro: Large scale, authoritative
    • Con: May not match your exact needs
  2. Administrative data

    • Records created for operational purposes
    • Health records, school records, transaction logs
    • Pro: Covers entire population, objective
    • Con: Not designed for research, access issues
  3. Survey data archives

    • Data from previous surveys
    • ICPSR, Pew, GSS, etc.
    • Pro: Often high quality, documented
    • Con: Variables may not match your needs
  4. Research datasets

    • Data from previous studies
    • Shared by researchers for replication
    • Pro: Relevant to your field
    • Con: May have restrictions
  5. Commercial data

    • Purchased from data vendors
    • Market research, consumer behavior
    • Pro: Detailed, proprietary
    • Con: Expensive, unclear methodology

EVALUATION CRITERIA:

  1. Relevance

    • Does it measure what you need?
    • Does it cover your population?
    • Is the time period appropriate?
  2. Quality

    • How was data collected?
    • What is the sample size and response rate?
    • Are there known quality issues?
    • Is methodology documented?
  3. Availability

    • Is data publicly available?
    • What are access restrictions?
    • Cost to acquire?
    • Data use agreements required?
  4. Documentation

    • Is there a codebook?
    • Are variables well-defined?
    • Is methodology described?
    • Are limitations documented?

COMBINING DATA SOURCES:

  • Ensure compatible definitions
  • Harmonize coding schemes
  • Document any transformations
  • Be aware of different error structures

Step 7: Develop quality assurance procedures

Plan for ensuring data quality throughout collection:

PREVENTION (before/during collection):

  1. Training

    • Train all data collectors
    • Standardize procedures
    • Practice with mock sessions
    • Certification/testing
  2. Standardization

    • Written protocols for all procedures
    • Scripts for surveys/interviews
    • Decision rules for ambiguous situations
  3. Built-in checks

    • Validation rules in electronic surveys
    • Range checks (flag out-of-range values)
    • Consistency checks (flag contradictions)
    • Attention checks (catch inattentive respondents)

MONITORING (during collection):

  1. Progress tracking

    • Monitor response rates
    • Track completion times
    • Identify problematic items/sections
  2. Ongoing review

    • Review first batch of data closely
    • Check for interviewer effects
    • Spot-check recordings/transcripts
  3. Regular calibration

    • Periodic retraining
    • Inter-rater reliability checks
    • Address drift in coding

CLEANING (after collection):

  1. Data validation

    • Check for duplicates
    • Verify range constraints
    • Check skip logic executed correctly
    • Validate against external data if possible
  2. Missing data assessment

    • Extent of missingness
    • Patterns (random vs. systematic)
    • Reasons for missingness
  3. Outlier identification

    • Statistical detection methods
    • Investigate flagged values
    • Decide: Correct, exclude, or retain
  4. Documentation

    • Data dictionary with all variables
    • Record all cleaning decisions
    • Version control for datasets

RESPONSE RATE AND NON-RESPONSE:

  • Track response rate: Completed / Eligible contacted
  • Analyze non-response: Do responders differ from non-responders?
  • Non-response bias analysis if possible
  • Document efforts to maximize response

Step 8: Address ethical considerations

Ensure ethical data collection:

CORE ETHICAL PRINCIPLES:

  1. Respect for persons (autonomy)

    • Informed consent
    • Voluntary participation
    • Right to withdraw
    • Protect those with diminished autonomy
  2. Beneficence

    • Maximize benefits
    • Minimize harm
    • Balance risks and benefits
  3. Justice

    • Fair selection of participants
    • Equitable distribution of burdens and benefits
    • No exploitation of vulnerable groups

INFORMED CONSENT:

Must include:

  • Purpose of research
  • Procedures involved
  • Duration of participation
  • Risks and benefits
  • Confidentiality protections
  • Voluntary nature, right to withdraw
  • Contact information for questions

Documentation:

  • Written consent for most research
  • Verbal consent with waiver if justified
  • Online: Click-through consent
  • Keep consent forms separate from data

Special populations:

  • Children: Parental consent + child assent
  • Cognitively impaired: Legally authorized representative
  • Prisoners: Special protections required

CONFIDENTIALITY AND PRIVACY:

Data protection:

  • De-identify data (remove names, IDs)
  • Secure storage (encryption, access controls)
  • Limited access (need-to-know basis)
  • Secure disposal when no longer needed

Anonymity vs. confidentiality:

  • Anonymous: You don’t know who provided data
  • Confidential: You know but protect identity

Reporting:

  • Never report individual-level identifying data
  • Suppress small cells in tables
  • Use pseudonyms in qualitative reports

IRB/ETHICS REVIEW:

Categories:

  • Exempt: Minimal risk, standard procedures
  • Expedited: Minimal risk, more complex
  • Full board: Greater than minimal risk

Prepare:

  • Protocol describing all procedures
  • Consent documents
  • Data collection instruments
  • Data security plan

Ongoing:

  • Report adverse events
  • Get approval for protocol changes
  • Annual renewal if continuing

When to Use

  • Planning primary data collection for research
  • Designing surveys or questionnaires
  • Preparing for interviews or focus groups
  • Setting up observational studies
  • Acquiring and evaluating secondary data
  • Determining sampling strategy for a study
  • Validating existing measures or creating new ones

Verification

  • Data requirements clearly linked to research questions
  • Sampling strategy appropriate and justified
  • Instruments measure intended constructs
  • Quality assurance procedures in place
  • Ethical requirements addressed
  • All procedures documented for reproducibility

Input: $ARGUMENTS

Apply this procedure to the input provided.