Research Methodology

Routing Summary

This folder covers statistical methodology, the replication crisis, causal inference challenges, experimental design, and longitudinal methods. Contains 14 notes plus an Experimental Design subfolder (3 notes) and a Pre-registration and Open Science subfolder (5 notes).

Concept Map

ConceptNoteTypeDepends OnKey Result
Multiple comparisons without explicit p-hackingGarden of Forking PathsconceptThe Experimental Ideal, Research Questions in EconometricsData-contingent analysis invalidates p-values even without p-hacking
Sources of analytic flexibility inflating false positivesResearcher Degrees of FreedomconceptGarden of Forking Paths, The Experimental Ideal, Omitted Variables BiasEvery analytic choice is a hidden comparison
Bayesian/hierarchical solutions to multiplicityForking Paths and Bayesian ApproachesconceptGarden of Forking Paths, Researcher Degrees of Freedom, Multiple Testing CorrectionsHierarchical models naturally regularize multiple comparisons
Multilevel models as structural multiple comparisons solutionMultiple Comparisons - Bayesian PerspectiveoverviewMultiple Testing Corrections, Hierarchical Models, Garden of Forking PathsPartial pooling replaces classical corrections; adapts to variance ratio
Observational methods overestimate ad effectsActivity Bias in AdvertisingconceptConditional Independence Assumption, The Selection Problem, The Experimental Ideal, Omitted Variables BiasActivity bias causes 10-1000x overestimation
Why regression and matching fail for adsObservational vs Experimental Methods in AdvertisingconceptActivity Bias in Advertising, The Selection Problem, Conditional Independence Assumption, Regression and the CEF, Instrumental VariablesNo observational method recovers true ad effect
Within/between-persons and causal inferenceWithin-Between Persons Distinction - OverviewoverviewPotential Outcomes Framework, The Selection ProblemWithin/between distinction informative but not decisive; start from estimands
When within-persons data helps for causal inferenceWithin-Between Persons Causal InferenceconceptPotential Outcomes Framework, The Selection ProblemBetween-persons (RCT) recovers ATE; within-persons eliminates time-invariant confounders but not time-varying
Fixed-effects modelFixed-Effects ModelconceptDirected Acyclic Graphs, Within-Between Persons Causal InferenceControls time-invariant confounders; assumes no lagged dynamics, no time-varying confounders
Cross-lagged and dynamic panel modelsCross-Lagged and Dynamic Panel ModelsconceptFixed-Effects Model, Directed Acyclic GraphsCLPM targets lagged reciprocal effects; DPM adds time-invariant confounding control; both assume no contemporaneous effects
Estimands in longitudinal researchEstimands in Longitudinal ResearchconceptPotential Outcomes Framework, Causal EstimandsDefine estimand before model; psychological constructs are “fat-handed”; consistency violations common
Table 2 FallacyTable 2 FallacyconceptDAGs and Causal Identification, Potential Outcomes FrameworkAdjustment set identifies one causal path only; confounder coefficients are almost always biased
Logic of regression adjustmentLogic of Regression AdjustmentconceptPotential Outcomes Framework, DAGs and Causal IdentificationRegression adjustment recovers PATE for treatment only; adjustment set is a sacrifice, not a co-effect estimator
Nuisance parameter bias (simulation)Nuisance Parameter Bias SimulationexampleTable 2 Fallacy, Logic of Regression Adjustment90% CI coverage 0–29% for confounders; bias does not shrink as

Sub-topics

  • Experimental Design — Power analysis, multiple testing corrections, survival analysis (3 notes)
  • Pre-registration and Open Science — Prediction vs postdiction, pre-registration vs registered reports, pre-analysis plans & the OSF/AsPredicted/AEA ecosystem, limits & objections (5 notes; Nosek et al. 2018)

Notes

  • Garden of Forking Paths — CONTAINS: Multiple comparisons as implicit forking, data-contingent analysis, why p-values are invalid when analysis is flexible
  • Researcher Degrees of Freedom — CONTAINS: Sources of analytic flexibility, exclusion criteria, variable transformations, model specification choices
  • Forking Paths and Bayesian Approaches — CONTAINS: Bayesian solutions to multiplicity, hierarchical regularization, partial pooling as natural correction
  • Activity Bias in Advertising — CONTAINS: Three experiments showing observational methods fail, 10-1000x overestimation, selection bias in ad measurement
  • Observational vs Experimental Methods in Advertising — CONTAINS: Regression controls and matching failing, case studies from Lewis/Rao/Reiley experiments
  • Multiple Comparisons - Bayesian Perspective — CONTAINS: Argument against classical corrections, IHDP multi-site example, state test scores, 8 schools simulation, fishing for significance, subgroup effects, multiple outcomes
  • Within-Between Persons Distinction - Overview — CONTAINS: 3 main claims (between-persons can inform ATE; within-persons not sufficient; within-persons can be helpful), 3 longitudinal models table, central recommendation to start from estimands
  • Within-Between Persons Causal Inference — CONTAINS: potential outcomes proof that between-persons RCT recovers ATE; time-varying confounders problem in FE; 3 reasons within-persons is still helpful; confounding at each level table
  • Fixed-Effects Model — CONTAINS: FE causal DAG (Box 1), what FE controls/doesn’t control table, 3 causal assumptions for identification, 5 limitations (lagged dynamics, time-varying confounders, heterogeneous slopes, reciprocal dynamics, consistency)
  • Cross-Lagged and Dynamic Panel Models — CONTAINS: CLPM definition + Granger causality, CLPM bias from stable traits (Box 2), DPM/RI-CLPM definition (Box 3), comparison table (FE vs CLPM vs DPM), shared assumption of no contemporaneous effects, time lag misspecification
  • Estimands in Longitudinal Research — CONTAINS: theoretical estimand definition, recommended 5-step workflow (estimand → assumptions → plausibility → model → interpretation), consistency challenge for psychological constructs (Box 4), fat-handed treatments, causal web problem
  • Table 2 Fallacy — CONTAINS: definition (Westreich & Greenland 2013), core argument for non-joint identification, theorem on single-path identification, downstream scientific consequences, conditions under which multiple paths can be jointly identified
  • Logic of Regression Adjustment — CONTAINS: potential outcomes setup, PATE definition, what adjustment identifies vs. doesn’t, single-path theorem, “adjustment set as sacrifice” framing, strategies for identifying multiple paths
  • Nuisance Parameter Bias Simulation — CONTAINS: DAG DGP with 4 measured confounders + 2 unobserved confounders, R/Python data simulation code, full Stan model, 90% CI coverage table (5 parameters × 2 conditions × 3 sample sizes), error-of-magnitude results, “big data is not a substitute” finding

Sources

  • p_hacking.pdf — “The Garden of Forking Paths” (Gelman & Loken, 2013)
  • ssrn-2080235.pdf — “Here, There, and Everywhere” (Lewis, Rao, & Reiley, 2011)
  • multiple2f.pdf — “Why we (usually) don’t have to worry about multiple comparisons” (Gelman, Hill & Yajima, 2009)
  • rohrer-murayama-2023.pdf — “These Are Not the Effects You Are Looking For” (Rohrer & Murayama, 2023, AMPPS 6(1))
  • These Are Not the Effects You Are Looking For — A. Jordan Nafa (2022), blog post: Table 2 Fallacy, logic of statistical control, mutual adjustment, simulation study with R/Python/Stan demonstrating nuisance parameter bias (2026-06-26)

See Also