What Is Specification Shopping?
Specification shopping is the practice of running many model variations and only reporting the ones that produce results stakeholders expect to see. It is the single biggest threat to the integrity of marketing measurement.
- Analyst runs 30 model variations
- Only 3 show TV ROI above 1.0
- Those 3 are presented as “the model”
- Results look precise but are selection artifacts
- Budget decisions made on false confidence
- Model specification defined before seeing results
- Uncertainty honestly reported as probability ranges
- All reasonable models considered, not just “good” ones
- Sensitivity to modeling choices tested and reported
- Budget decisions account for genuine uncertainty
Why This Matters
When you test many specifications and select based on results, standard statistical inference is invalidated. The confidence intervals no longer mean what they claim to mean. The process systematically selects for confirming rather than disconfirming evidence.
Who Is Affected
Specification shopping is pervasive across the marketing measurement ecosystem. Each role faces distinct pressures that encourage the practice and distinct consequences when it fails.
Ad Agencies
Under pressure to demonstrate ROI for the media they placed. Models that show poor performance threaten client relationships and revenue.
Model Shops
Hired to build models that “make sense.” When results contradict expectations, there is pressure to adjust until they align with priors.
Brands & Advertisers
Rely on model outputs to allocate millions in media spend. False precision from specification shopping leads to misallocated budgets.
For Ad Agencies
Agencies face a structural conflict of interest: they are often asked to measure the effectiveness of media they themselves placed. This creates subtle but powerful incentives to find positive results.
The Incentive Problem
When your revenue depends on clients continuing to spend on channels you manage, models that show low ROI for those channels threaten your business. This creates unconscious bias in modeling decisions even among well-intentioned analysts.
How It Manifests
Adjusting adstock decay rates until a channel “looks right.” Removing control variables that reduce media coefficients. Zeroing out negative effects and calling them “non-significant.” Choosing between model variants based on which gives the “most reasonable” ROIs.
The Opportunity for Agencies
Agencies that adopt transparent, pre-specified modeling methodologies differentiate themselves from competitors. When your models are validated against holdout experiments, you can demonstrate genuine value rather than asserted value. This builds deeper, more durable client relationships.
For Model Shops
Dedicated analytics firms face the “client satisfaction versus scientific rigor” tension daily. The commercial pressure to deliver results that “make sense” is real, but it comes at a cost.
The Credibility Trap
If clients hire you expecting certain results and you deliver models that confirm expectations, no one complains. But when two different model shops produce contradictory results for the same brand using the same data, the industry’s credibility erodes.
The Validation Gap
Most model shops cannot point to systematic validation of their predictions. How often are model-implied ROIs tested against controlled experiments? Without this feedback loop, there is no mechanism to distinguish good models from specification-shopped ones.
Common Specification Shopping Practices
| Practice | Why It’s Done | Why It’s Harmful |
|---|---|---|
| Zeroing out negative media effects | “Media can’t have negative ROI” | Systematically biases all estimates upward; makes uncertainty invisible |
| Tuning adstock until results look right | “Domain knowledge about decay rates” | When done after seeing results, it’s a form of p-hacking; invalidates inference |
| Dropping control variables that reduce media effects | “Multicollinearity issues” | Omitting confounders inflates causal estimates; leads to incorrect attribution |
| Selecting “best” model from many candidates | “Model selection is standard practice” | Winner’s curse: the selected model overestimates effect sizes; reported uncertainty is too narrow |
| Adjusting priors to match expected ROIs | “Incorporating domain knowledge” | When done iteratively after seeing posteriors, this is Bayesian specification shopping; the posterior no longer reflects an honest belief update |
For Brands & Advertisers
As the end consumers of marketing measurement, brands are both the primary victims and the primary beneficiaries of improved methodology. Understanding what questions to ask is the first step toward better outcomes.
The Budget Impact
If your model overstates TV ROI by 40% due to specification shopping, you are systematically over-investing in television at the expense of other channels. Over a year, this can mean millions of dollars in misallocated spend.
The Year-over-Year Problem
Have you noticed that model results change dramatically year over year even when strategy is stable? This instability is often a sign of specification shopping—different analysts making different ad hoc choices rather than a systematic change in market dynamics.
Financial Consequences
The Compounding Effect
Specification shopping doesn’t just produce a single bad estimate. It produces a systematically biased view of your entire media portfolio. The channels that appear most effective are often the ones where the model had the most room to be optimistic—typically those with the least experimental validation.
Credibility Risk
The marketing measurement industry faces a growing credibility crisis. As data science matures in other domains and clients become more sophisticated, the gap between standard practice and scientific rigor becomes harder to ignore.
How to Detect Specification Shopping
Whether you are commissioning a model or reviewing one, these indicators suggest specification shopping may have occurred.
No negative media effects anywhere
If every channel shows positive ROI, ask: was this constrained or did the data show it? In reality, some channels in some time periods may show negligible or negative incremental effects, especially when over-saturated.
Extremely narrow confidence intervals
If the model says TV ROI is 1.42 (1.38–1.46), ask how this precision was achieved. With typical MMM data, genuine uncertainty is much wider. Artificially narrow intervals are a hallmark of selecting among specifications.
Results perfectly match prior expectations
If every result aligns with what the client expected, ask what would have been reported if results contradicted expectations. A model that always confirms priors is a mirror, not a measurement tool.
Dramatic year-over-year changes with no clear driver
If last year’s model showed TV was strongest and this year shows digital is strongest—with no change in strategy—the modeling process itself is likely the source of variation.
No holdout validation or experimental calibration
If model predictions have never been tested against controlled experiments, there is no empirical basis for trusting the results. In-sample fit measures like R-squared do not validate causal claims.
A Better Approach
The MMM Framework is built from the ground up to eliminate specification shopping while producing genuinely useful business insights.
- Run many models, report “the best one”
- Point estimates with false precision
- Post hoc adjustments to “fix” results
- No experimental validation
- Different analysts, different results
- Confidence comes from presentation, not evidence
- Pre-specify model before seeing results
- Full posterior distributions with honest uncertainty
- Bayesian priors encode domain knowledge transparently
- Built-in experimental calibration support
- Reproducible: same data, same results
- Confidence comes from validated predictions
The Business Case for Rigor
Organizations that adopt rigorous measurement practices don’t just get better models—they get better decisions. When you know which estimates are confident and which are uncertain, you can invest in experiments where they matter most, allocate budget based on validated effects, and build a compounding knowledge advantage over competitors who rely on specification-shopped results.
Questions to Ask Your Modeling Partner
Whether you are evaluating a new vendor or auditing existing work, these questions help distinguish rigorous measurement from specification shopping.
The gold standard is a pre-registered analysis plan that specifies model structure, priors, and decision criteria before the model is fit to data. Ask for the analysis plan and compare it to the final delivered model.
There is nothing wrong with testing multiple specifications, but all should be reported. If 30 models were run and only 1 is presented, the uncertainty is vastly understated. Ask for a sensitivity analysis showing how results change across reasonable specifications.
The most powerful validation is comparing model-implied predictions against holdout experiments (geo lift tests, randomized controlled trials). If the modeling partner cannot point to any experimental validation, the model’s causal claims are untested.
If the answer is “we constrain it to be positive” or “we adjust the model,” this is specification shopping. Negative effects are valid findings that indicate over-saturation, poor creative, or confounding. Honest measurement sometimes delivers unwelcome news.
If the answer is “we report point estimates,” push for credible intervals. If intervals seem implausibly narrow, ask what assumptions produce that precision. Genuine Bayesian credible intervals for MMM are typically wide enough to affect optimization decisions.
Reproducibility is the minimum bar for scientific claims. If the modeling partner cannot provide code that reproduces their results, the work cannot be independently verified. The MMM Framework is fully open source and reproducible by design.
Getting Started with Rigorous Measurement
Whether you are an agency, model shop, or brand, the transition to rigorous measurement follows a common path.
Assess your current practices
Review your existing modeling workflow against the detection criteria above. Identify where post hoc adjustments are made and where specification choices are data-driven rather than pre-specified.
Start with one project
Pick a single client or brand and run the full rigorous workflow alongside your existing approach. Compare the results and understand the differences.
Design validation experiments
Use model predictions to design geo lift tests or holdout experiments. This creates the feedback loop needed to distinguish working models from non-working ones.
Communicate uncertainty as a feature
Train stakeholders to see honest uncertainty ranges as more valuable than false precision. When you say “we are confident TV ROI is between 1.1 and 1.8” it enables better decisions than “TV ROI is 1.42.”
Build organizational capability
Invest in training your team on Bayesian methods, causal inference, and the MMM Framework. This is a long-term competitive advantage, not just a tool change.
Key Takeaway
The marketing measurement industry is moving toward greater rigor. Organizations that lead this transition will build differentiated capabilities and client relationships grounded in demonstrated rather than asserted credibility. The MMM Framework provides the tools—the rest is organizational commitment to honesty.
Ready to Learn More?
Explore the step-by-step modeling guide for implementing statistically sound models, or read interpreting results for guidance on communicating findings to media planners and CMOs.