What Is Specification Shopping?

Specification shopping is the practice of running many model variations and only reporting the ones that produce results stakeholders expect to see. It is the single biggest threat to the integrity of marketing measurement.

⚠️ How It Happens
  • Analyst runs 30 model variations
  • Only 3 show TV ROI above 1.0
  • Those 3 are presented as “the model”
  • Results look precise but are selection artifacts
  • Budget decisions made on false confidence
What Should Happen
  • Model specification defined before seeing results
  • Uncertainty honestly reported as probability ranges
  • All reasonable models considered, not just “good” ones
  • Sensitivity to modeling choices tested and reported
  • Budget decisions account for genuine uncertainty

Why This Matters

When you test many specifications and select based on results, standard statistical inference is invalidated. The confidence intervals no longer mean what they claim to mean. The process systematically selects for confirming rather than disconfirming evidence.

Who Is Affected

Specification shopping is pervasive across the marketing measurement ecosystem. Each role faces distinct pressures that encourage the practice and distinct consequences when it fails.

🏢

Ad Agencies

Proving campaign value

Under pressure to demonstrate ROI for the media they placed. Models that show poor performance threaten client relationships and revenue.

📊

Model Shops

Delivering expected results

Hired to build models that “make sense.” When results contradict expectations, there is pressure to adjust until they align with priors.

🏭

Brands & Advertisers

Making budget decisions

Rely on model outputs to allocate millions in media spend. False precision from specification shopping leads to misallocated budgets.

For Ad Agencies

Agencies face a structural conflict of interest: they are often asked to measure the effectiveness of media they themselves placed. This creates subtle but powerful incentives to find positive results.

💰

The Incentive Problem

When your revenue depends on clients continuing to spend on channels you manage, models that show low ROI for those channels threaten your business. This creates unconscious bias in modeling decisions even among well-intentioned analysts.

🔍

How It Manifests

Adjusting adstock decay rates until a channel “looks right.” Removing control variables that reduce media coefficients. Zeroing out negative effects and calling them “non-significant.” Choosing between model variants based on which gives the “most reasonable” ROIs.

The Opportunity for Agencies

Agencies that adopt transparent, pre-specified modeling methodologies differentiate themselves from competitors. When your models are validated against holdout experiments, you can demonstrate genuine value rather than asserted value. This builds deeper, more durable client relationships.

For Model Shops

Dedicated analytics firms face the “client satisfaction versus scientific rigor” tension daily. The commercial pressure to deliver results that “make sense” is real, but it comes at a cost.

⚖️

The Credibility Trap

If clients hire you expecting certain results and you deliver models that confirm expectations, no one complains. But when two different model shops produce contradictory results for the same brand using the same data, the industry’s credibility erodes.

📈

The Validation Gap

Most model shops cannot point to systematic validation of their predictions. How often are model-implied ROIs tested against controlled experiments? Without this feedback loop, there is no mechanism to distinguish good models from specification-shopped ones.

When everyone uses the same biased methods, an entire industry can be confidently wrong. The first firms to break this cycle will have a significant competitive advantage.
— The case for rigorous measurement

Common Specification Shopping Practices

Practice Why It’s Done Why It’s Harmful
Zeroing out negative media effects “Media can’t have negative ROI” Systematically biases all estimates upward; makes uncertainty invisible
Tuning adstock until results look right “Domain knowledge about decay rates” When done after seeing results, it’s a form of p-hacking; invalidates inference
Dropping control variables that reduce media effects “Multicollinearity issues” Omitting confounders inflates causal estimates; leads to incorrect attribution
Selecting “best” model from many candidates “Model selection is standard practice” Winner’s curse: the selected model overestimates effect sizes; reported uncertainty is too narrow
Adjusting priors to match expected ROIs “Incorporating domain knowledge” When done iteratively after seeing posteriors, this is Bayesian specification shopping; the posterior no longer reflects an honest belief update

For Brands & Advertisers

As the end consumers of marketing measurement, brands are both the primary victims and the primary beneficiaries of improved methodology. Understanding what questions to ask is the first step toward better outcomes.

💸

The Budget Impact

If your model overstates TV ROI by 40% due to specification shopping, you are systematically over-investing in television at the expense of other channels. Over a year, this can mean millions of dollars in misallocated spend.

📋

The Year-over-Year Problem

Have you noticed that model results change dramatically year over year even when strategy is stable? This instability is often a sign of specification shopping—different analysts making different ad hoc choices rather than a systematic change in market dynamics.

Financial Consequences

30-50%
ROI Overestimation
Typical upward bias from zeroing out negative effects
5-15%
Budget Misallocation
Share of budget directed to wrong channels
100%
Reported Accuracy
From models selected to show high fit
<50%
Actual Reliability
When validated against holdout experiments

The Compounding Effect

Specification shopping doesn’t just produce a single bad estimate. It produces a systematically biased view of your entire media portfolio. The channels that appear most effective are often the ones where the model had the most room to be optimistic—typically those with the least experimental validation.

Credibility Risk

The marketing measurement industry faces a growing credibility crisis. As data science matures in other domains and clients become more sophisticated, the gap between standard practice and scientific rigor becomes harder to ignore.

Client trust erosion
High
Regulatory scrutiny
Growing
Competitive displacement
High

How to Detect Specification Shopping

Whether you are commissioning a model or reviewing one, these indicators suggest specification shopping may have occurred.

No negative media effects anywhere

If every channel shows positive ROI, ask: was this constrained or did the data show it? In reality, some channels in some time periods may show negligible or negative incremental effects, especially when over-saturated.

Extremely narrow confidence intervals

If the model says TV ROI is 1.42 (1.38–1.46), ask how this precision was achieved. With typical MMM data, genuine uncertainty is much wider. Artificially narrow intervals are a hallmark of selecting among specifications.

Results perfectly match prior expectations

If every result aligns with what the client expected, ask what would have been reported if results contradicted expectations. A model that always confirms priors is a mirror, not a measurement tool.

Dramatic year-over-year changes with no clear driver

If last year’s model showed TV was strongest and this year shows digital is strongest—with no change in strategy—the modeling process itself is likely the source of variation.

No holdout validation or experimental calibration

If model predictions have never been tested against controlled experiments, there is no empirical basis for trusting the results. In-sample fit measures like R-squared do not validate causal claims.

A Better Approach

The MMM Framework is built from the ground up to eliminate specification shopping while producing genuinely useful business insights.

Traditional Approach
  • Run many models, report “the best one”
  • Point estimates with false precision
  • Post hoc adjustments to “fix” results
  • No experimental validation
  • Different analysts, different results
  • Confidence comes from presentation, not evidence
MMM Framework Approach
  • Pre-specify model before seeing results
  • Full posterior distributions with honest uncertainty
  • Bayesian priors encode domain knowledge transparently
  • Built-in experimental calibration support
  • Reproducible: same data, same results
  • Confidence comes from validated predictions

The Business Case for Rigor

Organizations that adopt rigorous measurement practices don’t just get better models—they get better decisions. When you know which estimates are confident and which are uncertain, you can invest in experiments where they matter most, allocate budget based on validated effects, and build a compounding knowledge advantage over competitors who rely on specification-shopped results.

Questions to Ask Your Modeling Partner

Whether you are evaluating a new vendor or auditing existing work, these questions help distinguish rigorous measurement from specification shopping.

The gold standard is a pre-registered analysis plan that specifies model structure, priors, and decision criteria before the model is fit to data. Ask for the analysis plan and compare it to the final delivered model.

There is nothing wrong with testing multiple specifications, but all should be reported. If 30 models were run and only 1 is presented, the uncertainty is vastly understated. Ask for a sensitivity analysis showing how results change across reasonable specifications.

The most powerful validation is comparing model-implied predictions against holdout experiments (geo lift tests, randomized controlled trials). If the modeling partner cannot point to any experimental validation, the model’s causal claims are untested.

If the answer is “we constrain it to be positive” or “we adjust the model,” this is specification shopping. Negative effects are valid findings that indicate over-saturation, poor creative, or confounding. Honest measurement sometimes delivers unwelcome news.

If the answer is “we report point estimates,” push for credible intervals. If intervals seem implausibly narrow, ask what assumptions produce that precision. Genuine Bayesian credible intervals for MMM are typically wide enough to affect optimization decisions.

Reproducibility is the minimum bar for scientific claims. If the modeling partner cannot provide code that reproduces their results, the work cannot be independently verified. The MMM Framework is fully open source and reproducible by design.

Getting Started with Rigorous Measurement

Whether you are an agency, model shop, or brand, the transition to rigorous measurement follows a common path.

Assess your current practices

Review your existing modeling workflow against the detection criteria above. Identify where post hoc adjustments are made and where specification choices are data-driven rather than pre-specified.

Start with one project

Pick a single client or brand and run the full rigorous workflow alongside your existing approach. Compare the results and understand the differences.

Design validation experiments

Use model predictions to design geo lift tests or holdout experiments. This creates the feedback loop needed to distinguish working models from non-working ones.

Communicate uncertainty as a feature

Train stakeholders to see honest uncertainty ranges as more valuable than false precision. When you say “we are confident TV ROI is between 1.1 and 1.8” it enables better decisions than “TV ROI is 1.42.”

Build organizational capability

Invest in training your team on Bayesian methods, causal inference, and the MMM Framework. This is a long-term competitive advantage, not just a tool change.

Key Takeaway

The marketing measurement industry is moving toward greater rigor. Organizations that lead this transition will build differentiated capabilities and client relationships grounded in demonstrated rather than asserted credibility. The MMM Framework provides the tools—the rest is organizational commitment to honesty.

Ready to Learn More?

Explore the step-by-step modeling guide for implementing statistically sound models, or read interpreting results for guidance on communicating findings to media planners and CMOs.