Frequently Asked Questions
Guidance for analysts defending rigorous methodology and for clients understanding what honest measurement looks like.
For Analysts
Bayesian methods provide genuine uncertainty quantification. Traditional regression gives you confidence intervals that answer: "If I repeated this sampling process infinitely, 95% of intervals would contain the true value." That's a statement about a hypothetical procedure, not about your actual estimate.
Bayesian credible intervals answer a more useful question: "Given our data and prior knowledge, there's a 94% probability the true value lies in this range." This is a direct probability statement about the parameter—exactly what decision-makers need.
📊 Interactive: How Data Updates Beliefs
Additional advantages:
- Prior incorporation: External evidence (meta-analyses, experiments, domain expertise) can be formally included
- Natural regularization: Priors prevent overfitting without ad-hoc penalty terms
- Hierarchical modeling: Partial pooling across geographies improves sparse estimates
- Full posterior: You get distributions, not just point estimates—enabling decision analysis under uncertainty
Specification shopping (also called specification searching, data dredging, or p-hacking) occurs when analysts test multiple model specifications and selectively report the one that produces desired results—typically significant coefficients, expected signs, or "reasonable" ROIs.
📊 Interactive: How Selecting "Positive" Results Creates Bias
When the true effect is zero or small, coefficient estimates scatter around the truth. If you only report specifications with positive results, you systematically overestimate.
Common Forms in MMM
- Testing dozens of lag lengths until coefficients become significant
- Trying different saturation curves until ROI looks "right"
- Adding control variables based on residual patterns
- Dropping "outlier" observations until fit improves
- Adjusting adstock until negative coefficients become positive
The fundamental problem is that this invalidates all statistical inference. When you test 20 specifications and report the best one, your reported confidence interval dramatically understates true uncertainty. The "95% confidence" claim becomes meaningless.
📊 Interactive: Many Analysts, One Dataset
This simulates the Silberzahn et al. (2018) finding: give the same data to multiple analysts with flexibility in specification choices, and results vary dramatically.
This conflates two different things: not having an answer, and being honest about confidence in the answer. Clients won't accept "we don't know" as a final answer—agreed. But they should accept (and increasingly expect) honest uncertainty bounds.
"Our best estimate is that TV ROI is 1.4, with an 80% probability it falls between 1.1 and 1.8. This estimate is based on pre-specified methodology and robust to reasonable alternative specifications. We recommend a geo-holdout test to narrow this range before major budget changes."
This is not "we don't know"—it's a precise, actionable statement that acknowledges uncertainty while providing a clear recommendation. Compare this to the alternative:
"TV ROI is 1.4." (Based on the third specification we tried, after the first two gave negative coefficients. We have no idea what happens if the client acts on this and it's wrong.)
The honest version is more valuable because it enables appropriate decision-making. It also protects the relationship: if the recommendation fails, you acknowledged the uncertainty upfront.
No. A negative coefficient is informative, not a mistake. It might indicate:
- The channel genuinely has no effect (or negative effect) in this context
- The effect is too small to detect with available data
- Measurement error is attenuating estimates toward zero
- Confounding is biasing estimates (e.g., media increases during low-demand periods)
- The model is misspecified in ways that obscure the true effect
📊 Interactive: Interpreting Coefficient Posteriors
The correct response is to investigate why the estimate is unexpected, not to iterate until it matches expectations. If you require the model to show advertising works, and adjust until it does, you cannot then cite the model as evidence that advertising works. The reasoning is circular.
The Bayesian Approach
This framework uses informative priors that encode the belief that media effects are positive, if you have a strong prior knowledge that a media effect can not be negative it should be incorporated into the model via priors. Specification shopping to avoid negative coefficients just hides this assumption rather than addressing it directly.
Year-over-year instability usually reflects one of two things:
- Genuine changes in media effectiveness (new creative, changed competitive environment, audience saturation)
- Sensitivity to specification choices that differ between modeling cycles
📊 Interactive: When is Year-over-Year Change Meaningful?
Changes within credible intervals are expected random variation, not real changes.
The second is more common and more problematic. If different analysts, or the same analyst with slightly different judgment calls, produce substantially different results, then the "result" is an artifact of the process, not a property of reality.
This framework addresses instability by:
- Pre-specifying models before seeing results, reducing researcher degrees of freedom
- Reporting sensitivity analyses that show how results vary across reasonable specifications
- Using hierarchical priors that can pool information across time periods
- Quantifying uncertainty so that changes within credible intervals aren't over-interpreted
For Clients
This is perhaps the most important misconception to address. Statistics does not remove uncertainty—it quantifies uncertainty.
📊 Interactive: Before vs. After Analysis
Before analysis, you don't know what TV's ROI is. After analysis, you still don't know for certain—but you have a principled range of plausible values, given your data and assumptions. That range might be narrow (high confidence) or wide (low confidence), but it's always a range, never a single "true" value.
The Role of Statistical Analysis
Statistical analysis tells you: "Given the data you have and the assumptions you've made, here is the range of values that are consistent with that evidence, and here is how much probability mass falls in different parts of that range."
It does not tell you: "The true value is exactly X."
This matters for decisions. If TV ROI is "1.4 ± 0.2" (narrow range), you can confidently increase TV spend. If it's "1.4 ± 1.2" (wide range), the same point estimate suggests a very different action—probably running an experiment before making major changes.
A credible interval gives you a direct probability statement about the parameter. When we report "TV ROI: 1.4 (1.1–1.8, 80% CI)", we're saying:
"Given our data and analysis, there is an 80% probability that TV's true ROI falls between 1.1 and 1.8."
📊 Interactive: ROI Estimates and Decision Regions
This is useful because:
- If the interval is entirely above 1.0, we're confident the channel is profitable
- If the interval spans 1.0, we can't confidently say whether it's profitable
- The width of the interval tells you how precise the estimate is
- You can make probability statements: "There's a 90% chance ROI exceeds 1.2"
| Scenario | Example Interval | Interpretation |
|---|---|---|
| High confidence, positive effect | 1.4 (1.2–1.6) | Almost certainly profitable; act with confidence |
| Moderate confidence | 1.4 (0.9–2.0) | Probably profitable; monitor closely or validate |
| High uncertainty | 1.4 (0.5–2.5) | Too uncertain to act; run experiment first |
| Confident null | 0.2 (0.1–0.4) | Almost certainly unprofitable; consider cutting |
Uncertainty is valuable because it enables appropriate decision-making. Without it, you can't distinguish between situations that require different actions.
📊 Interactive: Same Point Estimate, Different Decisions
Both channels show TV ROI = 1.4. But should you take the same action for both?
Channel B: Wide uncertainty (σ=0.60) → 25% chance unprofitable → Validate before major changes
Consider two scenarios, both with the same point estimate:
Scenario A
TV ROI: 1.4 (1.2–1.6, 80% CI)
Action: Increase TV spend with confidence
Scenario B
TV ROI: 1.4 (0.5–2.8, 80% CI)
Action: Run experiment before major changes
If you only see "TV ROI: 1.4" without the interval, you'd take the same action in both cases—but that would be a mistake. Scenario B has a 25% probability that TV is actually unprofitable. Increasing spend based on that estimate is gambling, not optimization.
Uncertainty also creates opportunities:
- High-uncertainty parameters are opportunities for experiments—resolving the uncertainty through a geo-test may have high ROI itself
- Low-uncertainty parameters need less validation—you can act more quickly on robust estimates
- Portfolio effects matter—you can diversify across channels where you're uncertain
Experiments (geo-holdouts, incrementality tests) should be considered when:
- Uncertainty is high for a channel that matters. If credible intervals are wide and the channel represents significant spend, an experiment may have high ROI.
- You're considering a major budget change. Moving 20% of budget based on an estimate with wide uncertainty is risky; an experiment de-risks the decision.
- The model result is surprising. If the model says a historically strong channel is underperforming, validation provides confidence before acting.
- Stakeholders need conviction. Sometimes the issue isn't statistical uncertainty but organizational buy-in; experiments create shared evidence.
📊 Interactive: When to Experiment
Experiments are typically not needed when:
- Credible intervals are narrow and entirely above/below decision thresholds
- The decision is easily reversible
- The channel is small relative to total spend
- An experiment is logistically infeasible or too expensive relative to the decision value
Experiments and Models are Complementary
Experiments don't replace models—they validate and calibrate them. A model can answer questions about channels that weren't experimented on, predict effects at spend levels not tested, and generate hypotheses for future experiments. But periodic experimental validation is what keeps models grounded in reality.
Different approaches make different tradeoffs. Here's how to think about the major distinctions:
| Approach | Uncertainty | Prior Knowledge | Specification |
|---|---|---|---|
| Traditional OLS | Understated | Not incorporated | Often shopped |
| Ridge/LASSO | Bootstrap only | Ad-hoc penalties | Often shopped |
| Bayesian (this framework) | Full posterior | Principled priors | Pre-specified |
Key questions to ask about any MMM methodology:
- Do they report credible/confidence intervals? Point estimates alone are incomplete.
- How are specifications chosen? If it's "we tried several and picked the best," that's specification shopping.
- Is there validation? Do model predictions ever get tested against experiments?
- How is domain knowledge incorporated? Formal priors vs. ad-hoc constraints make a big difference.
- What happens with negative coefficients? If they're always "fixed," ask how and why.
Framework Features
This framework is designed from the ground up to enable honest measurement:
- Full Bayesian inference via PyMC provides genuine posterior distributions, not just point estimates. Every parameter has a complete uncertainty distribution.
- Structured prior specification through configuration objects means priors are declared before fitting, not adjusted after seeing results.
- Comprehensive diagnostics (trace plots, R-hat, ESS, posterior predictive checks) ensure the inference is reliable before results are interpreted.
- Contribution uncertainty propagates coefficient uncertainty through to business metrics (ROI, contributions) rather than reporting only point estimates.
- Variable selection methods (horseshoe, spike-and-slab) handle uncertainty about which controls to include, rather than requiring manual specification shopping.
# The framework computes full posterior distributions for business metrics
contributions = model.compute_contributions()
# Get probability that TV ROI exceeds threshold
prob_profitable = (contributions["TV"]["roi"] > 1.0).mean()
print(f"Probability TV is profitable: {prob_profitable:.1%}")
Priors encode what we believe before seeing the data. In marketing, we typically have genuine prior knowledge:
- Media elasticities are usually small (0.01–0.3 based on meta-analyses)
- Effects are generally positive (advertising shouldn't decrease sales)
- Adstock decay rates are bounded (effects don't last forever)
This information is valuable and should be used. The alternative—pretending we know nothing—is just as much a choice, and often a worse one.
How Priors and Data Interact
With enough data, the posterior is dominated by the data and priors have minimal impact. With limited data, priors provide regularization that prevents overfitting. This is exactly the behavior you want: trust data when you have it, fall back on prior knowledge when you don't.
The framework includes prior sensitivity analysis to verify that conclusions are robust to reasonable prior alternatives.
The key difference from specification shopping is that priors are declared before seeing results. You can't "prior shop" in the same way because you commit to priors before fitting. The framework's configuration-based approach enforces this discipline.
The framework supports multiple forms of validation:
- Prior predictive checks: Before fitting, simulate data from your priors to verify they imply reasonable data distributions.
- Posterior predictive checks: After fitting, simulate data from the posterior and compare to actual observations. Systematic discrepancies indicate model misspecification.
- Out-of-sample prediction: Hold out recent periods and evaluate predictive accuracy on unseen data.
- Experimental calibration: Compare model-predicted effects to geo-experiment results when available.
# Generate posterior predictive samples
ppc = model.sample_posterior_predictive(trace)
# Compare to observed data
az.plot_ppc(trace, observed=y_observed)
Validation doesn't prevent specification shopping, but it creates accountability: if a specification-shopped model makes predictions that fail validation, you'll know.
Effective uncertainty visualization is critical for stakeholder communication. The framework provides:
- Response curves with uncertainty bands: Show saturation curves with 80% and 94% credible intervals, so stakeholders see where we're confident vs. uncertain.
- Contribution waterfall charts: Decompose sales into components with error bars showing contribution uncertainty.
- ROI forest plots: Display channel ROIs with credible intervals, making it easy to see which channels are confidently above/below profitability thresholds.
- Prior vs. posterior comparisons: Show how much the data moved beliefs from the prior, indicating data informativeness.
📊 Example: ROI Forest Plot with Decision Regions
Design Principle
Every visualization in the framework shows uncertainty by default. Hiding uncertainty requires explicit opt-out, not opt-in. This ensures that honest measurement is the path of least resistance.
Summary: Key Principles
For Analysts
- Pre-specify models before seeing results
- Report credible intervals, not just point estimates
- Investigate unexpected results; don't iterate until they disappear
- Recommend experiments when uncertainty is high
- Use sensitivity analysis to show robustness (or lack thereof)
For Clients
- Statistics quantifies uncertainty; it doesn't eliminate it
- Credible intervals tell you how confident to be in estimates
- Wide intervals mean "run an experiment before acting"
- Narrow intervals mean "act with confidence"
- Ask for uncertainty bounds on any recommendation
Red Flags in Any Methodology
- Point estimates without uncertainty bounds
- "We adjusted the model until the results made sense"
- Negative coefficients that were "fixed"
- No discussion of validation or out-of-sample testing
- Year-over-year changes presented as real when they're within uncertainty