Frequently Asked Questions

Guidance for analysts defending rigorous methodology and for clients understanding what honest measurement looks like.

For Analysts

Why should we use Bayesian methods instead of traditional regression?

Bayesian methods provide genuine uncertainty quantification. Traditional regression gives you confidence intervals that answer: "If I repeated this sampling process infinitely, 95% of intervals would contain the true value." That's a statement about a hypothetical procedure, not about your actual estimate.

Bayesian credible intervals answer a more useful question: "Given our data and prior knowledge, there's a 94% probability the true value lies in this range." This is a direct probability statement about the parameter—exactly what decision-makers need.

📊 Interactive: How Data Updates Beliefs

30 observations
1.40
How to read this: The prior (dashed) shows beliefs before seeing data. As data accumulates, the posterior (solid) concentrates around the true value. With little data, the prior dominates. With lots of data, the data dominates.

Additional advantages:

  • Prior incorporation: External evidence (meta-analyses, experiments, domain expertise) can be formally included
  • Natural regularization: Priors prevent overfitting without ad-hoc penalty terms
  • Hierarchical modeling: Partial pooling across geographies improves sparse estimates
  • Full posterior: You get distributions, not just point estimates—enabling decision analysis under uncertainty
What exactly is specification shopping, and why is it a problem?

Specification shopping (also called specification searching, data dredging, or p-hacking) occurs when analysts test multiple model specifications and selectively report the one that produces desired results—typically significant coefficients, expected signs, or "reasonable" ROIs.

📊 Interactive: How Selecting "Positive" Results Creates Bias

When the true effect is zero or small, coefficient estimates scatter around the truth. If you only report specifications with positive results, you systematically overestimate.

0.05
20
True effect: 0.05. Testing 20 specifications and reporting only the most positive: expected reported estimate is 0.35 — a 7x overestimate.

Common Forms in MMM

  • Testing dozens of lag lengths until coefficients become significant
  • Trying different saturation curves until ROI looks "right"
  • Adding control variables based on residual patterns
  • Dropping "outlier" observations until fit improves
  • Adjusting adstock until negative coefficients become positive

The fundamental problem is that this invalidates all statistical inference. When you test 20 specifications and report the best one, your reported confidence interval dramatically understates true uncertainty. The "95% confidence" claim becomes meaningless.

📊 Interactive: Many Analysts, One Dataset

This simulates the Silberzahn et al. (2018) finding: give the same data to multiple analysts with flexibility in specification choices, and results vary dramatically.

Each bar represents one analyst's estimate. Same data, same question, different (defensible) analytical choices. This variation comes from specification flexibility—not from bad actors or incompetent analysts.
Clients won't accept "we don't know." How do I present uncertainty?

This conflates two different things: not having an answer, and being honest about confidence in the answer. Clients won't accept "we don't know" as a final answer—agreed. But they should accept (and increasingly expect) honest uncertainty bounds.

Talking Point

"Our best estimate is that TV ROI is 1.4, with an 80% probability it falls between 1.1 and 1.8. This estimate is based on pre-specified methodology and robust to reasonable alternative specifications. We recommend a geo-holdout test to narrow this range before major budget changes."

This is not "we don't know"—it's a precise, actionable statement that acknowledges uncertainty while providing a clear recommendation. Compare this to the alternative:

The Alternative

"TV ROI is 1.4." (Based on the third specification we tried, after the first two gave negative coefficients. We have no idea what happens if the client acts on this and it's wrong.)

The honest version is more valuable because it enables appropriate decision-making. It also protects the relationship: if the recommendation fails, you acknowledged the uncertainty upfront.

We got a negative coefficient for a media channel. Should we adjust the model?

No. A negative coefficient is informative, not a mistake. It might indicate:

  • The channel genuinely has no effect (or negative effect) in this context
  • The effect is too small to detect with available data
  • Measurement error is attenuating estimates toward zero
  • Confounding is biasing estimates (e.g., media increases during low-demand periods)
  • The model is misspecified in ways that obscure the true effect

📊 Interactive: Interpreting Coefficient Posteriors

Strong positive effect: 95% probability effect is positive. Act with confidence.

The correct response is to investigate why the estimate is unexpected, not to iterate until it matches expectations. If you require the model to show advertising works, and adjust until it does, you cannot then cite the model as evidence that advertising works. The reasoning is circular.

The Bayesian Approach

This framework uses informative priors that encode the belief that media effects are positive, if you have a strong prior knowledge that a media effect can not be negative it should be incorporated into the model via priors. Specification shopping to avoid negative coefficients just hides this assumption rather than addressing it directly.

Why do our estimates change so much year-over-year?

Year-over-year instability usually reflects one of two things:

  1. Genuine changes in media effectiveness (new creative, changed competitive environment, audience saturation)
  2. Sensitivity to specification choices that differ between modeling cycles

📊 Interactive: When is Year-over-Year Change Meaningful?

Changes within credible intervals are expected random variation, not real changes.

Key insight: The 2024 vs 2023 change looks dramatic when viewing only point estimates, but both estimates are within each other's uncertainty bands. The 2022→2023 change, however, shows intervals that barely overlap—suggesting a potentially real shift.

The second is more common and more problematic. If different analysts, or the same analyst with slightly different judgment calls, produce substantially different results, then the "result" is an artifact of the process, not a property of reality.

This framework addresses instability by:

  • Pre-specifying models before seeing results, reducing researcher degrees of freedom
  • Reporting sensitivity analyses that show how results vary across reasonable specifications
  • Using hierarchical priors that can pool information across time periods
  • Quantifying uncertainty so that changes within credible intervals aren't over-interpreted

For Clients

What does statistics actually do? I thought it gives us the answer.

This is perhaps the most important misconception to address. Statistics does not remove uncertainty—it quantifies uncertainty.

📊 Interactive: Before vs. After Analysis

Before analysis: We know almost nothing—ROI could plausibly be anywhere from 0 to 3+.

Before analysis, you don't know what TV's ROI is. After analysis, you still don't know for certain—but you have a principled range of plausible values, given your data and assumptions. That range might be narrow (high confidence) or wide (low confidence), but it's always a range, never a single "true" value.

The Role of Statistical Analysis

Statistical analysis tells you: "Given the data you have and the assumptions you've made, here is the range of values that are consistent with that evidence, and here is how much probability mass falls in different parts of that range."

It does not tell you: "The true value is exactly X."

This matters for decisions. If TV ROI is "1.4 ± 0.2" (narrow range), you can confidently increase TV spend. If it's "1.4 ± 1.2" (wide range), the same point estimate suggests a very different action—probably running an experiment before making major changes.

How do I read the credible intervals in these reports?

A credible interval gives you a direct probability statement about the parameter. When we report "TV ROI: 1.4 (1.1–1.8, 80% CI)", we're saying:

"Given our data and analysis, there is an 80% probability that TV's true ROI falls between 1.1 and 1.8."

📊 Interactive: ROI Estimates and Decision Regions

1.40
0.20
Recommendation: Act with confidence. 95% probability ROI > 1.0. The data strongly supports this channel being profitable.

This is useful because:

  • If the interval is entirely above 1.0, we're confident the channel is profitable
  • If the interval spans 1.0, we can't confidently say whether it's profitable
  • The width of the interval tells you how precise the estimate is
  • You can make probability statements: "There's a 90% chance ROI exceeds 1.2"
Scenario Example Interval Interpretation
High confidence, positive effect 1.4 (1.2–1.6) Almost certainly profitable; act with confidence
Moderate confidence 1.4 (0.9–2.0) Probably profitable; monitor closely or validate
High uncertainty 1.4 (0.5–2.5) Too uncertain to act; run experiment first
Confident null 0.2 (0.1–0.4) Almost certainly unprofitable; consider cutting
Why is uncertainty valuable? It sounds like you're just telling me you don't know.

Uncertainty is valuable because it enables appropriate decision-making. Without it, you can't distinguish between situations that require different actions.

📊 Interactive: Same Point Estimate, Different Decisions

Both channels show TV ROI = 1.4. But should you take the same action for both?

Channel A: Tight uncertainty (σ=0.15) → Only 0.4% chance unprofitable → Act with confidence
Channel B: Wide uncertainty (σ=0.60) → 25% chance unprofitable → Validate before major changes

Consider two scenarios, both with the same point estimate:

Scenario A

TV ROI: 1.4 (1.2–1.6, 80% CI)

Action: Increase TV spend with confidence

Scenario B

TV ROI: 1.4 (0.5–2.8, 80% CI)

Action: Run experiment before major changes

If you only see "TV ROI: 1.4" without the interval, you'd take the same action in both cases—but that would be a mistake. Scenario B has a 25% probability that TV is actually unprofitable. Increasing spend based on that estimate is gambling, not optimization.

Uncertainty also creates opportunities:

  • High-uncertainty parameters are opportunities for experiments—resolving the uncertainty through a geo-test may have high ROI itself
  • Low-uncertainty parameters need less validation—you can act more quickly on robust estimates
  • Portfolio effects matter—you can diversify across channels where you're uncertain
When should we run an experiment instead of relying on the model?

Experiments (geo-holdouts, incrementality tests) should be considered when:

  • Uncertainty is high for a channel that matters. If credible intervals are wide and the channel represents significant spend, an experiment may have high ROI.
  • You're considering a major budget change. Moving 20% of budget based on an estimate with wide uncertainty is risky; an experiment de-risks the decision.
  • The model result is surprising. If the model says a historically strong channel is underperforming, validation provides confidence before acting.
  • Stakeholders need conviction. Sometimes the issue isn't statistical uncertainty but organizational buy-in; experiments create shared evidence.

📊 Interactive: When to Experiment

The decision to experiment depends on both uncertainty and stakes. High uncertainty + high stakes = experiment. Low uncertainty + any stakes = act on model. Low stakes + any uncertainty = act on model (reversible decisions don't need validation).

Experiments are typically not needed when:

  • Credible intervals are narrow and entirely above/below decision thresholds
  • The decision is easily reversible
  • The channel is small relative to total spend
  • An experiment is logistically infeasible or too expensive relative to the decision value

Experiments and Models are Complementary

Experiments don't replace models—they validate and calibrate them. A model can answer questions about channels that weren't experimented on, predict effects at spend levels not tested, and generate hypotheses for future experiments. But periodic experimental validation is what keeps models grounded in reality.

How does this compare to other MMM approaches?

Different approaches make different tradeoffs. Here's how to think about the major distinctions:

Approach Uncertainty Prior Knowledge Specification
Traditional OLS Understated Not incorporated Often shopped
Ridge/LASSO Bootstrap only Ad-hoc penalties Often shopped
Bayesian (this framework) Full posterior Principled priors Pre-specified

Key questions to ask about any MMM methodology:

  1. Do they report credible/confidence intervals? Point estimates alone are incomplete.
  2. How are specifications chosen? If it's "we tried several and picked the best," that's specification shopping.
  3. Is there validation? Do model predictions ever get tested against experiments?
  4. How is domain knowledge incorporated? Formal priors vs. ad-hoc constraints make a big difference.
  5. What happens with negative coefficients? If they're always "fixed," ask how and why.

Framework Features

How does this framework specifically help with these issues?

This framework is designed from the ground up to enable honest measurement:

  • Full Bayesian inference via PyMC provides genuine posterior distributions, not just point estimates. Every parameter has a complete uncertainty distribution.
  • Structured prior specification through configuration objects means priors are declared before fitting, not adjusted after seeing results.
  • Comprehensive diagnostics (trace plots, R-hat, ESS, posterior predictive checks) ensure the inference is reliable before results are interpreted.
  • Contribution uncertainty propagates coefficient uncertainty through to business metrics (ROI, contributions) rather than reporting only point estimates.
  • Variable selection methods (horseshoe, spike-and-slab) handle uncertainty about which controls to include, rather than requiring manual specification shopping.
# The framework computes full posterior distributions for business metrics
contributions = model.compute_contributions()

# Get probability that TV ROI exceeds threshold
prob_profitable = (contributions["TV"]["roi"] > 1.0).mean()
print(f"Probability TV is profitable: {prob_profitable:.1%}")
How do priors work, and don't they bias the results?

Priors encode what we believe before seeing the data. In marketing, we typically have genuine prior knowledge:

  • Media elasticities are usually small (0.01–0.3 based on meta-analyses)
  • Effects are generally positive (advertising shouldn't decrease sales)
  • Adstock decay rates are bounded (effects don't last forever)

This information is valuable and should be used. The alternative—pretending we know nothing—is just as much a choice, and often a worse one.

How Priors and Data Interact

With enough data, the posterior is dominated by the data and priors have minimal impact. With limited data, priors provide regularization that prevents overfitting. This is exactly the behavior you want: trust data when you have it, fall back on prior knowledge when you don't.

The framework includes prior sensitivity analysis to verify that conclusions are robust to reasonable prior alternatives.

The key difference from specification shopping is that priors are declared before seeing results. You can't "prior shop" in the same way because you commit to priors before fitting. The framework's configuration-based approach enforces this discipline.

What validation capabilities does the framework provide?

The framework supports multiple forms of validation:

  • Prior predictive checks: Before fitting, simulate data from your priors to verify they imply reasonable data distributions.
  • Posterior predictive checks: After fitting, simulate data from the posterior and compare to actual observations. Systematic discrepancies indicate model misspecification.
  • Out-of-sample prediction: Hold out recent periods and evaluate predictive accuracy on unseen data.
  • Experimental calibration: Compare model-predicted effects to geo-experiment results when available.
# Generate posterior predictive samples
ppc = model.sample_posterior_predictive(trace)

# Compare to observed data
az.plot_ppc(trace, observed=y_observed)

Validation doesn't prevent specification shopping, but it creates accountability: if a specification-shopped model makes predictions that fail validation, you'll know.

How does the framework communicate uncertainty visually?

Effective uncertainty visualization is critical for stakeholder communication. The framework provides:

  • Response curves with uncertainty bands: Show saturation curves with 80% and 94% credible intervals, so stakeholders see where we're confident vs. uncertain.
  • Contribution waterfall charts: Decompose sales into components with error bars showing contribution uncertainty.
  • ROI forest plots: Display channel ROIs with credible intervals, making it easy to see which channels are confidently above/below profitability thresholds.
  • Prior vs. posterior comparisons: Show how much the data moved beliefs from the prior, indicating data informativeness.

📊 Example: ROI Forest Plot with Decision Regions

This visualization makes decisions clear at a glance: Search and TV are confidently profitable (act), Display is uncertain (validate), and Print is confidently unprofitable (cut or reallocate).

Design Principle

Every visualization in the framework shows uncertainty by default. Hiding uncertainty requires explicit opt-out, not opt-in. This ensures that honest measurement is the path of least resistance.

Summary: Key Principles

For Analysts

  • Pre-specify models before seeing results
  • Report credible intervals, not just point estimates
  • Investigate unexpected results; don't iterate until they disappear
  • Recommend experiments when uncertainty is high
  • Use sensitivity analysis to show robustness (or lack thereof)

For Clients

  • Statistics quantifies uncertainty; it doesn't eliminate it
  • Credible intervals tell you how confident to be in estimates
  • Wide intervals mean "run an experiment before acting"
  • Narrow intervals mean "act with confidence"
  • Ask for uncertainty bounds on any recommendation

Red Flags in Any Methodology

  • Point estimates without uncertainty bounds
  • "We adjusted the model until the results made sense"
  • Negative coefficients that were "fixed"
  • No discussion of validation or out-of-sample testing
  • Year-over-year changes presented as real when they're within uncertainty