Bayesian variable selection can help identify which control variables matter—but it requires careful application to avoid undermining your causal estimates.
Variable selection is not a general-purpose model improvement technique. When applied to the wrong variables, it can systematically bias your media effect estimates—often in ways that make results look better while making them less accurate.
The key to using variable selection correctly is understanding the causal relationships in your data. Some variables only affect your outcome (precision controls), while others affect both your treatment and outcome (confounders).
Weather affects sales but doesn't influence media spend decisions. Safe to shrink.
Distribution affects both media spend and sales. Shrinking biases media effects.
When you shrink a confounder toward zero, you don't remove its influence—you just reassign it to your media variables. This inflates media effects and leads to overconfident ROI estimates.
Before applying variable selection, classify every potential control variable. This classification should be done before looking at model results.
| Variable Type | Examples | Selection? | Reason |
|---|---|---|---|
| Precision Controls | Weather, gas prices, minor holidays, local events | ✓ OK | Affect outcome only, not media allocation |
| Confounders | Distribution/ACV, price, competitor media | ✗ Never | Affect both media AND outcome—shrinking biases estimates |
| Core Structure | Trend, seasonality, intercept | ✗ Never | Fundamental model components, always required |
| Mediators | Brand awareness, consideration | ✗ Never | On causal path—if included, use standard priors (never shrink) |
| Media Variables | TV spend, digital spend, etc. | ✗ Never | These are your treatments—never shrink toward zero |
Ask yourself: "Does this variable influence how we allocate our media budget?" If yes, it's likely a confounder and must be excluded from selection.
Traditional MMM practice often involves running many model specifications and selecting the one with "sensible" results. This specification shopping destroys statistical validity. Bayesian variable selection provides a principled alternative—but only when variables are correctly classified before analysis.
Problem: You're painting targets around arrows. The "good" results are an artifact of selection, not evidence of true effects.
Benefit: Valid inference. Uncertainty is quantified honestly. Results can be trusted for decision-making.
A particularly insidious form of specification shopping occurs when analysts iterate through adstock decay rates, lag structures, and saturation parameters until media coefficients become positive with high t-statistics. This practice is common but deeply problematic.
When you cycle through dozens of transformation specifications until one produces a positive, "significant" coefficient, you haven't discovered the true effect—you've built a result-producing machine. The process guarantees you'll find something that looks good, regardless of what's actually in the data.
The fundamental problem: you're learning nothing about your data. The "significant" result is an artifact of the search process, not evidence of a true effect. You could run this same procedure on pure noise and eventually find a specification that passes your threshold.
Reporting the t-statistic and p-value from your final "winning" specification creates a false appearance of rigor. The statistics are mathematically correct for that particular model—but they're meaningless for inference because they don't account for the selection process.
What it looks like: "TV coefficient = 0.15, t = 2.34, p = 0.019"
What actually happened: 47 specifications were tested. This was the
first one where TV was positive and significant. The other 46 were quietly discarded.
This isn't transparency—it's the opposite of transparency. You're presenting the statistics from one specification while hiding the search that led to it. A truly transparent report would acknowledge: "We tested 47 transformation specifications. Here's what we found across all of them." (Of course, that report would reveal the fragility of the result.)
Selecting on t-statistics creates two distinct problems, not one:
Testing many specifications and keeping the "winner" inflates your false positive rate far beyond the nominal α = 0.05. With 50 specs tested, you have a 92% chance of finding something "significant" even when there's no true effect.
Since \(t = \frac{\hat{\beta}}{SE}\), selecting for high t-statistics means selecting for coefficients that were inflated by favorable noise. The "winning" estimate systematically overstates the true effect—often dramatically.
The winner's curse is especially pernicious because even when a true effect exists, you'll overestimate it. If the true TV effect is β = 0.05, random variation across specifications might produce estimates ranging from -0.02 to +0.12. By selecting the specification with the highest t-stat, you're systematically picking estimates from the upper tail of the noise distribution.
The cruel irony: The more specifications you test, the more "significant" and the more wrong your selected result becomes. Your reported coefficient is simultaneously more confident and more biased.
If you test k specifications and report the one with p < 0.05, your actual false positive rate is approximately:
Drag the slider to see how testing more specifications inflates your actual error rate. Common MMM specification searches easily reach 50+ combinations.
Choose adstock and saturation parameters based on theory or industry benchmarks before seeing how they affect coefficients. Document this in a pre-registration.
Put priors on adstock/saturation parameters and let the model estimate them jointly with coefficients. Uncertainty in transformations flows through to uncertainty in effects.
If you must explore specifications, report results across all of them. Show the distribution of estimates, not just the "best" one.
If the effect is only positive under specific transformations, that's important information—it means the effect is uncertain. Report that uncertainty.
A classic specification shopping pattern is dropping confounder variables (like distribution) because they "reduce" media effects or have unexpected signs. But confounders affect both your media spending and your outcomes—dropping them doesn't remove their influence, it just hides it while biasing your estimates.
Variables included to control for confounding should not have their coefficients interpreted as causal effects. These coefficients may themselves be biased by other omitted variables.
In this DAG:
The dashed paths show unobserved confounding that biases the distribution coefficient—but does not bias the media coefficient (our target).
Even with its biased coefficient, distribution blocks the backdoor path to media, giving us unbiased media effect estimates.
Avoid making strategic recommendations about distribution based on its model coefficient—that coefficient is not a valid causal estimate.
When analysts drop distribution because "the sign doesn't make sense" or "it reduces media ROI," they're not finding the true media effect—they're creating a biased estimate that includes spurious correlation from the confounder. The inflated coefficient looks better but is wrong.
With the confounder included, we isolate the true causal effect of media (0.25). Distribution's effect on sales is properly attributed to distribution, not media.
Mediators (like brand awareness) lie on the causal path between media and sales. When you're interested in the total effect of media, you should not control for mediators—doing so would block the indirect pathway and underestimate media's true impact.
Want total effect? Don't include mediators in the model. The coefficient on media captures both direct and indirect effects.
Want to decompose effects? Include mediators to separate direct from indirect effects—but understand you're now estimating different quantities.
The key insight: controlling for a mediator isn't "wrong"—it just answers a different question. Specification shopping happens when analysts switch between these approaches based on which gives "better" numbers.
To estimate the total effect of media, don't control for mediators like awareness. Your media coefficient (0.40) captures the full causal impact through all pathways.
The fundamental difference between specification shopping and principled analysis is when decisions are made. Pre-classification commits you to a causal structure before seeing results, preventing the temptation to adjust based on what "looks right."
Affect: Both media spending AND sales
Action: Always include with standard priors
If dropped: Media effects biased upward
Affect: Lie on path from media to sales
Action: Exclude for total effect; include to decompose
Key: Decide BEFORE analysis which you want
Affect: Sales only, not media allocation
Action: Can apply Bayesian selection
If dropped: Increased noise, wider CIs
Document your variable classifications in a pre-analysis plan before fitting any models. Include:
This commitment device prevents the unconscious drift toward "finding" results that confirm expectations.
Different Bayesian priors create different shrinkage profiles. Understanding these helps you choose the right method for your problem.
Best for: Sparse signals
Strong shrinkage of small effects, minimal shrinkage of large effects. Good when few controls matter.
Best for: Clear selection
Explicit inclusion probabilities. Variables are "in" or "out" with posterior probabilities.
Best for: Many small effects
Uniform shrinkage across coefficients. Good when many controls have small effects.
Best for: Bounded effects
Regularized horseshoe with slab component. Prevents unrealistically large coefficients.
Bayesian variable selection doesn't make hard yes/no decisions. Instead, it produces posterior inclusion probabilities (PIPs)—the probability that each variable has a non-zero effect.
Higher bars indicate stronger evidence that the variable has a real effect.
Variables with larger estimated effects tend to have higher inclusion probabilities, but uncertainty matters too.
Your choice of "expected number of relevant variables" affects results. Always check sensitivity to this assumption.
If your conclusions are highly sensitive to the prior sparsity assumption, this indicates weak identification. Report this uncertainty rather than picking the "best looking" result.
Document which variables are confounders vs. precision controls before seeing any results.
The "expected number of relevant controls" should reflect domain knowledge, not be tuned for fit.
Explicitly exclude all confounders from variable selection. They need standard priors.
Show full posterior inclusion probabilities, not just binary "selected" decisions.
Vary prior sparsity and see how results change. Report this uncertainty.
Low PIP ≠ proof of no effect. It means the data doesn't provide strong evidence.