Variable Selection in MMM

Bayesian variable selection can help identify which control variables matter—but it requires careful application to avoid undermining your causal estimates.

⚠️ Critical Warning: Misuse Can Bias Your Results

Variable selection is not a general-purpose model improvement technique. When applied to the wrong variables, it can systematically bias your media effect estimates—often in ways that make results look better while making them less accurate.

Understanding the Causal Structure

The key to using variable selection correctly is understanding the causal relationships in your data. Some variables only affect your outcome (precision controls), while others affect both your treatment and outcome (confounders).

✅ Precision Control (Safe to Select)

Weather Media Sales Weather → Sales only

Weather affects sales but doesn't influence media spend decisions. Safe to shrink.

❌ Confounder (Never Select)

Distribution Media Sales Distribution → Media AND Sales

Distribution affects both media spend and sales. Shrinking biases media effects.

The Confounder Problem Visualized

When you shrink a confounder toward zero, you don't remove its influence—you just reassign it to your media variables. This inflates media effects and leads to overconfident ROI estimates.

Interactive: See What Happens When You Shrink a Confounder

0%

Variable Classification Guide

Before applying variable selection, classify every potential control variable. This classification should be done before looking at model results.

Variable Type Examples Selection? Reason
Precision Controls Weather, gas prices, minor holidays, local events ✓ OK Affect outcome only, not media allocation
Confounders Distribution/ACV, price, competitor media ✗ Never Affect both media AND outcome—shrinking biases estimates
Core Structure Trend, seasonality, intercept ✗ Never Fundamental model components, always required
Mediators Brand awareness, consideration ✗ Never On causal path—if included, use standard priors (never shrink)
Media Variables TV spend, digital spend, etc. ✗ Never These are your treatments—never shrink toward zero

💡 The Key Question

Ask yourself: "Does this variable influence how we allocate our media budget?" If yes, it's likely a confounder and must be excluded from selection.

Why This Beats Specification Shopping

Traditional MMM practice often involves running many model specifications and selecting the one with "sensible" results. This specification shopping destroys statistical validity. Bayesian variable selection provides a principled alternative—but only when variables are correctly classified before analysis.

❌ Specification Shopping

  1. Run model with all variables
  2. See negative media coefficient 😟
  3. Remove variables until coefficient is positive
  4. Justify removals post-hoc as "not significant"
  5. Report the "good" model only

Problem: You're painting targets around arrows. The "good" results are an artifact of selection, not evidence of true effects.

✅ Principled Bayesian Selection

  1. Classify all variables before seeing data
  2. Protect confounders & mediators from selection
  3. Apply selection only to precision controls
  4. Run model once with pre-specified structure
  5. Report full posterior with uncertainty

Benefit: Valid inference. Uncertainty is quantified honestly. Results can be trusted for decision-making.

⚠️ The Transformation Shopping Trap: "Optimizing" Adstock, Lag, and Saturation

A particularly insidious form of specification shopping occurs when analysts iterate through adstock decay rates, lag structures, and saturation parameters until media coefficients become positive with high t-statistics. This practice is common but deeply problematic.

❌ The "Optimization" Loop

  1. Fit model with initial transformation parameters
  2. Media coefficient is negative or insignificant 😟
  3. Try different adstock decay (1.0 → 0.9 → 0.8 → 0.7...)
  4. Try different lag structure (0 weeks → 2 weeks → 4 weeks...)
  5. Try different saturation curve (adjust λ, K, S...)
  6. Stop when coefficient is positive with t > 1.0 🎯
  7. Report the t-stat and p-value as if this was the only model
This Is a Machine for Producing Results—Not Learning

When you cycle through dozens of transformation specifications until one produces a positive, "significant" coefficient, you haven't discovered the true effect—you've built a result-producing machine. The process guarantees you'll find something that looks good, regardless of what's actually in the data.

The fundamental problem: you're learning nothing about your data. The "significant" result is an artifact of the search process, not evidence of a true effect. You could run this same procedure on pure noise and eventually find a specification that passes your threshold.

The False Transparency of Reported Statistics

Reporting the t-statistic and p-value from your final "winning" specification creates a false appearance of rigor. The statistics are mathematically correct for that particular model—but they're meaningless for inference because they don't account for the selection process.

What it looks like: "TV coefficient = 0.15, t = 2.34, p = 0.019"
What actually happened: 47 specifications were tested. This was the first one where TV was positive and significant. The other 46 were quietly discarded.

This isn't transparency—it's the opposite of transparency. You're presenting the statistics from one specification while hiding the search that led to it. A truly transparent report would acknowledge: "We tested 47 transformation specifications. Here's what we found across all of them." (Of course, that report would reveal the fragility of the result.)

Two Problems: Inflated False Positives AND Biased Coefficients

Selecting on t-statistics creates two distinct problems, not one:

1. False Positive Inflation

Testing many specifications and keeping the "winner" inflates your false positive rate far beyond the nominal α = 0.05. With 50 specs tested, you have a 92% chance of finding something "significant" even when there's no true effect.

2. Upward Coefficient Bias (Winner's Curse)

Since \(t = \frac{\hat{\beta}}{SE}\), selecting for high t-statistics means selecting for coefficients that were inflated by favorable noise. The "winning" estimate systematically overstates the true effect—often dramatically.

The winner's curse is especially pernicious because even when a true effect exists, you'll overestimate it. If the true TV effect is β = 0.05, random variation across specifications might produce estimates ranging from -0.02 to +0.12. By selecting the specification with the highest t-stat, you're systematically picking estimates from the upper tail of the noise distribution.

The cruel irony: The more specifications you test, the more "significant" and the more wrong your selected result becomes. Your reported coefficient is simultaneously more confident and more biased.

The Math: How Bad Is It?

If you test k specifications and report the one with p < 0.05, your actual false positive rate is approximately:

$$P(\text{at least one false positive}) \approx 1 - (1 - 0.05)^k$$

Interactive: False Positive Rate vs. Specifications Tested

Drag the slider to see how testing more specifications inflates your actual error rate. Common MMM specification searches easily reach 50+ combinations.

20
What To Do Instead
Pre-specify transformations

Choose adstock and saturation parameters based on theory or industry benchmarks before seeing how they affect coefficients. Document this in a pre-registration.

Use Bayesian priors on transformations

Put priors on adstock/saturation parameters and let the model estimate them jointly with coefficients. Uncertainty in transformations flows through to uncertainty in effects.

Report sensitivity

If you must explore specifications, report results across all of them. Show the distribution of estimates, not just the "best" one.

Embrace uncertainty honestly

If the effect is only positive under specific transformations, that's important information—it means the effect is uncertain. Report that uncertainty.

The Confounder Problem: Why Dropping Variables Biases Effects

A classic specification shopping pattern is dropping confounder variables (like distribution) because they "reduce" media effects or have unexpected signs. But confounders affect both your media spending and your outcomes—dropping them doesn't remove their influence, it just hides it while biasing your estimates.

✓ Confounder Included (Correct)

Distribution Media Sales True Effect = 0.25 ✓ Unbiased estimate of media effect

✗ Confounder Dropped (Biased)

Distribution Media Sales "Effect" = 0.45 ⚠️ Confounding absorbed ⚠️ 80% inflated! Includes spurious correlation

📋 Important: Don't Over-Interpret Confounder Coefficients

Variables included to control for confounding should not have their coefficients interpreted as causal effects. These coefficients may themselves be biased by other omitted variables.

Economic Factor (U) Distribution Media Sales

In this DAG:

  • Distribution confounds media → we must include it
  • An unobserved economic factor (U) affects both distribution and sales
  • Distribution's coefficient captures both its true effect and the confounding from U
  • This may produce a negative or unexpected sign on distribution

The dashed paths show unobserved confounding that biases the distribution coefficient—but does not bias the media coefficient (our target).

Concrete Example: Suppose distribution shows a coefficient of −0.15 in your model. This negative sign doesn't mean "more distribution causes fewer sales." More likely:
  • An economic downturn (U) forced retailers to expand distribution to maintain volume
  • The same downturn independently reduced consumer spending
  • The model attributes this coincident decline to distribution
✓ Still Include Distribution

Even with its biased coefficient, distribution blocks the backdoor path to media, giving us unbiased media effect estimates.

✗ Don't Over-Interpret

Avoid making strategic recommendations about distribution based on its model coefficient—that coefficient is not a valid causal estimate.

⚠️ This Is What Specification Shopping Looks Like

When analysts drop distribution because "the sign doesn't make sense" or "it reduces media ROI," they're not finding the true media effect—they're creating a biased estimate that includes spurious correlation from the confounder. The inflated coefficient looks better but is wrong.

Interactive: What Happens When You Drop a Confounder

📊 Interpretation

With the confounder included, we isolate the true causal effect of media (0.25). Distribution's effect on sales is properly attributed to distribution, not media.

Mediators and Total Effects

Mediators (like brand awareness) lie on the causal path between media and sales. When you're interested in the total effect of media, you should not control for mediators—doing so would block the indirect pathway and underestimate media's true impact.

✓ Total Effect (Don't Control for Mediator)

Media Awareness Sales Total Effect = 0.40 (Direct + Indirect, both pathways open)

⚠️ Direct Effect Only (Control for Mediator)

Media Awareness CONTROLLED Sales Direct Effect Only = 0.20 (Indirect pathway blocked by controlling)

💡 When to Control for Mediators

Want total effect? Don't include mediators in the model. The coefficient on media captures both direct and indirect effects.
Want to decompose effects? Include mediators to separate direct from indirect effects—but understand you're now estimating different quantities.

The key insight: controlling for a mediator isn't "wrong"—it just answers a different question. Specification shopping happens when analysts switch between these approaches based on which gives "better" numbers.

Interactive: Total Effect vs. Direct Effect

✓ Correct Approach

To estimate the total effect of media, don't control for mediators like awareness. Your media coefficient (0.40) captures the full causal impact through all pathways.

Why Pre-Classification Matters

The fundamental difference between specification shopping and principled analysis is when decisions are made. Pre-classification commits you to a causal structure before seeing results, preventing the temptation to adjust based on what "looks right."

🎯 Confounders

Affect: Both media spending AND sales

Action: Always include with standard priors

If dropped: Media effects biased upward

🔗 Mediators

Affect: Lie on path from media to sales

Action: Exclude for total effect; include to decompose

Key: Decide BEFORE analysis which you want

🎚️ Precision Controls

Affect: Sales only, not media allocation

Action: Can apply Bayesian selection

If dropped: Increased noise, wider CIs

✓ The Pre-Registration Principle

Document your variable classifications in a pre-analysis plan before fitting any models. Include:

  • Which variables are confounders and why (cite business process)
  • Which variables are mediators and the causal pathway
  • Which variables are precision controls eligible for selection
  • Your prior beliefs about sparsity (expected number of relevant controls)

This commitment device prevents the unconscious drift toward "finding" results that confirm expectations.

How Different Priors Shrink Coefficients

Different Bayesian priors create different shrinkage profiles. Understanding these helps you choose the right method for your problem.

🐴

Regularized Horseshoe

Best for: Sparse signals

Strong shrinkage of small effects, minimal shrinkage of large effects. Good when few controls matter.

📊

Spike-and-Slab

Best for: Clear selection

Explicit inclusion probabilities. Variables are "in" or "out" with posterior probabilities.

📐

Bayesian LASSO

Best for: Many small effects

Uniform shrinkage across coefficients. Good when many controls have small effects.

🇫🇮

Finnish Horseshoe

Best for: Bounded effects

Regularized horseshoe with slab component. Prevents unrealistically large coefficients.

Understanding Posterior Inclusion Probabilities

Bayesian variable selection doesn't make hard yes/no decisions. Instead, it produces posterior inclusion probabilities (PIPs)—the probability that each variable has a non-zero effect.

Example: Control Variable Selection Results

Higher bars indicate stronger evidence that the variable has a real effect.

How to Interpret These Results

  • Weather (Rain Days) — PIP: 0.92 Strong evidence
  • Holiday Indicator — PIP: 0.87 Strong evidence
  • Local Sports Events — PIP: 0.64 Moderate evidence
  • Gas Price — PIP: 0.31 Weak evidence
  • Unemployment Rate — PIP: 0.18 Little evidence
  • Stock Market Index — PIP: 0.08 Effectively excluded

Relationship: Effect Size vs. Inclusion Probability

Variables with larger estimated effects tend to have higher inclusion probabilities, but uncertainty matters too.

Prior Sensitivity Analysis

Your choice of "expected number of relevant variables" affects results. Always check sensitivity to this assumption.

Interactive: How Prior Sparsity Affects PIPs

3

⚠️ If Results Change Dramatically

If your conclusions are highly sensitive to the prior sparsity assumption, this indicates weak identification. Report this uncertainty rather than picking the "best looking" result.

Best Practices

Before Fitting the Model

📋 Classify All Variables

Document which variables are confounders vs. precision controls before seeing any results.

🎯 Set Priors Thoughtfully

The "expected number of relevant controls" should reflect domain knowledge, not be tuned for fit.

🚫 Exclude Confounders

Explicitly exclude all confounders from variable selection. They need standard priors.

After Fitting the Model

📊 Report PIPs

Show full posterior inclusion probabilities, not just binary "selected" decisions.

🔄 Check Sensitivity

Vary prior sparsity and see how results change. Report this uncertainty.

⚖️ Don't Overinterpret

Low PIP ≠ proof of no effect. It means the data doesn't provide strong evidence.

⚠️ What NOT to Do

  • Don't tune hyperparameters to improve model fit or coefficient signs
  • Don't apply selection to confounders, mediators, or media variables
  • Don't use selection as a substitute for thoughtful model specification
  • Don't run selection, see "excluded" variables, remove them and refit without selection
View Technical Details →