Variable Selection | MMM Framework

Variable Type	Examples	Selection?	Reason
Precision Controls	Weather, gas prices, minor holidays, local events	✓ OK	Affect outcome only, not media allocation
Confounders	Distribution/ACV, price, competitor media	✗ Never	Affect both media AND outcome—shrinking biases estimates
Core Structure	Trend, seasonality, intercept	✗ Never	Fundamental model components, always required
Mediators	Brand awareness, consideration	✗ Never	On causal path—if included, use standard priors (never shrink)
Media Variables	TV spend, digital spend, etc.	✗ Never	These are your treatments—never shrink toward zero

Why This Beats Specification Shopping

Traditional MMM practice often involves running many model specifications and selecting the one with "sensible" results. This specification shopping destroys statistical validity. Bayesian variable selection provides a principled alternative—but only when variables are correctly classified before analysis.

❌ Specification Shopping

Run model with all variables
See negative media coefficient 😟
Remove variables until coefficient is positive
Justify removals post-hoc as "not significant"
Report the "good" model only

Problem: You're painting targets around arrows. The "good" results are an artifact of selection, not evidence of true effects.

✅ Principled Bayesian Selection

Classify all variables before seeing data
Protect confounders & mediators from selection
Apply selection only to precision controls
Run model once with pre-specified structure
Report full posterior with uncertainty

Benefit: Valid inference. Uncertainty is quantified honestly. Results can be trusted for decision-making.

⚠️ The Transformation Shopping Trap: "Optimizing" Adstock, Lag, and Saturation

A particularly insidious form of specification shopping occurs when analysts iterate through adstock decay rates, lag structures, and saturation parameters until media coefficients become positive with high t-statistics. This practice is common but deeply problematic.

❌ The "Optimization" Loop

Fit model with initial transformation parameters
Media coefficient is negative or insignificant 😟
Try different adstock decay (1.0 → 0.9 → 0.8 → 0.7...)
Try different lag structure (0 weeks → 2 weeks → 4 weeks...)
Try different saturation curve (adjust λ, K, S...)
Stop when coefficient is positive with t > 1.0 🎯
Report the t-stat and p-value as if this was the only model

This Is a Machine for Producing Results—Not Learning

When you cycle through dozens of transformation specifications until one produces a positive, "significant" coefficient, you haven't discovered the true effect—you've built a result-producing machine. The process guarantees you'll find something that looks good, regardless of what's actually in the data.

The fundamental problem: you're learning nothing about your data. The "significant" result is an artifact of the search process, not evidence of a true effect. You could run this same procedure on pure noise and eventually find a specification that passes your threshold.

The False Transparency of Reported Statistics

Reporting the t-statistic and p-value from your final "winning" specification creates a false appearance of rigor. The statistics are mathematically correct for that particular model—but they're meaningless for inference because they don't account for the selection process.

What it looks like: "TV coefficient = 0.15, t = 2.34, p = 0.019"
What actually happened: 47 specifications were tested. This was the first one where TV was positive and significant. The other 46 were quietly discarded.

This isn't transparency—it's the opposite of transparency. You're presenting the statistics from one specification while hiding the search that led to it. A truly transparent report would acknowledge: "We tested 47 transformation specifications. Here's what we found across all of them." (Of course, that report would reveal the fragility of the result.)

Two Problems: Inflated False Positives AND Biased Coefficients

Selecting on t-statistics creates two distinct problems, not one:

1. False Positive Inflation

Testing many specifications and keeping the "winner" inflates your false positive rate far beyond the nominal α = 0.05. With 50 specs tested, you have a 92% chance of finding something "significant" even when there's no true effect.

2. Upward Coefficient Bias (Winner's Curse)

Since \(t = \frac{\hat{\beta}}{SE}\), selecting for high t-statistics means selecting for coefficients that were inflated by favorable noise. The "winning" estimate systematically overstates the true effect—often dramatically.

The winner's curse is especially pernicious because even when a true effect exists, you'll overestimate it. If the true TV effect is β = 0.05, random variation across specifications might produce estimates ranging from -0.02 to +0.12. By selecting the specification with the highest t-stat, you're systematically picking estimates from the upper tail of the noise distribution.

The cruel irony: The more specifications you test, the more "significant" and the more wrong your selected result becomes. Your reported coefficient is simultaneously more confident and more biased.

The Math: How Bad Is It?

If you test k specifications and report the one with p < 0.05, your actual false positive rate is approximately:

                    $$P(\text{at least one false positive}) \approx 1 - (1 - 0.05)^k$$
                

Interactive: False Positive Rate vs. Specifications Tested

Drag the slider to see how testing more specifications inflates your actual error rate. Common MMM specification searches easily reach 50+ combinations.

Specifications tested (k): 20

Nominal α level:

What To Do Instead

Pre-specify transformations

Choose adstock and saturation parameters based on theory or industry benchmarks before seeing how they affect coefficients. Document this in a pre-registration.

Use Bayesian priors on transformations

Put priors on adstock/saturation parameters and let the model estimate them jointly with coefficients. Uncertainty in transformations flows through to uncertainty in effects.

Report sensitivity

If you must explore specifications, report results across all of them. Show the distribution of estimates, not just the "best" one.

Embrace uncertainty honestly

If the effect is only positive under specific transformations, that's important information—it means the effect is uncertain. Report that uncertainty.

The Confounder Problem: Why Dropping Variables Biases Effects

A classic specification shopping pattern is dropping confounder variables (like distribution) because they "reduce" media effects or have unexpected signs. But confounders affect both your media spending and your outcomes—dropping them doesn't remove their influence, it just hides it while biasing your estimates.

✓ Confounder Included (Correct)

✗ Confounder Dropped (Biased)

📋 Important: Don't Over-Interpret Confounder Coefficients

Variables included to control for confounding should not have their coefficients interpreted as causal effects. These coefficients may themselves be biased by other omitted variables.

In this DAG:

Distribution confounds media → we must include it
An unobserved economic factor (U) affects both distribution and sales
Distribution's coefficient captures both its true effect and the confounding from U
This may produce a negative or unexpected sign on distribution

The dashed paths show unobserved confounding that biases the distribution coefficient—but does not bias the media coefficient (our target).

Concrete Example: Suppose distribution shows a coefficient of −0.15 in your model. This negative sign doesn't mean "more distribution causes fewer sales." More likely:

An economic downturn (U) forced retailers to expand distribution to maintain volume
The same downturn independently reduced consumer spending
The model attributes this coincident decline to distribution

✓ Still Include Distribution

Even with its biased coefficient, distribution blocks the backdoor path to media, giving us unbiased media effect estimates.

✗ Don't Over-Interpret

Avoid making strategic recommendations about distribution based on its model coefficient—that coefficient is not a valid causal estimate.

⚠️ This Is What Specification Shopping Looks Like

When analysts drop distribution because "the sign doesn't make sense" or "it reduces media ROI," they're not finding the true media effect—they're creating a biased estimate that includes spurious correlation from the confounder. The inflated coefficient looks better but is wrong.

Interactive: What Happens When You Drop a Confounder

Include distribution confounder:

Yes (Correct) No (Biased)

📊 Interpretation

With the confounder included, we isolate the true causal effect of media (0.25). Distribution's effect on sales is properly attributed to distribution, not media.

Mediators and Total Effects

Mediators (like brand awareness) lie on the causal path between media and sales. When you're interested in the total effect of media, you should not control for mediators—doing so would block the indirect pathway and underestimate media's true impact.

✓ Total Effect (Don't Control for Mediator)

⚠️ Direct Effect Only (Control for Mediator)

💡 When to Control for Mediators

Want total effect? Don't include mediators in the model. The coefficient on media captures both direct and indirect effects.
Want to decompose effects? Include mediators to separate direct from indirect effects—but understand you're now estimating different quantities.

The key insight: controlling for a mediator isn't "wrong"—it just answers a different question. Specification shopping happens when analysts switch between these approaches based on which gives "better" numbers.

Interactive: Total Effect vs. Direct Effect

What do you want to estimate?

Total Effect Direct Effect Only

✓ Correct Approach

To estimate the total effect of media, don't control for mediators like awareness. Your media coefficient (0.40) captures the full causal impact through all pathways.

Why Pre-Classification Matters

The fundamental difference between specification shopping and principled analysis is when decisions are made. Pre-classification commits you to a causal structure before seeing results, preventing the temptation to adjust based on what "looks right."

🎯 Confounders

Affect: Both media spending AND sales

Action: Always include with standard priors

If dropped: Media effects biased upward

🔗 Mediators

Affect: Lie on path from media to sales

Action: Exclude for total effect; include to decompose

Key: Decide BEFORE analysis which you want

🎚️ Precision Controls

Affect: Sales only, not media allocation

Action: Can apply Bayesian selection

If dropped: Increased noise, wider CIs

✓ The Pre-Registration Principle

Document your variable classifications in a pre-analysis plan before fitting any models. Include:

Which variables are confounders and why (cite business process)
Which variables are mediators and the causal pathway
Which variables are precision controls eligible for selection
Your prior beliefs about sparsity (expected number of relevant controls)

This commitment device prevents the unconscious drift toward "finding" results that confirm expectations.

Variable Selection in MMM

⚠️ Critical Warning: Misuse Can Bias Your Results

Understanding the Causal Structure

✅ Precision Control (Safe to Select)

❌ Confounder (Never Select)

The Confounder Problem Visualized

Interactive: See What Happens When You Shrink a Confounder

⚠️ Bias Detected

Variable Classification Guide

💡 The Key Question

Why This Beats Specification Shopping

❌ Specification Shopping

✅ Principled Bayesian Selection

⚠️ The Transformation Shopping Trap: "Optimizing" Adstock, Lag, and Saturation

❌ The "Optimization" Loop

This Is a Machine for Producing Results—Not Learning

The False Transparency of Reported Statistics

Two Problems: Inflated False Positives AND Biased Coefficients

The Math: How Bad Is It?

Interactive: False Positive Rate vs. Specifications Tested

What To Do Instead

The Confounder Problem: Why Dropping Variables Biases Effects

✓ Confounder Included (Correct)

✗ Confounder Dropped (Biased)

📋 Important: Don't Over-Interpret Confounder Coefficients

⚠️ This Is What Specification Shopping Looks Like

Interactive: What Happens When You Drop a Confounder

📊 Interpretation

Mediators and Total Effects

✓ Total Effect (Don't Control for Mediator)

⚠️ Direct Effect Only (Control for Mediator)

💡 When to Control for Mediators

Interactive: Total Effect vs. Direct Effect

✓ Correct Approach

Why Pre-Classification Matters

🎯 Confounders

🔗 Mediators

🎚️ Precision Controls

✓ The Pre-Registration Principle

How Different Priors Shrink Coefficients

Regularized Horseshoe

Spike-and-Slab

Bayesian LASSO

Finnish Horseshoe

Understanding Posterior Inclusion Probabilities

Example: Control Variable Selection Results

How to Interpret These Results

Relationship: Effect Size vs. Inclusion Probability

Prior Sensitivity Analysis

Interactive: How Prior Sparsity Affects PIPs

⚠️ If Results Change Dramatically

Best Practices

Before Fitting the Model

📋 Classify All Variables

🎯 Set Priors Thoughtfully

🚫 Exclude Confounders

After Fitting the Model

📊 Report PIPs

🔄 Check Sensitivity

⚖️ Don't Overinterpret

⚠️ What NOT to Do