Modeling Guide | MMM Framework

Modeling Philosophy

A good model is not one that produces the results you want. A good model is one whose predictions you can trust—including when those predictions are uncomfortable.

This guide walks you through a complete, pre-specified modeling workflow using the MMM Framework. Each step includes both the conceptual reasoning and the code to implement it. The goal is not just to produce a model, but to produce a model you can defend.

The workflow is causal at its core. Every variable enters with a declared role—a confounder the model must adjust for (holiday demand lifts both spend and sales), a precision control that sharpens estimates without biasing them, or a mediator that carries the very effect you are measuring—and the specification is locked in before results are seen. That combination is what lets the model answer “what did media cause” rather than “what correlates best.”

The Core Principle

Every modeling decision should be made before seeing results, or if made after, it should be clearly documented as post hoc and its effect on inference acknowledged. This is not a limitation—it is what separates measurement from storytelling.

📜 The identification contract

The workflow below holds up seven identification assumptions—most of them untestable. They are stated formally in one place, each priced against the pressure-testing scorecard: the identification contract.

Workflow Overview

The modeling process has four phases. Each phase has specific outputs and checkpoints.

Plan

Define questions, variables, and specification before seeing data

Build

Prepare data, configure model, check priors

Validate

Fit, diagnose, test predictions, assess sensitivity

Report

Extract insights and communicate with honest uncertainty

🗂️ Printable engagement artifacts

This workflow ships as two one-pagers: the MMM Diagnostic Checklist (EDA pre-flight, symptom→action table, fix ladder) and the 10–12 week Engagement Timeline that maps these phases onto a client engagement.

Phase 1: Plan
Define the Business Question

Before touching data or code, clearly articulate what you are trying to learn. Different questions require different models and different levels of rigor.

🎯

Attribution Questions

“How much of our sales are driven by each media channel?” This requires the standard BayesianMMM and careful handling of confounders.

💰

Optimization Questions

“How should we reallocate our $10M budget across channels?” This requires attribution plus saturation curve estimation and honest uncertainty propagation.

🔁

Mechanism Questions

“Does TV drive sales through awareness or directly?” This requires the NestedMMM with mediation pathways.

📚

Portfolio Questions

“Does promoting our multipack cannibalize single-pack sales?” This requires the MultivariateMMM with cross-effects.

Write It Down

Document your business question in a brief that specifies: (1) what outcome you are measuring, (2) what decisions the model will inform, (3) what level of uncertainty is acceptable for those decisions, and (4) what validation data is available.

Identify Variables Using Causal Reasoning

Variable selection in MMM is a causal reasoning task, not a statistical one. The question is not “which variables improve model fit” but “which variables must be included for the media effect estimates to be unbiased.”

List your treatment variables (media channels)

These are the variables whose effects you want to estimate. They are never optional—every channel you want to measure must be in the model.

Identify confounders

Variables that affect both media spending and your outcome. Common examples: seasonality, promotions, economic indicators. Omitting confounders biases media estimates. These are never candidates for variable selection. One distinction on holidays: structural holidays (e.g., Christmas) belong in the model’s seasonality/calendar structure, while minor or movable promotional holidays may instead enter as selection-eligible controls.

Identify precision controls

Variables that affect only your outcome, not media spending. Weather, supply disruptions, and sometimes competitor actions. Including these reduces noise but omitting them does not bias media estimates. These can be candidates for Bayesian variable selection. Competitor media is the canonical conditional case: if it influences both your spend decisions (defensive budgeting) and your sales, it is a confounder and must always be included; if it only moves sales, it is a precision control and selection-eligible.

Identify potential mediators

Variables on the causal pathway between media and your outcome (awareness, consideration, search volume). Never control for mediators in a standard model unless you are explicitly modeling mediation with a NestedMMM.

Identify potential colliders

Variables caused by both media and your outcome (e.g., brand tracking scores that reflect both ad exposure and purchase behavior). Never include colliders—they introduce bias.

The Critical Distinction

Variable selection should only be applied to precision controls. Confounders must always be included. Mediators and colliders must never be included (in a standard model). Getting this wrong is a form of specification shopping.

See the Variable Selection and Causal Inference guides for detailed treatment.

Pre-Register Your Specification

Before fitting the model, document the following decisions. This is your analysis plan—and the lock against specification shopping. Once variable roles, priors, and structure are written down, there is no quiet path to a more flattering model later: any change must be documented, justified, and reported alongside the original.

# analysis_plan.py — Document this BEFORE fitting
"""
Analysis Plan: Q4 2025 Media Effectiveness
==========================================
Business Question: What is the incremental ROI of each media channel?
Decision: Inform 2026 budget allocation

Outcome Variable: Weekly national sales ($)
Media Channels:
  - TV (national, geometric adstock, L_max=8)
  - Digital Display (national, geometric adstock, L_max=4)
  - Paid Search (national, geometric adstock, L_max=2)

Confounders (always included):
  - Price index
  - Distribution (% ACV)
  - Holiday indicator
  - Seasonality (Fourier, order 2)

Precision Controls (Bayesian variable selection):
  - Weather (heating degree days)
  - Competitor A spend

Trend: Linear
Sampler: NumPyro, 4 chains, 2000 draws, 1000 tune

Validation: Compare TV ROI posterior to Q3 geo lift test
            (TV ROI from geo test: 1.1-1.5, 90% CI)

Decision Criteria:
  - If TV 80% CI includes geo-test estimate -> validated
  - If TV 80% CI does not overlap -> investigate discrepancy
"""

Phase 2: Build
Prepare Your Data

Good data preparation is the foundation of a trustworthy model. The MMM Framework uses the Master Flat File (MFF) format for consistency and flexibility.

import pandas as pd
from mmm_framework import (
    ControlVariableConfigBuilder,
    MFFConfigBuilder,
    load_mff,
)

# Configure data structure per analysis plan
mff_config = (
    MFFConfigBuilder()
    .with_kpi_name("Sales")
    .add_national_media("TV", adstock_lmax=8)
    .add_national_media("Digital_Display", adstock_lmax=4)
    .add_national_media("Paid_Search", adstock_lmax=2)
    .add_price_control("Price_Index")          # convenience: allows a negative sign
    .add_distribution_control("Distribution")  # convenience: positive-only
    .add_control(ControlVariableConfigBuilder("Holiday").national().build())
    .build()
)

# Load and validate
panel = load_mff(pd.read_csv("data/weekly_mff.csv"), mff_config)

# Data quality checks (plain pandas on the panel's aligned frames)
print(f"Date range: {panel.coords.periods.min()} to {panel.coords.periods.max()}")
print(f"Observations: {panel.n_obs}")
print(f"Missing media values:\n{panel.X_media.isna().sum()}")
print(f"Missing control values:\n{panel.X_controls.isna().sum()}")
print(f"\nZero-spend share by channel:")
print((panel.X_media == 0).mean())  # flag channels with little variation
print(panel.summary())              # overall panel summary

Data Quality Checklist

At least 2 years of weekly data (104+ observations) for stable estimation
No media channels with >50% zero-spend weeks (insufficient variation)
Control variables present for the entire time series
No obvious data errors (negative spend, impossible values)
Time series is at consistent frequency (weekly/daily)

For a fuller pre-fit screen—automated validation rules, missingness maps, and outlier detection—use the dedicated data-quality module: mmm_framework.eda (load_eda_panel, validate_dataset, missingness_matrix, detect_outliers).

Configure the Model

Model configuration follows directly from your analysis plan. Every choice here should trace back to a pre-specified decision.

from mmm_framework import (
    ModelConfigBuilder,
    TrendConfig,
    TrendType,
    BayesianMMM,
)

# Model inference configuration
model_config = (
    ModelConfigBuilder()
    .bayesian_numpyro()       # JAX-based sampler for speed
    .with_chains(4)           # 4 chains for convergence diagnostics
    .with_draws(2000)         # 2000 posterior draws per chain
    .with_tune(1000)          # 1000 warmup iterations
    .with_target_accept(0.9)  # Target acceptance rate
    .build()
)

# Trend configuration (from analysis plan: linear)
trend_config = TrendConfig(
    type=TrendType.LINEAR,
    growth_prior_sigma=0.1
)

# Build the model
mmm = BayesianMMM(panel, model_config, trend_config)

# Verify model structure matches analysis plan
print("Model parameters:")
for var in mmm.model.free_RVs:
    print(f"  {var.name}")

Functional form: additive or multiplicative

By default the model is additive: sales = baseline + Σ channel contributions. The alternative is a multiplicative (semi-log) form, ModelConfigBuilder().multiplicative(), which models log(sales) with each channel entering as the same saturating curve (see saturation) but acting as a percent lift on sales rather than a dollar addition:

log(sales) = baseline + Σ_c β_c · sat_c(spend_c)

Because saturation is bounded in [0, 1] with sat(0) = 0, a channel's lift runs from 1× at zero spend (a finite baseline) up to exp(β_c) at full saturation. So β_c is the channel's maximum log-lift, and the model reports max_pct_lift_<channel> = exp(β)−1 directly — e.g. a max lift of 0.30 means "at saturation this channel lifts sales 30%."

Use additive (the default) for most MMMs — it gives per-channel dollar contributions, marginal ROAS, and experiment calibration.
Use multiplicative when effects are better thought of as percent lifts that compound (common in demand-driven categories) — it keeps the saturation curves, needs a strictly positive KPI (it models logs), and handles zero-spend / flighted weeks fine.

The decomposition still works: because the multiplicative form has a finite media-off baseline, compute_component_decomposition() returns an exact waterfall via the log-mean (LMDI) index — each channel's original-scale contribution is its share of the log-lift, and the components sum to the fitted sales.

Two surfaces stay additive-only

The in-graph marginal-ROAS table and experiment calibration are computed on the additive scale and raise a clear error under the multiplicative form rather than return a wrong number; use compute_counterfactual_contributions() / compute_channel_roi() (which diff back-transformed predictions) for original-scale channel effects and ROI.

Set Priors Thoughtfully

Priors are the mechanism for encoding domain knowledge transparently. Unlike post hoc adjustments, priors are explicit, documented, and their effect on results can be measured.

✕ Bad Prior Practice

Setting priors after seeing posteriors
Using priors to force results in a desired direction
Extremely informative priors without empirical justification
No documentation of prior choices

✓ Good Prior Practice

Setting priors before fitting based on domain knowledge
Using weakly informative priors that regularize without dominating
Documenting the source and reasoning for each prior
Running sensitivity analysis across reasonable prior ranges

from mmm_framework import (
    AdstockConfigBuilder,
    MediaChannelConfigBuilder,
    PriorConfigBuilder,
)

# Adstock decay priors: encode belief about carryover duration.
# These apply on the parametric-adstock path (see the note below).
# TV: expect longer carryover (allow data to inform)
tv_adstock = (
    AdstockConfigBuilder()
    .geometric()
    .with_max_lag(8)
    .with_alpha_prior(
        PriorConfigBuilder().beta(alpha=3, beta=1.5).build()
        # Beta(3, 1.5) -> mode at 0.8, most mass above 0.5
    )
    .build()
)

# Digital: expect shorter carryover
digital_adstock = (
    AdstockConfigBuilder()
    .geometric()
    .with_max_lag(4)
    .with_alpha_prior(
        PriorConfigBuilder().beta(alpha=2, beta=3).build()
        # Beta(2, 3) -> mode at ~0.33, most mass below 0.6
    )
    .build()
)

# Attach to the channel definitions used in the MFF config
tv_channel = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_adstock(tv_adstock)
    .with_logistic_saturation()
    .build()
)

# Document: "Multi-week TV carryover is a common industry rule of
# thumb -- treat it as a prior starting point, not evidence.
# Digital prior based on platform-reported attribution windows
# of 1-3 weeks."

Saturation shape is per-channel: .with_logistic_saturation() (the concave default), .with_hill_saturation() (S-shaped, for threshold/tipping-point response), .with_michaelis_menten_saturation() / .with_tanh_saturation() (hyperbolic elbows), and .with_root_saturation() (the classic concave power curve $x^{k}$). See the saturation math for the equations and when each fits (including why ADBUDG is the Hill form and exponential-CDF is the logistic form).

Deep diveShipped prior defaults: parametric adstock and the legacy blend opt-out

Shipped Defaults: Parametric Adstock (with a Legacy Blend Opt-Out)

Out of the box (use_parametric_adstock=True, the default since June 2026) the model estimates a continuous decay rate in-graph per channel: geometric adstock defaults to adstock_alpha_<ch> ~ Beta(1, 3), honoring each channel's alpha_prior and supporting delayed/Weibull kernels via MediaChannelConfig.adstock. The change was made because, on carryover-sensitive synthetic worlds, the pressure-testing series measured ~28% median contribution error for the previous default versus ~7% for the parametric kernel. That previous default — a blend weight adstock_<ch> ~ Beta(2, 2) over a fixed bank of geometric decays [0.0, 0.3, 0.5, 0.7, 0.9], with no continuous decay parameter — remains available via use_parametric_adstock=False for reproducing older fits. Other shipped defaults: saturation sat_lam_<ch> ~ Exponential(lam=0.5), media coefficients beta_<ch> ~ Gamma(mu=1.5, sigma=1.0) unless an roi_prior is set, and controls Normal(0, 0.5) with declared confounders given a wider Normal(0, 2).

Collinear, exchangeable channels: grouped priors (opt-in)

When several channels are near-collinear and genuinely exchangeable (e.g. five social platforms), you can partial-pool their coefficients toward a shared mean with ModelConfigBuilder().with_grouped_media_priors() — set a shared parent_channel on the members. It is off by default: only pool channels you believe are alike, because collinearity means the split is unidentified, not that the effects are similar. A channel with a calibrated ROI prior (from an experiment) is excluded from the pool, and pooled channels are disclosed in the report — their combined group effect is the reliable number, not the per-channel split. Variable Selection covers the different problem of collinear controls (shrink to zero, not pool).

Reach or frequency? Reach × frequency-saturation channels (opt-in)

For channels with reach/frequency data (TV, YouTube, programmatic), the lever is how many distinct people were reached and how often — not raw impressions. ModelConfigBuilder().with_reach_frequency(ReachFrequencyConfig( channel="TV", frequency_column="Frequency")) treats the channel's column as reach and modulates it by a frequency-saturation curve g(f) (diminishing returns to added exposures — the 3+ frequency wearout): effect = beta · sat(adstock(reach · g(frequency))). The report surfaces an effective-frequency insight ("effectiveness plateaus around N exposures — buy reach, not frequency, beyond this"). Pick exponential (diminishing from the first exposure) or hill (an S-shaped minimum-effective-frequency threshold). Off by default; identified by frequency variation that is not collinear with reach.

Price & promotion as first-class levers (opt-in)

For CPG / retail / DTC, price and promotion are decisions the client controls, and planners want an elasticity and a promo ROI — not a nuisance coefficient. ModelConfigBuilder().with_price(PriceConfig(variable="Price", reference="median")).with_promotions(PromoConfig(variable="Promo")) promotes those control columns to levers: a sign-guarded log-price elasticity (responding to discount depth vs a reference price) and a promo lift with its own carryover, reported as a separate "Price & Promotion" contribution. Caveat: price is usually endogenous (cut because demand is soft) — treat the elasticity as conditional, cross-check the identification assumptions, and confirm with an experiment.

Do channels work together? Synergy / interaction terms (opt-in)

The base model is strictly additive, so it can't say "TV makes Search work harder." Add a pair term with ModelConfigBuilder().with_channel_interactions(ChannelInteraction(channel_a="TV", channel_b="Search", expected_sign="positive")) — it fits β_ij·sat_i·sat_j with a sign-aware, shrink-to-zero prior (synergy positive, cannibalization negative), reported as a separate "Synergy" contribution. Off by default. Caveat: interactions are weakly identified without designed variation — trust the sign, not the magnitude, and run an experiment to actually pin a synergy.

Sharp spikes: holiday & event effects (opt-in)

Black Friday, Prime Day, a product launch — these are not smooth-seasonal, so the yearly Fourier basis can't represent them and they bleed into media around your highest-revenue weeks. Declare them instead of hand-building dummies: ModelConfigBuilder().with_events(EventsConfig(country="US", holidays=[...], custom_events=[...])). Each holiday/event becomes a windowed, optionally-decaying regressor with its own coefficient, fit as a separate event contribution in the decomposition — distinct from Fourier seasonality, so they don't double-count. Off by default.

Effectiveness that drifts: time-varying coefficients (opt-in)

A single scalar coefficient assumes a channel is equally effective across the whole window. When that's wrong — creative fatigue, a channel maturing, a mid-window break — set MediaChannelConfigBuilder().with_time_varying() to model log(β_t) as a smooth random walk. It's off by default and collapses to the constant-β model as its innovation scale → 0; beta_<channel> is still reported as the time-average, and the report surfaces the trajectory. Read the drift's shape, not its exact level — TVP is weakly identified against trend and seasonality, so use it only for genuinely unexplained drift.

Prior Predictive Check

Before fitting the model to data, verify that your priors produce plausible predictions. This is the most underrated step in Bayesian modeling.

# Sample from priors only (no likelihood evaluation yet)
prior = mmm.get_prior(samples=500, random_seed=42)

# y_obs is modeled on a standardized scale; y_obs_scaled is the
# deterministic back-transform to the original KPI scale
y_prior = prior.prior["y_obs_scaled"].values.flatten()

print(f"Prior predictive y range: [{y_prior.min():.0f}, {y_prior.max():.0f}]")
print(f"Actual y range: [{panel.y.min():.0f}, {panel.y.max():.0f}]")
print(f"Prior predictive y mean: {y_prior.mean():.0f}")
print(f"Actual y mean: {panel.y.mean():.0f}")

# Check: do priors produce data in the right ballpark?
# They should cover the observed range with room to spare,
# but not predict absurdities (negative sales, 10x actual)

What to Look For

Prior predictions should be plausible but vague. If the prior predictive range is [-1000, 1000] for sales that are always between 800-1200, your priors are too diffuse. If it is [950, 1050], your priors may be too informative. The sweet spot is a range like [200, 2000]—covering the data comfortably without allowing absurdities.

Phase 3: Validate
Fit the Model

# Fit with a fixed random seed for reproducibility
results = mmm.fit(random_seed=42)

n_chains = results.trace.posterior.sizes["chain"]
n_draws = results.trace.posterior.sizes["draw"]

print(f"Sampling completed.")
print(f"  Chains: {n_chains}")
print(f"  Draws per chain: {n_draws}")
print(f"  Total posterior samples: {n_chains * n_draws}")

Check Diagnostics (Non-Negotiable)

MCMC diagnostics are not optional. They tell you whether the sampler explored the posterior adequately. Interpreting results from a poorly-converged model is worse than having no model at all.

# Convergence diagnostics (results.diagnostics is filled at fit time
# with divergences, rhat_max, and ess_bulk_min)
print("=== MCMC Diagnostics ===")
diag = results.diagnostics

# 1. Divergences: should be 0
print(f"Divergences: {diag['divergences']}")
if diag['divergences'] > 0:
    print("  ACTION: Reparameterize or increase target_accept")

# 2. R-hat: should be < 1.01 for all parameters
print(f"R-hat max: {diag['rhat_max']:.4f}")
if diag['rhat_max'] > 1.01:
    print("  ACTION: Run longer chains or investigate multimodality")

# 3. ESS: should be > 400 for all parameters
# (tail ESS is not pre-computed; pull it from the trace with ArviZ)
import arviz as az
# arviz 1.x returns a DataTree (no .to_array()); dataset_extremum reduces it to a float
from mmm_framework.utils.arviz_compat import dataset_extremum
ess_tail_min = dataset_extremum(az.ess(results.trace, method="tail"), "min")
print(f"ESS bulk min: {diag['ess_bulk_min']:.0f}")
print(f"ESS tail min: {ess_tail_min:.0f}")
if diag['ess_bulk_min'] < 400:
    print("  ACTION: Run more draws")

# 4. Summary of key parameters
summary = results.summary()
print("\n=== Parameter Summary ===")
# arviz 1.x exposes an 89% equal-tailed interval as eti89_lb / eti89_ub
print(summary[["mean", "sd", "eti89_lb", "eti89_ub", "r_hat"]].to_string())

Diagnostic	Acceptable Range	If Out of Range
Divergences	0	Increase `target_accept` to 0.95+, or reparameterize
R-hat	< 1.01	Run longer chains, check for multimodality
ESS Bulk	> 400	Increase number of draws
ESS Tail	> 400	Increase number of draws (tail ESS is harder to achieve)
Tree Depth	Rarely hits max	Increase `max_treedepth`

Do Not Proceed with Bad Diagnostics

If diagnostics are poor, do not interpret the results. Fix the computational issues first. Common fixes: increase target acceptance rate, use a non-centered parameterization for hierarchical models, increase warmup iterations, or simplify the model.

Posterior Predictive Check

After fitting, compare the model’s predictions to observed data. This tests whether the model can reproduce the patterns in your data.

# Posterior predictive check via the model's predict() method
import numpy as np

pred = mmm.predict(hdi_prob=0.90, random_seed=42)

# Compare predicted vs actual (original KPI scale)
y_pred_mean = pred.y_pred_mean
y_actual = panel.y.values

# Calibration: what fraction of observations fall within the 90% HDI?
coverage = np.mean(
    (y_actual >= pred.y_pred_hdi_low) & (y_actual <= pred.y_pred_hdi_high)
)

print(f"90% interval coverage: {coverage:.1%}")
print(f"  (Target: ~90%. If much lower, model is overconfident)")
print(f"  (If much higher, model is underconfident)")

# MAPE
mape = np.mean(np.abs(y_pred_mean - y_actual) / y_actual)
print(f"MAPE: {mape:.1%}")

Out-of-Sample Backtest

The posterior predictive check above is in-sample: the model is graded on data it was fitted to, so it cannot catch an overconfident forecaster. Before trusting the model for planning, run a rolling-origin backtest: refit on an expanding training window, forecast past each cutoff, and grade against held-out actuals and naive baselines. The framework ships this as mmm_framework.validation.backtest.

# Out-of-sample backtest: genuine out-of-time forecasts
from mmm_framework.validation import BacktestConfig, run_backtest

config = BacktestConfig(
    min_train_size=104,  # two seasonal cycles before the first cutoff
    horizon=13,          # forecast a quarter past each cutoff
    step=13,             # non-overlapping windows
)
result = run_backtest(mmm, config)

print(result.fits)          # per-refit convergence -- read this first
print(result.summary())     # MAPE/MASE vs naive + seasonal-naive baselines
print(result.coverage())    # nominal vs empirical interval coverage

# Gates worth pre-registering:
#   - MASE < 1 (beats "copy last year" out of time)
#   - empirical coverage near nominal (intervals honest on unseen data)

Measured reference points

From the baked notebook nbs/validation/backtest_validation.ipynb (realistic synthetic world, 4 refits, 52 graded weekly forecasts): out-of-time MAPE 3.0% vs 13.8% for seasonal-naive (MASE 0.24), 80% interval covering 90% of held-out weeks. On the trend-break stress world the same protocol degrades to 11.4% MAPE with 58% coverage — the backtest detects misspecification a green in-sample check misses. And on real data (nbs/validation/pinkham_real_data.ipynb) the honest grade was no skill over persistence — which is exactly the kind of finding this step exists to surface before a client hears a forecast.

A passing backtest validates the predictive model only; a model can forecast well while attributing wrongly (see pressure testing). Keep both gates.

Sensitivity Analysis

Test how much your conclusions change when you vary modeling assumptions. Robust findings persist across reasonable choices. Fragile findings suggest more data or experiments are needed.

# Sensitivity analysis: vary key assumptions
sensitivity_results = {}

def tv_roas(model, panel):
    """Posterior TV ROAS: counterfactual contribution / spend."""
    contrib = model.compute_counterfactual_contributions(
        channels=["TV"], hdi_prob=0.9, random_seed=42
    )
    spend = panel.X_media["TV"].sum()
    return (
        contrib.total_contributions["TV"] / spend,
        contrib.contribution_hdi_low["TV"] / spend,
        contrib.contribution_hdi_high["TV"] / spend,
    )

# 1. Vary TV adstock L_max
for lmax in [4, 6, 8, 12]:
    config = make_config(tv_lmax=lmax)  # your helper: rebuild configs with this l_max
    model = BayesianMMM(panel, config, trend_config)
    model.fit(random_seed=42)
    sensitivity_results[f"tv_lmax_{lmax}"] = tv_roas(model, panel)

# 2. Vary prior strength
for sigma in [0.5, 1.0, 2.0]:
    config = make_config(beta_prior_sigma=sigma)
    model = BayesianMMM(panel, config, trend_config)
    model.fit(random_seed=42)
    sensitivity_results[f"prior_sigma_{sigma}"] = tv_roas(model, panel)

# Report: how much do ROAS estimates change?
print("\n=== Sensitivity Analysis ===")
for name, (roas, low, high) in sensitivity_results.items():
    print(f"  {name}: TV ROAS = {roas:.2f} ({low:.2f}-{high:.2f})")

Interpreting Sensitivity

If TV ROI ranges from 1.1 to 1.6 across all reasonable specifications, you can confidently report it is positive and above 1.0. If it ranges from 0.5 to 2.5, the estimate is sensitive to modeling choices and you should recommend validation experiments before acting on it.

Fragile estimates are not a dead end—they are the entry point to the measurement loop: the channels where conclusions are least stable are exactly where a lift test buys the most learning. See the Measurement & Calibration guide for how experiment results feed back into the model.

Phase 4: Report
Extract Insights

# Channel contributions and ROAS from counterfactual analysis:
# contribution = prediction(actual spend) - prediction(channel zeroed out)
contrib = mmm.compute_counterfactual_contributions(hdi_prob=0.9, random_seed=42)

print("=== Channel Results ===")
for channel in panel.coords.channels:
    spend = panel.X_media[channel].sum()
    roas = contrib.total_contributions[channel] / spend
    roas_low = contrib.contribution_hdi_low[channel] / spend
    roas_high = contrib.contribution_hdi_high[channel] / spend
    print(f"\n{channel}:")
    print(f"  ROAS: {roas:.2f} (90% HDI: {roas_low:.2f}-{roas_high:.2f})")
    print(f"  Share of media effect: {contrib.contribution_pct[channel]:.1f}%")

# Generate the full report
from mmm_framework.reporting import MMMReportGenerator, ReportConfig

report = MMMReportGenerator(
    model=mmm,
    panel=panel,
    results=results,
    config=ReportConfig(
        title="Q4 2025 Media Effectiveness Analysis",
        client="Brand Name",
        analysis_period="Jan 2024 - Dec 2025",
    ),
)

report.to_html("q4_2025_mmm_report.html")

Communicate Results Honestly

How you communicate results matters as much as the analysis itself. Stakeholders need to understand both what the model says and how confident the model is.

✕ Poor Communication

“TV ROI is 1.42”
“Digital drives 28% of sales”
“We should shift $2M from TV to digital”
No mention of uncertainty
No mention of assumptions

✓ Honest Communication

“TV ROI is estimated at 1.4 (90% CI: 1.1-1.8)”
“Digital contributes 22-34% of sales (90% CI)”
“A $2M shift would likely improve returns, but the magnitude is uncertain”
Sensitivity to key assumptions documented
Comparison to experimental results where available

For Detailed Guidance on Presenting Results

See the Interpreting Results for Media Planners and CMOs guide for specific recommendations on presenting uncertainty, creating executive summaries, and translating model outputs into actionable planning guidance.

When a report recommends validation, the platform’s experiment design studio can turn that recommendation directly into a pre-registered geo lift or matched-market test—see the Platform Overview for a tour.

When (and How) to Iterate

Iteration is a normal part of modeling. The key distinction is between legitimate iteration and specification shopping.

Legitimate Iteration	Specification Shopping
Fixing computational issues (divergences, non-convergence)	Adjusting until results “look right”
Adding a confounder you forgot to include	Removing a variable because it reduced media effects
Expanding priors that are clearly too narrow (based on prior predictive check)	Tightening priors to force results toward desired values
Documenting changes and reporting both versions	Only reporting the version with preferred results
Iteration driven by failed diagnostic checks	Iteration driven by stakeholder feedback on ROI values

The Rule of Thumb

If you would make the same change regardless of which direction the results moved, it is legitimate. If you would only make the change because the results went in the “wrong” direction, it is specification shopping.

Next Steps

Ready to implement? Start with the Getting Started guide for installation and a complete code walkthrough. For understanding the business context, see For Business Stakeholders. For presenting results, see Interpreting Results. To wire experiments into the model so each cycle compounds, see the Closed-Loop Measurement & Calibration guide.

Building Statistically Sound Marketing Mix Models