Technical Guide

A comprehensive reference for data scientists implementing Bayesian Marketing Mix Models. This guide covers mathematical foundations, model specifications, and practical considerations for each model type in the framework.

Overview

Marketing Mix Modeling (MMM) estimates the incremental impact of marketing activities on business outcomes like sales or conversions. This framework implements MMM using Bayesian inference, which provides several advantages over traditional frequentist approaches:

Bayesian Basics

At its core, Bayesian inference combines prior beliefs with observed data to form posterior beliefs. For a parameter θ given data y:

$$p(\theta | y) = \frac{p(y | \theta) \cdot p(\theta)}{p(y)} \propto p(y | \theta) \cdot p(\theta)$$

Where:

Credible Intervals vs. Confidence Intervals

A 94% Bayesian credible interval means "there is a 94% probability the true parameter lies in this range, given our prior and data." This is a direct probability statement about the parameter—unlike frequentist confidence intervals, which are statements about the procedure.

Positivity Constraints via Priors

Media coefficients should generally be positive (advertising shouldn't decrease sales). Rather than post-hoc filtering (which invalidates inference), we encode this through priors that concentrate probability mass on positive values:

$$\beta \sim \text{HalfNormal}(\sigma = 0.5)$$

⚠️ Specification Shopping

Running multiple models and selecting one with "reasonable" coefficients is specification shopping. This practice destroys the statistical properties of your estimates. Instead, encode constraints through priors before seeing the data, then report the single resulting posterior.

Choosing a Model

ModelUse WhenComplexity
Standard MMM Single outcome, no intermediate metrics, simple attribution Low
Nested MMM Upper-funnel metrics available (awareness, consideration), want direct/indirect decomposition Medium
Multivariate MMM Multiple outcomes that may interact (product portfolio, cannibalization) Medium
Combined MMM Both mediators AND multiple interacting outcomes High

Data Structure Considerations

Data StructureRecommended Model
National aggregate onlySingle time-series model; acknowledge wide uncertainty
Multiple geographies, national mediaHierarchical model with random intercepts by geo
Multiple geographies, regional media variationHierarchical model with random slopes on media
Many geographies, rich regional variationFull hierarchical model with geo-level covariates

Standard MMM

The standard model relates marketing inputs to a single outcome through transformed media variables. Channel contributions flow through three sequential components: adstock (carryover), saturation (diminishing returns), and coefficient scaling.

Media Spend Adstock g(x) Saturation f(·) β Sales

Full Model Specification

The complete generative model for the standard MMM is:

(1) $$y_t = \alpha + \underbrace{\sum_{m=1}^{M} \beta_m \cdot f_{\text{sat}}\left(g_{\text{adstock}}(x_{m,t})\right)}_{\text{Media Effects}} + \underbrace{\tau(t)}_{\text{Trend}} + \underbrace{s(t)}_{\text{Seasonality}} + \underbrace{\gamma' \mathbf{z}_t}_{\text{Controls}} + \epsilon_t$$

Where:

Multiplicative (Log-Log) Specification

For elasticity interpretation, use the multiplicative form:

$$\log(y_t) = \log(\beta_0) + \sum_{m=1}^{M} \beta_m \log\left(f_m(x_{m,t})\right) + \gamma' \mathbf{z}_t + \epsilon_t$$

Coefficients now represent elasticities: percent change in sales per percent change in media.

Framework Usage: Complete Model Setup

from mmm_framework import (
    BayesianMMM,
    MFFConfigBuilder,
    ModelConfigBuilder,
    MediaChannelConfigBuilder,
    ControlVariableConfigBuilder,
    KPIConfigBuilder,
    load_mff,
)

# 1. Configure the data specification (MFF = Master Flat File)
mff_config = (
    MFFConfigBuilder()
    .with_kpi_builder(KPIConfigBuilder("Sales").national().additive())
    .add_media_builder(
        MediaChannelConfigBuilder("TV")
        .national()
        .with_geometric_adstock(l_max=8)
        .with_hill_saturation()
    )
    .add_media_builder(
        MediaChannelConfigBuilder("Digital")
        .national()
        .with_geometric_adstock(l_max=4)
        .with_hill_saturation()
    )
    .add_control_builder(
        ControlVariableConfigBuilder("Price")
        .national()
        .allow_negative()
    )
    .weekly()
    .build()
)

# 2. Load data
panel = load_mff(dataframe, mff_config)

# 3. Configure and fit model
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()  # Fast JAX backend
    .with_chains(4)
    .with_draws(2000)
    .build()
)

model = BayesianMMM(panel, model_config)
results = model.fit()

Adstock Transformation

Adstock captures carryover effects—the idea that today's advertising continues to influence sales in future periods.

Geometric Adstock (Default)

The simplest form with a single decay parameter $\alpha \in [0, 1)$:

(2) $$A_t = x_t + \alpha \cdot A_{t-1} = \sum_{l=0}^{L} \alpha^l \cdot x_{t-l}$$

The half-life (time for effect to decay by 50%) is:

$$t_{1/2} = \frac{\log(0.5)}{\log(\alpha)} = \frac{-0.693}{\log(\alpha)}$$

For $\alpha = 0.7$: half-life ≈ 1.9 periods. For $\alpha = 0.9$: half-life ≈ 6.6 periods.

Weibull/Delayed Adstock

For channels where peak effect is delayed (e.g., brand campaigns, content marketing):

(3) $$w_l = \exp\left(-\left(\frac{l}{\theta}\right)^k\right) - \exp\left(-\left(\frac{l+1}{\theta}\right)^k\right)$$

Where θ (scale) controls delay and k (shape) controls peak sharpness.

Interactive: Geometric Adstock Decay

See how the decay parameter α affects carryover. Higher α = longer memory.

0.70 Half-life: 1.9 periods

Interactive: Compare Adstock Types

Geometric vs Weibull: see how delayed peak effects differ from immediate decay.

0.70
3.0
2.0
Geometric (k=1)
Peak effect is immediate, then decays exponentially. Best for performance media (search, social).
Weibull (k>1)
Peak effect is delayed. Best for brand campaigns, TV, content where awareness builds over time.

Interactive: Adstock Applied to Media Spend

See how a pulse of spend gets transformed by adstock:

Framework Usage: Adstock Configuration

from mmm_framework import AdstockConfigBuilder, PriorConfigBuilder

# Geometric adstock with 8-week max lag (recommended for most media)
adstock_geometric = (
    AdstockConfigBuilder()
    .geometric()
    .with_max_lag(8)
    .with_alpha_prior(PriorConfigBuilder().beta(alpha=1, beta=3).build())
    .build()
)

# Fast decay variant for performance media (search, social)
adstock_fast = (
    AdstockConfigBuilder()
    .geometric()
    .with_fast_decay()  # Uses Beta(1, 3) - skews toward lower alpha
    .with_max_lag(4)
    .build()
)

# Channel-specific adstock via MediaChannelConfigBuilder
tv_channel = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)  # Long carryover for TV
    .with_hill_saturation()
    .build()
)

digital_channel = (
    MediaChannelConfigBuilder("Digital")
    .national()
    .with_geometric_adstock(l_max=4)  # Shorter carryover for digital
    .with_hill_saturation()
    .build()
)

Saturation Functions

Saturation captures diminishing returns—each additional dollar produces less incremental effect.

Logistic Saturation (Recommended)

(4) $$f_{\text{logistic}}(x) = 1 - \exp(-\lambda x)$$

Hill Function

(5) $$f_{\text{Hill}}(x) = \frac{x^S}{K^S + x^S}$$

Where K is the half-saturation point (EC50) and S is slope (steepness).

Interactive: Compare Saturation Functions

2.0
0.50
2.0

Key insight: The logistic function has one parameter (λ) controlling saturation speed, while Hill has two (K, S) offering more flexibility but risking identifiability issues.

Hill Function: Effect of Parameters

See how K (half-saturation) and S (slope) independently affect the curve shape:

K (Half-saturation point)
The spend level where you achieve 50% of maximum effect. Lower K = saturation kicks in earlier.
S (Slope/Steepness)
Controls how sharply the curve rises. S < 1 = concave; S > 1 = S-shaped; S = 1 = Michaelis-Menten.

⚠️ Identification Warning

Hill function parameters can be weakly identified. Constrain K's prior to the observed data range.

Framework Usage: Saturation Configuration

from mmm_framework import SaturationConfigBuilder, PriorConfigBuilder

# Hill saturation with data-driven bounds (recommended)
saturation_hill = (
    SaturationConfigBuilder()
    .hill()
    .with_kappa_prior(PriorConfigBuilder().beta(2, 2).build())
    .with_slope_prior(PriorConfigBuilder().half_normal(1.5).build())
    .with_beta_prior(PriorConfigBuilder().half_normal(2.0).build())
    .with_kappa_bounds(0.1, 0.9)  # Constrain K to percentile bounds
    .build()
)

# Strong saturation variant (expect heavy diminishing returns)
saturation_strong = (
    SaturationConfigBuilder()
    .hill()
    .with_strong_saturation()  # Higher slope prior
    .build()
)

# Channel-specific saturation via MediaChannelConfigBuilder
tv_with_saturation = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)
    .with_hill_saturation()  # Default Hill saturation
    .with_positive_prior(sigma=2.0)  # Prior for media coefficient
    .build()
)

Trend & Seasonality

TypeDescriptionUse When
Linear$\tau(t) = \delta \cdot t$Stable growth/decline
PiecewiseLinear segments with changepointsKnown structural breaks
B-SplineSmooth nonlinear trendFlexible trend
Gaussian Process$\tau(t) \sim \mathcal{GP}(0, k(t, t'))$Maximum flexibility

Interactive: Compare Trend Types

See how different trend specifications capture underlying patterns:

0.20
0.50
5 knots
Linear & Piecewise
Simple, interpretable. Use piecewise when you know structural breaks occurred (e.g., COVID, new product launch).
B-Spline & GP
Flexible, data-driven. Risk overfitting short series. GP provides uncertainty bands but is computationally expensive.

Fourier Seasonality

(6) $$s(t) = \sum_{n=1}^{N} \left[ a_n \sin\left(\frac{2\pi n t}{P}\right) + b_n \cos\left(\frac{2\pi n t}{P}\right) \right]$$

Interactive: Fourier Seasonality Components

Higher order N captures finer seasonal patterns. P is the period (e.g., 52 for weekly data with annual seasonality).

2
OrderPattern CapturedTypical Use
N = 1Single annual waveSimple annual cycle
N = 2+ Semi-annualSummer/winter peaks
N = 3+ QuarterlyQ4 holiday spike
N ≥ 4+ Finer patternsComplex seasonality

Interactive: Combined Trend + Seasonality

See how trend and seasonality combine to form the baseline (before media effects):

💡 Why Explicit Trend & Seasonality Matter

You might wonder: "If my control variables (e.g., temperature, holidays) already capture seasonal patterns, why include explicit seasonality components?" The answer lies in how media spending correlates with time.

The problem: Advertisers typically increase spend during high-demand periods (Q4 holidays, summer travel season). This means media spend is correlated with seasonality. If you rely on control variables to absorb seasonal patterns, any correlation between those controls and media will bias your media effect estimates.

The solution: Explicit trend and seasonality components isolate the structural time patterns that exist regardless of marketing activity. This allows the model to:

  • Separate "sales are high because it's December" from "sales are high because we spent on TV"
  • Prevent media from getting credit for seasonal uplift it didn't cause
  • Allow controls to capture their specific effects without also serving as proxies for time
  • Provide cleaner counterfactual estimation (what would sales be with zero media?)

Rule of thumb: Always include trend and seasonality components. Let controls capture their incremental effects beyond structural time patterns, not instead of them.

Framework Usage: Trend & Seasonality Configuration

from mmm_framework import (
    SeasonalityConfigBuilder,
    TrendConfig,
    TrendType,
    ModelConfigBuilder,
)

# Seasonality configuration: annual + monthly patterns
seasonality = (
    SeasonalityConfigBuilder()
    .with_yearly(order=2)   # N=2 Fourier terms for annual pattern
    .with_monthly(order=1)  # N=1 for monthly variation
    .build()
)

# Trend configuration (use TrendConfig enum)
trend = TrendConfig(
    trend_type=TrendType.LINEAR,  # Options: LINEAR, PIECEWISE, SPLINE, GP
    # For PIECEWISE: specify changepoint dates
    # For SPLINE: specify n_knots
)

# Include in model configuration
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()
    .with_seasonality(seasonality)
    # Trend is included by default; customize via TrendConfig if needed
    .with_chains(4)
    .with_draws(2000)
    .build()
)

Prior Specification

# Recommended default priors for standardized data
beta_media ~ HalfNormal(sigma=0.5)      # Media coefficients
alpha ~ Beta(alpha=1, beta=3)            # Adstock decay
kappa ~ Beta(alpha=2, beta=2)            # Hill half-saturation
slope ~ HalfNormal(sigma=1.5)            # Hill slope
lambda_sat ~ Gamma(alpha=2, beta=1)      # Logistic saturation rate
intercept ~ Normal(mu=0, sigma=1)        # Intercept
gamma ~ Normal(mu=0, sigma=0.5)          # Control coefficients
sigma ~ HalfNormal(sigma=0.5)            # Noise
delta ~ Normal(mu=0, sigma=0.1)          # Trend growth
season_coef ~ Normal(mu=0, sigma=0.3)    # Seasonality

Framework Usage: Prior Configuration

from mmm_framework import PriorConfigBuilder, MediaChannelConfigBuilder

# Build custom priors using PriorConfigBuilder
media_prior = PriorConfigBuilder().half_normal(sigma=0.5).build()
adstock_prior = PriorConfigBuilder().beta(alpha=1, beta=3).build()
control_prior = PriorConfigBuilder().normal(mu=0, sigma=0.5).build()

# Apply priors to media channels
tv_channel = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)
    .with_hill_saturation()
    .with_positive_prior(sigma=2.0)  # HalfNormal prior for coefficient
    .build()
)

# Control variables with different prior specifications
price_control = (
    ControlVariableConfigBuilder("Price")
    .national()
    .allow_negative()
    .with_normal_prior(mu=-0.5, sigma=0.5)  # Expect negative price effect
    .build()
)

distribution_control = (
    ControlVariableConfigBuilder("Distribution")
    .national()
    .positive_only()  # Use HalfNormal by default
    .build()
)

Nested Model

Nested models estimate causal pathways where media affects intermediate outcomes (mediators) which in turn affect the final outcome.

Core Structure

Stage 1 — Media → Mediator

(7) $$M_t = \alpha_M + \sum_{c=1}^{C} \beta^{(M)}_c \cdot f_c(x_{c,t}) + \epsilon^{(M)}_t$$

Stage 2 — Mediator → Outcome

(8) $$y_t = \alpha_y + \gamma \cdot M_t + \sum_{c=1}^{C} \beta^{(D)}_c \cdot f_c(x_{c,t}) + \epsilon^{(y)}_t$$

Effect Decomposition

(9) $$\text{Total Effect}_c = \underbrace{\beta^{(D)}_c}_{\text{Direct}} + \underbrace{\beta^{(M)}_c \cdot \gamma}_{\text{Indirect}}$$
$$\text{Proportion Mediated}_c = \frac{\beta^{(M)}_c \cdot \gamma}{\beta^{(D)}_c + \beta^{(M)}_c \cdot \gamma}$$

Mediator Types

TypeData RequirementObservation Model
Fully ObservedComplete time series$M^{obs}_t = M_t + \nu_t$
Partially ObservedSparse survey data$M^{obs}_{t_i} = M_{t_i} + \nu_{t_i}$ for observed periods
Fully LatentNo direct observationsInferred from outcome; requires strong priors

Identification Considerations

ModelKey Requirements
Nested (fully observed)Variation in media, complete mediator data
Nested (partially observed)Sufficient survey observations, informative priors
Nested (fully latent)Strong priors, non-zero mediator effect assumed

Framework Usage: Nested Model Configuration

from mmm_framework.mmm_extensions import (
    MediatorConfigBuilder,
    NestedModelConfigBuilder,
    awareness_mediator,  # Factory function for common mediators
    NestedMMM,
)

# Option 1: Use factory function for common mediator types
awareness = awareness_mediator(
    name="brand_awareness",
    observation_noise=0.15,  # For partially observed mediators
)

# Option 2: Full control with MediatorConfigBuilder
awareness_custom = (
    MediatorConfigBuilder("brand_awareness")
    .partially_observed(observation_noise=0.15)
    .with_positive_media_effect(sigma=1.0)
    .with_slow_adstock(l_max=12)  # Brand awareness has longer carryover
    .with_direct_effect(sigma=0.3)
    .build()
)

# Build nested configuration
nested_config = (
    NestedModelConfigBuilder()
    .add_mediator(awareness_custom)
    .map_channels_to_mediator(
        "brand_awareness",
        ["TV", "Digital", "Social"],  # These channels build awareness
    )
    .share_adstock(True)  # Share adstock parameters across paths
    .build()
)

# Create and fit the nested model
nested_model = NestedMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    nested_config=nested_config,
)
results = nested_model.fit()

Multivariate Model

The multivariate model jointly estimates effects on multiple outcomes, capturing correlations and cross-effects between products.

Model Structure

(10) $$y_{k,t} = \alpha_k + \sum_{c=1}^{C} \beta_{kc} \cdot f_c(x_{c,t}) + \sum_{j \neq k} \psi_{jk} \cdot y_{j,t} + \epsilon_{k,t}$$

Cross-Effects

Cannibalization (ψ < 0)

Product j steals from product k

$$\psi_{jk} \sim \mathcal{N}^-(0, 0.3)$$

Halo Effect (ψ > 0)

Product j lifts product k

$$\psi_{jk} \sim \mathcal{N}^+(0, 0.3)$$

Promotion-Modulated Cross-Effects

(11) $$y_{k,t} = \ldots + \psi_{jk} \cdot P_{j,t} \cdot y_{j,t} + \ldots$$

Correlated Errors

(12) $$\boldsymbol{\epsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}), \quad \boldsymbol{\Sigma} = \text{diag}(\boldsymbol{\sigma}) \cdot \mathbf{R} \cdot \text{diag}(\boldsymbol{\sigma}), \quad \mathbf{R} \sim \text{LKJ}(\eta)$$

Framework Usage: Multivariate Model Configuration

from mmm_framework.mmm_extensions import (
    OutcomeConfigBuilder,
    CrossEffectConfigBuilder,
    MultivariateModelConfigBuilder,
    MultivariateMMM,
    halo_effect,           # Factory for positive cross-effects
    cannibalization_effect,  # Factory for negative cross-effects
)

# Configure multiple outcomes
product_a = OutcomeConfigBuilder("ProductA_Sales").build()
product_b = OutcomeConfigBuilder("ProductB_Sales").build()

# Define cross-effects between products
# Halo effect: Product A lifts Product B
halo = halo_effect(
    from_outcome="ProductA_Sales",
    to_outcome="ProductB_Sales",
    lag=2,  # 2-week delayed effect
)

# Cannibalization: Product B steals from Product A
cannibalization = cannibalization_effect(
    from_outcome="ProductB_Sales",
    to_outcome="ProductA_Sales",
    lag=0,  # Immediate effect
)

# Build multivariate configuration
multivariate_config = (
    MultivariateModelConfigBuilder()
    .add_outcome(product_a)
    .add_outcome(product_b)
    .add_cross_effect(halo)
    .add_cross_effect(cannibalization)
    .with_correlated_errors(eta=2.0)  # LKJ prior for error correlation
    .build()
)

# Create and fit
multivariate_model = MultivariateMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    multivariate_config=multivariate_config,
)
results = multivariate_model.fit()

Combined Model

The combined model integrates nested pathways and multivariate outcomes.

Full Specification

Mediator equations:

(13) $$M_{m,t} = \alpha_m + \sum_{c \in \mathcal{C}_m} \beta^{(M)}_{mc} \cdot f_c(x_{c,t}) + \epsilon^{(M)}_{m,t}$$

Outcome equations:

(14) $$y_{k,t} = \alpha_k + \underbrace{\sum_{c=1}^{C} \beta^{(D)}_{kc} \cdot f_c(x_{c,t})}_{\text{Direct}} + \underbrace{\sum_{m \in \mathcal{M}_k} \gamma_{km} \cdot M_{m,t}}_{\text{Mediator}} + \underbrace{\sum_{j \neq k} \psi_{jk} \cdot y_{j,t}}_{\text{Cross-effects}} + \epsilon_{k,t}$$

DAG Representation

flowchart TB
    Media["🎯 Media Channels"]
    
    M1["Mediator 1
Awareness"] M2["Mediator 2
Consideration"] Direct["Direct Effects"] Y1["Y₁ Product A"] Y2["Y₂ Product B"] Sigma["Correlated Errors Σ"] Media --> M1 Media --> M2 Media --> Direct M1 --> Y1 M1 --> Y2 M2 --> Y1 M2 --> Y2 Direct --> Y1 Direct --> Y2 Y1 <-.->|"Cross-effects ψ"| Y2 Y1 -.-> Sigma Y2 -.-> Sigma style Media fill:#8fa86a,stroke:#6d8a4a,color:#fff style M1 fill:#6a8fa8,stroke:#4a6d8a,color:#fff style M2 fill:#6a8fa8,stroke:#4a6d8a,color:#fff style Direct fill:#e5e8e0,stroke:#c4cdc4,color:#5a6b5a style Y1 fill:#d4a86a,stroke:#b8860b,color:#fff style Y2 fill:#d4a86a,stroke:#b8860b,color:#fff style Sigma fill:#f5f7f3,stroke:#d4ddd4,color:#5a6b5a

Framework Usage: Combined Model Configuration

from mmm_framework.mmm_extensions import (
    CombinedMMM,
    CombinedModelConfigBuilder,
)

# Combine nested and multivariate configurations
combined_config = (
    CombinedModelConfigBuilder()
    .with_nested_config(nested_config)       # From NestedModelConfigBuilder
    .with_multivariate_config(multivariate_config)  # From MultivariateModelConfigBuilder
    .build()
)

# Create and fit the combined model
combined_model = CombinedMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    combined_config=combined_config,
)
results = combined_model.fit()

# Access decomposed effects
print(results.direct_effects)     # Direct media → outcome
print(results.indirect_effects)   # Media → mediator → outcome
print(results.cross_effects)      # Outcome → outcome

Hierarchical Structure

Hierarchical models provide partial pooling—sharing information across groups while allowing heterogeneity.

Geographic Hierarchy

(15) $$\beta_g \sim \mathcal{N}(\bar{\beta}, \tau^2)$$
(16) $$y_{gt} = \alpha_g + \beta_g x_{gt} + \gamma' \mathbf{z}_{gt} + \epsilon_{gt}$$

⚠️ National Media Limitation

Geo-level random effects on national media coefficients cannot be interpreted as differential causal response—national media provides no geo-level exposure variation.

Parameterization

Centered (Default)

beta_g ~ Normal(mu_beta, tau)

Works well with sufficient data per group.

Non-Centered

offset_g ~ Normal(0, 1)
beta_g = mu_beta + tau * offset_g

Use when groups have < 20 observations.

Framework Usage: Hierarchical Configuration

from mmm_framework import (
    HierarchicalConfigBuilder,
    KPIConfigBuilder,
    MediaChannelConfigBuilder,
    ModelConfigBuilder,
    PriorConfigBuilder,
)

# Configure hierarchical pooling
hierarchical = (
    HierarchicalConfigBuilder()
    .enabled()
    .pool_across_geo()      # Share information across geographies
    .pool_across_product()  # Share information across products
    .use_non_centered()     # Better for sparse data
    .with_non_centered_threshold(20)  # Switch at 20 obs per group
    .with_mu_prior(PriorConfigBuilder().normal(0, 1).build())
    .with_sigma_prior(PriorConfigBuilder().half_normal(0.5).build())
    .build()
)

# Geo-level KPI configuration
geo_kpi = (
    KPIConfigBuilder("Sales")
    .by_geo()  # Period + Geography dimensions
    .additive()
    .build()
)

# Geo-level media (has regional variation)
local_radio = (
    MediaChannelConfigBuilder("LocalRadio")
    .by_geo()  # Media varies by geography
    .with_geometric_adstock(6)
    .with_hill_saturation()
    .build()
)

# Include hierarchical config in model
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()
    .with_hierarchical(hierarchical)
    .with_chains(4)
    .with_draws(2000)
    .build()
)

Variable Selection Methods

⚠️ Critical: Variable Classification Required

Variable selection should only be applied to precision control variables. Confounders must be EXCLUDED from selection. See Variable Selection page for details.

Regularized Horseshoe Prior

(17) $$\beta_j = z_j \cdot \tau \cdot \tilde{\lambda}_j, \quad \tau_0 = \frac{D_0}{D - D_0} \cdot \frac{\sigma}{\sqrt{N}}$$

Spike-and-Slab Prior

(18) $$\beta_j = \gamma_j \cdot \beta_{\text{slab},j} + (1 - \gamma_j) \cdot \beta_{\text{spike},j}$$

Bayesian LASSO

(19) $$\beta_j | \sigma^2, \lambda^2 \sim \mathcal{N}(0, \sigma^2 \tau_j^2), \quad \tau_j^2 \sim \text{Exp}(\lambda^2/2)$$

Interpreting Results

PIP RangeInterpretation
> 0.75Strong evidence of effect
0.50 - 0.75Moderate evidence
0.25 - 0.50Weak evidence
< 0.25Little evidence of effect

Framework Usage: Variable Selection Configuration

from mmm_framework import (
    ControlSelectionConfigBuilder,
    ControlVariableConfigBuilder,
    MFFConfigBuilder,
)

# Configure variable selection for precision controls
control_selection = (
    ControlSelectionConfigBuilder()
    .horseshoe()  # Regularized horseshoe prior
    .with_expected_nonzero(5)  # Expect ~5 important controls
    .build()
)

# Add controls with shrinkage enabled
weather_control = (
    ControlVariableConfigBuilder("Temperature")
    .national()
    .allow_negative()
    .with_shrinkage()  # Enable variable selection
    .build()
)

promo_control = (
    ControlVariableConfigBuilder("Promotion")
    .national()
    .positive_only()
    .with_shrinkage()  # Enable variable selection
    .build()
)

# Build MFF config with control selection
mff_config = (
    MFFConfigBuilder()
    .with_kpi_builder(...)
    .add_media_builder(...)
    .add_control_builder(weather_control)
    .add_control_builder(promo_control)
    .with_control_selection(control_selection)
    .build()
)

Model Diagnostics

Convergence Diagnostics

DiagnosticTargetInterpretation
$\hat{R}$< 1.01Chains converged
ESS> 400Sufficient independent samples
Divergences0No pathological behavior
Tree Depth< maxNot hitting limits

Identifiability Checks

Model Comparison

MetricDescription
WAICWidely Applicable Information Criterion
LOO-CVLeave-One-Out Cross-Validation
Out-of-sample RMSEPrediction error on held-out data

Computational Scaling

ComponentCost Multiplier
Additional mediator~1.3×
Additional outcome~1.5×
Cross-effects~1.1×
Partial observation~1.2×
Hierarchical pooling~1.5×

💡 Performance Tip

For complex models, use nuts_sampler="numpyro" for 4-10× speedup via JAX.

Prior Explorer

Understanding how prior distributions behave is crucial for encoding domain knowledge in your MMM. Use this interactive explorer to visualize common prior choices and see how their parameters affect the distribution shape.

Interactive Prior Distribution Viewer

Common Prior Choices in MMM

Parameter Distribution Typical Values Rationale
Media coefficients (β) HalfNormal(σ) σ = 0.3 – 0.5 Enforces positivity; regularizes toward zero
Adstock decay (α) Beta(a, b) a=1, b=3 or a=2, b=2 Bounded [0,1]; controls half-life distribution
Saturation λ Gamma(α, β) α=2, β=1 Positive; mildly regularizes saturation speed
Hill K (half-sat) Beta(a, b) a=2, b=2 Bounded to data range; centered prior
Control coefficients (γ) Normal(μ, σ) μ=0, σ=0.5 Can be positive or negative; regularized
Intercept (α) Normal(μ, σ) μ=0, σ=1 Weakly informative for standardized data
Noise (σ) HalfNormal(σ) σ = 0.5 Positive; expect residual ~0.3-0.5 for standardized y
Hierarchical σ HalfNormal(σ) σ = 0.3 Controls pooling strength; smaller = more pooling

Compare Multiple Priors

See how different parameterizations affect your prior beliefs:

💡 Prior Selection Principles

  • Weakly informative: Regularize extreme values without dominating the likelihood
  • Scientifically motivated: Encode known constraints (e.g., media effects should be positive)
  • Scale-appropriate: Match prior scale to standardized data (coefficients typically < 1)
  • Sensitivity check: If results change dramatically with different reasonable priors, your data is weakly informative