Technical Guide

A comprehensive reference for data scientists implementing Bayesian Marketing Mix Models. This guide covers mathematical foundations, model specifications, and practical considerations for each model type in the framework.

Overview

Marketing Mix Modeling (MMM) estimates the incremental impact of marketing activities on business outcomes like sales or conversions. This framework implements MMM using Bayesian inference, which provides several advantages over traditional frequentist approaches:

Uncertainty quantification: Full posterior distributions, not just point estimates
Prior incorporation: Domain knowledge can be encoded in priors
Regularization: Priors act as natural regularizers, preventing overfitting
Hierarchical modeling: Partial pooling across geographies and products
Decision-ready outputs: Credible intervals directly answer "how confident are we?"

Bayesian Basics

At its core, Bayesian inference combines prior beliefs with observed data to form posterior beliefs. For a parameter θ given data y:

p(\theta | y) = \frac{p(y | \theta) \cdot p(\theta)}{p(y)} \propto p(y | \theta) \cdot p(\theta)

Where:

p(θ) — Prior: what we believe before seeing data
p(y|θ) — Likelihood: probability of data given parameters
p(θ|y) — Posterior: updated beliefs after seeing data

Credible Intervals vs. Confidence Intervals

A 94% Bayesian credible interval means "there is a 94% probability the true parameter lies in this range, given our prior and data." This is a direct probability statement about the parameter—unlike frequentist confidence intervals, which are statements about the procedure.

Positivity Constraints via Priors

Media coefficients should generally be positive (advertising shouldn't decrease sales). Rather than post-hoc filtering (which invalidates inference), we encode this through priors that concentrate probability mass on positive values:

\beta \sim \text{HalfNormal}(\sigma = 0.5)

⚠️ Specification Shopping

Running multiple models and selecting one with "reasonable" coefficients is specification shopping. This practice destroys the statistical properties of your estimates. Instead, encode constraints through priors before seeing the data, then report the single resulting posterior.

Choosing a Model

Model	Use When	Complexity
Standard MMM	Single outcome, no intermediate metrics, simple attribution	Low
Nested MMM	Upper-funnel metrics available (awareness, consideration), want direct/indirect decomposition	Medium
Multivariate MMM	Multiple outcomes that may interact (product portfolio, cannibalization)	Medium
Combined MMM	Both mediators AND multiple interacting outcomes	High

Data Structure Considerations

Data Structure	Recommended Model
National aggregate only	Single time-series model; acknowledge wide uncertainty
Multiple geographies, national media	Hierarchical model with random intercepts by geo
Multiple geographies, regional media variation	Hierarchical model with random slopes on media
Many geographies, rich regional variation	Full hierarchical model with geo-level covariates

Standard MMM

The standard model relates marketing inputs to a single outcome through transformed media variables. Channel contributions flow through three sequential components: adstock (carryover), saturation (diminishing returns), and coefficient scaling.

Full Model Specification

The complete generative model for the standard MMM is:

(1) $$y_t = \alpha + \underbrace{\sum_{m=1}^{M} \beta_m \cdot f_{\text{sat}}\left(g_{\text{adstock}}(x_{m,t})\right)}_{\text{Media Effects}} + \underbrace{\tau(t)}_{\text{Trend}} + \underbrace{s(t)}_{\text{Seasonality}} + \underbrace{\gamma' \mathbf{z}_t}_{\text{Controls}} + \epsilon_t$$

Where:

$y_t$ is the outcome at time $t$ (optionally log-transformed)
$x_{m,t}$ is spend for media channel $m$ at time $t$
$g_{\text{adstock}}(\cdot)$ is the adstock transformation (carryover)
$f_{\text{sat}}(\cdot)$ is the saturation function (diminishing returns)
$\tau(t)$ is the trend component
$s(t)$ is the seasonality component
$\mathbf{z}_t$ is a vector of control variables
$\epsilon_t \sim \mathcal{N}(0, \sigma^2)$ is the error term

Multiplicative (Log-Log) Specification

For elasticity interpretation, use the multiplicative form:

\log(y_t) = \log(\beta_0) + \sum_{m=1}^{M} \beta_m \log\left(f_m(x_{m,t})\right) + \gamma' \mathbf{z}_t + \epsilon_t

Coefficients now represent elasticities: percent change in sales per percent change in media.

Framework Usage: Complete Model Setup

from mmm_framework import (
    BayesianMMM,
    MFFConfigBuilder,
    ModelConfigBuilder,
    MediaChannelConfigBuilder,
    ControlVariableConfigBuilder,
    KPIConfigBuilder,
    load_mff,
)

# 1. Configure the data specification (MFF = Master Flat File)
mff_config = (
    MFFConfigBuilder()
    .with_kpi_builder(KPIConfigBuilder("Sales").national().additive())
    .add_media_builder(
        MediaChannelConfigBuilder("TV")
        .national()
        .with_geometric_adstock(l_max=8)
        .with_hill_saturation()
    )
    .add_media_builder(
        MediaChannelConfigBuilder("Digital")
        .national()
        .with_geometric_adstock(l_max=4)
        .with_hill_saturation()
    )
    .add_control_builder(
        ControlVariableConfigBuilder("Price")
        .national()
        .allow_negative()
    )
    .weekly()
    .build()
)

# 2. Load data
panel = load_mff(dataframe, mff_config)

# 3. Configure and fit model
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()  # Fast JAX backend
    .with_chains(4)
    .with_draws(2000)
    .build()
)

model = BayesianMMM(panel, model_config)
results = model.fit()

Adstock Transformation

Adstock captures carryover effects—the idea that today's advertising continues to influence sales in future periods.

Geometric Adstock (Default)

The simplest form with a single decay parameter $\alpha \in [0, 1)$:

(2) $$A_t = x_t + \alpha \cdot A_{t-1} = \sum_{l=0}^{L} \alpha^l \cdot x_{t-l}$$

The half-life (time for effect to decay by 50%) is:

t_{1/2} = \frac{\log(0.5)}{\log(\alpha)} = \frac{-0.693}{\log(\alpha)}

For $\alpha = 0.7$: half-life ≈ 1.9 periods. For $\alpha = 0.9$: half-life ≈ 6.6 periods.

Weibull/Delayed Adstock

For channels where peak effect is delayed (e.g., brand campaigns, content marketing):

(3) $$w_l = \exp\left(-\left(\frac{l}{\theta}\right)^k\right) - \exp\left(-\left(\frac{l+1}{\theta}\right)^k\right)$$

Where θ (scale) controls delay and k (shape) controls peak sharpness.

Interactive: Geometric Adstock Decay

See how the decay parameter α affects carryover. Higher α = longer memory.

Decay (α): 0.70 Half-life: 1.9 periods

Interactive: Compare Adstock Types

Geometric vs Weibull: see how delayed peak effects differ from immediate decay.

Geometric α: 0.70

Weibull θ (scale): 3.0

Weibull k (shape): 2.0

Geometric (k=1)
Peak effect is immediate, then decays exponentially. Best for performance media (search, social).

Weibull (k>1)
Peak effect is delayed. Best for brand campaigns, TV, content where awareness builds over time.

Interactive: Adstock Applied to Media Spend

See how a pulse of spend gets transformed by adstock:

Adstock type:

Framework Usage: Adstock Configuration

from mmm_framework import AdstockConfigBuilder, PriorConfigBuilder

# Geometric adstock with 8-week max lag (recommended for most media)
adstock_geometric = (
    AdstockConfigBuilder()
    .geometric()
    .with_max_lag(8)
    .with_alpha_prior(PriorConfigBuilder().beta(alpha=1, beta=3).build())
    .build()
)

# Fast decay variant for performance media (search, social)
adstock_fast = (
    AdstockConfigBuilder()
    .geometric()
    .with_fast_decay()  # Uses Beta(1, 3) - skews toward lower alpha
    .with_max_lag(4)
    .build()
)

# Channel-specific adstock via MediaChannelConfigBuilder
tv_channel = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)  # Long carryover for TV
    .with_hill_saturation()
    .build()
)

digital_channel = (
    MediaChannelConfigBuilder("Digital")
    .national()
    .with_geometric_adstock(l_max=4)  # Shorter carryover for digital
    .with_hill_saturation()
    .build()
)

Saturation Functions

Saturation captures diminishing returns—each additional dollar produces less incremental effect.

Logistic Saturation (Recommended)

(4) $$f_{\text{logistic}}(x) = 1 - \exp(-\lambda x)$$

Hill Function

(5) $$f_{\text{Hill}}(x) = \frac{x^S}{K^S + x^S}$$

Where K is the half-saturation point (EC50) and S is slope (steepness).

Interactive: Compare Saturation Functions

λ (logistic): 2.0

K (Hill): 0.50

S (Hill): 2.0

Key insight: The logistic function has one parameter (λ) controlling saturation speed, while Hill has two (K, S) offering more flexibility but risking identifiability issues.

Hill Function: Effect of Parameters

See how K (half-saturation) and S (slope) independently affect the curve shape:

K (Half-saturation point)
The spend level where you achieve 50% of maximum effect. Lower K = saturation kicks in earlier.

S (Slope/Steepness)
Controls how sharply the curve rises. S < 1 = concave; S > 1 = S-shaped; S = 1 = Michaelis-Menten.

⚠️ Identification Warning

Hill function parameters can be weakly identified. Constrain K's prior to the observed data range.

Framework Usage: Saturation Configuration

from mmm_framework import SaturationConfigBuilder, PriorConfigBuilder

# Hill saturation with data-driven bounds (recommended)
saturation_hill = (
    SaturationConfigBuilder()
    .hill()
    .with_kappa_prior(PriorConfigBuilder().beta(2, 2).build())
    .with_slope_prior(PriorConfigBuilder().half_normal(1.5).build())
    .with_beta_prior(PriorConfigBuilder().half_normal(2.0).build())
    .with_kappa_bounds(0.1, 0.9)  # Constrain K to percentile bounds
    .build()
)

# Strong saturation variant (expect heavy diminishing returns)
saturation_strong = (
    SaturationConfigBuilder()
    .hill()
    .with_strong_saturation()  # Higher slope prior
    .build()
)

# Channel-specific saturation via MediaChannelConfigBuilder
tv_with_saturation = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)
    .with_hill_saturation()  # Default Hill saturation
    .with_positive_prior(sigma=2.0)  # Prior for media coefficient
    .build()
)

Trend & Seasonality

Type	Description	Use When
Linear	$\tau(t) = \delta \cdot t$	Stable growth/decline
Piecewise	Linear segments with changepoints	Known structural breaks
B-Spline	Smooth nonlinear trend	Flexible trend
Gaussian Process	$\tau(t) \sim \mathcal{GP}(0, k(t, t'))$	Maximum flexibility

Interactive: Compare Trend Types

See how different trend specifications capture underlying patterns:

Linear slope (δ): 0.20

Changepoint location: 0.50

Spline flexibility: 5 knots

Linear & Piecewise
Simple, interpretable. Use piecewise when you know structural breaks occurred (e.g., COVID, new product launch).

B-Spline & GP
Flexible, data-driven. Risk overfitting short series. GP provides uncertainty bands but is computationally expensive.

Fourier Seasonality

(6) $$s(t) = \sum_{n=1}^{N} \left[ a_n \sin\left(\frac{2\pi n t}{P}\right) + b_n \cos\left(\frac{2\pi n t}{P}\right) \right]$$

Interactive: Fourier Seasonality Components

Higher order N captures finer seasonal patterns. P is the period (e.g., 52 for weekly data with annual seasonality).

Order (N): 2

Order	Pattern Captured	Typical Use
N = 1	Single annual wave	Simple annual cycle
N = 2	+ Semi-annual	Summer/winter peaks
N = 3	+ Quarterly	Q4 holiday spike
N ≥ 4	+ Finer patterns	Complex seasonality

Interactive: Combined Trend + Seasonality

See how trend and seasonality combine to form the baseline (before media effects):

Trend type: Seasonality order:

💡 Why Explicit Trend & Seasonality Matter

You might wonder: "If my control variables (e.g., temperature, holidays) already capture seasonal patterns, why include explicit seasonality components?" The answer lies in how media spending correlates with time.

The problem: Advertisers typically increase spend during high-demand periods (Q4 holidays, summer travel season). This means media spend is correlated with seasonality. If you rely on control variables to absorb seasonal patterns, any correlation between those controls and media will bias your media effect estimates.

The solution: Explicit trend and seasonality components isolate the structural time patterns that exist regardless of marketing activity. This allows the model to:

Separate "sales are high because it's December" from "sales are high because we spent on TV"
Prevent media from getting credit for seasonal uplift it didn't cause
Allow controls to capture their specific effects without also serving as proxies for time
Provide cleaner counterfactual estimation (what would sales be with zero media?)

Rule of thumb: Always include trend and seasonality components. Let controls capture their incremental effects beyond structural time patterns, not instead of them.

Framework Usage: Trend & Seasonality Configuration

from mmm_framework import (
    SeasonalityConfigBuilder,
    TrendConfig,
    TrendType,
    ModelConfigBuilder,
)

# Seasonality configuration: annual + monthly patterns
seasonality = (
    SeasonalityConfigBuilder()
    .with_yearly(order=2)   # N=2 Fourier terms for annual pattern
    .with_monthly(order=1)  # N=1 for monthly variation
    .build()
)

# Trend configuration (use TrendConfig enum)
trend = TrendConfig(
    trend_type=TrendType.LINEAR,  # Options: LINEAR, PIECEWISE, SPLINE, GP
    # For PIECEWISE: specify changepoint dates
    # For SPLINE: specify n_knots
)

# Include in model configuration
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()
    .with_seasonality(seasonality)
    # Trend is included by default; customize via TrendConfig if needed
    .with_chains(4)
    .with_draws(2000)
    .build()
)

Prior Specification

# Recommended default priors for standardized data
beta_media ~ HalfNormal(sigma=0.5)      # Media coefficients
alpha ~ Beta(alpha=1, beta=3)            # Adstock decay
kappa ~ Beta(alpha=2, beta=2)            # Hill half-saturation
slope ~ HalfNormal(sigma=1.5)            # Hill slope
lambda_sat ~ Gamma(alpha=2, beta=1)      # Logistic saturation rate
intercept ~ Normal(mu=0, sigma=1)        # Intercept
gamma ~ Normal(mu=0, sigma=0.5)          # Control coefficients
sigma ~ HalfNormal(sigma=0.5)            # Noise
delta ~ Normal(mu=0, sigma=0.1)          # Trend growth
season_coef ~ Normal(mu=0, sigma=0.3)    # Seasonality

Framework Usage: Prior Configuration

from mmm_framework import PriorConfigBuilder, MediaChannelConfigBuilder

# Build custom priors using PriorConfigBuilder
media_prior = PriorConfigBuilder().half_normal(sigma=0.5).build()
adstock_prior = PriorConfigBuilder().beta(alpha=1, beta=3).build()
control_prior = PriorConfigBuilder().normal(mu=0, sigma=0.5).build()

# Apply priors to media channels
tv_channel = (
    MediaChannelConfigBuilder("TV")
    .national()
    .with_geometric_adstock(l_max=8)
    .with_hill_saturation()
    .with_positive_prior(sigma=2.0)  # HalfNormal prior for coefficient
    .build()
)

# Control variables with different prior specifications
price_control = (
    ControlVariableConfigBuilder("Price")
    .national()
    .allow_negative()
    .with_normal_prior(mu=-0.5, sigma=0.5)  # Expect negative price effect
    .build()
)

distribution_control = (
    ControlVariableConfigBuilder("Distribution")
    .national()
    .positive_only()  # Use HalfNormal by default
    .build()
)

Nested Model

Nested models estimate causal pathways where media affects intermediate outcomes (mediators) which in turn affect the final outcome.

Core Structure

Stage 1 — Media → Mediator

(7) $$M_t = \alpha_M + \sum_{c=1}^{C} \beta^{(M)}_c \cdot f_c(x_{c,t}) + \epsilon^{(M)}_t$$

Stage 2 — Mediator → Outcome

(8) $$y_t = \alpha_y + \gamma \cdot M_t + \sum_{c=1}^{C} \beta^{(D)}_c \cdot f_c(x_{c,t}) + \epsilon^{(y)}_t$$

Effect Decomposition

(9) $$\text{Total Effect}_c = \underbrace{\beta^{(D)}_c}_{\text{Direct}} + \underbrace{\beta^{(M)}_c \cdot \gamma}_{\text{Indirect}}$$

\text{Proportion Mediated}_c = \frac{\beta^{(M)}_c \cdot \gamma}{\beta^{(D)}_c + \beta^{(M)}_c \cdot \gamma}

Mediator Types

Type	Data Requirement	Observation Model
Fully Observed	Complete time series	$M^{obs}_t = M_t + \nu_t$
Partially Observed	Sparse survey data	$M^{obs}_{t_i} = M_{t_i} + \nu_{t_i}$ for observed periods
Fully Latent	No direct observations	Inferred from outcome; requires strong priors

Identification Considerations

Model	Key Requirements
Nested (fully observed)	Variation in media, complete mediator data
Nested (partially observed)	Sufficient survey observations, informative priors
Nested (fully latent)	Strong priors, non-zero mediator effect assumed

Framework Usage: Nested Model Configuration

from mmm_framework.mmm_extensions import (
    MediatorConfigBuilder,
    NestedModelConfigBuilder,
    awareness_mediator,  # Factory function for common mediators
    NestedMMM,
)

# Option 1: Use factory function for common mediator types
awareness = awareness_mediator(
    name="brand_awareness",
    observation_noise=0.15,  # For partially observed mediators
)

# Option 2: Full control with MediatorConfigBuilder
awareness_custom = (
    MediatorConfigBuilder("brand_awareness")
    .partially_observed(observation_noise=0.15)
    .with_positive_media_effect(sigma=1.0)
    .with_slow_adstock(l_max=12)  # Brand awareness has longer carryover
    .with_direct_effect(sigma=0.3)
    .build()
)

# Build nested configuration
nested_config = (
    NestedModelConfigBuilder()
    .add_mediator(awareness_custom)
    .map_channels_to_mediator(
        "brand_awareness",
        ["TV", "Digital", "Social"],  # These channels build awareness
    )
    .share_adstock(True)  # Share adstock parameters across paths
    .build()
)

# Create and fit the nested model
nested_model = NestedMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    nested_config=nested_config,
)
results = nested_model.fit()

Multivariate Model

The multivariate model jointly estimates effects on multiple outcomes, capturing correlations and cross-effects between products.

Model Structure

(10) $$y_{k,t} = \alpha_k + \sum_{c=1}^{C} \beta_{kc} \cdot f_c(x_{c,t}) + \sum_{j \neq k} \psi_{jk} \cdot y_{j,t} + \epsilon_{k,t}$$

Cross-Effects

Cannibalization (ψ < 0)

Product j steals from product k

\psi_{jk} \sim \mathcal{N}^-(0, 0.3)

Halo Effect (ψ > 0)

Product j lifts product k

\psi_{jk} \sim \mathcal{N}^+(0, 0.3)

Promotion-Modulated Cross-Effects

(11) $$y_{k,t} = \ldots + \psi_{jk} \cdot P_{j,t} \cdot y_{j,t} + \ldots$$

Correlated Errors

(12) $$\boldsymbol{\epsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}), \quad \boldsymbol{\Sigma} = \text{diag}(\boldsymbol{\sigma}) \cdot \mathbf{R} \cdot \text{diag}(\boldsymbol{\sigma}), \quad \mathbf{R} \sim \text{LKJ}(\eta)$$

Framework Usage: Multivariate Model Configuration

from mmm_framework.mmm_extensions import (
    OutcomeConfigBuilder,
    CrossEffectConfigBuilder,
    MultivariateModelConfigBuilder,
    MultivariateMMM,
    halo_effect,           # Factory for positive cross-effects
    cannibalization_effect,  # Factory for negative cross-effects
)

# Configure multiple outcomes
product_a = OutcomeConfigBuilder("ProductA_Sales").build()
product_b = OutcomeConfigBuilder("ProductB_Sales").build()

# Define cross-effects between products
# Halo effect: Product A lifts Product B
halo = halo_effect(
    from_outcome="ProductA_Sales",
    to_outcome="ProductB_Sales",
    lag=2,  # 2-week delayed effect
)

# Cannibalization: Product B steals from Product A
cannibalization = cannibalization_effect(
    from_outcome="ProductB_Sales",
    to_outcome="ProductA_Sales",
    lag=0,  # Immediate effect
)

# Build multivariate configuration
multivariate_config = (
    MultivariateModelConfigBuilder()
    .add_outcome(product_a)
    .add_outcome(product_b)
    .add_cross_effect(halo)
    .add_cross_effect(cannibalization)
    .with_correlated_errors(eta=2.0)  # LKJ prior for error correlation
    .build()
)

# Create and fit
multivariate_model = MultivariateMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    multivariate_config=multivariate_config,
)
results = multivariate_model.fit()

Combined Model

The combined model integrates nested pathways and multivariate outcomes.

Full Specification

Mediator equations:

(13) $$M_{m,t} = \alpha_m + \sum_{c \in \mathcal{C}_m} \beta^{(M)}_{mc} \cdot f_c(x_{c,t}) + \epsilon^{(M)}_{m,t}$$

Outcome equations:

(14) $$y_{k,t} = \alpha_k + \underbrace{\sum_{c=1}^{C} \beta^{(D)}_{kc} \cdot f_c(x_{c,t})}_{\text{Direct}} + \underbrace{\sum_{m \in \mathcal{M}_k} \gamma_{km} \cdot M_{m,t}}_{\text{Mediator}} + \underbrace{\sum_{j \neq k} \psi_{jk} \cdot y_{j,t}}_{\text{Cross-effects}} + \epsilon_{k,t}$$

DAG Representation

flowchart TB
    Media["🎯 Media Channels"]
    
    M1["Mediator 1
Awareness"]
    M2["Mediator 2
Consideration"]
    Direct["Direct Effects"]
    
    Y1["Y₁ Product A"]
    Y2["Y₂ Product B"]
    
    Sigma["Correlated Errors Σ"]
    
    Media --> M1
    Media --> M2
    Media --> Direct
    
    M1 --> Y1
    M1 --> Y2
    M2 --> Y1
    M2 --> Y2
    Direct --> Y1
    Direct --> Y2
    
    Y1 <-.->|"Cross-effects ψ"| Y2
    
    Y1 -.-> Sigma
    Y2 -.-> Sigma
    
    style Media fill:#8fa86a,stroke:#6d8a4a,color:#fff
    style M1 fill:#6a8fa8,stroke:#4a6d8a,color:#fff
    style M2 fill:#6a8fa8,stroke:#4a6d8a,color:#fff
    style Direct fill:#e5e8e0,stroke:#c4cdc4,color:#5a6b5a
    style Y1 fill:#d4a86a,stroke:#b8860b,color:#fff
    style Y2 fill:#d4a86a,stroke:#b8860b,color:#fff
    style Sigma fill:#f5f7f3,stroke:#d4ddd4,color:#5a6b5a

Framework Usage: Combined Model Configuration

from mmm_framework.mmm_extensions import (
    CombinedMMM,
    CombinedModelConfigBuilder,
)

# Combine nested and multivariate configurations
combined_config = (
    CombinedModelConfigBuilder()
    .with_nested_config(nested_config)       # From NestedModelConfigBuilder
    .with_multivariate_config(multivariate_config)  # From MultivariateModelConfigBuilder
    .build()
)

# Create and fit the combined model
combined_model = CombinedMMM(
    panel=panel,
    mff_config=mff_config,
    model_config=model_config,
    combined_config=combined_config,
)
results = combined_model.fit()

# Access decomposed effects
print(results.direct_effects)     # Direct media → outcome
print(results.indirect_effects)   # Media → mediator → outcome
print(results.cross_effects)      # Outcome → outcome

Hierarchical Structure

Hierarchical models provide partial pooling—sharing information across groups while allowing heterogeneity.

Geographic Hierarchy

(15) $$\beta_g \sim \mathcal{N}(\bar{\beta}, \tau^2)$$

(16) $$y_{gt} = \alpha_g + \beta_g x_{gt} + \gamma' \mathbf{z}_{gt} + \epsilon_{gt}$$

⚠️ National Media Limitation

Geo-level random effects on national media coefficients cannot be interpreted as differential causal response—national media provides no geo-level exposure variation.

Parameterization

Centered (Default)

beta_g ~ Normal(mu_beta, tau)

Works well with sufficient data per group.

Non-Centered

offset_g ~ Normal(0, 1)
beta_g = mu_beta + tau * offset_g

Use when groups have < 20 observations.

Framework Usage: Hierarchical Configuration

from mmm_framework import (
    HierarchicalConfigBuilder,
    KPIConfigBuilder,
    MediaChannelConfigBuilder,
    ModelConfigBuilder,
    PriorConfigBuilder,
)

# Configure hierarchical pooling
hierarchical = (
    HierarchicalConfigBuilder()
    .enabled()
    .pool_across_geo()      # Share information across geographies
    .pool_across_product()  # Share information across products
    .use_non_centered()     # Better for sparse data
    .with_non_centered_threshold(20)  # Switch at 20 obs per group
    .with_mu_prior(PriorConfigBuilder().normal(0, 1).build())
    .with_sigma_prior(PriorConfigBuilder().half_normal(0.5).build())
    .build()
)

# Geo-level KPI configuration
geo_kpi = (
    KPIConfigBuilder("Sales")
    .by_geo()  # Period + Geography dimensions
    .additive()
    .build()
)

# Geo-level media (has regional variation)
local_radio = (
    MediaChannelConfigBuilder("LocalRadio")
    .by_geo()  # Media varies by geography
    .with_geometric_adstock(6)
    .with_hill_saturation()
    .build()
)

# Include hierarchical config in model
model_config = (
    ModelConfigBuilder()
    .additive()
    .bayesian_numpyro()
    .with_hierarchical(hierarchical)
    .with_chains(4)
    .with_draws(2000)
    .build()
)

Variable Selection Methods

⚠️ Critical: Variable Classification Required

Variable selection should only be applied to precision control variables. Confounders must be EXCLUDED from selection. See Variable Selection page for details.

Regularized Horseshoe Prior

(17) $$\beta_j = z_j \cdot \tau \cdot \tilde{\lambda}_j, \quad \tau_0 = \frac{D_0}{D - D_0} \cdot \frac{\sigma}{\sqrt{N}}$$

Spike-and-Slab Prior

(18) $$\beta_j = \gamma_j \cdot \beta_{\text{slab},j} + (1 - \gamma_j) \cdot \beta_{\text{spike},j}$$

Bayesian LASSO

(19) $$\beta_j | \sigma^2, \lambda^2 \sim \mathcal{N}(0, \sigma^2 \tau_j^2), \quad \tau_j^2 \sim \text{Exp}(\lambda^2/2)$$

Interpreting Results

PIP Range	Interpretation
> 0.75	Strong evidence of effect
0.50 - 0.75	Moderate evidence
0.25 - 0.50	Weak evidence
< 0.25	Little evidence of effect

Framework Usage: Variable Selection Configuration

from mmm_framework import (
    ControlSelectionConfigBuilder,
    ControlVariableConfigBuilder,
    MFFConfigBuilder,
)

# Configure variable selection for precision controls
control_selection = (
    ControlSelectionConfigBuilder()
    .horseshoe()  # Regularized horseshoe prior
    .with_expected_nonzero(5)  # Expect ~5 important controls
    .build()
)

# Add controls with shrinkage enabled
weather_control = (
    ControlVariableConfigBuilder("Temperature")
    .national()
    .allow_negative()
    .with_shrinkage()  # Enable variable selection
    .build()
)

promo_control = (
    ControlVariableConfigBuilder("Promotion")
    .national()
    .positive_only()
    .with_shrinkage()  # Enable variable selection
    .build()
)

# Build MFF config with control selection
mff_config = (
    MFFConfigBuilder()
    .with_kpi_builder(...)
    .add_media_builder(...)
    .add_control_builder(weather_control)
    .add_control_builder(promo_control)
    .with_control_selection(control_selection)
    .build()
)

Model Diagnostics

Convergence Diagnostics

Diagnostic	Target	Interpretation
$\hat{R}$	< 1.01	Chains converged
ESS	> 400	Sufficient independent samples
Divergences	0	No pathological behavior
Tree Depth	< max	Not hitting limits

Identifiability Checks

Prior-posterior overlap: Wide overlap suggests weak identification
Posterior correlations: High correlation indicates potential redundancy
Posterior predictive checks: Does the model reproduce observed patterns?

Model Comparison

Metric	Description
WAIC	Widely Applicable Information Criterion
LOO-CV	Leave-One-Out Cross-Validation
Out-of-sample RMSE	Prediction error on held-out data

Computational Scaling

Component	Cost Multiplier
Additional mediator	~1.3×
Additional outcome	~1.5×
Cross-effects	~1.1×
Partial observation	~1.2×
Hierarchical pooling	~1.5×

💡 Performance Tip

For complex models, use nuts_sampler="numpyro" for 4-10× speedup via JAX.

Prior Explorer

Understanding how prior distributions behave is crucial for encoding domain knowledge in your MMM. Use this interactive explorer to visualize common prior choices and see how their parameters affect the distribution shape.

Interactive Prior Distribution Viewer

Distribution:

Common Prior Choices in MMM

Parameter	Distribution	Typical Values	Rationale
Media coefficients (β)	`HalfNormal(σ)`	σ = 0.3 – 0.5	Enforces positivity; regularizes toward zero
Adstock decay (α)	`Beta(a, b)`	a=1, b=3 or a=2, b=2	Bounded [0,1]; controls half-life distribution
Saturation λ	`Gamma(α, β)`	α=2, β=1	Positive; mildly regularizes saturation speed
Hill K (half-sat)	`Beta(a, b)`	a=2, b=2	Bounded to data range; centered prior
Control coefficients (γ)	`Normal(μ, σ)`	μ=0, σ=0.5	Can be positive or negative; regularized
Intercept (α)	`Normal(μ, σ)`	μ=0, σ=1	Weakly informative for standardized data
Noise (σ)	`HalfNormal(σ)`	σ = 0.5	Positive; expect residual ~0.3-0.5 for standardized y
Hierarchical σ	`HalfNormal(σ)`	σ = 0.3	Controls pooling strength; smaller = more pooling

Compare Multiple Priors

See how different parameterizations affect your prior beliefs:

Compare:

💡 Prior Selection Principles

Weakly informative: Regularize extreme values without dominating the likelihood
Scientifically motivated: Encode known constraints (e.g., media effects should be positive)
Scale-appropriate: Match prior scale to standardized data (coefficients typically < 1)
Sensitivity check: If results change dramatically with different reasonable priors, your data is weakly informative