Getting Started

Build your first Bayesian Marketing Mix Model in minutes. This guide walks you through installation, core concepts, and a complete working example using the mmm-framework.

What You'll Build

By the end of this guide, you'll have a working MMM that estimates media effects with honest uncertainty quantification—the foundation for decisions you can trust.

What you're building is a causal model

A marketing mix model answers a causal question—"what would sales have been if we hadn't run this media?"—not just "what moved together?" Lots of things move together without one causing the other, the way holiday demand lifts both ad spend and sales at the same time. This framework separates coincidence from contribution: it accounts for confounders like underlying demand, locks the model design in before results are seen, and checks its answers against real-world experiments such as regional holdout tests. Read more in Causal Inference and Measurement & Calibration.

📊 Prepare Data

→

⚙️ Configure Model

→

🔬 Fit & Diagnose

→

📈 Analyze Results

Prerequisites

To install the modeling library and fit a model in Python, you need only:

✓ Python 3.12+

✓ uv (recommended) or pip

✓ Git (source install only)

Running the web application?

The library needs nothing beyond the above — no server, no database. Extra services only matter if you also run the web app:

React + agent API (modern, recommended): no Redis. The agent API runs fits in-kernel, so there is no separate job queue to stand up.
Streamlit (legacy): needs Redis + ARQ for the asynchronous job queue that keeps the UI responsive while MCMC sampling runs in the background. A local Redis instance is sufficient.

Installation

The modeling library is published on PyPI. For anything production-facing, pin an exact version — the project is pre-1.0 and minor releases may include breaking changes (see the changelog).

Install from PyPI

# Install the modeling library
$ pip install mmm-framework

# Production: pin the exact release
$ pip install mmm-framework==0.2.0

Or Install from Source (development)

# Clone the repository
$ git clone https://github.com/redam94/mmm-framework.git
$ cd mmm-framework

# Install with uv (recommended)
$ uv sync

# Or with pip
$ pip install -e .

Install App Dependencies

If you want to use the web application (API backend plus the React or Streamlit frontend):

# Install app dependencies (API backend + Streamlit frontend)
$ uv sync --group app

# For development (includes testing tools)
$ uv sync --group dev --group app

# For the React frontend, install its packages once
$ cd frontend && npm install

Verify Installation

import mmm_framework
print(f"mmm-framework version: {mmm_framework.__version__}")

# Check available components
from mmm_framework import (
    MFFConfigBuilder,
    ModelConfigBuilder,
    BayesianMMM,
    load_mff,
)
print("✓ All core components imported successfully")

Install or first fit not cooperating?

JAX on Apple Silicon, a PyTensor/clang link error, a slow sampler, a rejected dataset — the Troubleshooting runbook has a symptom-and-fix entry for each.

Fastest Path: fit a model in one script

The library ships two ready-to-model example datasets, so you can fit a real model without writing any data-loading code. Copy the block below into a Python session — it loads an example, fits a Bayesian MMM, prints each channel's ROI, and grades the estimate against the example's sealed answer key.

Run it in your browser — no local install.

from mmm_framework import (
    load_example,
    load_example_answer_key,
    BayesianMMM,
    ModelConfigBuilder,
    TrendConfig,
    TrendType,
)

# 1. Load a bundled example — 104 weeks of national weekly data, ready to model.
panel = load_example("national")
print(panel.summary())

# 2. Configure inference (fast JAX/NumPyro sampler) and a linear trend.
model_config = (
    ModelConfigBuilder()
    .bayesian_numpyro()      # ~3x faster than PyMC at equal draws
    .with_chains(4)
    .with_draws(500)         # small + fast for a first run
    .with_tune(500)
    .build()
)
trend_config = TrendConfig(type=TrendType.LINEAR)

# 3. Fit. On a laptop this national model takes roughly 15-25 seconds.
mmm = BayesianMMM(panel, model_config, trend_config)
results = mmm.fit(random_seed=42)
print("max R-hat:", round(results.diagnostics["rhat_max"], 3))   # ~1.0 = converged

# 4. The headline: each channel's return on ad spend (contribution / spend).
decomp = mmm.compute_component_decomposition()
roi = (decomp.media_by_channel.sum() / panel.X_media.sum()).sort_values(ascending=False)
print("\nEstimated ROI by channel:")
print(roi.round(2))

# 5. This example ships a SEALED answer key — grade the estimate against truth.
truth = load_example_answer_key("national")["true_roas"]
print("\nchannel   estimated   true")
for ch in roi.index:
    print(f"  {ch:<8} {roi[ch]:>8.2f}   {truth[ch]:>5.2f}")

What just happened

The model recovers the causal ranking — the brand channels (TV, Social, Video) out-earn the performance channels (Search, Display) — even though a single observational fit attenuates the magnitudes. Tightening those magnitudes is exactly what the experiment-calibration loop is for. load_example("geo") gives you an 8-market geo panel instead.

Your First Model, step by step

Prefer to understand each piece? The rest of this section builds the same kind of model from scratch — preparing data in MFF format, configuring priors and inference, fitting, and interpreting the results with honest uncertainty.

Step 1: Prepare Your Data

The framework uses Master Flat File (MFF) format—a long-format structure that handles variable-dimension data elegantly. Each row represents a single observation of a single variable.

import pandas as pd
import numpy as np

# Example: Create synthetic MFF data
np.random.seed(42)
n_weeks = 104  # 2 years of weekly data

# Generate dates
dates = pd.date_range("2023-01-01", periods=n_weeks, freq="W")

# Build MFF records
records = []

# KPI: Sales (weekly)
for i, date in enumerate(dates):
    base_sales = 1000 + 50 * np.sin(2 * np.pi * i / 52)  # Seasonality
    noise = np.random.normal(0, 50)
    records.append({
        "Period": date,
        "VariableName": "Sales",
        "VariableValue": base_sales + noise
    })

# Media: TV spend (weekly, with some weeks at zero)
for i, date in enumerate(dates):
    spend = np.random.exponential(500) if np.random.random() > 0.2 else 0
    records.append({
        "Period": date,
        "VariableName": "TV",
        "VariableValue": spend
    })

# Media: Digital spend (weekly)
for i, date in enumerate(dates):
    spend = np.random.exponential(300)
    records.append({
        "Period": date,
        "VariableName": "Digital",
        "VariableValue": spend
    })

# Control: Price Index
for i, date in enumerate(dates):
    price = 100 + np.random.normal(0, 5)
    records.append({
        "Period": date,
        "VariableName": "Price",
        "VariableValue": price
    })

# Create DataFrame
mff_data = pd.DataFrame(records)
print(f"MFF shape: {mff_data.shape}")
print(mff_data.head(10))

MFF Format Benefits

The Master Flat File format handles complex scenarios like different variables at different granularities (e.g., national media + geo-level sales), hierarchical structures, and missing data—all in a single, consistent structure.

Step 2: Configure the Model

The framework uses a fluent builder pattern for configuration. This provides a readable, chainable API while ensuring type safety and validation.

from mmm_framework import (
    MFFConfigBuilder,
    ModelConfigBuilder,
    TrendConfig,
    TrendType,
    BayesianMMM,
    load_mff,
)

# Step 2a: Configure the data structure
mff_config = (
    MFFConfigBuilder()
    .with_kpi_name("Sales")                    # Target variable
    .add_national_media("TV", adstock_lmax=8)  # TV with 8-week carryover
    .add_national_media("Digital", adstock_lmax=4)  # Digital with 4-week carryover
    .add_price_control()                       # Price as control variable
    .build()
)

print(f"Media channels: {mff_config.media_names}")
print(f"Control variables: {mff_config.control_names}")

# Step 2b: Load and validate data
panel = load_mff(mff_data, mff_config)

print(f"\nPanel dataset:")
print(f"  Observations: {panel.n_obs}")
print(f"  Channels: {panel.n_channels}")
print(f"  Controls: {panel.n_controls}")

# Step 2c: Configure model inference
model_config = (
    ModelConfigBuilder()
    .bayesian_numpyro()       # JAX sampler (measured ~3x faster than PyMC at equal draws on PyMC 6)
    .with_chains(4)           # 4 parallel chains for convergence diagnostics
    .with_draws(2000)         # 2000 posterior draws
    .with_tune(1000)          # 1000 warmup iterations
    .with_target_accept(0.9)  # Target acceptance rate
    .build()
)

# Step 2d: Configure trend component
trend_config = TrendConfig(
    type=TrendType.LINEAR,
    growth_prior_sigma=0.1
)

Adstock (Carryover)

Media effects persist over time. The adstock_lmax parameter sets the maximum lag window. TV typically has longer carryover (6-12 weeks) than digital (2-6 weeks).

Saturation

Returns diminish at higher spend levels. The framework uses Hill or logistic saturation functions by default, with priors that adapt to your data scale.

Step 3: Fit and Analyze

Now we build the model, check our priors, fit it to data, and analyze the results with proper uncertainty quantification.

How long should this take? Measured on an Apple M3 laptop (16 GB, NumPyro, baked in nbs/validation/runtime_benchmark.ipynb): a 104-week, 2–3-channel quickstart model like this one fits in roughly 9–13 seconds at 4 × 500 draws, and a production-size 156-week × 7-channel model in ~15 s; doubling the draws costs ~1.3× (extrapolating that measured rate, the 2000-draw configuration above lands around half a minute). If a national fit takes minutes instead, check that the NumPyro sampler is actually selected (the PyMC sampler measured ~3× slower at equal draws).

# Build the model
mmm = BayesianMMM(panel, model_config, trend_config)

# Inspect model structure
print("Model parameters:")
for var in mmm.model.free_RVs:
    print(f"  {var.name}")

# Prior predictive check (ALWAYS do this before fitting)
print("\n=== Prior Predictive Check ===")
prior = mmm.sample_prior_predictive(samples=200)
y_prior = prior.prior_predictive["y_obs"].values.flatten()

print(f"Prior predictive y range: [{y_prior.min():.1f}, {y_prior.max():.1f}]")
print(f"Actual y range: [{panel.y.min():.1f}, {panel.y.max():.1f}]")

# Fit the model
print("\n=== Fitting Model ===")
results = mmm.fit(random_seed=42)

# Convergence diagnostics (CRITICAL - check these!)
print("\n=== Diagnostics ===")
print(f"Divergences: {results.diagnostics['divergences']}")
print(f"R-hat max: {results.diagnostics['rhat_max']:.4f}")
print(f"ESS bulk min: {results.diagnostics['ess_bulk_min']:.0f}")

# Check for issues
if results.diagnostics['divergences'] > 0:
    print("⚠️  Divergences detected - consider reparameterization")
if results.diagnostics['rhat_max'] > 1.01:
    print("⚠️  R-hat > 1.01 - chains may not have converged")
if results.diagnostics['ess_bulk_min'] < 400:
    print("⚠️  Low ESS - consider more draws")

# Posterior summary with uncertainty
print("\n=== Posterior Summary ===")
summary = results.summary(["beta_TV", "beta_Digital", "sigma"])
# ArviZ 1.x reports an 89% equal-tailed interval as eti89_lb / eti89_ub
# (the older hdi_3% / hdi_97% column names were removed).
print(summary[["mean", "sd", "eti89_lb", "eti89_ub", "r_hat"]])

Always Check Diagnostics

MCMC diagnostics are not optional. Divergences, high R-hat, or low ESS indicate that your posterior samples may be unreliable. The framework provides these automatically—always review them before interpreting results.

Understanding Your Results

The MMMResults object provides everything you need for analysis:

# Channel contributions with uncertainty
print("\n=== Channel Contributions ===")
if results.channel_contributions is not None:
    contrib = results.channel_contributions.sum()
    print(f"Total TV contribution: {contrib['TV']:.0f}")
    print(f"Total Digital contribution: {contrib['Digital']:.0f}")

# Access the full posterior trace (an ArviZ DataTree under arviz 1.x) for plots
import arviz as az
from mmm_framework.utils.arviz_compat import plot_posterior

# Posterior distributions — az.plot_posterior was removed in arviz 1.x, so route
# through the framework's version-robust shim (it calls arviz_plots.plot_dist)
plot_posterior(results.trace, var_names=["beta_TV", "beta_Digital"])

# Trace plots for convergence
az.plot_trace(results.trace, var_names=["beta_TV", "beta_Digital"])

# Forest plot comparing channels
az.plot_forest(results.trace, var_names=["beta_TV", "beta_Digital"])

Generating Reports

The framework includes a reporting module that generates portable, single-file HTML reports with embedded Plotly charts and honest uncertainty quantification throughout.

from mmm_framework.reporting import MMMReportGenerator, ReportConfig, ReportBuilder

# Option 1: Quick report from fitted model
report = MMMReportGenerator(
    model=mmm,
    panel=panel,
    results=results,
    config=ReportConfig(
        title="Marketing Mix Model Analysis",
        client="Acme Consumer Products",
        analysis_period="Jan 2023 - Dec 2024",
    ),
)

# Save to HTML
report.to_html("mmm_report_q4_2024.html")

# Option 2: Fluent builder pattern for customization
report = (
    ReportBuilder()
    .with_model(mmm, panel=panel, results=results)
    .with_title("Q4 Marketing Analysis")
    .with_client("Acme Corp")
    .with_credible_interval(0.9)  # 90% credible intervals
    .enable_all_sections()
    .disable_section("diagnostics")  # Hide technical details
    .build()
)

report.to_html("executive_summary.html")

📊 Report Contents

Executive summary, model fit visualization, channel ROI forest plots with uncertainty, revenue decomposition (waterfall + time series), saturation curves, and methodology documentation.

🎨 Customizable

Multiple color palettes (Sage, Corporate, Warm), configurable sections, adjustable credible intervals, and support for extended models (nested, multivariate, geographic).

📄 See an Example Report

View a complete example report generated by the framework to see all available visualizations and sections:

Open Example Report →

Core Concept: MFF Data Format

The Master Flat File format is a long-format data structure designed for marketing measurement. It elegantly handles the dimensionality challenges common in MMM:

# MFF structure example
"""
Period      | Geography | VariableName | VariableValue
2024-01-01  | National  | Sales        | 15000
2024-01-01  | National  | TV_Spend     | 50000
2024-01-01  | East      | Sales        | 8000
2024-01-01  | West      | Sales        | 7000
2024-01-08  | National  | Sales        | 16500
...
"""

# The framework handles dimension alignment automatically
mff_config = (
    MFFConfigBuilder()
    .with_kpi_name("Sales")
    # National media disaggregated to geo by population share
    .add_national_media("TV", adstock_lmax=8)
    # Geo-level media stays at its native granularity
    .add_media_builder(
        MediaChannelConfigBuilder("Local_Radio").by_geo().with_geometric_adstock(6)
    )
    .build()
)

Core Concept: Builder Pattern

The fluent builder pattern provides a readable, type-safe API for configuration:

from mmm_framework import (
    PriorConfigBuilder,
    AdstockConfigBuilder,
    SaturationConfigBuilder,
    MediaChannelConfigBuilder,
    HierarchicalConfigBuilder,
    SeasonalityConfigBuilder,
)

# Build custom priors
decay_prior = (
    PriorConfigBuilder()
    .beta(alpha=2, beta=2)  # Centered at 0.5
    .build()
)

# Build adstock configuration
adstock = (
    AdstockConfigBuilder()
    .geometric()
    .with_max_lag(8)
    .with_alpha_prior(decay_prior)
    .build()
)

# Build complete media channel config
tv_channel = (
    MediaChannelConfigBuilder("TV")  # name is a required constructor argument
    .national()
    .with_adstock(adstock)
    .with_hill_saturation()
    .with_positive_prior()  # Constrain the coefficient to be positive (HalfNormal)
    .build()
)

# Build hierarchical structure for geo models
hierarchical = (
    HierarchicalConfigBuilder()
    .enabled()
    .pool_across_geo()
    .use_non_centered()  # Better for sparse geos
    .with_non_centered_threshold(20)
    .build()
)

Core Concept: Bayesian Workflow

The framework implements the complete Bayesian workflow as described by Gelman et al. (2020):

1. Prior Predictive Check

Sample from priors to ensure they produce plausible data. Use mmm.sample_prior_predictive() before fitting.

2. Fit & Diagnose

Run MCMC and check convergence. Look for divergences, R-hat < 1.01, and ESS > 400 per parameter.

3. Posterior Predictive Check

Compare model predictions to observed data. Use results.posterior_predictive for calibration.

4. Sensitivity Analysis

Test how results change with different priors. Robust findings persist across reasonable prior choices.

Project Structure

Understanding the repository layout helps you navigate the codebase:

mmm-framework/
src/mmm_framework/ — Core Python package
__init__.py — Module exports and version
config/ — Pydantic config classes and enums
data_loader.py — MFF parsing and validation
analysis.py — Analysis utilities
serialization.py — Save/load functionality
jobs.py — Async job management with ARQ
builders/ — Modular builder classes
base.py — Shared mixins and protocols
prior.py — Prior, adstock, saturation builders
variable.py — Media, control, KPI builders
model.py — Model config builders
mff.py — MFF config builders
model/ — Core model module
base.py — BayesianMMM class
results.py — Result containers
components/ — Model components
trend.py — Trend configurations
transforms/ — Transformation functions
adstock.py — Geometric adstock transforms
saturation.py — Logistic/Hill saturation
seasonality.py — Fourier features
trend.py — B-spline, piecewise trends
utils/ — Utility functions
standardization.py — Data standardization
statistics.py — Statistical helpers
reporting/ — HTML report generation
config.py — ReportConfig, ColorScheme
generator.py — MMMReportGenerator
sections.py — Report section implementations
design_tokens.py — Unified design tokens
charts/ — Modular chart functions
decomposition.py, diagnostic.py, fit.py, geo.py, roi.py
extractors/ — Data extraction from models
helpers/ — ROI, decomposition helpers
mmm_extensions/ — Extended model capabilities
config.py — Mediator, Outcome, CrossEffect configs
builders.py — Extension builders + factory functions
results.py — Extended model results
models/ — Model class implementations
nested.py — NestedMMM (mediation)
multivariate.py — MultivariateMMM
combined.py — CombinedMMM
components/ — PyMC/PyTensor blocks
cross_effects.py, variable_selection.py, transforms.py
api/ — FastAPI backend
main.py — Application factory and health endpoints
routes/ — API route handlers
schemas.py — Pydantic request/response models
redis_service.py — Redis connection management
worker.py — ARQ worker settings
app/ — Streamlit frontend
Home.py — Main entry point and dashboard
api_client.py — HTTP client for backend API
pages/ — Multipage app modules
1_Data_Management.py — Upload and manage datasets
2_Configuration.py — Build model configurations
3_Model_Fitting.py — Submit and monitor jobs
4_Results.py — View diagnostics and contributions
5_Scenarios.py — What-if analysis and optimization
components/ — Reusable UI components
__init__.py — Component exports
common.py — Formatters, session state, CSS
charts.py — Plotly visualization functions
examples/ — Usage examples
ex_builder.py — Builder pattern demonstrations
ex_config.py — Configuration examples
ex_models.py — Model fitting workflows
ex_extensions.py — Extended model examples
ex_reporter.py — Report generation examples
tests/ — Test suite
conftest.py — Pytest fixtures
mmm_extensions/ — Extension module tests
docs/ — GitHub Pages documentation
index.html — Documentation homepage
getting-started.html — This page
technical-guide.html — Model specifications
shared/ — Shared CSS and components
pyproject.toml — Project configuration (uv/pip)
README.md — Project documentation

Running the Web Application

Two frontends are available, and they target different backends. The modern React app talks to the agent API, which runs fits in-kernel — so it needs no Redis and no separate worker. The legacy Streamlit app targets the older REST API plus its Redis + ARQ job queue. Pick one path; you do not need both backends.

Option 1: React Web App + Agent API (modern, recommended)

The React frontend is where the full measurement loop lives—Program, Experiments, Performance, and the AI agent workspace. Take the platform tour for a guided look around. No Redis, no worker — just two terminals:

# Terminal 1: Agent API (runs model fits in-kernel)
$ uv run uvicorn mmm_framework.api.main:app --port 8000 --reload

# Terminal 2: React UI (Vite dev server, proxies /api → :8000)
$ cd frontend
$ npm run dev

Option 2: Streamlit (legacy, deprecated)

The original Streamlit interface remains available as a secondary surface. It targets the separate legacy REST API and requires Redis + an ARQ worker for its asynchronous job queue. Start the three backend services first, then the UI:

1 Start Redis

The legacy job queue requires Redis running locally.

$ redis-server

2 Start legacy REST API

The Streamlit backend (port 8000).

$ cd api
$ uvicorn main:app --reload

3 Start ARQ worker

Processes model-fitting jobs off the queue.

$ cd api
$ arq worker.WorkerSettings

# Terminal 4: Streamlit UI (port 8501)
$ cd app
$ streamlit run Home.py

Either frontend gives you an interactive interface for the complete MMM workflow:

📤 Data Upload

Upload MFF-format CSV files with automatic validation and preview.

⚙️ Config Builder

Visual interface for building model configurations without code.

🔬 Model Fitting

Submit jobs, monitor progress, and view real-time diagnostics.

📊 Results Analysis

Interactive visualizations for posteriors, contributions, and response curves.

Next Steps

You've built your first model. Here's where to go next:

Learn by series

The docs are organized into five guided, self-contained series. Each opens with a "Before you start" box (prerequisites, time, what you'll learn) and every synthetic world ships a sealed answer key. Not sure which to pick? The Recommended Paths matrix maps a route for evaluators, sponsors, analysts, and methodologists by how much time you have.

Workshop · beginner · 6 parts

New to Bayes? Build intuition from distributions to your first MMM.

→

Aurora Tour · 6 parts

The whole framework on one synthetic brand: causality → base → extended → reporting.

→

Causal Inference · 11 parts

Why correlation misleads and how experiments fix it — the ladder of evidence.

→

Pressure Testing · 7 parts

Where MMMs silently fail, each failure graded against ground truth.

→

Mathematics · technical · 7 parts

The generative model in full: adstock, saturation, the Bayesian core.

→

Explore the platform & methodology

🔬 Interactive Workflow Demo

Walk through the complete scientific modeling workflow step-by-step—from question formulation through prior predictive checks, MCMC fitting, diagnostics, sensitivity analysis, and report generation.

Launch Interactive Demo → View Example Report

Data-Prep Cookbook

Turn raw Google Ads / Meta / GA4 exports into a modeling-ready MFF

→

Reading the Report

What each report section shows and the budget decision it supports

→

Migration Guide

Coming from Robyn, Meridian, or PyMC-Marketing? Map your model over

→

Platform Tour

Program, Experiments, Performance—the web app, room by room