Bayesian Estimation and Priors for MMM

Summary

The nonlinear MMM is estimated by MCMC (a customized C++ Gibbs/slice sampler and a STAN/HMC implementation), placing support-respecting priors on each parameter: beta/uniform for the retention rate and half-saturation , uniform for the delay , gamma for the slope , and half-normal for the nonnegative coefficients . The central empirical finding: when the sample is small and the signal weak, the posterior is dominated by the prior and the data cannot correct prior-induced bias — so prior choice has a large, sometimes determinative, impact on estimates and downstream attribution.

Overview

Because the adstock and Hill transforms make the model nonlinear in its parameters, maximizing the likelihood (frequentist MLE) is nontrivial. More importantly, a single MMM dataset carries little information relative to the parameter count, so the Bayesian framework is adopted specifically to incorporate prior knowledge from industry experience or related media-mix models. The model can also be extended to a hierarchical Bayesian form pooling across related brands (Wang et al. 2017) or geos (Sun et al. 2017) to manufacture more informative priors.

Main Content

Likelihood, posterior, and MCMC

Let be all model parameters, the media, the controls, the response. The frequentist MLE is (Eq. 8). The Bayesian posterior is

Conjugate priors would give an analytic posterior, but here samples are drawn by MCMC. Two samplers: a customized Gibbs sampler using a slice sampler (Neal 2003) in C++/BOOM (Scott 2016), and a STAN implementation using Hamiltonian Monte Carlo (HMC). High posterior correlation among the transformation parameters challenges STAN — it can take hours/days on a few-thousand points — so the custom Gibbs sampler is much more efficient. Posterior summaries: mean, median, or mode, plus quantile-based credible intervals. See MCMC Basics.

Prior specifications (with rationale)

Priors must respect each parameter’s support (Gelman 2006):

  • Retention rate — beta or uniform on (simulation: ); narrower support if strong prior knowledge.
  • Delay — uniform or scaled beta (simulation: ).
  • Slope — gamma with a positive mode (simulation: ).
  • Half-saturation — beta constrained over the observed spend range (simulation: ), because outside the observed range is unidentifiable (see Shape (Saturation) Effects).
  • Coefficients — half-normal (normal constrained nonnegative), since media effect is believed nonnegative (simulation: ).
  • Baseline ; controls ; noise variance .

Prior dominance in small samples

If the data has strong information content, priors with the same support yield similar posteriors. If not, the prior has a large influence and the posterior may look almost the same as the prior. Empirically (Sec. 6): adstock parameters are recovered fairly well even in small samples, but the shape parameters suffer high variance and large bias for small samples — the Hill curves are systematically underestimated. The bias is attributable to the priors: when sample size is small and signal weak, the data is not strong enough to correct prior-induced bias.

Examples

Sensitivity to the prior on (Sec. 7.1)

Three priors compared over 500 datasets: , , . The two normal priors give nearly identical, underestimated Hill curves; puts more mass on large and so produces smaller bias (e.g. Media 2 at : −18.0% / −18.0% / −1.5%). But this does not generalize — in scenarios where curves are over-estimated, would worsen the bias. There is no universally “correct” prior.

Sensitivity to the prior on (Sec. 7.2)

Priors , , across two scenarios (true inside the observed range; true outside it). The Hill curves are similar across all three priors, but the estimates of differ markedly for the wide prior. Because media effect depends on the curve (not the individual ), the model is not very sensitive to the prior — but a tighter, knowledge-backed prior speeds sampler convergence. When lies outside the data range it cannot be estimated well even with a well-placed prior, yet the curve within range is still fine.

Connections

See Also