Model Selection and Exploratory Analysis

Summary

Model selection for market response models involves choosing among candidate functional forms, lag structures, and variable sets. This note covers information criteria (AIC, BIC), cross-validation, prior knowledge integration, and the role of exploratory data analysis (EDA) before formal model estimation.

The Model Selection Problem in Marketing

Market response modeling faces three nested selection problems:

Variable selection: which marketing instruments to include
Functional form selection: linear, log-log, ADBUDG, etc.
Lag structure selection: Koyck, PDL, or unrestricted ADL

These cannot be tested simultaneously within a single classical framework without inflating Type I error — related to Garden of Forking Paths and Researcher Degrees of Freedom.

Information Criteria

AIC and BIC

For a model with $k$ free parameters and log-likelihood $ln \hat{L}$ :
$AIC = - 2 ln \hat{L} + 2 k$ $BIC = - 2 ln \hat{L} + k ln T$
Select the model with lowest AIC or BIC. BIC penalizes complexity more heavily and tends toward parsimony. For large $T$ , BIC is consistent (selects the true model if it is in the candidate set); AIC selects the best approximating model.

Both are equivalent to Bayesian model comparison with flat priors (AIC) or reference priors (BIC) — see Model Comparison and Overfitting and Information Criteria.

Cross-Validation for Predictive Model Selection

Hold-Out Cross-Validation

Split the time series into estimation period and validation period:

Estimate model on first $T_{1}$ observations

Forecast for observations $T_{1} + 1, \dots, T$

Compare forecasts to actual values (MAPE, RMSE)

This directly targets predictive accuracy rather than in-sample fit. Cross-validation favors models that generalize rather than overfit.

For time series, use rolling window or recursive forecasting rather than random k-fold splits (to avoid look-ahead bias).

Role of Prior Knowledge in Model Selection

Manager Elicitation

Before estimation, managers can provide:

Response at current spending: point estimate of current sales

Response at zero spending: baseline sales estimate

Saturation level: maximum achievable sales

Shape: concave vs. S-shaped based on industry experience

These constraints can be incorporated as Bayesian priors or as parameter bounds in nonlinear estimation, reducing the effective model selection problem to a constrained search. This approach aligns with ADBUDG calibration (Little 1970) and managerial judgment methods.

Exploratory Data Analysis for Marketing Data

Before formal estimation:

Plot raw series: identify trends, seasonality, outliers
Scatter plots $Q$ vs. $X_{j}$ : visual indication of linearity vs. concavity
Correlation matrix: flag multicollinearity among instruments
ACF/PACF of $Q$ : determine if dynamic model is needed (see Single Marketing Time Series)
CCF between $Q$ and $X$ : identify transfer function structure (see Transfer Function Model)
Box plots by promotion status: quantify lift from feature/display

Multiple Testing and the Garden of Forking Paths

Running many specification tests on the same data inflates the overall Type I error rate. Key connections:

Pre-registration of model specification before seeing data
Bonferroni correction for multiple elasticity comparisons: see Multiple Testing Corrections
Bayesian model averaging: assign posterior probability to each model and average predictions

The more researchers test, the more likely a spuriously good-fitting model will be found. Reporting AIC alongside p-values mitigates this.

Cross-Links

Specification testing: Model Testing and Specification
Flexible forms for comparison: Flexible Functional Forms
Bayesian model comparison: Model Comparison, Overfitting and Information Criteria
Multiple testing: Multiple Testing Corrections, Garden of Forking Paths
ARIMA identification (EDA for time series): Single Marketing Time Series

Second Brain

Explorer

Model Selection and Exploratory Analysis

Model Selection and Exploratory Analysis

The Model Selection Problem in Marketing

Information Criteria

Cross-Validation for Predictive Model Selection

Role of Prior Knowledge in Model Selection

Exploratory Data Analysis for Marketing Data

Multiple Testing and the Garden of Forking Paths

Cross-Links

Graph View

Table of Contents

Backlinks