Discrete Choice Models

Summary

Discrete choice models operationalize the theory that agents choose the option that maximises subjective utility. Utility is modelled as a latent linear function of observable alternative attributes, and the Gumbel-distributed error assumption yields the softmax (multinomial logit) likelihood. Pioneered by Daniel McFadden (1970s) for transport demand forecasting; widely used in marketing, economics, and policy.

Core Idea: Random Utility Theory

Each alternative is assigned a latent utility . A decision-maker chooses the alternative with the highest utility. Utility is decomposed into a deterministic component (a linear function of observable attributes) plus an unobserved stochastic component:

Assuming , the probability of choosing alternative is given by the softmax (multinomial logit) transform:

This arises because differences of Gumbel random variables follow a logistic distribution.

Identification

Only utility differences are identified. The utility of one alternative (the “outside good” or pivot alternative) is fixed at 0 to achieve parameter identification, especially when alternative-specific intercepts are included.

Model Variants

1. Basic Conditional Logit (CL)

Utility driven entirely by alternative-specific attributes (e.g. installation cost, operating cost) with globally shared coefficients:

with pm.Model(coords=coords) as model_1:
    beta_ic = pm.Normal("beta_ic", 0, 1)
    beta_oc = pm.Normal("beta_oc", 0, 1)
    s = pm.math.stack([beta_ic * ic_j + beta_oc * oc_j for j in alts]).T
    p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
    choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")

Limitation: With only cost coefficients and no alternative-specific intercepts, the model fails to capture average preference differences across alternatives — posterior predictive checks reveal poor fit.

2. With Alternative-Specific Intercepts

Adding an intercept per alternative (except the pivot) absorbs systematic preference heterogeneity:

This is the standard multinomial logit and substantially improves PPC performance.

3. Correlated Intercepts (MvNormal Prior)

Placing a multivariate normal prior on the intercepts (via LKJ Cholesky decomposition) captures correlation structure among alternatives. Useful for understanding substitution patterns, though it does not always improve predictive metrics:

chol, corr, stds = pm.LKJCholeskyCov("chol", n=5, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=5))
alphas = pm.MvNormal("alpha", mu=0, chol=chol, dims="alts_probs")

Marginal Rate of Substitution

The ratio of coefficients in the utility function is economically interpretable even though utility itself is latent. For a utility of the form :

This gives the rate at which one-time installation costs substitute for recurring operating costs at constant utility. In Bayesian analysis, we obtain a full posterior over this quantity.

Data Format

Discrete choice data can be structured in either wide format (one row per decision-maker, cost columns per alternative) or long format (one row per decision-maker × alternative, with a binary choice flag). Long format makes matrix operations more natural, especially in Stan or pylogit.

Model Assessment

  • Posterior predictive checks (PPC): Compare predicted market shares (mean and 95% CI) against observed shares.
  • WAIC / LOO: Penalise added complexity from additional parameters (e.g. correlation structure).
  • Marginal rate of substitution: Posterior distribution over economically meaningful derived quantities.

Applications

  • Transport demand: McFadden’s original BART study — predicting market share for new transit options.
  • Marketing: Consumer brand choice from product attributes (e.g. cracker brands).
  • Energy economics: Household choice among heating systems.

Implementation Notes

  • Uses pm.Categorical likelihood with pm.math.softmax.
  • Categorical encoding of the outcome must match the ordering of stacked utility vectors.
  • pm.MutableData containers allow post-fit counterfactual inference (e.g. policy-price simulations).

See Also