Discrete Choice Models

Summary

Discrete choice models operationalize the theory that agents choose the option that maximises subjective utility. Utility is modelled as a latent linear function of observable alternative attributes, and the Gumbel-distributed error assumption yields the softmax (multinomial logit) likelihood. Pioneered by Daniel McFadden (1970s) for transport demand forecasting; widely used in marketing, economics, and policy.

Core Idea: Random Utility Theory

Each alternative $j \in Alt$ is assigned a latent utility $U_{j}$ . A decision-maker chooses the alternative with the highest utility. Utility is decomposed into a deterministic component (a linear function of observable attributes) plus an unobserved stochastic component:

U_{j} = x_{j}^{⊤} β + ε_{j}

Assuming $ε_{j} \sim Gumbel$ , the probability of choosing alternative $j$ is given by the softmax (multinomial logit) transform:

P (choose j) = softmax (u)_{j} = \frac{exp ( u _{j} )}{\sum _{q = 1}^{J} exp ( u _{q} )}

This arises because differences of Gumbel random variables follow a logistic distribution.

Identification

Only utility differences are identified. The utility of one alternative (the “outside good” or pivot alternative) is fixed at 0 to achieve parameter identification, especially when alternative-specific intercepts are included.

Model Variants

1. Basic Conditional Logit (CL)

Utility driven entirely by alternative-specific attributes (e.g. installation cost, operating cost) with globally shared coefficients:

u_{j} = β_{i c} \cdot ic_{j} + β_{oc} \cdot oc_{j}

with pm.Model(coords=coords) as model_1:
    beta_ic = pm.Normal("beta_ic", 0, 1)
    beta_oc = pm.Normal("beta_oc", 0, 1)
    s = pm.math.stack([beta_ic * ic_j + beta_oc * oc_j for j in alts]).T
    p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
    choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")

Limitation: With only cost coefficients and no alternative-specific intercepts, the model fails to capture average preference differences across alternatives — posterior predictive checks reveal poor fit.

2. With Alternative-Specific Intercepts

Adding an intercept per alternative (except the pivot) absorbs systematic preference heterogeneity:

u_{j} = α_{j} + β_{i c} \cdot ic_{j} + β_{oc} \cdot oc_{j} (u_{hp} = β_{i c} \cdot ic_{hp} + β_{oc} \cdot oc_{hp})

This is the standard multinomial logit and substantially improves PPC performance.

3. Correlated Intercepts (MvNormal Prior)

Placing a multivariate normal prior on the intercepts (via LKJ Cholesky decomposition) captures correlation structure among alternatives. Useful for understanding substitution patterns, though it does not always improve predictive metrics:

chol, corr, stds = pm.LKJCholeskyCov("chol", n=5, eta=2.0, sd_dist=pm.Exponential.dist(1.0, shape=5))
alphas = pm.MvNormal("alpha", mu=0, chol=chol, dims="alts_probs")

Marginal Rate of Substitution

The ratio of coefficients in the utility function is economically interpretable even though utility itself is latent. For a utility of the form $U = β_{oc} \cdot oc + β_{i c} \cdot i c$ :

- \frac{d i c}{d oc}_{d U = 0} = \frac{β _{oc}}{β _{i c}}

This gives the rate at which one-time installation costs substitute for recurring operating costs at constant utility. In Bayesian analysis, we obtain a full posterior over this quantity.

Data Format

Discrete choice data can be structured in either wide format (one row per decision-maker, cost columns per alternative) or long format (one row per decision-maker × alternative, with a binary choice flag). Long format makes matrix operations more natural, especially in Stan or pylogit.

Model Assessment

Posterior predictive checks (PPC): Compare predicted market shares (mean and 95% CI) against observed shares.
WAIC / LOO: Penalise added complexity from additional parameters (e.g. correlation structure).
Marginal rate of substitution: Posterior distribution over economically meaningful derived quantities.

Applications

Transport demand: McFadden’s original BART study — predicting market share for new transit options.
Marketing: Consumer brand choice from product attributes (e.g. cracker brands).
Energy economics: Household choice among heating systems.

Implementation Notes

Uses pm.Categorical likelihood with pm.math.softmax.
Categorical encoding of the outcome must match the ordering of stacked utility vectors.
pm.MutableData containers allow post-fit counterfactual inference (e.g. policy-price simulations).

Second Brain

Explorer

Discrete Choice Models

Discrete Choice Models

Core Idea: Random Utility Theory

Model Variants

1. Basic Conditional Logit (CL)

2. With Alternative-Specific Intercepts

3. Correlated Intercepts (MvNormal Prior)

Marginal Rate of Substitution

Data Format

Model Assessment

Applications

Implementation Notes

See Also

Graph View

Table of Contents

Backlinks