Linear Models in Statistical Rethinking

Summary

Chapter 4 of Statistical Rethinking builds Bayesian linear regression from scratch. Why normal distributions arise from addition (CLT), how to write models in mathematical notation and translate to R code, and how to generate posterior predictions with uncertainty intervals.

Why Normal Distributions Are Normal

The Gaussian distribution arises naturally from addition of many small effects (Central Limit Theorem). McElreath demonstrates this with a soccer field simulation: random steps left/right converge to a bell curve regardless of step size distribution.

Two justifications for using Gaussian likelihoods:

Ontological: many natural measurements are approximately Gaussian because they arise from additive processes
Epistemological: the Gaussian is the maximum entropy distribution for a given mean and variance — it assumes the least about the data

The Model Language

A complete Bayesian model specifies likelihood and priors:

h_{i} \sim Normal (μ_{i}, σ)

μ_{i} = α + β x_{i}

α \sim Normal (178, 100)

β \sim Normal (0, 10)

σ \sim Uniform (0, 50)

The R map function fits this by finding the maximum a posteriori (MAP) estimate and approximating the posterior as multivariate Gaussian.

Prior Predictive Simulation

Always Simulate from Priors First

Before fitting, simulate predictions from the prior to check that your priors produce sensible outcomes. This is a key step in Bayesian workflow.

Generating Predictions

Three-step recipe for any fitted model:

Use link to generate posterior distributions of $μ$ at each predictor value
Use mean/HPDI/PI to summarize those distributions
Use sim to generate full posterior predictions (incorporating $σ$ )

The two kinds of uncertainty:

Narrow interval (around $μ$ ): uncertainty about the average outcome at each predictor value
Wide interval (from sim): uncertainty about individual observations, including residual variation $σ$

Polynomial Regression

Polynomial models $μ_{i} = α + β_{1} x_{i} + β_{2} x_{i}^{2}$ can capture curvature but:

Hard to interpret coefficients
Better to use a mechanistic model when possible
Always standardize predictors first for numerical stability

Second Brain

Explorer

Linear Models in Statistical Rethinking

Linear Models in Statistical Rethinking

Why Normal Distributions Are Normal

The Model Language

Prior Predictive Simulation

Generating Predictions

Polynomial Regression

See Also

Graph View

Table of Contents

Backlinks