Spatial Models — Besag-York-Mollie (BYM)

Summary

The BYM model is the standard Bayesian approach for areal spatial data (counties, census tracts, etc.). It decomposes spatial variation into a structured component (ICAR prior, capturing spatial autocorrelation) and an unstructured component (i.i.d. random effects), with a mixture parameter $ρ$ controlling the balance.

When to Use BYM

Data type: Areal (discrete spatial units with neighbourhood structure), not point-referenced data
Goal: Partition variance into spatially-structured vs. unstructured components; estimate predictor effects after accounting for spatial autocorrelation
Scale: Computationally efficient — cost grows nearly linearly in number of areas (unlike full Gaussian Process)
For continuous spatial data, use a Gaussian Process instead

The ICAR Prior

Intrinsic Conditional Autoregressive (ICAR) is the engine of BYM. Given adjacency matrix $W$ (1 if areas share a border, 0 otherwise), ICAR encodes:

f (ϕ ∣ W) \propto exp (- \frac{1}{2} i \sim j \sum (ϕ_{i} - ϕ_{j})^{2})

Key properties:

Penalises differences between neighbouring areas (Tobler’s first law of geography: nearby things are more similar)
$ϕ$ is constrained to sum to zero (zero-mean)
Improper prior — must be embedded in a larger model with identifiable parameters
In PyMC: pm.ICAR("phi", W=W)

Graph Theory Vocabulary

Term	Spatial Meaning
Node	An area (census tract, county)
Edge	Shared border between two areas
Adjacency matrix $W$	$W_{ij} = 1$ if areas $i, j$ share a border
Edge list	Equivalent compact representation of $W$

The BYM2 Parameterisation

The modern BYM formulation (Riebler et al., 2016) uses a scaled, identified mixture:

BYM component = β + σ (1 - ρ θ + ρ / s ϕ)

Parameter	Meaning
$β$	Intercept (re-centres the mixture)
$σ$	Joint scale of random effects
$ρ \in [0, 1]$	Fraction of variance that is spatially structured
$θ \sim N (0, 1)$	Unstructured random effects
$ϕ$	ICAR spatial effects (standardised by scaling factor $s$ )
$s$	Scaling factor (computed from $W$ )

The scaling factor $s$

$s$ is the geometric mean of the diagonal of the pseudo-inverse of the graph Laplacian $Q = D - W$ . It normalises $ϕ$ so that $Var (ϕ) \approx 1$ , making $ϕ$ and $θ$ directly comparable and $σ$ interpretable as a joint scale.

Intuition: Densely connected graphs (every area adjacent to every other) imply little variance — the scaling factor corrects for this so $ρ$ can be interpreted consistently across different graph structures.

PyMC Model (NYC Traffic Accidents)

with pm.Model(coords=coords) as BYM_model:
    beta0   = pm.Normal("beta0", 0, 1)            # intercept
    beta1   = pm.Normal("beta1", 0, 1)            # fragmentation effect
    theta   = pm.Normal("theta", 0, 1, dims="area_idx")  # unstructured RE
    phi     = pm.ICAR("phi", W=W_nyc, dims="area_idx")  # structured RE
    sigma   = pm.HalfNormal("sigma", 1)
    rho     = pm.Beta("rho", 0.5, 0.5)
 
    mixture = pt.sqrt(1 - rho) * theta + pt.sqrt(rho / scaling_factor) * phi
    mu = pt.exp(log_E + beta0 + beta1 * fragment_index + sigma * mixture)
    y_i = pm.Poisson("y_i", mu, observed=y)

Offset log_E: log-population, makes mu an excess risk rate (not raw count)
Poisson likelihood: appropriate for count outcomes (traffic accidents, disease cases)

Variance Decomposition

After fitting, visualise each component separately:

Visualisation	Set	Interpretation
Spatial smoothing	$ρ = 1$ , $β_{1} = 0$	What spatial structure alone explains
Predictor effect	$σ = 0$	What fragmentation index alone explains
Unstructured residuals	$ρ = 0$ , $β_{1} = 0$	Remaining unstructured noise

Spatial smoothing is useful for forecasting: low-accident tracts surrounded by high-accident neighbourhoods will likely regress toward their neighbours in the future (partial pooling in space).

Connections

ICAR relates to the HSGP (graph Laplacian eigenfunctions ~ Laplace operator eigenfunctions)
Random effects structure: compare Hierarchical Linear Models (pooling across groups)
Poisson GLM: see Generalized Linear Models

Source

The Besag-York-Mollie Model for Spatial Data — PyMC example by Daniel Saunders (2023); NYC pedestrian accident data
Riebler et al. (2016): “An intuitive Bayesian spatial model for disease mapping that accounts for scaling”
Stan case study: ICAR and BYM2

Second Brain

Explorer

Spatial Models — Besag-York-Mollie (BYM)

Spatial Models — Besag-York-Mollie (BYM)

When to Use BYM

The ICAR Prior

Graph Theory Vocabulary

The BYM2 Parameterisation

PyMC Model (NYC Traffic Accidents)

Variance Decomposition

Connections

See Also

Source

Graph View

Table of Contents

Backlinks