Factor Analysis and Probabilistic PCA (PPCA)

Summary

Factor analysis (FA) is a probabilistic model for identifying low-rank structure in multivariate data via latent variables. It is a linear Gaussian model: observed data $X$ are modelled as a noisy linear transformation of latent factors $F$ . Probabilistic PCA (PPCA) is a special case with isotropic noise. Naive implementations suffer from non-identifiability; the constrained parametrisation (lower-triangular $W$ ) and amortized inference (marginalizing $F$ ) are the two key remedies.

Model Formulation

X_{(d, n)} ∣ W_{(d, k)}, F_{(k, n)} \sim N (W F, Ψ)

where:

$d$ = number of observed dimensions, $n$ = number of observations, $k$ = number of latent factors ( $k ≪ d$ )
$W$ = loading matrix (relates latent factors to observations)
$F$ = factor scores (latent representation)
$Ψ$ = diagonal noise covariance (FA); $Ψ = σ^{2} I$ for PPCA

The fundamental assumption is that $W W^{⊤}$ is low rank: most variance in $X$ is captured by a small number of latent directions.

Relation to PCA

Model	Prior on $F$	Noise $Ψ$
PCA (standard)	Deterministic	—
PPCA	$N (0, I)$	$σ^{2} I$ (isotropic)
Factor Analysis	$N (0, I)$	$diag (ψ_{1}, \dots, ψ_{d})$

The Identifiability Problem

The naive model is non-identified: only the product $W F$ enters the likelihood, so $P (X ∣ W, F) = P (X ∣ W Ω, Ω^{- 1} F)$ for any invertible $Ω$ . This causes factors and loadings to rotate, reflect, and permute freely between MCMC chains, producing:

Inconsistent chain means (multi-modal posterior)
Heavy autocorrelation in traceplots

Symptom

If $\hat{R} > 1.01$ and ESS is very low for $W$ entries, the model is not identified.

Constrained Parametrisation (Fix)

Restrict $W$ to be:

Lower triangular — eliminates rotational freedom
Positive, increasing diagonal entries — eliminates sign and permutation ambiguity (use cumulative sum of half-normal draws)

def makeW(d, k, dim_names):
    n_od = int(k * d - k * (k - 1) / 2 - k)
    z = pm.HalfNormal("W_z", 1.0, dims="latent_columns")          # positive diag
    b = pm.Normal("W_b", 0.0, 1.0, shape=(n_od,), dims="packed_dim")  # off-diagonal
    L = expand_packed_block_triangular(d, k, b, pt.ones(k))
    W = pm.Deterministic("W", L @ pt.diag(pt.extra_ops.cumsum(z)), dims=dim_names)
    return W

With this parametrisation, chains agree on posterior means and $\hat{R}$ improves substantially.

Amortized Inference (Marginalizing Out $F$ )

Explicitly sampling the $k \times n$ matrix $F$ is expensive for large $n$ and prevents minibatch streaming. Instead, integrate $F$ out analytically:

X ∣ W \sim N (0, W W^{⊤} + σ^{2} I)

This reduces the parameter space to $W$ and $σ$ only, enabling:

Faster per-iteration computation (fewer parameters)
Minibatch ADVI via pm.Minibatch + pm.fit(method="fullrank_advi")

Tradeoff: Computing the log-prob requires inverting the $d \times d$ covariance matrix, which is slow for large $d$ . MCMC on the amortized model is typically slower per sample but the reduced parameter count offsets this for large $n$ .

with pm.Model(coords=coords) as PPCA_amortized:
    W = makeW(d, k, ("observed_columns", "latent_columns"))
    sigma = pm.HalfNormal("sigma", 1.0)
    cov = W @ W.T + sigma**2 * pt.eye(d)
    X = pm.MvNormal("X", 0.0, cov=cov, observed=Y.T, dims=("rows", "observed_columns"))

Post-hoc Recovery of Factor Scores $F$

After fitting the amortized model (where $F$ was marginalized away), recover individual factor scores using the conjugate posterior:

F ∣ X, W \sim N (μ_{F}, Σ_{F})

μ_{F} = (I + σ^{- 2} W^{⊤} W)^{- 1} σ^{- 2} W^{⊤} X

Σ_{F} = (I + σ^{- 2} W^{⊤} W)^{- 1}

This is amortized inference: we postpone computing individual $F_{i}$ until after model fitting, then recover them analytically from the posterior samples of $W$ and $σ$ .

Reconstruction Quality

Model quality can be assessed by comparing the reconstruction $\hat{X} = W F$ against the original data. A scatter plot of observed vs. reconstructed values with $R^{2}$ from linear regression provides a quick diagnostic.

Scalability Summary

Approach	Fits large $n$ ?	Minibatch?	Notes
Explicit $F$ (MCMC)	No — $k \times n$ params	No	Simplest but slow
Amortized (MCMC)	Moderate	No	Covariance inversion bottleneck
Amortized (ADVI)	Yes	Yes	FullRankADVI + Minibatch

Second Brain

Explorer

Factor Analysis and Probabilistic PCA

Factor Analysis and Probabilistic PCA (PPCA)

Model Formulation

Relation to PCA

The Identifiability Problem

Constrained Parametrisation (Fix)

Amortized Inference (Marginalizing Out $F$ )

Post-hoc Recovery of Factor Scores $F$

Reconstruction Quality

Scalability Summary

See Also

Graph View

Table of Contents

Backlinks

Second Brain

Explorer

Factor Analysis and Probabilistic PCA

Factor Analysis and Probabilistic PCA (PPCA)

Model Formulation

Relation to PCA

The Identifiability Problem

Constrained Parametrisation (Fix)

Amortized Inference (Marginalizing Out F)

Post-hoc Recovery of Factor Scores F

Reconstruction Quality

Scalability Summary

See Also

Graph View

Table of Contents

Backlinks

Amortized Inference (Marginalizing Out $F$ )

Post-hoc Recovery of Factor Scores $F$