Confirmatory Factor Analysis and Structural Equation Models
Summary
CFA and SEM are latent variable models developed for psychometrics. CFA posits that observed survey indicators are caused by unobserved (latent) constructs; SEM adds structural (regression) paths between the latent constructs. Both are implemented as constrained Bayesian models in PyMC.
Motivation: Measuring Latent Constructs
Psychological constructs (intelligence, anxiety, political attitudes) are not directly observable. Instead, researchers design survey items that indicate the latent construct. CFA formalises this:
- Indicators are observed
- Latent factor(s) are unobserved
- Factor loadings connect indicators to factors
Pearl (motivation)
“The notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments — the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly.”
CFA: The Measurement Model
For a single factor:
- : item intercept (mean when )
- : factor loading (sensitivity of item to the latent factor)
- : standardised latent factor
- : unique (item-specific) variance
Identifiability constraints are essential:
- Fix one loading to 1.0 (marker variable), or standardise the factor
- Without constraints, the model is not identified (rotation problem, as in Factor Analysis and PPCA)
PyMC implementation sketch
with pm.Model() as cfa_model:
eta = pm.Normal("eta", 0, 1, shape=n) # latent factor scores
# Loadings (first one fixed to 1 for identification)
lam = pm.Normal("lambda", 0, 1, shape=p - 1)
loadings = pt.concatenate([[1.0], lam])
mu_items = pm.Normal("mu", 0, 5, shape=p)
psi = pm.HalfNormal("psi", 1, shape=p) # unique variances
y_hat = mu_items + loadings * eta[:, None]
pm.Normal("y", mu=y_hat, sigma=psi, observed=Y)SEM: Adding Structural Paths
SEM extends CFA by allowing regression among latent variables:
This enables testing hypotheses about causal relationships between latent constructs (e.g., “latent anxiety predicts latent avoidance”), not just their measurement.
CFA vs. EFA
| Exploratory FA (EFA) | Confirmatory FA (CFA) | |
|---|---|---|
| Loadings | Unconstrained | Theory-specified (many fixed to 0) |
| Goal | Discover factor structure | Test a hypothesised structure |
| Identifiability | Rotation ambiguity | Resolved by constraints |
| See also | Factor Analysis and PPCA | This note |
Model Fit in Psychometrics
Bayesian CFA/SEM uses posterior predictive checks (PPCs) and model comparison (WAIC/LOO) rather than classical fit indices (CFI, RMSEA). However, the spirit is the same: does the restricted factor model reproduce the observed correlation matrix?
Connections
- Factor Analysis and PPCA — exploratory factor analysis and PPCA; identifiability via constrained
- Hierarchical Models — latent factors as hierarchical random effects
- Spurious Association and Confounds — causal thinking about latent ↔ indicator relationships
- Nonparametric Models Overview — for factor models with non-Gaussian priors
Source
- Confirmatory Factor Analysis and Structural Equation Models in Psychometrics — PyMC case study; draws on Levy & Mislevy, Bayesian Psychometric Modeling