Confirmatory Factor Analysis and Structural Equation Models

Summary

CFA and SEM are latent variable models developed for psychometrics. CFA posits that observed survey indicators are caused by unobserved (latent) constructs; SEM adds structural (regression) paths between the latent constructs. Both are implemented as constrained Bayesian models in PyMC.

Motivation: Measuring Latent Constructs

Psychological constructs (intelligence, anxiety, political attitudes) are not directly observable. Instead, researchers design survey items that indicate the latent construct. CFA formalises this:

  • Indicators are observed
  • Latent factor(s) are unobserved
  • Factor loadings connect indicators to factors

Pearl (motivation)

“The notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments — the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly.”

CFA: The Measurement Model

For a single factor:

  • : item intercept (mean when )
  • : factor loading (sensitivity of item to the latent factor)
  • : standardised latent factor
  • : unique (item-specific) variance

Identifiability constraints are essential:

  • Fix one loading to 1.0 (marker variable), or standardise the factor
  • Without constraints, the model is not identified (rotation problem, as in Factor Analysis and PPCA)

PyMC implementation sketch

with pm.Model() as cfa_model:
    eta = pm.Normal("eta", 0, 1, shape=n)          # latent factor scores
 
    # Loadings (first one fixed to 1 for identification)
    lam = pm.Normal("lambda", 0, 1, shape=p - 1)
    loadings = pt.concatenate([[1.0], lam])
 
    mu_items = pm.Normal("mu", 0, 5, shape=p)
    psi = pm.HalfNormal("psi", 1, shape=p)         # unique variances
 
    y_hat = mu_items + loadings * eta[:, None]
    pm.Normal("y", mu=y_hat, sigma=psi, observed=Y)

SEM: Adding Structural Paths

SEM extends CFA by allowing regression among latent variables:

This enables testing hypotheses about causal relationships between latent constructs (e.g., “latent anxiety predicts latent avoidance”), not just their measurement.

CFA vs. EFA

Exploratory FA (EFA)Confirmatory FA (CFA)
LoadingsUnconstrainedTheory-specified (many fixed to 0)
GoalDiscover factor structureTest a hypothesised structure
IdentifiabilityRotation ambiguityResolved by constraints
See alsoFactor Analysis and PPCAThis note

Model Fit in Psychometrics

Bayesian CFA/SEM uses posterior predictive checks (PPCs) and model comparison (WAIC/LOO) rather than classical fit indices (CFI, RMSEA). However, the spirit is the same: does the restricted factor model reproduce the observed correlation matrix?

Connections

Source