Evans (2024) Ch. 19 demonstrates SMM estimation in Python using scipy.optimize.minimize with the L-BFGS-B method. The worked example fits a truncated normal to Econ 381 test scores (N=161) with S=100 simulations. A critical practical lesson: L-BFGS-B defaults to eps=1e-8 for its internal finite-difference Jacobian, which is too small when moment values are in the hundreds — the optimizer sees zero gradient and returns the initial guess after 3 function evaluations. Fix: options={'eps': 1.0}.
Overview
SMM estimation requires five components in code:
Fixed random draws — simulate the N×S uniform matrix once before optimization; never re-draw inside the criterion function
Simulation function — convert uniform draws to model draws via inverse CDF (common random numbers technique)
Moment function — compute moments from (N,) data or (N,S) simulated matrix
Error vector function — percent deviations of simulated moments from data moments
Criterion function — eTWe, the scalar to minimize
The weighting matrix W_hat is passed as an argument to the criterion function so the same code handles both identity-W and two-step-W estimation.
General Workflow
import numpy as npimport numpy.linalg as linimport scipy.stats as stsimport scipy.optimize as opt# Step 1: Fix random draws before optimizationnp.random.seed(25)N, S = len(data), 100unif_vals = sts.uniform.rvs(0, 1, size=(N, S)) # shape (N, S), never re-drawn# Step 2: Initial estimate with identity WW_hat_1 = np.eye(R)params_init = np.array([...])results_1 = opt.minimize( criterion, params_init, args=(data, unif_vals, ..., W_hat_1), method='L-BFGS-B', bounds=[...], options={'eps': 1.0} # CRITICAL when moments are in the 100s)theta_hat_1 = results_1.x# Step 3: Build two-step W from Step 1 estimatesErr_mat = get_Err_mat(data, unif_vals, *theta_hat_1, ...) # shape (R, S)VCV = (1 / S) * (Err_mat @ Err_mat.T) # shape (R, R)W_hat_2 = lin.inv(VCV) # or lin.pinv() if ill-conditioned# Step 4: Re-estimate with optimal Wresults_2 = opt.minimize( criterion, theta_hat_1, args=(data, unif_vals, ..., W_hat_2), method='L-BFGS-B', bounds=[...], options={'eps': 1.0})theta_hat_2 = results_2.x# Step 5: Compute Sigma_hat via numerical Jacobiand = Jac_err(data, unif_vals, *theta_hat_2, ...) # shape (R, K)Sigma_hat = (1 / S) * lin.inv(d.T @ W_hat_2 @ d) # shape (K, K)std_errs = np.sqrt(np.diag(Sigma_hat))
Common Random Numbers
The unif_vals matrix is drawn once before the optimizer is called. Inside criterion(), these same draws are used for every evaluation of θ. This ensures the criterion function is smooth in θ — without fixed draws, the stochastic component changes at each optimizer step and the criterion surface becomes non-differentiable. See Practical Issues in Simulation Estimation for the common random numbers principle.
Worked Example: Truncated Normal
Problem: Fit (μ,σ) of a truncated normal on [0,450] to 161 macroeconomics test scores using SMM.
Simulation via Inverse CDF
def trunc_norm_draws(unif_vals, mu, sigma, cut_lb, cut_ub): """ Draw (N x S) matrix from truncated normal N(mu, sigma) on [cut_lb, cut_ub]. Uses inverse CDF (percent-point function) applied to rescaled uniform draws. unif_vals : (N, S) matrix of U(0,1) draws — fixed before optimization Returns : (N, S) matrix of truncated normal draws """ cut_ub_cdf = sts.norm.cdf(cut_ub, loc=mu, scale=sigma) cut_lb_cdf = sts.norm.cdf(cut_lb, loc=mu, scale=sigma) unif2_vals = unif_vals * (cut_ub_cdf - cut_lb_cdf) + cut_lb_cdf return sts.norm.ppf(unif2_vals, loc=mu, scale=sigma)
Moment Functions
Two moments (mean, variance):
def data_moments2(xvals): """ xvals : (N,) or (N, S) — 1D for data, 2D for S simulations Returns: mean_data, var_data (scalars if 1D, (S,) vectors if 2D) """ if xvals.ndim == 1: return xvals.mean(), xvals.var() elif xvals.ndim == 2: return xvals.mean(axis=0), xvals.var(axis=0)
Four moments (bin percentages — better identification):
Symptom: The optimizer returns the initial guess after exactly 3 function evaluations, with jac: [0., 0.] and nit: 0.
Cause: L-BFGS-B estimates its internal gradient via finite differences with step size eps=1e-8 (default). It evaluates:
Guess 1: (μ0,σ0)
Guess 2: (μ0+10−8,σ0)
Guess 3: (μ0,σ0+10−8)
When moments are in the hundreds (mean ≈ 342, variance ≈ 7828), a perturbation of 10−8 in μ or σ produces a negligible change in the simulated moments — the criterion function looks flat at machine precision. The optimizer concludes the gradient is zero and halts.
Fix: Set options={'eps': 1.0} (or another scale-appropriate value):
The step size h = 1e-4 * |theta| (relative step) is much larger than the default eps=1e-8 used by L-BFGS-B, which is why this manual Jacobian works even when the optimizer’s internal Jacobian estimate failed. See Practical Issues in Simulation Estimation for step-size guidelines.
Results Comparison
Setting
μ^
σ^
SE(μ^)
SE(σ^)
2 moments, W=I
612.3
197.3
776.2
211.9
2 moments, 2-step W
619.4
199.1
49.0
15.2
4 moments, W=I
362.6
46.6
5.8
4.3
4 moments, 2-step W
362.6
46.6
1.9
1.3
Key observations:
The 2-moment setup produces estimates near the MLE solution (μ^≈612, σ^≈197) — but with enormous standard errors because mean and variance are highly correlated in the truncated normal
The 4 bin-percentage moments produce estimates closer to the data histogram shape (μ^≈362, σ^≈47) with much tighter standard errors
The two-step W substantially reduces standard errors in both setups, especially the 2-moment case (776 → 49 for μ^), without changing point estimates much
Moment selection matters more than weighting matrix choice for point estimate quality
Two-Step W: Computing the Error Matrix
def get_Err_mat(data, unif_vals, mu, sigma, cut_lb, cut_ub, simple=False): """ Returns (R, S) matrix where column s is the moment error vector for simulation s. Used to build the variance-covariance matrix of moments. """ R, S = 4, unif_vals.shape[1] Err_mat = np.zeros((R, S)) bpct_data = data_moments4(data) # (R,) data moments sim_vals = trunc_norm_draws(unif_vals, mu, sigma, cut_lb, cut_ub) bpct_sim = data_moments4(sim_vals) # (R, S) — each column is one sim for r in range(R): if simple: Err_mat[r, :] = bpct_sim[r] - bpct_data[r] else: Err_mat[r, :] = (bpct_sim[r] - bpct_data[r]) / bpct_data[r] return Err_mat# Build optimal WErr_mat = get_Err_mat(data, unif_vals, *theta_hat_1, ...)VCV = (1 / S) * (Err_mat @ Err_mat.T) # (R, R)W_hat_2 = lin.inv(VCV) # use lin.pinv() if VCV is near-singular
Pseudo-Inverse for Ill-Conditioned VCV
With 4 moments and short simulations, VCV can be poorly conditioned. The tutorial uses lin.pinv(VCV) (pseudo-inverse via SVD) instead of lin.inv(VCV) when the matrix is near-singular. This avoids numerical errors in the weighting matrix.
Indirect Inference
Indirect inference is SMM where the moments are parameters of an auxiliary model — a reduced-form statistical model estimated on both the data and simulated data.
Setup:
Let H(xt,zt∣ϕ)=0 be the auxiliary model with parameter vector ϕ (length R≥K)
Data moments: ϕ^(xt,zt) — auxiliary model estimated on real data
Model moments: ϕ^(x~t,z~t∣θ)=S1∑s=1Sϕ^s(x~s,t,z~s,t∣θ) — averaged auxiliary parameters from simulations
Estimator:
θ^II=θ:θmin∥ϕ^(x~t∣θ)−ϕ^(xt)∥
Common auxiliary models: OLS regression coefficients, VAR parameters, probit/logit coefficients, IV regression estimates.
Advantage over standard SMM: When the structural model’s moments are intractable, auxiliary model parameters provide tractable, information-rich moments. The auxiliary model need not be the “true” model — it just needs to summarize the data in a way that responds to changes in θ.
See Indirect Inference for the theoretical framework and SMM Estimator for Copulas for an application where copula dependence measures (Spearman’s ρ, tail dependence) serve as the auxiliary moments.