Numerical Integration and Optimization in PyBLP
Summary
Estimating BLP requires (1) approximating the share integrals over consumer heterogeneity and (2) optimizing a non-convex GMM objective. PyBLP’s best practices: use Gaussian quadrature product rules for few random coefficients (sparse grids or scrambled Halton draws in high dimensions) rather than crude pseudo-Monte Carlo; use the nested fixed-point (NFXP) algorithm with analytic gradients, gradient-based optimizers (Knitro Interior/Direct or SciPy L-BFGS-B), box constraints, and tight tolerances; and guard numerical stability with the log-sum-exp trick. These choices largely eliminate the multiple-local-optima difficulties reported in earlier literature.
Overview
The share integral has no closed form and is approximated at a finite set of nodes with weights (an “integration rule”):
The logit kernel is bounded on and infinitely differentiable, so it is well-behaved for quadrature.
Main Content
Integration rules: pMC, qMC, and quadrature ^integration-rules
- Pseudo-Monte Carlo (pMC). Equally weighted random draws (). Simulation error is and (by CLT) . Advantage: no curse of dimensionality. Disadvantage: error declines slowly.
- Quasi-Monte Carlo (qMC). Deterministic low-discrepancy sequences (e.g. Halton) that cover the hypercube more evenly; error , beating pMC as grows. Best practice: scramble the sequence (Owen 2017) and discard the first ~1,000 points to avoid cross-dimension correlation.
- Gaussian quadrature. Approximates the integrand by a polynomial integrated exactly; a weighted sum over a chosen . Gauss-Hermite rules suit normal mixing densities. Most accurate for few dimensions, but product rules need points in dimension — the curse of dimensionality. Sparse grids (Heiss & Winschel 2008) and monomial cubature (Judd & Skrainka 2011) prune nodes (sometimes with negative weights, which can be problematic for counterfactuals).
- Variance reduction. Modified Latin Hypercube Sampling (MLHS), antithetic sampling (), and importance sampling (oversample where is large, reweight by ).
Recommendation: product rules to high polynomial accuracy for few random coefficients; sparse grids or scrambled Halton for . After obtaining , verify the chosen rule against finer alternatives by comparing .
The Nested Fixed Point (NFXP) algorithm ^nfxp
For each guess of (the outer loop): (a) Inner loop — for each market solve the share system for via the contraction (SQUAREM/LM). (b) Build the intra-firm derivative matrix and recover markups . (c) Run linear IV-GMM to concentrate out the linear from and . (d) Form residuals , stack moments , and evaluate . Only the nonlinear parameters are searched over (Hessian is ); linear parameters and fixed effects are “essentially free.” Conlon-Gortmaker place on the LHS so the endogenous markup depends only on , enabling simultaneous supply/demand with fixed effects. (MPEC of Dubé et al. 2012 is an alternative; this paper focuses on the more popular NFXP.)
Optimization of the GMM objective ^optimization
The objective is non-convex (the Hessian need not be PSD), so no routine guarantees a global minimum. Best practices:
- Analytic gradients (PyBLP computes them for any model, including supply+demand and fixed effects) — major speedup and better convergence than finite differences or derivative-free methods.
- Gradient-based optimizers: try Knitro Interior/Direct first if available, then a BFGS-based routine, ideally SciPy L-BFGS-B with bounds. Avoid Nelder-Mead.
- Box constraints (e.g. nonnegative, bounded variances; ; with a supply side) — prevent unreasonable values and overflow.
- Tight termination tolerances (loose defaults cause early termination, worse when is large) and verifying first-order (gradient ) and second-order (PSD Hessian / positive eigenvalues) conditions, reported by default.
- Multiple starting values and optimizers to check agreement.
With these, >99% of simulation runs converge to a valid local minimum, contradicting Knittel & Metaxoglou (2014)‘s many-local-optima finding.
Numerical stability tricks ^numerical-tricks
The exponentiated utilities cause loss of precision / overflow (e.g. ). Fixes:
- Protected log-sum-exp: with — implemented by default, near-zero cost, recommended.
- Work market-by-market; box-constrain random coefficients; robust error handling (replace near-singular or with pseudo-inverses and warn).
- Tricks like using in place of , or “hot starts,” give little benefit once SQUAREM is used.
Examples
Conlon-Gortmaker Monte Carlo config: Gauss-Hermite product rule exact to degree 17 (9 nodes in 1-D, 81 in 2-D); SQUAREM inner loop with 1E-14 tolerance; L-BFGS-B with analytic gradients, projected-gradient tolerance 1E-5, three starting values drawn around truth, box constraints , , . In the Nevo (2000b) and BLP (1995) replications, switching from a few pMC draws to quadrature plus tight tolerances eliminates the dispersion in elasticities across starting values that Knittel & Metaxoglou (2014) reported.
Connections
- The BLP Contraction Mapping — the inner loop nested inside the optimization.
- Random Coefficients Logit Model — defines the integral being approximated.
- GMM Estimation and Instruments for Price Endogeneity — the objective being optimized.
- Method of Simulated Moments — numerical integration of moments is the simulation step shared with MSM.
- Supply Side and Markups — counterfactual pricing equilibria reuse these numerical methods.