Numerical Integration and Optimization in PyBLP

Summary

Estimating BLP requires (1) approximating the share integrals over consumer heterogeneity and (2) optimizing a non-convex GMM objective. PyBLP’s best practices: use Gaussian quadrature product rules for few random coefficients (sparse grids or scrambled Halton draws in high dimensions) rather than crude pseudo-Monte Carlo; use the nested fixed-point (NFXP) algorithm with analytic gradients, gradient-based optimizers (Knitro Interior/Direct or SciPy L-BFGS-B), box constraints, and tight tolerances; and guard numerical stability with the log-sum-exp trick. These choices largely eliminate the multiple-local-optima difficulties reported in earlier literature.

Overview

The share integral $s_{j t} (δ_{t}, θ_{2}) = \int \frac{e x p ( δ _{j t} + μ _{ij t} )}{\sum _{k} e x p ( δ _{k t} + μ _{ik t} )} f (μ_{i t} ∣ θ_{2}) d μ_{i t}$ has no closed form and is approximated at a finite set of $I_{t}$ nodes $ν_{i t}$ with weights $w_{i t}$ (an “integration rule”):

s_{j t} (δ_{t}, θ_{2}) \approx i \in I_{t} \sum w_{i t} \cdot s_{ij t} (δ_{t}, μ_{i t} (ν_{i t}, θ_{2})) .

The logit kernel is bounded on $(0, 1)$ and infinitely differentiable, so it is well-behaved for quadrature.

Main Content

Integration rules: pMC, qMC, and quadrature ^integration-rules

Pseudo-Monte Carlo (pMC). Equally weighted random draws ( $w_{i t} = I_{t}^{- 1}$ ). Simulation error is $O (I_{t}^{- 1/2})$ and (by CLT) $ϵ_{I_{t}}^{pMC} d N (0, V (s_{j t}) / I_{t})$ . Advantage: no curse of dimensionality. Disadvantage: error declines slowly.

Quasi-Monte Carlo (qMC). Deterministic low-discrepancy sequences (e.g. Halton) that cover the hypercube $[0, 1]^{K_{2}}$ more evenly; error $O (I_{t}^{- 1} \cdot (lo g I_{t})^{K_{2}})$ , beating pMC as $I_{t}$ grows. Best practice: scramble the sequence (Owen 2017) and discard the first ~1,000 points to avoid cross-dimension correlation.

Gaussian quadrature. Approximates the integrand by a polynomial integrated exactly; a weighted sum over a chosen $(ν_{i t}, w_{i t})$ . Gauss-Hermite rules suit normal mixing densities. Most accurate for few dimensions, but product rules need $I_{t}^{d}$ points in dimension $d$ — the curse of dimensionality. Sparse grids (Heiss & Winschel 2008) and monomial cubature (Judd & Skrainka 2011) prune nodes (sometimes with negative weights, which can be problematic for counterfactuals).

Variance reduction. Modified Latin Hypercube Sampling (MLHS), antithetic sampling ( $ϕ (ν) = ϕ (- ν)$ ), and importance sampling (oversample where $s_{ij t}$ is large, reweight by $w (ν) = ϕ (ν) / q (ν)$ ).

Recommendation: product rules to high polynomial accuracy for few random coefficients; sparse grids or scrambled Halton for $K_{2} > 5$ . After obtaining $\hat{θ}_{2}$ , verify the chosen rule against finer alternatives by comparing $∥ S_{t} - s_{t} (δ_{t}, \hat{θ}_{2}; I_{t}) ∥_{2}$ .

The Nested Fixed Point (NFXP) algorithm ^nfxp

For each guess of $θ_{2}$ (the outer loop): (a) Inner loop — for each market solve the share system for $\hat{δ}_{t} (θ_{2})$ via the contraction (SQUAREM/LM). (b) Build the intra-firm derivative matrix $Δ_{t}$ and recover markups $\overset{η}{^}_{t} (θ_{2}) = Δ_{t}^{- 1} s_{t}$ . (c) Run linear IV-GMM to concentrate out the linear $[θ_{1}, θ_{3}]$ from $\hat{δ}_{j t} + α p_{j t} = [x_{j t}, v_{j t}] β + ξ_{j t}$ and $f_{MC} (p_{j t} - \overset{η}{^}_{j t}) = [x_{j t}, w_{j t}] γ + ω_{j t}$ . (d) Form residuals $\hat{ξ}_{j t} (θ_{2}), \overset{ω}{^}_{j t} (θ_{2})$ , stack moments $g (θ_{2})$ , and evaluate $q (θ_{2}) = g (θ_{2})^{'} W g (θ_{2})$ . Only the $K_{2}$ nonlinear parameters are searched over (Hessian is $K_{2} \times K_{2}$ ); linear parameters and fixed effects are “essentially free.” Conlon-Gortmaker place $α p_{j t}$ on the LHS so the endogenous markup depends only on $θ_{2}$ , enabling simultaneous supply/demand with fixed effects. (MPEC of Dubé et al. 2012 is an alternative; this paper focuses on the more popular NFXP.)

Optimization of the GMM objective ^optimization

The objective is non-convex (the Hessian need not be PSD), so no routine guarantees a global minimum. Best practices:

Analytic gradients (PyBLP computes them for any model, including supply+demand and fixed effects) — major speedup and better convergence than finite differences or derivative-free methods.

Gradient-based optimizers: try Knitro Interior/Direct first if available, then a BFGS-based routine, ideally SciPy L-BFGS-B with bounds. Avoid Nelder-Mead.

Box constraints $θ_{2}^{(ℓ)} \in [\underline{θ}_{2}^{(ℓ)}, \overline{θ}_{2}^{(ℓ)}]$ (e.g. nonnegative, bounded variances; $ρ \in [0, 0.95]$ ; $α \leq - 0.001$ with a supply side) — prevent unreasonable values and overflow.

Tight termination tolerances (loose defaults cause early termination, worse when $N$ is large) and verifying first-order (gradient $\approx 0$ ) and second-order (PSD Hessian / positive eigenvalues) conditions, reported by default.

Multiple starting values and optimizers to check agreement.

With these, >99% of simulation runs converge to a valid local minimum, contradicting Knittel & Metaxoglou (2014)‘s many-local-optima finding.

Numerical stability tricks ^numerical-tricks

The exponentiated utilities $\sum_{j} exp (δ_{j t} + μ_{ij t})$ cause loss of precision / overflow (e.g. $exp (800)$ ). Fixes:

Protected log-sum-exp: $LSE (x) = lo g \sum_{k} exp x_{k} = a + lo g \sum_{k} exp (x_{k} - a)$ with $a = max {0, max_{k} x_{k}}$ — implemented by default, near-zero cost, recommended.

Work market-by-market; box-constrain random coefficients; robust error handling (replace near-singular $W$ or $Δ_{t}$ with pseudo-inverses and warn).

Tricks like using $exp (δ_{j t})$ in place of $δ_{j t}$ , or “hot starts,” give little benefit once SQUAREM is used.

Examples

Conlon-Gortmaker Monte Carlo config: Gauss-Hermite product rule exact to degree 17 (9 nodes in 1-D, 81 in 2-D); SQUAREM inner loop with 1E-14 tolerance; L-BFGS-B with analytic gradients, projected-gradient tolerance 1E-5, three starting values drawn $\pm 50%$ around truth, box constraints $σ_{x}, σ_{p} \geq 0$ , $ρ \in [0, 0.95]$ , $α \leq - 0.001$ . In the Nevo (2000b) and BLP (1995) replications, switching from a few pMC draws to quadrature plus tight tolerances eliminates the dispersion in elasticities across starting values that Knittel & Metaxoglou (2014) reported.

Connections

The BLP Contraction Mapping — the inner loop nested inside the optimization.
Random Coefficients Logit Model — defines the integral being approximated.
GMM Estimation and Instruments for Price Endogeneity — the objective being optimized.
Method of Simulated Moments — numerical integration of moments is the simulation step shared with MSM.
Supply Side and Markups — counterfactual pricing equilibria reuse these numerical methods.

Second Brain

Explorer

Numerical Integration and Optimization in PyBLP

Numerical Integration and Optimization in PyBLP

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks