Sampling and Estimation for Sobol Indices

Summary

Sobol indices are integrals that must be estimated from model runs. The Saltelli scheme builds two independent sample matrices and plus hybrid matrices , giving first- and total-order indices at a cost of model evaluations (or if second-order indices are also wanted). FAST (Fourier Amplitude Sensitivity Test) instead encodes each factor on a distinct integer frequency and recovers variance shares from the Fourier spectrum of the output, often converging faster than Monte-Carlo Sobol; RBD/FAST_RBD use a single frequency with random permutations to cut cost in high dimensions. All are implemented in the Python SALib library used by the paper.

Overview

The variance terms behind and (see Variance-Based Sensitivity and Sobol Indices) are conditional-variance integrals with no closed form for a general simulator, so they are estimated numerically. The paper’s case study used the Sensitivity Analysis Library in Python (SALib) for all methods. Two estimation routes dominate: Monte-Carlo with the Saltelli design (Sobol) and spectral analysis (FAST/RBD).

Main Content

Saltelli sampling scheme & cost ^saltelli-scheme

Draw two independent quasi-random (Sobol-sequence) matrices and . For each factor , form = matrix with only column replaced by column of . Running the model on , , and all matrices yields estimators

giving and . Total model evaluations = for first- + total-order; if second-order are also estimated. is the base sample size (often , e.g. 1024).

FAST — Fourier Amplitude Sensitivity Test ^fast

FAST “is based on periodic search sampling using a period search function and applies a decomposition of variance based on Fourier Transform.” Each factor is driven along a search curve at a distinct integer frequency :

By Parseval’s theorem the output variance is recovered from Fourier coefficients :

and the first-order index reads off the spectral power at ‘s frequency and its harmonics:

FAST “achieves a better estimate in terms of robustness and speed of convergence than Sobol” and handles nonlinear, non-monotonic models.

RBD and FAST_RBD (HFR) ^rbd

As grows, classic FAST suffers error and cost from resolving all higher-order harmonics. RBD uses a single frequency for all parameters (set to 1 for simplicity) with random permutation of the sample-point coordinates to restore stochasticity:

Hybrid FAST_RBD (HFR) groups the parameters into equal partitions, assigning one frequency per partition — “a balance between the accuracy of FAST and the computational efficiency of RBD.”

Choosing a budget (case-study sampling sizes) ^budget-table

Table 1 of the paper lists the sample sizes used (MNIST, 784 factors):

MethodSamples
Morris50 (in 4 levels)
Sobol300
FAST100
RBD400
Delta1000
DGSM1000

These were grid-searched “to strike a balance between optimizing performance and minimizing the number of samples required.” General rule: Morris screens cheapest; FAST/RBD are economical for first-order; full Sobol total-effect is the most expensive but most informative ().

SALib ^salib

The Sensitivity Analysis Library (SALib) in Python (https://salib.readthedocs.io) implements Morris, Sobol (Saltelli sampling), FAST, RBD-FAST, Delta (DMIM), and DGSM. Typical pattern: define a problem dict (names, bounds), call the method’s sample() to generate the design, run the model on every row, then analyze() to obtain , (Sobol) or , (Morris).

Examples

Quantifying 5 surviving ABM parameters (after Morris screening, see Morris Elementary Effects Screening) with Sobol at :

  • Cost = model runs for first- + total-order.
  • Adding second-order indices would cost runs.
  • If each ABM run takes 30 s, that is ~60 h serial — motivating either a coarser , a FAST/RBD first-order screen, or a cheap emulator. SALib code:
    from SALib.sample import saltelli
    from SALib.analyze import sobol
    param_values = saltelli.sample(problem, 1024)      # -> N(p+2) rows
    Y = run_abm(param_values)                          # one output per row
    Si = sobol.analyze(problem, Y)                     # Si['S1'], Si['ST']

Connections

See Also

  • History Matching for ABMs — emulator-based designs amortize the run budget GSA needs.
  • Saltelli (2002), “Making best use of model evaluations to compute sensitivity indices.”
  • Tarantola, Gatelli & Mara (2006), “Random balance designs for the estimation of first order global sensitivity indices.”