Sampling and Estimation for Sobol Indices

Summary

Sobol indices are integrals that must be estimated from model runs. The Saltelli scheme builds two independent sample matrices $A$ and $B$ plus $p$ hybrid matrices $A_{B}^{(i)}$ , giving first- and total-order indices at a cost of $N (p + 2)$ model evaluations (or $N (2 p + 2)$ if second-order indices are also wanted). FAST (Fourier Amplitude Sensitivity Test) instead encodes each factor on a distinct integer frequency and recovers variance shares from the Fourier spectrum of the output, often converging faster than Monte-Carlo Sobol; RBD/FAST_RBD use a single frequency with random permutations to cut cost in high dimensions. All are implemented in the Python SALib library used by the paper.

Overview

The variance terms behind $S_{i}$ and $S_{T i}$ (see Variance-Based Sensitivity and Sobol Indices) are conditional-variance integrals with no closed form for a general simulator, so they are estimated numerically. The paper’s case study used the Sensitivity Analysis Library in Python (SALib) for all methods. Two estimation routes dominate: Monte-Carlo with the Saltelli design (Sobol) and spectral analysis (FAST/RBD).

Main Content

Saltelli sampling scheme & cost ^saltelli-scheme

Draw two independent $(N \times p)$ quasi-random (Sobol-sequence) matrices $A$ and $B$ . For each factor $i$ , form $A_{B}^{(i)}$ = matrix $A$ with only column $i$ replaced by column $i$ of $B$ . Running the model on $A$ , $B$ , and all $p$ matrices $A_{B}^{(i)}$ yields estimators
$V_{i} \approx \frac{1}{N} j = 1 \sum N f (B)_{j} (f (A_{B}^{(i)})_{j} - f (A)_{j}), E [V (Y ∣ X_{\sim i})] \approx \frac{1}{2 N} j = 1 \sum N (f (A)_{j} - f (A_{B}^{(i)})_{j})^{2}$
giving $S_{i} = V_{i} / V (Y)$ and $S_{T i} = 1 - \frac{V ( E [ Y ∣ X _{\sim i} ])}{V ( Y )}$ . Total model evaluations = $N (p + 2)$ for first- + total-order; $N (2 p + 2)$ if second-order $S_{ij}$ are also estimated. $N$ is the base sample size (often $N = 2^{m}$ , e.g. 1024).

FAST — Fourier Amplitude Sensitivity Test ^fast

FAST “is based on periodic search sampling using a period search function and applies a decomposition of variance based on Fourier Transform.” Each factor is driven along a search curve at a distinct integer frequency $ω_{i}$ :
$X_{i} (s_{j}) = G_{i} (sin (ω_{i} s_{j})), Y = f (X_{1} (s), \dots, X_{k} (s))$
By Parseval’s theorem the output variance is recovered from Fourier coefficients $A_{p}, B_{p}$ :
$V (Y) = \frac{1}{2 π} \int_{- π}^{π} f^{2} (s) d s - [E (Y)]^{2} \approx 2 p = 1 \sum \infty (A_{p}^{2} + B_{p}^{2})$
and the first-order index reads off the spectral power at $X_{i}$ ‘s frequency and its harmonics:
$S_{i} = \frac{V _{i}}{V ( Y )} \approx \frac{\sum _{q = 1}^{M} ( A _{q ω_{i}}^{2} + B _{q ω_{i}}^{2} )}{\sum _{i = 1}^{n} \sum _{q = 1}^{M} ( A _{q ω_{i}}^{2} + B _{q ω_{i}}^{2} )}, S_{T i} = S_{i} + S_{i, \sim i} .$
FAST “achieves a better estimate in terms of robustness and speed of convergence than Sobol” and handles nonlinear, non-monotonic models.

RBD and FAST_RBD (HFR) ^rbd

As $p$ grows, classic FAST suffers error and cost from resolving all higher-order harmonics. RBD uses a single frequency $ω$ for all parameters (set to 1 for simplicity) with random permutation of the sample-point coordinates to restore stochasticity:
$X_{i} (s_{j}) = G_{i} (sin (ω s_{i_{j}})), i = 1, \dots, k .$
Hybrid FAST_RBD (HFR) groups the $k$ parameters into equal partitions, assigning one frequency per partition — “a balance between the accuracy of FAST and the computational efficiency of RBD.”

Choosing a budget (case-study sampling sizes) ^budget-table

Table 1 of the paper lists the sample sizes used (MNIST, 784 factors):

Method Samples
Morris 50 (in 4 levels)
Sobol 300
FAST 100
RBD 400
Delta 1000
DGSM 1000

These were grid-searched “to strike a balance between optimizing performance and minimizing the number of samples required.” General rule: Morris screens cheapest; FAST/RBD are economical for first-order; full Sobol total-effect is the most expensive but most informative ( $N (p + 2)$ ).

Method	Samples
Morris	50 (in 4 levels)
Sobol	300
FAST	100
RBD	400
Delta	1000
DGSM	1000

SALib ^salib

The Sensitivity Analysis Library (SALib) in Python (https://salib.readthedocs.io) implements Morris, Sobol (Saltelli sampling), FAST, RBD-FAST, Delta (DMIM), and DGSM. Typical pattern: define a problem dict (names, bounds), call the method’s sample() to generate the design, run the model on every row, then analyze() to obtain $S_{i}$ , $S_{T i}$ (Sobol) or $μ^{*}$ , $σ$ (Morris).

Examples

Quantifying 5 surviving ABM parameters (after Morris screening, see Morris Elementary Effects Screening) with Sobol at $N = 1024$ :

Cost = $N (p + 2) = 1024 \times 7 = 7168$ model runs for first- + total-order.
Adding second-order indices would cost $N (2 p + 2) = 1024 \times 12 = 12288$ runs.

If each ABM run takes 30 s, that is ~60 h serial — motivating either a coarser

N

, a FAST/RBD first-order screen, or a cheap emulator. SALib code:

from SALib.sample import saltelli
from SALib.analyze import sobol
param_values = saltelli.sample(problem, 1024)      # -> N(p+2) rows
Y = run_abm(param_values)                          # one output per row
Si = sobol.analyze(problem, Y)                     # Si['S1'], Si['ST']

Connections

Variance-Based Sensitivity and Sobol Indices — the indices these schemes estimate.
Global Sensitivity Analysis - Overview — cost positions Sobol/FAST in the screen-then-quantify pipeline.
Morris Elementary Effects Screening — cheap pre-screen that shrinks $p$ and thus Saltelli cost.
Uncertainty Quantification for ABM Calibration — emulators/surrogates make large Saltelli budgets feasible for ABMs.
Approximate Bayesian Computation for ABMs — shared reliance on many simulator runs / quasi-random designs.

Second Brain

Explorer

Sampling and Estimation for Sobol Indices

Sampling and Estimation for Sobol Indices

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks