Estimation and Structure Selection for Vines
Summary
Inference on R-vines splits into three tasks: (i) selecting the tree structure, (ii) choosing a copula family for each of the pair-copulae, and (iii) estimating the parameters. Ideally (i)-(ii) are done jointly, but in practice everything is done stepwise. The dominant approach is Dißmann’s algorithm — a greedy, bottom-up maximum-spanning-tree heuristic that maximizes dependence in the lowest trees — combined with AIC/BIC family selection and sequential (level-by-level) parameter estimation. Truncation and pruning control the parameter explosion in high dimensions.
Overview
The flexibility of vines comes at the cost of a combinatorial structure space and a parameter count that grows quadratically with dimension. The number of distinct R-vines on variables is
so globally optimal structure search is infeasible beyond small . The practical strategy exploits the fact that lower trees are estimated more precisely, so the structure is built bottom-up to capture the strongest dependencies first.
Main Content
Structure selection — Dißmann's algorithm
Originally proposed for C-/D-vines by Aas et al. (2009) and extended to general R-vines by Dißmann et al. (2013). Procedure:
- Compute a pairwise dependence measure (e.g. absolute Kendall’s ) for all variable pairs and use them as edge weights.
- Find the maximum spanning tree over the nodes (Prim’s algorithm) — the tree maximizing the summed edge weights — to form .
- Estimate each pair-copula (family + parameters), compute the implied conditional (“pseudo-”)observations via the h-functions.
- Build as a maximum spanning tree over the edges of , subject to the proximity condition; repeat up the trees.
This greedy bottom-up scheme is by far the most used in practice and requires simultaneous selection of pair-copula types and parameter estimation at each level. An alternative (Kurowicka 2011) starts by assigning the weakest conditional dependencies to the highest trees. Bayesian posterior-over-structure methods (Gruber & Czado 2015) exist but are little used in finance.
Choosing copula families
Families (Gaussian, , Gumbel, Clayton, …) are typically selected one pair at a time using a model-selection criterion — AIC, BIC, the copula information criterion (CIC), or a copula goodness-of-fit test. In a comparison of four strategies (Manner), AIC was the most reliable selection criterion. Because of sequential estimation, the family chosen at a given level depends on choices at preceding levels (observations at one level are partial derivatives of preceding-level copulae), so selection uncertainty accumulates up the trees and the final model must be carefully validated.
Parameter estimation — sequential vs joint
A PCC is a multivariate copula, so in principle parameters can be estimated by any multivariate-copula estimator: the inference-functions-for-margins (IFM) method or the maximum pseudo-likelihood (MPL) estimator. But the parameter count grows fast, making full joint MLE computationally demanding in medium/high dimensions. Aas et al. (2009) therefore proposed a sequential method: estimate parameters level by level, conditioning on the parameters from preceding levels; the sequential estimates can serve as starting values for a final joint MLE. Asymptotic properties are studied by Hobæk Haff (2013). With temporal dependence, vines are fitted on standardized residuals from ARIMA-GARCH filtering of the original series.
Truncation and pruning (Eq. 7)
To curb the parameter explosion, replace as many pair-copulae as possible by the independence copula (a set of conditional independencies):
- Pruning — test individual copulae for independence and set them to independence (e.g. a Kendall’s- test, valid as an independence test only for Gaussian copulae, or the Cramér-von Mises test of Hobæk Haff & Segers).
- Truncation at level — replace all pair-copulae in trees above level by independence copulae. The density of an R-vine truncated at level is
At the truncated R-vine is a Markov tree modeling only unconditional relationships. Truncation is justified because Dißmann’s bottom-up build puts the strongest dependence in the first trees; upper-level estimates are uncertain (repeated transformations) and barely affect lower-order dependencies. The optimal level is found by, e.g., Brechmann et al. (2012): start at , increase by one, and stop (level ) when the gain from an extra tree is negligible by Vuong’s likelihood-ratio test.
Model validation (goodness-of-fit)
GOF tests assess whether the fitted vine fits the data. Early proposals use the probability integral transform (PIT) (Rosenblatt 1952; Aas et al. 2009) and the Breymann et al. transformation, plus tests based on the empirical copula and Kendall’s process. Newer high-power tests come from the information-matrix equality and a specification test (Schepsmeier 2015, 2016), shown to have excellent size and power in high dimensions.
Examples
Fitting a 4-stock portfolio vine (workflow)
- Filter each return series with AR-GARCH; take standardized residuals; probability-integral-transform to uniforms.
- Build by maximum spanning tree on (Dißmann); fit each edge’s family by AIC.
- Propagate pseudo-observations via h-functions; build under the proximity condition.
- Optionally truncate after the level where Vuong’s test shows no significant gain; refit jointly by MLE using sequential estimates as starting values.
- Validate with a PIT/information-matrix GOF test before using the model for VaR/cVaR.
Connections
- Pair-Copula Constructions — the density whose families/parameters are being selected and estimated.
- C-vines, D-vines, and Regular Vines — the structure space ( R-vines) Dißmann’s algorithm searches.
- The Simplifying Assumption — what makes per-edge family selection and sequential estimation coherent.
- Copula Estimation — estimation of the bivariate building blocks (IFM/MPL at the pair level).
- Dependence Measures for Copulas — Kendall’s / rank measures used as spanning-tree edge weights.
See Also
- SMM Estimation of Factor Copulas — contrast: simulated method of moments for the factor-copula architecture (no closed-form likelihood) vs. sequential MLE for vines.
- Factor Analysis and PPCA — latent-structure alternative to greedy tree selection.
- Dependence Modeling