Variation partitioning

The main idea...

Variation partitioning (VP) was introduced by Borcard (1992) and is growing in popularity in ecological analysis and modelling (e.g. Bienhold et al. 2012, Dray et al., 2012; Gobet et al. 2012). The method attempts to "partition" or resolve the explanatory power of different explanatory matrices in relation to the same response matrix (Figure 1).

VP may be used with redundancy analysis (RDA) or  canonical correspondence analysis (CCA). If using RDA, multiple partial RDAs will be run to determine the partial, linear effect of each explanatory matrix on the response data. If using CCA (e.g. Økland and Eilertsen, 1994; see Warnings) the total inertia of the response matrix is partitioned.
 

a

b

Figure 1: Conceptual illustration of partitioning the variation in a response matrix (Y) between two explanatory matrices (X and W). Panels a and b illustrate the same process: matrices X and W (both explanatory) each explain a portion of the variability in matrix Y (a response matrix). Each explanatory matrix uniquely explains a portion of the variation in Y (X explains partition "a" and W explains partition "c"). Additionally, both matrices are able to explain another portion of the variation in Y (partition "b"). The residual variation in matrix Y (partition "d") is not explained by X or W. More  matrices may be included in the analysis, however, this may quickly lead to increased complexity of interpretation and - if low replication is an issue - loss of power. Clear experimental and sampling design with a reasonable scope will greatly enhance this (and many other) analysis. 

Results and interpretation

Many implementations will deliver the following results:

 Total variation in "Y" The variation in your response data is reported. This is often the total sum of squared differences from the mean of each variable.

 R2 The coefficients of determination (R2 values) are estimates of how much variation has been 'explained' by a given partition. When using RDA, these are simply the squares of the correlation coefficients calculated. Coefficients for both partial or semi-partial determination (see (semi-)partial correlation below) may be available. 

 adjusted R2 Adjusted coefficients of determination (Adj. R2 values) take the number of explanatory variables used in an analysis into account. This prevents  R2 coefficients from taking misleadingly high values simply due to the inclusion of more variables in the model. This "inflation" occurs even if the variables have no "true" explanatory power or are redundant relative to other explanatory variables. These estimates should be preferred relative to the unadjusted estimates. Note, no adjusted estimates are available when using CCA in VP.

 partial correlation
When used with RDA, the partial correlation coefficient may be extracted from a VP analysis and tested for significance. This coefficient expresses the proportion of variation exclusively explained by one explanatory matrix (e.g. partition "a"  in Figure 1) relative to the sum of the unexplained variation and the variation of interest (i.e. the sum of partition "a" and "d" in Figure 1). It controls for ('ignores') the influence of any other explanatory matrices as well as their overlap with the explanatory matrix of interest (partition "b" in Figure 1).

 semi-partial correlation When used with RDA, the semi-partial correlation coefficient may be extracted from a VP analysis and tested for significance. This coefficient expresses the proportion of variation exclusively explained by one explanatory matrix (e.g. partition "a" in Figure 1) relative to all the explained and unexplained variation.

 significance P-values associated with the R2 values will be calculated by permutation. The overall model and the individual partitions (when possible) will be tested for significance. Only if the overall model is significant should the individual partitions be examined. Note that not all the partitions can be tested for significance (see Warnings).

  

The results of a VP analysis are often displayed as a Venn diagram (Figure 2).

Figure 2: Fictional Venn diagram displaying the results of a variation partitioning analysis. Three explanatory matrices were used here, containing variables pertaining to space, time, and environmental factors. The bounding rectangle represents the total variation in the response data while each circle represents the portion of variation accounted for by an explanatory matrix or a combination of explanatory matrices. The elements of such figures are rarely drawn to scale.


Warnings
  • The shared partition(s) (e.g. partition "b" in Figure 1) is not an interaction term (as in ANOVA-like analyses), cannot be tested, and hence cannot be assigned a significance value. This is simply the variation in the response data that could be explained by both explanatory matrices. That is, the explanatory matrices are redundant in this partition. The larger this fraction is, the more multicollinearity is present in the model.
  • Using VP with canonical correspondence analysis (CCA) is not universally accepted. The inertia of partitions derived from CCA are not truly comparable. However, it is possible to "partial out" (i.e. control for, remove the effect of) an additional explanatory matrix via partial CCA.
  • If you note that two explanatory matrices (or variables) are very redundant (i.e. they are collinear and thus have high covariation or correlation among their variables, as determined by e.g. Mantel testing), their shared explanatory power (Figure 1, partition "b") is likely to be large relative to the partitions attributable to each matrix uniquely (Figure 1, partitions "a" and "c"). The inclusion of these redundant variables in further, related investigations is thus suspect as they are not introducing unique identification.
  • When examining the output of a VP, you may notice that the total of the variance explained by each partition and the total residual variance will exceed "1" or "100%". Negative explained variances are also possible. These are artifacts of the VP procedure which arise when certain relationships are present in the data. No solution is available to remove them meaningfully.
Walkthroughs featuring variation partitioning
Implementations
  • R
    • varpart() in the vegan package can perform VP with up to 4 explanatory matrices.
References
Comments