Canonical Correspondence Analysis

The main idea...

Canonical correspondence analysis (CCA) is the canonical form of correspondence analysis (CA). As a form of direct gradient analysis, wherein a matrix of explanatory variables intervenes in the calculation of the CA solution, only correspondence that can be 'explained' by the matrix of explanatory variables is represented in the final results. 

As with CA, this technique is suitable for response variables showing unimodal distributions and preserves χ2 (chi-squared) distances (click here for more information about distances) between objects. In fact it can be computed from a matrix of χ2 distances that is passed on to a form of redundancy analysis (RDA) which uses object marginal sums (row totals) as a weighting parameter. The result of this weighted RDA is that only those response variables that are maximally related to linear combinations of the explanatory variables provided are ordinated in a Euclidean space. These are then canonical variables. The correlation of the explanatory variables to the final ordination determines their 'importance'. 

Legendre and Legendre (1998) note that CCA can be used to relate a qualitative explanatory variable to unimodal response data. The qualitative variable is recoded as a dummy variable and CCA is run. The fitted site scores provide a quantitative rescaling of the qualitative explanatory variable.


Figure 1: An illustrative schematic of a CCA triplot. Filled circles represent objects (e.g. sampling sites). Hollow circles represent response variables (e.g. OTU abundances). Arrows represent quantitative explanatory variables (here, nutrient concentrations) with arrowheads indicating their direction of increase. Filled triangles represent the states of a categorical explanatory variable (e.g. sand, silt, or clay sediment type). See Figure 2 for guidance on reading a CCA triplot.

Results and interpretation

Many implementations of CCA will report the total inertia of the solution alongside the inertia that was successfully constrained by the explanatory variables. The quotient of the constrained inertia over the total inertia indicates how good the overall 'fit' was. Further, each CCA axis is associated with an eigenvalue. For constrained axes (i.e. those that are linear combinations of the explanatory variables), the eigenvalues are a fraction of the total constrained inertia. Thus, they express the amount of the constrained inertia expressed by each constrained axis.

The correlation of the canonical axes with the explanatory matrix is reported as well as the significance of each correlation determined by permutation. Significance can be tested for the overall solution or for individual ordination axes (and their eigenvalues) derived from the response data. Note that individual axes should only be examined if the overall solution was significant. Testing the hypothesised relationships between the matrix of response variables and that of explanatory variables is done by permuting one matrix a sufficient number of times to establish a null distribution.

Reading CCA triplots

As in CA, the distances between points representing objects and response variables in a CCA plot are χ2 distances and must be interpreted as such. The type of scaling used (see below and Figure 2) will determine whether object-to-object or (response) variable-to-variable distances are meaningful. In general, object-to-variable distances are not readily interpretable; however, smaller object-to-variable distances indicate the increased probability of a given variable being 'present at', 'abundant at', or otherwise influential for a given object.

a

b

Figure 2: Illustrative example of CCA triplot interpretation using a) type I scaling and b) type II scaling. a) This example focuses on two objects ("o1", "o2"), three quantitative explanatory variables ("Nitrate", "Phosphate", "Silicate") represented by vectors (arrows) pointing in the direction of increase and extended for clarity (dashed lines), and two states of a nominal (qualitative) variable, sediment type ("Sand", "Silt", "Clay"). Orthogonal projections are shown as dotted red lines. Object "o1" is very likely to be found in clay sediments while object "o2" is more likely to be found in sand sediments. Perpendicular projections of object "o1" onto quantitative explanatory variables suggests it realises high values of nitrate concentration, mid-to-low values of phosphate concentration, and low values of silicate concentration. Object "o2" realises high values of phosphate concentration, mid-range values of silicate concentration and low values of nitrate concentration. b) This example is similar to that in (a), however, points representing response variables ("v1", "v2") are now the focus of interpretation. Variable "v1" is likely to reach its maximum (e.g. highest abundance) in silty sediments at high concentrations of nitrate (projection not shown), mid-to-low concentrations of silicate, and low concentrations of phosphate.Variable "v2" is likely to reach its maximum in sandy sediments, at mid-to-high phosphate concentrations, low nitrate concentrations and high silicate concentrations.


Scaling in CCA

 Type 1 Type 1 scaling emphasises the relationships among objects. Thus:
  • objects act as the centroids of the response variables and the distances between object points indicate their χ2 distances.
  • A right-angled projection of an object point onto a vector representing a quantitative explanatory variable approximates the value of the variable realised for that object.
  • Objects near centroids representing states of categorical or qualitative variables are more likely to realise that state.

 Type 2 Type 2 scaling emphasises the relationships among response variables. Thus:
  • response variables act as the centroids of the objects and the distances between response variable points indicate their χ2 distances. 
  • A right-angled projection of a point representing a response variable onto an arrow representing an explanatory variable indicates the position of the maximum value (the optimum) of the response variable along that explanatory variable.
  • The closer a point representing a response variable is to the centroid representing a state of a categorical explanatory variable, the more likely that response variable is to have higher values at that state.

Assumptions

  • Response variables show unimodal distributions across objects. If dealing with a sites × species (or OTUs) matrix, this suggests that a sampling gradient must be long enough to allow the increase and decrease of a given species or OTU across the sites sampled. Gradients that are too short may manifest linear responses and may be better handled by redundancy analysis (RDA), although CCA may also handle linear relationships.
  • Explanatory variables show linear, causal relationships to the response data. If one is unsure if their is a causal relationship between an explanatory variable and the response data, interpretation should be performed with care.

Warnings

  • The variables in the explanatory matrix should be chosen with care, i.e. there should be good rationale behind their inclusion. If explanatory variables are included too liberally, there is an increased risk of distorting the resulting CCA results.
  • Only examine the significance and effects of individual axes if the overall CCA solution is found to be significant.
  • The algorithm used to compute a CCA solution and the exact meaning of the scaling modes may vary across implementations. Carefully review how results should be interpreted in each implementation used.
Implementations
  • R
    • The cca() function in vegan package. Significance may be tested by permutation with the anova.cca() function.
MASAME CCA app
References
  • Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.