The main idea...
Redundancy analysis (RDA) is a method to extract and summarise the variation in a set of response variables that can be explained by a set of explanatory variables. More accurately, RDA is a direct gradient analysis technique which summarises linear relationships between components of response variables that are "redundant" with (i.e. "explained" by) a set of explanatory variables. To do this, RDA extends multiple linear regression (MLR) by allowing regression of multiple response variables on multiple explanatory variables (
Figure 1). A matrix of the fitted values of all response variables generated through MLR is then subject to principal components analysis (PCA). RDA can also be considered a constrained version of principal components analysis (PCA), wherein canonical axes - built from linear combinations of response variables - must also be linear combinations of the explanatory variables (i.e. fitted by MLR). The RDA approach generates one ordination in the space defined by the matrix of response variables and another in the space defined by the matrix of explanatory variables. Residuals generated by the MLR step, which yield non-canonical axes, may also be ordinated. Detailed discussion is available in Legendre and Legendre (1998).
Pre-analysis - If your response variables are not dimensionally homogeneous (i.e. if they have different base units of measurement), you may centre them on their means or standardise them using, for example, z-scoring. However, it is not advisable to standardise raw count data.
- Ensure the number of explanatory variables is less than the number of objects (sites, samples, observations etc.) in your data matrices. If not your system is overdetermined.
- If your explanatory variables are not dimensionally homogeneous (e.g. have different physical units), centre them on their means and standardise them Standardisation allows direct comparison of regression coefficients, which may have different scales otherwise. Further, Legendre and Legendre (1998) note that RDA can be used to relate a qualitative explanatory variable to linear response data. The qualitative variable is recoded as a dummy variable and RDA is run. The fitted site scores provide a quantitative rescaling of the qualitative explanatory variable
- Examine the distribution of each variable in you explanatory and response matrix as well as plots of each variable against other variables in its own and any other matrix. If the relationships are markedly non-linear, apply transformations to linearise the relationships and reduce the effect of outliers.
- If you wish to represent non-Euclidean relationships (e.g. Hellinger distances) between objects in an RDA ordination, you should apply an ecologically-motivated transformation discussed on this page before analysis.
- RDA produces an ordination that summarises the main patterns of variation in the response matrix which can be explained by a matrix of explanatory variables. Choosing an appropriate scaling and interpreting this ordination is discussed in the next section.
- The
**total variance**of the data set, partitioned into constrained and unconstrained variances, is a standard result. This result shows how much variation in your response variables was redundant with the variation in your explanatory variables. If the**constrained variance**is much higher than your**unconstrained variance**, the analysis suggests that much of the variation in the response data may be accounted for by your explanatory variables. If, however, there is a large proportion of unconstrained variation (i.e. variation in your response matrix that is non-redundant with the variation in the explanatory matrix), then the results should be interpreted with caution as only a small amount of the variation in your response matrix is displayed. - Information concerning a number of constrained axes (RDA axes) and unconstrained axes (PCA axes) are often presented in the results of an RDA.
- Each RDA axis has an
**eigenvalue**associated with it. As the total variance of the solution is equivalent to the sum of all eigenvalues (constrained an unconstrained), the proportion of variance explained by each axis is simply the quotient of a given eigenvalue with the total variance of the solution. - Occasionally, the ordination of and/or correlations between residuals may be more ecologically interesting than those of well-characterised factors. Examining the non-canonical (unconstrained) vectors of an RDA solution by ordination and correlation allows insight into the behaviour of these residuals. Alternatively, one may perform a PCA on the matrix of residuals after performing MLR on a collection of response variables. Some implementations of RDA present
**PCA axes**alongside RDA axes. PCA axes summarise unconstrained (residual) variance. - Sets of "
**scores**" are also a typical feature of an RDA output and will change depending on the scaling used (see next section for details): **Object and response variable scores**are often reported as "site" and "species" scores, respectively. These scores are the coordinates used to ordinate points and vectors. The coordinates of variables should be understood as the "tip" of their vector with the origin as its "tail". The direction of the vector is the direction of increase for that variable.**Explanatory variable scores**, also referred to as constraining variable scores, may be interpreted as response variable scores when the explanatory variable in question is quantitative. Scores for each*state*of nominal or factorial variables are the coordinates of these states' centroids and show the average position of the sites that have that state.- A
**significance**value for a) the overall RDA solution and b) individual RDA axes may be determined by permutation. These significance values should be treated similarly to those of ANOVA or other omnibus tests: only if the overall solution is significant should the significance of individual axes or explanatory variables be examined. Permuting the row labels in either the response or explanatory matrix will generate the null distribution. The number of permutations determines the minimum significance value possible.
RDA ordinations may be presented as a biplot or triplot (
- Distances between object points approximate Euclidean distances. Thus, objects ordinated closer together can be expected to have similar variable values. This will not
*always*hold true, as RDA only recovers part of the variation in the data set. - Right-angled projections of object points onto vectors representing response variables approximate variable values for a given object.
- The angles between vectors representing response variables are meaningless.
- The angles between vectors representing response variables and those representing explanatory variables reflect their (linear) correlation.
- Note, that binary explanatory variables may be represented as points. These points are the centroids of objects which have a state "1" for a given binary variable. Projecting centroid points onto a vector representing a response variable reflects the relationship between these variables.
- Distances between centroids and between centroids and object points approximates Euclidean distances.
- Distances between object points should not be considered to approximate Euclidean distances.
- Right-angled projections of object points onto vectors representing response variables approximate variable values for a given object.
- The angles between all vectors reflect their (linear) correlation. The correlation is equal to the cosine of the angle between vectors (e.g. a vector pair describing an angle of 90° are uncorrelated as cos(90) = 0), those describing an angle of 20° have strong, positive correlation as cos(20) = 0.94)
- Note, that binary or nominal explanatory variables may be represented as points. These points are the centroids of objects which have a state "1" for a given binary variable or realise a particular level of a nominal explanatory variable. Projecting centroid points onto a vector representing a response variable reflects the relationship between these variables.
Warnings- Remember that not all the variance in the original response matrix is presented. Thus, distances between objects and the relation between objects and variables and among variables should be interpreted carefully. If you are interested in a only a few of the variables being analysed, methods such as multiple linear regression may be more suitable. This is especially true if there is a large proportion of unconstrained variation (i.e. variation in your response matrix that is non-redundant with the variation in the explanatory matrix), then the results should be interpreted with caution as only a small amount of the variation in your response matrix is displayed.
- If the number of explanatory variables is equal to or greater than the number of objects in your data set, the analysis is not constrained. That is, the matrix of response variables will be completely 'explained' by the matrix of explanatory variables.
- If your response data is in the form of a distance or (dis)similarity matrix, consider distance-based RDA.
- If your experimental design includes nestedness or similar structural features, ensure that permutations are restricted accordingly. Ignoring this will invalidate any significance values reported.
- If you wish to remove the influence of a set of explanatory variables (e.g. experimental block) prior to RDA, consider partial RDA.
- Different implementations of RDA may report different forms of eigenvalues. Ensure interpretation of these values is appropriate when determining 'variance explained'.
MASAME RDA app |

Constrained analyses >