Home‎ > ‎

Why multivariate analysis?

Ecological phenomena are inherently complex. As such, it is rare that a single response variable is sufficient to describe an ecological system, entity, or interaction. Rather, multiple response variables - such as the abundances of multiple species - are often measured to gain ecological insight. In addition, it is common to add multiple explanatory variables to an analysis in an attempt to explain the variation in the response data.

Multivariate analyses contend with the complexity of simultaneously analysing multiple response variables. Their use in ecology (e.g. James, 1990) and microbial ecology (Ramette, 2007) has been growing The techniques that fall under this category are diverse and difficult to neatly classify. While some are extensions of standard univariate techniques such as ANOVA, others are more algorithmic in nature, taking advantage of increasing computational power to process complex data sets. Generally, multivariate approaches are favoured to multiple executions of univariate methods as they save time and conserve statistical power which is quickly lost through multiple testing. In some cases, taking multiple variables into account simultaneously may reveal patterns that would not be detectable by univariate methods.

b

Figure 1: a) Univariate data sets feature a single response variable (R1). This may be analysed against one or more explanatory variables (E1-E3). b) Multivariate data sets include multiple response variables (R1...Rn). Multivariate analysis methods allow the evaluation of all response variables simultaneously, rather than requiring multiple executions of univariate methods. In the latter case, multiple testing occurs, which decreases the statistical power of the analysis.

Johnson and Wichern (2002) suggest five types of scientific inquiry most suited to the application of multivariate methods.
1. Data reduction or structural simplification. Several multivariate methods, such as principal components analysis, allow the summary of multiple variables through a comparatively smaller set of 'synthetic' variables generated by the analyses themselves. Thus, high-dimensional patterns are presented in a lower-dimensional space, aiding interpretation.
2. Sorting and grouping. Many ecological questions are concerned with the similarity or dissimilarity of a collection of entities and their assignment to groups. Several multivariate methods, such as cluster analysis and non-metric dimensional scaling, allow detection of potential groups in the data. Active classification based on multivariate data may also be performed by methods such as linear discriminant analysis.
3. Investigation of the dependence among variables. Dependence among response variables, among response and explanatory variables, or among  explanatory variables is of key interest. Methods that detect dependence, such as redundancy analysis, are valuable in detecting influence or covariation.
4. Prediction. Once the dependence among variables has been detected and characterised, multivariate models may be constructed to allow prediction.
5. Hypothesis construction and testing. Exploratory techniques can reveal patterns in data from which hypotheses may be constructed (however, see this warning). Several methods, such as MANOVA or the Mantel test, allow the testing of statistical hypotheses on multivariate data. Appropriately constructed assertions may thus be tested.
The content of this guide attempts to provide non-technical explanations of multivariate methods and link you to relevant literature to deepen your knowledge. In this sense, the guide is a constitutively updated "living review". Some basic statistical fluency, such as familiarity with ANOVA methods and bivariate regression, is useful. Users who require an introduction to or refresher in statistics may find texts like Freedman et al. (2008) and Caldwell (2009) appropriate. A collection of free statistics courses from various universities are also available on-line.