(Dis)similarity-based methods

The main idea...

Many ecological questions are concerned with finding groups in data sets or determining how (dis)similar sites, samples, or other entities are relative to one another. This (dis)similarity is established by comparing objects based on the variable values associated with them through a (dis)similarity measure (Figure 1). The validity of (dis)similarity-based methods hinges on the use of the correct (dis)similarity coefficient. These coefficients define what (dis)similarity means in a particular analysis (i.e. for a particular system, entity, phenomenon, etc). If you're unsure of which coefficient to use, please use the (dis)similarity wizard to help you select an appropriate coefficient. 

Be aware, that (dis)similarity methods have come under criticism for mis-specifying the mean-variance relationship present in count data (such as abundance data). See Warton et al. (2012).

Below, several distance-based methods are briefly described and links to their summary pages provided. This collection is by no means exhaustive, however, can accomplish a range of tasks useful in analysing ecological data.




Figure 1: A matrix of raw data (a), with samples as rows and variables as columns, may be converted into a symmetrical (dis)similarity matrix (b) using a (dis)similarity measure to assert the 'closeness' of each sample relative to every other sample. Here, the Bray-Curtis dissimilarity was calculated. Samples S1 and S2 are most similar while samples S2 and S4 as well as S3 and S4 are most dissimilar.

Ordination methods

 Principal coordinates analysis
PCoA attempts to summarise and represent inter-object (dis)similarity in a low-dimensional, Euclidean space

 Non-metric multidimensional scaling NMDS is a robust ordination approach which attempts to represent, as closely as possible, the pairwise (dis)similarity between objects in a low-dimensional space. NMDS is a rank-based approach. This means that the original distance data is substituted with ranks.

 Distance-based redundancy analysis Distance-based redundancy analysis (db-RDA) is a means to conduct RDA, a method which is intended to detect linear relationships, on a (dis)similarity matrix generated by measures which may be non-linear. This is a constrained analysis.

Clustering methods 

Hierarchical cluster analysis Hierarchical cluster analysis may be performed using an "object x object" matrix of (dis)similarities or distances. It attempts to find a good, although perhaps not the best, grouping of objects based on the distances supplied in a hierarchical manner, first grouping objects with the lowest dissimilarities before proceeding.

Non-hierarchical cluster analysis This method attempts to find a grouping of objects that optimise some evaluating criterion (which may be a (dis)similarity measure) by iteratively reassigning objects to groups in such a way as to improve the criterion value. 

Clustering methods are not to be confused with classification methods, in which objects are sorted into a number of predefined groups based on a set of rules.

Hypothesis testing

Analysis of Similarity Given a dissimilarity matrix with objects placed into groups, ANOSIM can test whether there is a difference between the groups of dissimilarities.

Mantel test The Mantel test may be used to calculate correlations between corresponding positions of two (dis)similarity or distance matrices and can test whether the distances among objects in one matrix are linearly correlated with those in another matrix.