Cluster analysis & ordination

The main idea...

Combining the results of a hierarchical cluster analysis with an ordination, such as that produced from non-metric dimensional scaling (NMDS), can help validate potential clusters by providing alternate perspectives on the data at hand. In Figure 1, clustering and NMDS results are superimposed in the bottom-left panel. In this example, the results largely agree: objects that were clustered together were also ordinated close to one another. 

Disagreements may manifest as objects ordinated far away from one another (relative to other objects) while being clustered at a high level of similarity. When disagreement is detected, consider looking more critically at the data and methods used to determine the source of the disagreement. If no explanation can be found, the confidence in the results produced by either method should be tempered. Be aware, however, that seemingly overlapping clusters may be distinct if regarded from a higher-dimensional space, especially when handling NMDS ordinations with higher stress values. If you suspect this to be the case, you may introduce more dimensions into your solution and examine the results by examining two- or three-dimensional plots of differing sets of dimensions. Having to do so, however, suggests that the data set is not well-suited to the chosen ordination.

Figure 1: A distance matrix (a) provides input for both b) hierarchical cluster analysis and c) non-metric dimensional scaling. The results of the cluster analysis may be superimposed on the ordination  (d) to validate that each solution corroborates the other. Adapted from Ramette 2007, originally adapted from Legendre & Legendre 1998.

  • Consider how the (dis)similarity measures or distances used in cluster analysis correspond to those used in the ordination. If these are very different, confusing results may emerge in the visualisation.
    • R
      • ordicluster() from the vegan package allows the overlay of an hclust object on any ordination.
      • Colour coding points in an NMDS ordination is possible by combining hclust(), cutree()metaMDS() (from the package, vegan), and R's plotting functions. After generating a cluster analysis object from hclust(), groups may be defined using cutree(), based on the clustering. A colour vector, with as many colours as groups, will be indexed by the cutree() object when plotting the points of a metaMDS() solution.