Hierarchical cluster analysis is an algorithmic approach to find discrete groups with varying degrees of (dis)similarity in a data set represented by a (dis)similarity matrix. These groups are hierarchically organised as the algorithms proceed and may be presented as a dendrogram (Figure 1). Many of these algorithms are greedy (i.e. the optimal local solution is always taken in the hope of finding an optimal global solution) and heuristic, requiring the results of cluster analysis to be evaluated for stability by, for example, bootstrapping procedures.
In ecological analyses, hierarchical clustering may be useful in discretising largely continuous ecological phenomena to aid structure detection and hypothesis generation. If data were collected along a gradient, for example, cluster analysis may help to identify (relatively) distinct regions therein which may correspond to an ecologically meaningful grouping. As always, one must consider carefully whether clustering is suitable and meaningful for the task at hand. If a very smooth response gradient (i.e. very even changes in (dis)similarity between objects) is being scrutinised, ordination procedures may be more appropriate. For further discussion, see Legendre and Legendre (1998).

bChaining in singlelinkage cluster analysis 
Figure 1: Schematics of dendrograms representing the results of hierarchical cluster analyses. The position of branch points along a similarity axis indicate the level of shared similarity between clusters and/or individual entitites. These points indicate at what stage clusters were formed during the course of the chosen algorithm. In panel (b), there is evidence of "chaining", or sequential joining of individual entities. 
Agglomerative clustering is a widespread approach to cluster analysis. Agglomerative algorithms successively merge individual entities and clusters that have the highest similarity values. These typically use a linkage criterion (examples below and in Figure 2) to determine the (dis)similarity values for clusters formed during the algorithm's execution. These are valid both for the clustering of objects and of variables provided some measure of (dis)similarity; however, note other clustering techniques may not share this flexibility. Agglomerative algorithms end when all the individual entities and clusters have been merged into a single cluster. Below some common linkage criteria are summarised.
Singlelinkage 
When a new cluster is formed, the (dis)similarities between it and the other clusters and/or individual entities resent are computed based on the (dis)similarity between the nearest two members of each group (i.e. those with the smallest dissimilarity or largest similarity; Figure 2, a).
Singlelinkage clustering is susceptible to an effect known as "chaining" whereby poorly separated, but distinct clusters are merged at an early stage. Chaining is clearly identifiable on dendrograms (Figure 1, b). This may result in clusters which contain members that are considerably dissimilar and should be treated carefully on interpretation. As a "trade off", the singlelinkage criterion allows the detection of irregular clusters, i.e.those which have nebulous, poorly defined shapes with narrow 'bridging regions' in multidimensional space.

Completelinkage 
When a new cluster is formed, the (dis)similarities between it and the other clusters and/or individual entities present are computed based on the (dis)similarity between the farthest two members of each group (i.e. those with the largest dissimilarity or smallest similarity; Figure 2, b). This ensures that all members in a given cluster fall within a region of maximum similarity relative to one another.

Averagelinkage 
This method is also known as the unweighted pairgroup method using arithmetic averages (UPGMA). When a new cluster is formed, the (dis)similarities between it and the other clusters and/or individual entities present are computed based on the average (dis)similarity between all members in each group (Figure 2, c).

Ward's method 
Ward's method (Ward, 1963) determines which clusters and/or individual entities to merge by evaluating the 'cost' of such a merge against an objective function. Merges with the minimum cost are performed at each stage of the algorithm. Typically, this is implemented by evaluating the sum of squared deviations from cluster centroids. Every possible merge is evaluated at each stage of the algorithm and that which yields the smallest increase in the sum of squared deviations is selected. Implementations of Ward's method generally assume that individual entities form regular, hyperellipsoids in multidimensional space. It may not perform well where irregular clusters occur.

aSinglelinkage clustering

bCompletelinkage clustering

cAveragelinkage clustering (UPGMA)

Figure 2: Examples of rules employed by hierarchical clustering algorithms to create new clusters. a) Singlelinkage or nearestneighbour clustering merges clusters based on the (dis)similarities between their nearest members. b) Complete or farthestneighbour clustering takes into account the dissimilarities between the most distant members of a cluster. c) Averagelinkage clustering, also called the unweighted pairgroup method using arithmetic averages (UPGMA), defines the (dis)similarity between clusters as the average pairwise distance between all their members. Other methods of clustering are available including those which cluster based on minimising information loss on merging clusters (e.g. Ward's method) or probabilistic measures evaluating cluster membership with reference to statistical distributions (e.g. Vtype clustering) as well as fuzzy clustering approaches which allow for more relaxed group membership. 
Warnings
 Hierarchical clustering methods can become computationally intensive for large data sets and bootstrapping results may be prohibitively expensive. Implementations of clustering algorithms optimised for efficiency are available for large data sets (See Olson, 1995; Day, 1984; Defrays, 1977; Sibson, 1973)
 If more than one approach seems appropriate to analysing the data at hand, each should be used and the results compared. Evaluating the differences in these results with an understanding of each method's rationale can often reveal interesting relationships between objects and variables. Clusters that are repeatedly found by the methods chosen may be considered robust.
 Many hierarchical clustering approaches are sensitive to outliers.
 Being greedy, heuristic algorithms, misgroupings that occur at early stages are not corrected at later stages. Many implementations support bootstrapping or other resampling techniques to assess the stability of a clustering solution and suggest a consensus grouping.
 Be aware of caveats and known issues associated with the wide range of clustering methods available. Some may have profound consequences on interpretability.
Walkthroughs featuring canonical correspondence analysis
Implementations
 The hclust() function in R's stats package
 The pvclust() in the pvclust package; further, this package allows bootstrapping and indication of highly supported clusters with pvrect().
MASAME hierarchical clustering app
