Choosing the right measureYour choice of (dis)similarity measure is likely to have major impact on your results. Understanding how each measure affects your data and which one is suitable is an essential part of many analyses. The page below discusses some of these measures. If you're unable to decide on a measure, consider using our (dis)similarity wizard to help you decide what sort of measure may be most appropriate..
(Dis)similarity, distance, and dependence measures are powerful tools in determining ecological association and resemblance. Choosing an appropriate measure is essential as it will strongly affect how your data is treated during analysis and what kind of interpretations are meaningful. Nonmetric dimensional scaling, principal coordinate analysis, and cluster analysis are examples of analyses that are strongly influenced by the choice of (dis)similarity measure used. Note, that while these measures may draw out certain types of relationships in your raw data, they may do so at the expense of other information present therein. Below, several key measures for asserting ecological resemblance are introduced. For a more complete overview, see chapter seven in Legendre & Legendre's Numerical Ecology (1998). For a critical view on the use of dissimilarity and distance, see Warton et al. (2012).
When choosing a distance measure, ensure that the measure reflects the ecological relationships you are concerned with. Further, some measures have mathematical properties that make them unsuitable for certain analyses. Similarly, certain analyses will only produce meaningful results when certain measures are used. If a measure listed below sounds suited to your data, use more detailed resources to learn about its properties and limitations before drawing any conclusions from analyses based upon it. The list below is not exhaustive, but aims to familiarise you with a set of commonly used measures and their uses.
Q mode similarity measures
As noted above, similarity measures (S) are never metric, thus objects cannot be ordinated in a metric or Euclidean space based on their similarities. Converting similarities to distances can allow such ordination. This can be done simply by taking their onecomplement (1S) or its square root. Below, a few common measures are described below. For an extensive overview, see Legendre and Legendre (1998). 
Simplematching coefficient  This coefficient gives equal weight to both forms of match  double zeros and double ones, and is thus a symmetrical coefficient. 
Jaccard coefficient  This coefficient excludes double zeros, giving equal weight to nonzero agreements ("1", "1") and disagreements ("1", "0" and "0", "1") when comparing two objects. Given a "sites x species" matrix, the Jaccard coefficient can be used to express species/OTU turnover. 
Sørensen / Dice coefficient 
This coefficient is similar to the Jaccard coefficient, however, gives double weight to nonzero agreements. This asserts that the cooccurrence or coincidence of variable states among objects is more informative or important than disagreements. This is based on the logic of the harmonic mean and is thus suitable for data sets with largevalued outliers. It may, however, increase the influence of smallvalued outliers. 
Other binary measures are available which treat doublezero agreements, doubleone agreements, and disagreements differently for a variety of reasons. Consider carefully if any special meaning is indicated by the different matching states of the binary variables in your data set and ensure that the measure chosen adequately reflects these.
Quantitative measures
Quantitative coefficients take into account values other than "0" and "1". Some quantitative measures lessen the effect of relatively large or small variable values in a data set to preserve overall interpretability. However, other measures are sensitive to large quantitative differences and perform better on transformed data.
Gower coefficient  This coefficient may be used for heterogeneous data sets (i.e. data sets including numerous variable types). It calculates a partial similarity value of two objects for each variable describing them. The final similarity score is the average of all partial similarities. Binary, qualitative and semiquantitative, and quantitative variables are treated differently.

Gower, 1971 
Steinhaus coefficient 
This asymmetric coefficient is widely used for raw count data. It compares the sum of the minimum, pervariable values between two objects to the average value of all variables describing these objects. If applied to binary data, this is equivalent to the Sørensen coefficient. The onecomplement of this coefficient is the popular BrayCurtis dissimilarity measure. 

== incomplete ==

There are three groups of dissimilarity measues: metric, semimetric, and nonmetric. See the "Key terminology" section of this page for definitions.
Metric distances
Euclidean distance  A simple, symmetrical metric using the Pythagorean formula. The more variables present in a data set, the larger one may expect Euclidean distances to be. Further, double zeros result in decreased distances. This property makes the Euclidean distance unsuitable for many ecological data sets and ecologicallymotivated transformations should be considered. Principal components analysis and redundancy analysis ordinate objects using Euclidean distances. 

Chord distance  This asymmetric distance measure is simply the Euclidean distance calculated for a row standardised matrix (see the chord transformation). Rather than comparing absolute values, the chord distance compares objects based on the proportion of a given value to the sum of all variable values across the row corresponding to that object. Thus, even if objects have different raw values of two or more variables, as long as these values are proportionately equivalent when standardised, the sites will be considered similar. The Chord distance is insensitive to double zeros. 
Orlóci, 1967 
Mahalanobis distance  Appropriate for comparing groups of objects described by the same variables, this coefficient eliminates the effect of correlations between variables and is arrived at through the calculation of a covariance matrix from the input matrix. It also eliminates differences in scale between variables. Alternative forms of this measure may be used to calculate the distance between a group and a single object. 
Mahalanobis, 1936 
Coefficient of racial likeness  Appropriate for comparing groups of objects described by the same variables, this coefficient does not eliminate the effect of correlations between variables. This may be desirable when samples are too small to effectively remove correlative effects (see e.g. Penrose, 1952) 
Pearson, 1926 
χ2 metric  The calculation of this asymmetric metric transforms a matrix of quantitative values into a matrix of conditional probabilities (i.e. the quotient of a given value in a cell and either the row or column totals). A weighted Euclidean distance measure is then computed based on the values in the rows (or columns in R mode analysis) of the conditional probability matrix. Weights, which are the reciprocal of the variable (column) totals from the raw data matrix, serve to reduce the influence of the highest values measured. 

χ2 distance  This asymmetric distance is similar to the χ2 metric, however, the weighted Euclidean distances are multiplied by the total of all values in the raw data matrix. This converts the weights in the Euclidean distances to probabilities rather than column totals. This is the measure used in correspondence analysis and related analyses. 
Lebart & Fénelon, 1971 
Hellinger distance  This asymmetric distance is similar
to the χ2 metric. While no weights are applied, the square roots of conditional probabilities are used as variancestabilising data transformations. This distance measure performs well in linear ordination. Variables with few nonzero counts (such as rare species) are given lower weights. 
Hellinger, 1909; Rao, 1995 
Manhattan metric  Similar to the Euclidean distance; however, rather than using the Pythagorean formula, the Manhattan distance simply sums the absolute differences across pairs of variable values for a given object. Just like the Euclidean distance, this metric suffers from the double zero problem and distances reported will increase with the number of variables assessed. 

Canberra metric  This metric excludes double zeros and increases the effect of differences between variables with low values or many zeroes.  Lance & Williams, 1966 
Jaccard distance  The one complement of the Jaccard similarity (described above), is a metric distance. 
Semimetric measures
As described above, semimetric measures do not always satisfy the triangle inequality and hence cannot be fully relied upon to represent dissimilarities in a Euclidean space without appropriate transformation. That being said, they often do behave metrically and can be used in principal coordinates analysis (following an adjustment for negative eigenvalues if necessary) and nonmetric dimensional scaling.
BrayCurtis dissimilarity  This is an asymmetrical measure often used for raw count data. This is the onecomplement of the Steinhaus similarity coefficient and a popular measure of dissimilarity in ecology. This measure treats differences between high and low variable values equally. 
Bray & Curtis, 1957 
Sørensen dissimilarity  The one complement of the Sørensen similarity coefficient (described above) is a semimetric dissimilarity measure. 

Nonmetric measures
As noted by Legendre and Legendre (1998), nonmetric dissimilarity measures, such as a binary coefficient proposed by Kulczynski (1928) which is the quotient of double presences and disagreements, may assume negative values. As negative dissimilarities are intuitively nonsensical, they are problematic for interpretation. In general, these should be avoided unless there is a very clear reason to use them.
R mode measures of dependence
R mode measures express the relationships between variables. With some exceptions, Qmode measures are generally not useful or meaningful in Rmode analysis. See Legendre and Legendre (1998) and Ludwig and Reynolds (1988) for an explanation of what constitutes a permissible Rmode measure. Often, Rmode measures are referred to as dependence coefficients as they express how much the values of one variable can be said to depend on the states of another variable. Wellknown correlation measures are examples of R mode measures.
Pearson's r  This familiar measure of linear correlation between two variables, suitable only for detecting linear relationships between variables. This is covariance between two variables divided by the product of their standard deviations. If your variables have many zeros, this correlation coefficient will not be reliable as doublezeros will be understood as an "agreement" when, in fact, they are simply the absence of an observation. This will inflate the correlation coefficient. 

Spearman's rho  This is a nonparametric measure of correlation which uses ranks rather than the original variable values. Variables should have monotonic relationships: that is, their ranks should either go up or down across objects, but not necessarily in a linear fashion. Like Pearson's r, Spearman's rho is based on the principal of least squares, but is concerned with how strongly the rankings between two variables disagree. The larger the disagreement the lower the rho value. This statistic is sensitive to large disagreements. That is, if one variable ranks an object as "1" and another variable ranks the same object as "100", the correlation reported by Spearman's rho will be strongly affected (relative to Kendall's tau, for example), even if these variables agree on all other ranks. This measure is suitable for raw or standardized abundance data and any monotonically related variables. 

Kendall's τ  Like the Spearman's rho, Kendall's tau uses ranked values to calculate correlation. This measure, however, is not based on the principal of least squares and instead expresses the degree of concordance between two rankings. The tau statistic is the quotient of 1) the difference between concordant and discordant pairs (i.e. ranks that agree and ranks that differ) and 2) the total number of pairs compared. This statistic is not sensitive to the scale of the disagreement. As above, variables should have monotonic relationships: that is, their ranks should either go up or down across objects, but not necessarily in a linear fashion. This measure is suitable for raw or standardized abundance data and any monotonically related variables. 

χ2 similarity, metric, and distance 
The χ2 similarity, metric, and distance measures (see above for description) may also be used for Rmode analysis. These are useful when monotonic relationships are not present and are appropriate for raw abundance data, qualitative and ordinal data. 

Hellinger distance 
Described above, the Hellinger distance is useful for variables populated with abundance data. 

Symmetric uncertainty coefficient  This coefficient is based in the logic of information theory. It expresses the amount of information shared between two variables using contingency tables and Shannon's information formula. Resorting to contingency tables is useful when dealing with qualitative variables with no monotonic relationships. Probabilities of association can be calculated and then translated into measures of dependence. Legendre and Legendre (1998) offer a developed discussion on information theory in numerical ecology. 

Jaccard coefficient  This coefficient excludes double zeros, giving equal weight to nonzero agreements ("1", "1") and disagreements ("1", "0" and "0", "1") when comparing two objects. Given a "sites x species" matrix, the Jaccard coefficient can be used to express species/OTU turnover.  
Dice coefficient  This coefficient is similar to the Jaccard coefficient, however, gives double weight to nonzero agreements. This asserts that the cooccurrence or coincidence of variable states among objects is more informative or important than disagreements.This is based on the logic of the harmonic mean and is thus suitable for data sets with largevalued outliers. It may, however, increase the influence of smallvalued outliers.  
Ochai index  The Ochai index is the quotient of the total nonzero agreements ("1", "1") between two variables and the the product of the square rooted sums of nonzero agreements and each form of disagreement (i.e. "0","1" and "1","0"). Thus, this measure is based on the logic of the geometric mean and values with different ranges will be normalised before a central value is proposed. This is particularly suitable when the ranges and variance of agreements and disagreements are very different from one another. 
Implementations
 R
 vegdist() in the vegan package
 dist() in the package
 distance() or bcdist() in the ecodist package
 daisy() in the cluster package can compute a Gower index for both quantitative and categorical variables
 R
 vegdist() in the vegan package
 dist() in the package
 distance() or bcdist() in the ecodist package
 daisy() in the cluster package can compute a Gower index for both quantitative and categorical variables
References
 Bray JR, Curtis JT (1957). An ordination of upland forest communities of southern Wisconsin. Ecol Monogr. 27:325349.
 Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 9780444892508.
 Gower JC (1971) A General Coefficient of Similarity and Some of Its Properties. Biometrics. 27(4):857871
 Gower JC, Legendre P (1986) Metric and Euclidean properties of dissimilarity coefficients. J Classif. 3(1): 548.
 Hellinger E (1909) Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J Reine Angew Math. 136:210–271.
 Kulczynski S (1928) Die Pflanzenassoziationen der Pieninen. Bull Int Acad Pol Sci Lett Cl Sci Math Nat Ser B. Suppl. II (1927):57203.
 Lance GN, Williams WT (1966) Computer programs for hierarchical polythetic classification (“similarity analysis”). Comput J. 9:6064.
 Ludwig JA, Reynolds JF. Statistical ecology: A primer on methods and computing. New York: Wiley, 1988.
 Orlóci L (1967) An agglomerative method for classification of plant communities. J Ecol. 55:193205
 Mahalanobis, PC (1936) On the generalised distance in statistics Proceedings of the National Institute of Sciences of India 2(1): 49–55.
 Pearson K (1926) On the coefficient of racial likeness. Biometrika 18:105117.
 Penrose LS (1952) Distance, size and shape. Ann Eugen. 17(1):337343.
 Rao CR. The use of Hellinger distance in graphical displays of contingency table data. In: Multivariate Statistics and Matrices in Statistics: Proceedings of the 5th Tartu Conference, Tartu, Pühajärve, Estonia, 2325 May 1994. Tiit EM, Kollo T, Niemi H (Eds.) Zeist: VSP BV, 1995. ISBN 9067641952.
 Warton DI, Wright ST, Wang Y (2012). Distancebased Multivariate Analyses Confound Location and Dispersion Effects. Methods Ecol Evol. 3(1):89–101.