The main idea...
Linear discriminant analysis (LDA) is a method to evaluate how well a group of variables supports an a priori grouping of objects. It is based on work by Fisher (1936) and is closely related to other linear methods such as MANOVA, multiple linear regression, principal components analysis (PCA), and factor analysis (FA). In LDA, a grouping variable is treated as the response variable and is expected to be categorical. Groupings should reflect ecologically relevant knowledge, such as sampling environment or method, or reflect the results of an exploratory method such as cluster analysis or nonmetric dimensional scaling. In the latter case, it is vital that the exploratory method was performed on an independent data set to avoid data dredging.
To evaluate groupings, a typical LDA performs an ANOVA or MANOVA on the explanatory variable(s) (i.e. any variable other than a grouping variable). If a significant difference among the groups is found, LDA attempts to find linear combinations of the explanatory variables that best discriminate between the groups defined. It then constructs discriminant functions based on these combinations. The resulting functions can be used to classify new objects described by the same explanatory variables used in the LDA; however, LDA itself is not a classification method. Further, the contribution of each of the explanatory variables to the discriminant functions may be examined allowing, for example, variables with high discriminatory power to be identified.


Figure 1: Schematic illustrating the discovery of a linear function that maximally discriminates between two groups described by two variables. Examining the original variables (a), there is some overlap between group distribution. A linear combination of these variables (b) shows better performance. A decision boundary can be built orthogonally from the linear combination (b, red dashed line). Group centroids are indicated by points and dispersion by coloured circles. 
Results and evaluation
Implementations of LDA will often deliver the following results:
Discriminant functions 
These are the functions created to discriminate between the groups. The number of functions needed depends on the number of levels within a group and the number of explanatory variables. The actual form of the function is usually not shown by default, but information about its performance is delivered (see below).
The functions themselves are analogous to those from multiple linear regression. Hence, coefficient values associated with explanatory variables (here, any variable that is not a grouping variable) indicate how much a given explanatory variable contributed to a discriminant function. Coefficients represent the partial contribution of each variable to the discriminant function (i.e. the contribution of the variable after removing the contributions of all other variables, which may overlap) Standardised and unstandardised coefficients are available, the former being suited to comparing the relative 'importance' of variables in each function.

Discriminant function eigenvalues 
An eigenvalue is associated with each discriminant function. Their magnitude indicates how good the function was at discriminating between groups by expressing how much variance a given function was able to account for.

Variance explained 
Usually shown as a percentage, the variance explained is a proportion of the variance accounted for by a discriminant function, relative to the total variance.

Canonical correlation

This is the canonical correlation between the explanatory variables (or the linear combination thereof used to build a discriminant function) and the grouping(s), represented as a matrix of dummy variables. The strength of the correlation corresponds to the discriminatory power of a set of variables, or their linear combination.

Tests of significance 
Tests of significance may be performed on the matrix of canonical correlation. A standard null hypothesis is that the canonical correlation is equal to "0"  i.e. the explanatory variables have no discriminatory power. Wilks' (λ) lambda and χ^{2} (chisquared) tests are typically used

Structure matrix 
This matrix shows the correlations between the values of the explanatory variables and those of the discriminant function(s). Variables with stronger correlations can be considered more 'important' for the performance of a given discriminant function. These values are akin to loadings in principal components analysis and factor analysis. A discriminant function's interpretation is generally tied to its most 'important' variables.

Evaluating the results of an LDA is best done empirically, through classification approaches. This can be done by using discriminant functions to classify a new set of objects described by the same variables into the same groups of the original, or 'training' data. Of course, objects must have known group membership. The misclassification (or error) rates will provide an indication of how trustworthy the discriminant functions are when faced with new data.
When more than two groups are described, multiple discriminant analysis (MDA) should be used. Note, however, that many LDA routines will automatically perform MDA when three or more groups are detected.
Key assumptions
 The groups can be discriminated between by linear combinations of the explanatory variables.
 Explanatory variables are continuous. Categorical explanatory variables should be evaluated by, e.g., discriminant correspondence analysis.
 Each set of explanatory variables should show (close to) multivariate normal distributions within each group defined.
 The groups should have (near) equal covariances.
 There should be at least two groups.
 There should be at least two objects per group.
 Variables should be homoscedastic. If the mean of a variable is correlated with its variance, significance tests may be invalid.
 There should be no linear dependency between explanatory variables.
Warnings
 LDA is sensitive to outliers. These should be identified and treated accordingly.
 LDA is only suitable when evaluating the variables' ability to linearly discriminate between any grouping.
 Highly correlated variables will contribute very similarly to an LDA solution and may be redundant. Thus, variables that are uncorrelated are preferable.
 While unequal group sizes can be tolerated, very large differences in group sizes can distort results, particularly if there are very few (< 20) objects per group.
 If ANOVA/MANOVA tests on a given set of explanatory variables are insignificant, LDA is unlikely to be useful.
 When interpreting the coefficients of a discriminant function, carefully distinguish between standardised and unstandardised coefficients.
 Heteroscedasticity is likely to lead to invalid significance tests.
Implementations
 R
 The lda() function from the MASS package in R.
 The DiscriMiner package hosts a range of functions for discriminant analyses, including LDA.
 The generic predict() function (from the stats pacakge) can be used to classify unknown objects into the classes of an LDA R object.
