Other methods‎ > ‎

Factor analysis

The main idea...

Factor analysis (FA) is an exploratory technique closely related to principal components analysis (PCA); however, is designed to detect latent (hidden) variables that are represented by highly-correlated response variables. More formally, FA involves steps to assess whether data are consistent with a factor model (Equation 1), which summarises the covariance relationships between many variables using a small number of factors. Factors are understood to underlie the behaviour of the observed variables while being unobserved (or unobservable) themselves. Thus, if groups of highly-correlated variables exist and these groups show only weak correlations to each other, orthogonal FA can be used to assess whether each group can be explained by some unifying factor. Oblique FA methods exist to assess groups of variables which are distinct, but may have some correlation with one another. FA expects a table of response variables as input with a sufficiently high object to variable ratio (see Assumptions).

While conceptually approachable, FA requires more user input relative to PCA, and several decisions are non-trivial. This page summarises some key ideas required to approach FA implementations.

Figure 1: A schematic illustration of a simple factor analysis plot. Two groups each comprising three, well-correlated variables (red arrows) were used to build two factors (axes) that summarised the variable's covariance structure. The first factor was able to summarise 54% of the variance, while the second summarised 21%. The factor model is thus able to account for 75% of the variance in the data.

Equation 1: An orthogonal factor model (matrix notation). X is a matrix of random variables with means stored in a matrix, μ. The factor model holds that X linearly depends on a matrix of a few common factors (F) and a matrix of error terms for each variable (ε). Each variable has a loading on each common factor stored in a matrix of factor loadings (L)  In addition to common sources of error, error terms may include specific factors which may explain the behaviour of single variables, but which are not of interest in a standard FA.

Selecting a method

Multiple methods to 'extract' common factors are available. Osborne and Costello (2005) recommended maximum likelihood (ML) estimation for (approximately) normally distributed data or principal axis factoring (PAF) for strongly non-normal data. Other methods may be suitable for more specific scenarios.

Results, decisions, and interpretation

FA will generate a list of common factors along with information on how much variance in the original data they account for. Typical reports include:
  • A percentage of the total variance explained by the FA. While higher percentages indicate a better solution, a figure between 50% and 60% may also be acceptable if the results are interpretable and the factor model simple (see below).
  • Eigenvalues reported for each factor are proportional to the strength of its relationship with the original variables.
  • The amount of a given variable's variance that can be 'explained' by the factors in the model is referred to as that variable's communality. 
  • The correlation of a common factor with a variable is known as the factor loading of that variable. 
  • The proportion of variance explained by that variable's error term is the uniqueness of the variable. 
An initial number of factors should be specified and FA executed without rotation. Examining the quantities above and judging the interpretability of the results will provide motivation and guidance in performing the retention and rotation steps below. These steps (described below) are performed until a satisfactory solution is reached or FA is abandoned.

Factor retention

Following the first execution of FA, the factor model can be simplified by removing factors that do not explain sufficient variance to be included. Removing factors with less explanatory power may improve the model and aid interpretability. The choices made at this stage will strongly influence how successful the rotation procedure will be and the overall results of the FA.

It is often desirable to retain factors that explain a larger amount of variation relative to the other factors. This is useful in simplifying the factor model. The number of factors to retain in the final model may follow existing knowledge (i.e. a consensus that their are two main drivers of a given change in an ecosystem, measured by an appropriate set of variables) or be determined by methods such as the Guttman-Kaiser criterion. This popular but widely criticised criterion retains only common factors that account for more variation than any of the original variables: i.e. factors with eigenvalues > 1 (Yeomans and Golder, 1982). Other methods such as the Velicer partial correlation procedure (Velicer and Jackson, 1990), Bartlett's test, the broken-stick model, or visual inspection of a scree plot (Cattell, 1966) may also be employed. Further, Hayton et al. (2004) argue that the less popular method of parallel analysis may offer better guidance on factor retention. 

Performing several of these approaches can be informative and help determine the final number of factors to retain. Osborne and Costello (2005) suggest an evaluation where FA (including the rotation step, below) is run multiple times using the factor numbers suggested by different techniques as well as factor numbers immediately above and below these. The model where variable loadings are moderate to high (generally above 0.3), cross-loadings (i.e. loadings for more than one factor) are absent or scarce, and multiple variables are associated with each factor retained is selected as that with the best fit to the data

The effect of overextraction and underextraction of common factors has been investigated (Velicer and Jackson, 1990; Fava and Velicer, 1992; Fava and Velicer, 1996; Wood et al.,1996) with Wood et al. (1996) recommending running the risk of overextraction (retaining more factors than needed) rather than underextraction.

Variable retention

If the ultimate goal of FA is more a question of guidance in model building rather than exploration, it may justifiable to remove variables which complicate the factor model produced. Post hoc variable removal should be performed with extreme caution as it may not be justifiable with regard to the study design. Further, it is easy to place too much weight on a few variables that support the results without considering the overall contributions. Variables which show weak correlation to retained common factors (i.e. have low communalities) and which have large cross-loadings are candidates for removal.

Factor rotation

Factor rotation is a procedure which attempts to clarify which variables show the highest association with a given factor. It does not dramatically change the results of FA, but rather attempts to simplify the relationships between factors and variables for greater interpretability.

If the factors are assumed to be orthogonal, the popular and widely implemented "varimax" (Kaiser, 1958) procedure is typically used. Varimax minimises factor correlation by maximising the variance of squared factor loadings among all the original variables. It then becomes easier to attribute a given variable to a single factor and the overall interpretability of the FA may be simplified.

In ecological scenarios, it is more likely to expect factors are at least mildly correlated, hence, oblique rotations may be preferable. However, the stronger the intercorrelation that exists between factors, the more difficult the interpretation is likely to be. The popular "oblimin" procedure (Jennrich and Sampson, 1966) allows factor correlation to an extent controlled by a parameter, delta (δ). The more negative δ is, the more constrained to orthogonality the factors will be. It is generally recommended to use a delta of zero to allow oblique rotation. More positive values may lead to instability in the minimisation procedure used in oblimin. For large data sets, the "promax" rotation (Hendrickson and White 1964) may be more appropriate. This rotation performs a varimax rotation and subsequently 'relaxes' the orthogonality constraint to reach a better fit.

For exploratory purposes, it is not necessary to choose the 'best' rotation method, but rather a method that performs well in characterising the factors extracted. This choice requires human decision-making based on knowledge of the system under scrutiny. For more in-depth discussion on rotation methods, consult Warburton (1963) and Browne (2001). 

Key assumptions
  • Standard implementations of FA typically assume linear relationships between variables and factors.
  • A sufficient number of samples (objects), relative to the number of variables, must be present. Different opinions exist on establishing this minimum. See Mundfrom et al. (2005), Comrey and Lee (1992), and Barrett and Kline (1981). The minimum number of objects is generally a function of the ratio of variables to factors and the communality of the variables. High communalities, a small number of common factors, and low variable cross-loadings between factors generally decrease the need for larger sample numbers. 
  • At least some significant correlations exist between variables. A significant result from Bartlett's test of sphericity is a good indication this assumption is met.
  • When using orthogonal FA, the factors are expected to be uncorrelated. If you believe or have foreknowledge that suggests the factors are correlated, consider oblique FA.
  • The sum of partial correlation coefficients between the original variables should be small relative to the sum of their original (zero-order) correlation coefficients. This means that the variables are, in general, well-correlated and hence suitable for representation by common factors. A Kaiser-Meyer-Olkin (KMO) statistic greater than 0.5 indicates this assumption is met while values above 0.8 indicate a 'good' sample (see Cerny and Kaiser, 1977).
  • Normality is assumed by several factor extraction methods, but even these are often robust to violations of normality. Roughly symmetrical distributions without heavy skew are often sufficient.
  • If the number of common factors is close to or equal to the number of original variables, reconsider the need for FA. 
  • FA is sensitive to outliers and screening should be performed to identify and appropriately handle these
  • If correlated variables form a few, very different groups, FA may simply create common factors that reflect large differences while ignoring potential sub-groups which may be of greater interest. 
  • The construction of informative factors depends on the quality of the input data. Simply because some variables are poorly correlated with others does not mean they do not belong to an important common factor. It may be the case that their correlates were not measured and hence the factors overlooked. Knowledge of the system being analysed should guide the selection of appropriate input variables. 
  • Caution should be taken when interpreting and naming common factors. Justification for naming factors should come from background knowledge or other corroborating evidence external to the FA itself. 
  • FA is based on covariance matrices, thus, no directionality of relationships can be asserted as in regression analyses.
  • High cross-loadings, low communalities, and a small number of variables associated to any given factor indicate that the data are not suited to FA.

  • R
    • The factanal() function in the stats pacakge.
    • The "oblimin" rotation is available in the GPArotation library
  • SAS
  • BMDP
  • IBM SPSS Statistics