Reference‎ > ‎

Data transformations

The main idea...

Occasionally, the variables in a "raw" data set have properties that violate an assumption of a statistical procedure (e.g. normally distributed values) or which cannot be compared to other variables due to differences in scale or variability. For example, principal components analysis (PCA) requires that variables be linearly related to one another and on roughly the same scale or will perform poorly. Rather than abandoning an analysis due to inappropriate data structure, it may be possible to transform the variables so they satisfy the conditions in question. A transformation involves the application of a mathematical procedure to every value of a given variable or set of variables to create a new set of values. The new values of the transformed variables should still represent the data, but will be more amenable to analysis or comparison. 

The sections below first describe some basic transformations and then discuss transformations specifically geared towards comparing variables. A set of ecologically-motivated transformations intended to allow Euclidean representation of ecological dissimilarities by methods such as PCA and redundancy analysis (RDA) are also summarised.

Before you  begin transforming your data, ensure there is a defined and well-supported reason to do so. Common rationale includes linearising, normalising, or standardising data in order to respect a method's assumptions.



Figure 1: Schematics illustrating a linear and square root transformation. 
a) A linear transformation where the variable "y" is transformed into " y' " through a translation "b" and an expansion "m" transformation. This can be expressed by the linear equation y' = my + b. This transformation may be used to place two or more linearly related variables on the same scale. In this illustration, both "b" and "m" are positive leading to a translation to the right and an expansion, respectively. b) A square root transformation. Larger values of a variable "y" are affected more strongly than smaller values. This transformation is useful when positive data shows a positive skew and a more Gaussian distribution is desired. Hollow circles indicate former positions of values along an axis. 

Basic transformations

A few basic but popular data transformations are described below. The main motivations for applying these transformations include placing variables on similar scales, simplifying calculations, meeting distributional assumptions (such as normality), and dealing with heteroscedasticity

Linear Using the linear equation y' = my + b, you can rescale a linear variable, y, into y' through a translation (b) and an expansion (m) parameter (see Figure 1a). 
Logarithmic Transforming positive data to a logarithmic scale reduces the range of the data set. The relative change of a variable, whose values are expressed as an exponent with respect to some base, is emphasised over its absolute change. Common bases are 2 (the binary logarithm), Euler's constant e (the natural logarithm), and 10. Transforming exponentially related variables using appropriate bases can linearise their relationship. Further, logarithmic transformations may be used to normalise certain data distributions.
Square root A common power transformation (see below), transforming positive data with a square root transformation reduces the data's range, compressing large values more than smaller values (see Figure 1b). This transformation is useful in transforming variables with a small proportion of large values which distort the overall distribution. 

Power transformations Power transformations involve raising positive values of a variable to some predetermined exponent (λ) in order to improve linearity, reduce heteroscedasticity and promote symmetric distribution of residuals (and thus normality). The square root transformation is an example of a power transformation (λ = 0.5) as is the inverse function (λ = -1). The definition of the power transformation (Equation 1) invokes a logarithmic transformation when λ = 0 to allow continuous transformations. 

Box-CoxSimilar to a power transformation (Equation 2), the Box-Cox transformation is part of a procedure to determine the 'best' value of an exponent, λ, to maximise some property of the variable (e.g. normality, linear correlation) being transformed in relation to a known distribution or another variable. Consider using the Box-Cox technique (Box and Cox, 1964; Figure 2) when there is no clear reason to use a specific power transformation.

Equation 1: The power transformation expressed as a piecewise function. Resorting to a log transformation when λ = 0 allows the power transformation to remain continuous for all non-negative real numbers. Equation 2: The Box-Cox transformation. This transformation is used in the Box-Cox procedure to estimate a value of λ which best transforms the variable to meet some criterion such as normality or linearity (see Figure 2 for illustration). The natural logarithm of original values is taken when λ = 0.



Figure 2: Box Cox plots for determining optimal 
λ values for a) normalising and b) linearising transformations. a) λ is chosen such that it maximises the correlation of a Box-Cox-transfromed variable, X, with a comparable normal distribution, N(μ,σ). In this illustration, a square root transformation (λ = 0.5) appears to be a good choice. b) λ is chosen such that it maximises the correlation between the variable being transformed, X, and another variable, Y. In the illustrated case, squaring the variable (λ = 2) appears to be a good linearising transformation. If the variables X and Y were negatively correlated, the λ corresponding to the minimum (i.e. most negative) correlation would be chosen.

Transformations in aid of comparability

Transformation can also promote the comparability of variables that have different magnitudes, variability, or scale such as those that describe different quantities (e.g. pH and enzyme rates). The transformations described below, discussed in more detail by Legendre and Legendre (1998), are applied to two or more variables in order to place them on comparable scales. Which transformation is appropriate to your data will depend on whether you need to correct for differences in magnitude, variability, or both between the variables in question.

Centring by translation This strandardisation is simply a translation (see Figure 1a, parameter "b") of each variable by some scalar quantity. The objective is to remove differences in scale due to different magnitudes between variables. This transformation does not expand or contract a given variable's distribution of values, however. Centring by the mean by subtracting a variable's mean value from each of its original values is a common approach for data symmetric about the mean.

Scaling by expansion Dividing (or multiplying) a variable's values by some scalar quantity (see Figure 1a, parameter "m") can place them on a relative scale with comparable range. A common method for relative-scale variables is to divide all values of a given variable by the maximum value of that variable. Any maxima are thus expressed as "1" and all other values as fractions of the maximum or maxima. Units are lost in the quotient.

Transformation by translation and expansion A combination of translation and expansion can be used to restrain values of a continuous or ratio-scale variable to a given interval. Legendre and Legendre (1998) discuss Sneath and Sokal's (1973) method of ranging wherein the minimum value of a variable is subtracted from each original value and the difference divided by the range of that variable.

 Z-scoring This method of data standardisation uses both translation and expansion to create unit-free transformed variables with means of zero and standard deviations of one. The mean of each variable is subtracted from the original values and the difference divided by the variable's standard deviation (Equation 3). Standardised values differ from one another in units of standard deviation, a key difference to ranging.

 Equation 3: Z-scoring a variable "y".

Ecologically motivated transformations 

Presented in Legendre and Gallagher (2001), the transformations listed below are closely related to several (dis)similarity and distance measures and have their collective basis in ecological theory. These transformations may be applied prior to analyses such as principal components analysis (PCA) or redundancy analysis (RDA) of, for example, abundance data. These analyses use simple Euclidean distances in their ordinations which are often not appropriate for count data. Hence, these transformations may improve the effectiveness of many analyses in representing ecological relationships. Formulae, further explanation, and examples are available in Legendre and Gallagher (2001).

Like the Hellinger transformation, this transformation gives low weights to variables with low counts and many zeros. This transformation divides each value in a data matrix by the square root of its marginal sum of squares. It thereby sets the marginal (either row or column) sum of squares to one.

Particularly suited to species abundance data, this transformation gives low weights to variables with low counts and many zeros. The transformation itself comprises dividing each value in a data matrix by its row sum, and taking the square root of the quotient.

 χ2 metric Variables with low counts and many zeros are given high weights, which may be advantageous if their presence is highly indicative of a given phenomenon. Each value in a data matrix is divided by the product of its row sum with the square root of its column sum.

 χ2 distance This transformation is the product of the values transformed by the χ2 metric and the square root of the sum of all counts in the data matrix. This is the distance used in correspondence analysis (CA) and canonical correspondence analysis (CCA)

Distance between species profiles This transformation is similar to the  χ2 metric, however, does not give high weight to variables with low counts and many zeros. Variables with higher values and fewer zeros contribute more to distance calculations. To transform data using this approach, each value in the data matrix is divided by its row sum.

  • Choose transformations according to need, rather than as a matter of course. Applying transformations that are too "harsh" (i.e. stronger than needed to prepare data for a particular analysis) may distort results and harm interpretation.
  • If a numerical interpretation of the results is desired, it is necessary to back-transform values after conducting an analysis in order to correctly interpret the results.
  • Ecological data that has been transformed using an ecologically motivated function can often be interpreted in a straightforward manner, however, transformations which simply aim to correct for some property in the data should be considered carefully during interpretation  (Legendre and Legendre, 1998).
  • Some transformations, such as power transformations, require values to be positive. Adding a constant to achieve this is acceptable.
  • Treat negative values with caution. Ensure that your transformation adequately represents differences between negative values. If this is not possible, translating values into positive numbers by the addition of a constant scalar quantitiy may be advisable.
  • R
    • scale() in the base package allows translational and expansion-based scaling.
    • decostand() in the vegan package contains several transformation functions
    • boxcox() in the MASS package generates a plot of values of λ against the log-likelihood (derived from a linear model)