Wizards‎ > ‎

Screening

Below, a series of steps common to many data screening procedures are listed. Note that not all these steps will apply to all techniques. For example, non-parametric techniques such as non-metric dimensional scaling and ANOSIM generally have fewer assumptions than techniques such as canonical correspondence analysis or MANOVA. If you know which method you wish to use, familiarise yourself with its assumptions and screen your data accordingly. Minimally, data should be screened for missing values and outliers. See Zuur et al (2010) for an introduction to data screening.

Click the links in the text to navigate to pages describing each step at greater length.
  • Ensure that you are aware of any pseudoreplication in your study.
  • As a first step, check if you data has missing values.
  • Secondly, screen your data for outliers.
  • If you plan to use parametric analyses (i.e. analyses that assume that your data has a certain distribution which can be summarised by a set of parameters), ensure that your data meet, at least approximately, the distributional assumptions of the analysis. Visual inspection of the distribution of variable values or the distribution of residuals is preferable to relying on the statistics (e.g. tests of normality).
  • If you plan to use linear methods, such as redundancy analysis or MANOVA, ensure that your response and explanatory variables have, at least approximately, linear relationships to one another. Scatter plots of the residuals of an analysis or pairs of the variables themselves can reveal non-linearity. In many cases, normalising transformations of one or more variables will be able to linearise non-linear relationships and allow you to continue analysis. Importantly, remember that all transformations must be taken into account when interpreting the data (e.g. when comparing variable values, these should be "back-transformed").
  • If you plan to use unimodal methods such as correspondence analysis, ensure that your response variables follow a unimodal distribution.
  • Many multivariate methods use some form of correlation to assess relationships between variables. If your explanatory variables are strongly correlated, this may render them redundant and distort your results and/or interpretation. Thus, inspect your data for multicollinearity.
  • Screen your data for heteroscedasticity, especially if you intend to perform any hypothesis tests.


References


Comments