Warnings‎ > ‎


The main idea...

"Pseudoreplication is the incorrect modelling of randomness"
- Milar & Anderson, 2004

"A true "replicate" is the smallest experimental unit to which a treatment is independently applied."
- Heffner et al, 1996

"Comparison of community composition on the basis of single, unreplicated samples, with no estimate of variability, is just as crazy and invalid as comparing my height with that of a randomly chosen animal ecologist and concluding, on the basis of these two values, that microbial ecologists are smaller or taller than animal ecologists."
- Prosser, 2010

“Dr. Box chuckled when I read him the letter – he has similar experience to yours, but in chemical experiments, and agrees that treating
subsamples from a single experimental unit as if each represented an independent experimental unit is one of the commonest errors of analysis.”
- Joan Fisher Box, in litt. to S. Hurlbert, 27 November 1981 (in Hurlbert, 2009)

Pseudoreplication stems from the assumption that one has more statistically independent experimental or sampling units than is actually the case. An individual object (site, observation, enrichment, sample, etc) is not necessarily a valid, independent replicate and may have non-random associations (i.e. be interdependent) with other objects. Accidental pseudoreplication is usually the result of misidentifying what the experimental or sampling unit is or failing to account for spatial, temporal, or environmental factors shared between samples which result in non-random relationships between them. For example, if one is interested in the effect of oxic or anoxic conditions on microbial communities across a particular region of the seafloor, the "treatments" (oxia and anoxia) must be replicated across independent samples (e.g. cores) which constitute a representative sample of the region under study. The delay between obtaining each core and their spatial arrangement should not have any meaningful (non-random) impact on the variables of interest. See Figure 1 and Figure 2 for illustration of this concept.

The consequence of pseudoreplication is an over-estimation of sample size and thus the degrees of freedom and statistical power available. In Figure 2, for example, if one were to assume a sample size that corresponds to the number of PCR replicates or subsamples of the core despite the sampling unit of interest being an entire core, the data would contain pseudoreplicated units. Treating pseudoreplicated units as replicate units will violate key statistical assumption of independence of samples (or other objects). This can have a major effect on hypothesis testing procedures and other inferential statistics as random variation inherent in the real experimental or sampling units will probably manifest in all of their subsamples in an interdependent manner. This, in turn, is likely to lower the variability in the study, causing confidence intervals to be artificially small and the Type I error rate to increase.

A debated issue...

The threat and practice of pseudoreplication in ecological investigations was stressed by Hurlbert (1984) and, subsequently, highlighted in numerous fields (e.g. Hurlbert & White, 1993; Lazic, 2010) including microbial ecology (Prosser, 2010). Indeed, authors such as Knight et al. (2012), have called for replicated experimental design in microbial metagenomics sampling alongside better data and methodological standardisation to assist comparability. Pseudoreplication was further examined in the context of ecological investigations by Oksanen (2001, 2004), who asserted that pseudoreplication may, at times, be a "pseudo-issue" in specific ecological designs and efforts to prevent it must be balanced against a number of restraints, particularly the issue of scale. Hargrove and Pickering (1992), recognising the conflicting needs to replicate and to study large-scale processes, describe how other domains address difficulties in replicating experimental or sampling units and suggest that regional ecology "embrace" pseudoreplication as an investigative strategy. Cottenie and De Meester (2003) attempted to reconcile the positions of Oksanen (2001) and Hurlbert (1984): these authors stressed that inferential statistics used without reasonable replication and interspersion of "treatments" may only be considered as an extension of descriptive statistics and cannot be generalised to any system other than the one under study. Hurlbert (2004) directly addressed several of the issues raised in Oksanen (2001), adding further qualification to many of his previous positions, but reasserting his main positions.  Schank and Koehnle (2009asserted the existence of several flaws with Hurlbert's (1984) position on pseudoreplication and explored these through simulations. Their findings may have considerable impact on ecological experimental and sampling designs. Authors such as Coss (2009) supported these criticisms and warned that rigid adherence to Hurlbert's arguments may stifle innovative methodology. Hurlbert rebutted many of these criticisms in a commentary (Hurlbert, 2009) wherein he described several aspects of Schank and Koehnle's work he deemed erroneous or confounding and put forth a clarifying framework for the discussion of pseudoreplication and experimental design. In turn, a commentary from Koehnle and Schank (2009) reasserted their position, further discussed the results of their previous article, and raised concerns regarding the treatment of "independent" units by Hurlbert.

An in-depth discussion of this debate is beyond the scope of this guide. Below, we provide a conceptual illustration of pseudoreplication to spur on further reading. Please consult the references above for a more developed discussion. 

Figure 1: Example of a replicated experimental set-up. This example is adapted from Glasby (1999) who tested the hypothesis of whether shading and distance from the seafloor affect subtidal epibiontic communities. Settlement plates were either shaded with opaque plexiglass or unshaded, with a procedural control of clear plexiglass over an additional set of plates. Plates were located either close to or away from the seafloor at comparable depths. a) Assuming that the settlement plates were independent of one another (i.e. the community on one plate had no influence on the community of any another), there are two replicates in this design. 

Figure 2: Technical replicates. In the illustration above, the "treatments" are oxic and anoxic regimes. a single core was subsampled multiple times, creating technical replicates. These technical replicates do not replicate the treatment conditions independently (they came from a single 'application' of the 'treatments' to a single sampling unit) and thus will not add power to any resultant data sets. However, technical replicates are useful in assessing the stability and precision of a given measurement.

Identifying pseudoreplication

Examining a study's experimental or sampling design often reveals whether replication of "treatment" regimes over experimental or sampling units has been performed. Only experimental or sampling units that can be reasonably assumed to be independent should be used as objects in most inferential statistics. See the Examples section, below, for illustration. Extra care should be taken when researchers do not report their designs or have no discussion on the independence of their samples. As noted by Hubert (1984), many ecological studies involve only one "true replicate" per 'treatment' (or set of environmental conditions associated with a sample) due to difficulties in obtaining more replicates. This, in itself, is not a cause to disregard the results of such studies; however, problems arise when pseudoreplicated units are used for certain statistical analyses, the results of which are used to suggest that a study's conclusions are more authoritative than in reality.

Scenarios where the risk of pseudoreplication is increased include:
  • Designs which repeatedly measure the same entity (or set of entities) to generate observations. Many resources are available on repeated measures designs and how to account for interdependence of observations or samples therein
  • Designs where objects are spatially or temporally autocorrelated (i.e. their associated variable values at one point in space or time are related to those at another; however, see Schank & Koehnle, 2009)
  • Designs with a hierarchical or nested structure (objects that are nested in the same cluster are more likely to have non-random associations)
If data (or pilot data) has been analysed and residuals are available, checks for non-random patterns in the residuals associated with objects should be conducted. If groups of objects have very similar residuals, they may be dependent. Further, the data and residuals should be screened for spatial or temporal autocorrelation

Hurlbert (2004; see the section "Valid tests for treatment effects in absence of treatment replication: special cases") suggests that there are scenarios where tests on a single replicate are valid, if conservative. One such scenario involves regression models, where several, unreplicated measurements are taken along the range of a continuous treatment variable. If the model fits the data well, presents low p-values, and has low error values, the effect of the treatment can be supported. Hurlbert also notes that interaction mean squares in an ANOVA conducted on the results of a factorial experiment with only one replicate per treatment combination can indicate the presence of treatment effects.

Avoiding pseudoreplication

As recommended by Prosser (2010), when planning an experiment, even if only an exploratory investigation, consider whether "deeper" data is really more important than replication. For example, if you wish to evaluate the difference between two environments using targeted sequencing approaches, it will be more powerful (statistically) to perform shallower sequencing on more replicates rather than deep-sequencing on a few replicates (or a single sample). This is especially true, if the community composition can be adequately described by a shallow sequencing run and the same conclusions reached (determined by e.g. a pilot study), then shallower sequencing and more replication is generally in order. 

  • Carefully consider what the hypothesis to be tested is and what experimental or sampling units best allow this hypothesis to be tested. 
  • Clearly note what scale of inference your experimental units support (i.e. can you make a statement about the Pacific Ocean or only the coastal waters off the east of Australia?).
  • If possible, attempt to randomly assign each "treatment" to more than one experimental or sampling unit.
  • Replicate as high up in a design as feasible. Replicating studies is preferable to replicating cores which is preferable to replicating samples from a core etc.
  • Identify any non-random associations in the residuals of any analysis. If such associations are detected, attempt to deal with this potential pseudoreplication accordingly (see below).
  • Always report the details of any statistics used including the number of samples, the values of any statistics with their associated degrees of freedom, and p-values. 

Dealing with pseudoreplication

If original sample material is still available, pooling a replicate's sub-samples prior to data generation is a common strategy. When followed by PCR-based methods, this approach is associated with several properties and potential risks, including the underestimation of sample richness and the 'smearing out' of local structure (Manter et al. 2010).

When handing data with pseudoreplicates, a range of measures should be considered to address them. These measures are often associated with certain risks which should be carefully weighed before proceeding. Lazic (2010) and Millar and Anderson (2004) describe some remedies for pseudoreplication:
  • Averaging sub-samples: Here, the values of variables derived from dependent units are averaged, yielding one value for each replicate (independent unit). Before averaging data in this manner, one must be careful to screen for non-random effects. For example, if the values of any response variable are correlated with those of an explanatory variable (including time or space) across a set of sub-samples, there is a risk that the average of these response values is not an appropriate representation of the average sample value. Rather, these values may be associated with some arbitrary region of a gradient, which may be quite distant from the true mean. Similar risks arise if averages are computed from sets of sub-samples with very different sizes, unequal variances, or MNAR data, all of which may bias downstream results. Averaging, in itself, is likely to impact the statistical power of a data set and multilevel or nested models should be considered (Koehnle & Schank, 2009).
  • Summary-measure analyses: A measure other than the mean may be used in a summary-measure analysis. Here, some statistic derived from each set of interdependent units is used as the variable value for a replicate sample or unit. For example, a statistic that reflects the change in beta diversity with depth in a core (the core being a replicate and the depth layers being sub-sampled) can be compared across replicates. As with calculating means, the precision of each summary measure calculated must be evaluated to ensure that they are comparable. Further, the number of sub-samples must be sufficient to allow reasonable estimation of whichever summary-measure has been chosen.
  • Random effects models: Also known as mixed effects models, nested models, or multilevel models, random effect models (see Bolker et al., 2009 and Zuur et al. 2009) allow estimation of a representative variable values across each set of sub-samples while taking their precision into account. Variables that correspond to a sub-sampling scheme (e.g. "core" or "mesocosm") can be specified as random effects - effects whose influence is not of direct interest, but must be taken into account. Multivariate and nonlinear mixed models are also available and implemented in various software products; however, note that the effort involved to achieve a functional understanding of mixed modelling is not trivial.    
If pseudoreplication cannot be dealt with, one must attempt to minimise its influence and report its existence openly, specifically:
  • Prior to data collection, add as much randomness to your study as possible. For example, if samples must be stored in the same incubator, rearrange them frequently and randomly during the experiment.
  • Report pseudo-confidence intervals, declaring the existence and possible impacts of pseudoreplication on your study. Declare that any statistics resulting from the use of pseudoreplicates should be viewed skeptically (Heffner et al., 1996).

Two sets of sediment samples, each comprising five push cores in close proximity, were obtained from a methane seepage site and neighbouring, inactive sediment, respectively. Microbial community structure was inferred by a DNA fingerprinting method. If the interest lies in determining the differences in community structure between seep and non-seep sites in general, this investigation does not replicate the "treatment" (seep, non-seep) and hence has no replication. The five push cores per site are pseudoreplicated samples of seep and non-seep samples. A statement about the variability of community structure at the two locations sampled can be made, but not generalised.

An experimental nutrient mix is added once to an enrichment culture and its activity is monitored for a period of time. Following data collection, the experimenter waits until the culture's activity returns to its baseline state. At this stage, the experimenter adds the nutrient mix and measures the culture's activity once again. The two data points, coming from the same culture, are not independent. To analyse this data, a repeated-measures design should be used.

A total of 20 sediment samples were retrieved from random locations across a homogeneous, geothermally active region of the seafloor. Ten of these are placed in an incubator set to four degrees Celsius and the remaining ten are placed in an incubator set to 60 degrees Celsius in order to examine the influence of temperature on microbial community structure and activity. While there are ten sediment samples per incubator, they are all subject to the same set of conditions and anomalies within their respective incubator. Further, factors such as a sample's position in its incubator may have unexpected influence on the response variables. Thus, the samples in each incubator are not likely to be independent experimental units. Introducing randomness by randomly rearranging the samples in their incubators several times during the experiment may mitigate position-dependent effects, but any inferential statistics reported must include a discussion on pseudoreplication and how it was addressed.   

For more examples, including transect-based sampling and the use of random effects, see Millar and Anderson (2004).