Reference‎ > ‎


The main idea...

Resampling techniques are a powerful means to assess the precision, significance, or validity of data, statistics, estimates, and models. Rather than referring to a distribution (e.g. the F-distribution, the normal distribution, etc.), these methods repeatedly sample (resample) the original data to build new distributions to test some analysis outcome. Each sample (which may be a subsets of the original data set or a rearrangement thereof) is subjected to a user-selected treatment (e.g. a hypothesis test, a model fit, etc.). Collectively, the results of each sample treatment reveal interesting properties of the data set which can guide further analysis or help in the evaluation of a given statistic.

Below, a number of popular resampling approaches are briefly described. An extensive literature on resampling methods for ecology are available (see Crowley, 1992; Adams et al., 1997).



Figure 1: Jackknifing and Bootstrapping. a)  First-order jackknifing. Repeatedly sampling (resampling) a data set while leaving one object out generates a collection of subsets, referred to as 'samples'. Some statistic or estimate (here "s") is calculated from each sample (s'1, s'2, etc.). These statistics are then compared to the statistic derived from the original data (s) or a specific portion thereof to establish whether they are in agreement. Marked disagreements suggest that the statistic of choice is reporting something "unexpected" about the data relative to a distribution of that statistic derived from that data itself. b) Bootstrapping is similar to jackknifing, however, will sample the original data set, with replacement, to create resampling data sets of the same size.


The Jackknife procedure (Quenouille, 1949; Tukey, 1958) is useful in detecting the bias and variance of a given statistic and can be used to suggest an unbiased value of that statistic. The original data set is resampled, leaving one (first-order jackknifing) or a few objects (n-order jackknifing) out in each sample. A jackknifed statistic is then the mean difference between the original statistic (for the whole data set) and bias terms. Bias terms are primarily determined by how much a given sample-derived statistic differs from the original statistic. Sudden changes in the jackknifed statistic or a high statistic variance may indicate that some values are outliers or otherwise bias the test statistic.


The bootstrap procedure (Efron, 1979) is a generalisation of the jackknife procedure and is typically used to assess the stability of a statistic or estimate. It assumes that a statistic can best be assessed by referencing the data it is derived from. Multiple data sets (typically 999 and above, if feasible) are created by randomly drawing objects or variables from the original data with replacement. Thus, objects or variables may occur more than once. The parameter or statistic of interest is then recalculated for each data set and its stability can be assessed by examining its variability. Many adaptations of the bootstrap algorithm are available for special applications (e.g. time series (Bühlmann, 2002).


In its simple form, cross-validation (Kurtz, 1948) is a straightforward approach wherein a data set is split into two or more comparable subsets and the value of an estimate is compared across these subsets. Confidence in the estimate can then be gauged depending on how greatly the estimate value varies between subsets. Values that are concurrent between subsets are generally more trustworthy than those that vary greatly. A cross-validation coefficient can be reported to reflect the agreement between estimate values across subsets. Multicross-validation procedures can also be used. In this procedure, each subset is used to validate every other subset. If the data set is small (i.e. only a small number of objects present), cross-validation approaches are unlikely to be reliable as estimate values may vary greatly by chance alone.


Permutation, related to randomised exact tests, involves rearrangement of exchangeable units in a data set. It is not, in a strict sense, a resampling method; however, can be used to accomplish similar tasks is often grouped with such methods. A page dedicated to permutation can be found here. See Good (2002) for further comparison of resampling and permutation.

One use of resampling involves comparing the distribution of a statistic derived from resampled data sets (s'1..n) to its value when derived from a sample (or set of samples) of interest (s1, s2). If statistics derived from samples of interest have markedly different values than those derived from the bulk of the resampled data (s1) they may be unduly influenced by one or more objects or variables in your data set. If such values fall within the bulk of values derived from resampled data (s2), they may be said to be 'typical' or 'expected' values of that data set. The spread of the distribution indicates how stable the statistic is.

  • R
    • the "boot" package supports both parametric and non-parametric bootstrapping. The statistic of interest can be customised and weighted bootstrapping and strata can be specified.
    • the "multtest" package from Bioconductor may be used for resampling-based multiple hypothesis testing, initially designed for microarray data.
Subpages (1): Permutation