The main idea...
Resampling techniques are a powerful means to assess the precision, significance, or validity of data, statistics, estimates, and models. Rather than referring to a distribution (e.g. the F-distribution, the normal distribution, etc.), these methods repeatedly sample (resample) the original data to build new distributions to test some analysis outcome. Each sample (which may be a subsets of the original data set or a rearrangement thereof) is subjected to a user-selected treatment (e.g. a hypothesis test, a model fit, etc.). Collectively, the results of each sample treatment reveal interesting properties of the data set which can guide further analysis or help in the evaluation of a given statistic.
Below, a number of popular resampling approaches are briefly described. An extensive literature on resampling methods for ecology are available (see Crowley, 1992; Adams et al., 1997).
Jackknifing
The Jackknife procedure (Quenouille, 1949; Tukey, 1958) is useful in detecting the bias and variance of a given statistic and can be used to suggest an unbiased value of that statistic. The original data set is resampled, leaving one (first-order jackknifing) or a few objects (n-order jackknifing) out in each sample. A jackknifed statistic is then the mean difference between the original statistic (for the whole data set) and bias terms. Bias terms are primarily determined by how much a given sample-derived statistic differs from the original statistic. Sudden changes in the jackknifed statistic or a high statistic variance may indicate that some values are outliers or otherwise bias the test statistic.
Bootstrapping
Cross-validation
In its simple form, cross-validation (Kurtz, 1948) is a straightforward approach wherein a data set is split into two or more comparable subsets and the value of an estimate is compared across these subsets. Confidence in the estimate can then be gauged depending on how greatly the estimate value varies between subsets. Values that are concurrent between subsets are generally more trustworthy than those that vary greatly. A cross-validation coefficient can be reported to reflect the agreement between estimate values across subsets. Multicross-validation procedures can also be used. In this procedure, each subset is used to validate every other subset. If the data set is small (i.e. only a small number of objects present), cross-validation approaches are unlikely to be reliable as estimate values may vary greatly by chance alone.
Permutation
Permutation, related to randomised exact tests, involves rearrangement of exchangeable units in a data set. It is not, in a strict sense, a resampling method; however, can be used to accomplish similar tasks is often grouped with such methods. A page dedicated to permutation can be found here. See Good (2002) for further comparison of resampling and permutation.
Implementations
References
|
Reference >