Home‎ > ‎

Notes on data structure

Notes on data structure....

Most of the analyses presented in this guide assume that these data structures described below are present. Throughout this guide, the words "objects", "sites", and "samples" often refer to entities indexed by the rows of a data table while the words "variable", "species", and "parameter" refer to entities indexed by columns.

Perhaps the most common form of ecological data set is the "sites x species" table (Figure 1, a). Here, rows represent a sampling site or other geospatial location and columns represent different species. Each cell contains either abundance (an integer greater than or equal to zero) or the presence/absence ("1" for present, "0" for absent) for a given species at a given site. The "sites x species" table is assumed to be dimensionally homogeneous, i.e. all variables should have the same units. This table can be accompanied by a  "sites x environmental variables") table (Figure 1, b). The rows of this table correspond to the sites in the "sites x species" table, while the columns represent different environmental parameters measured. The cells contain the value of a given parameter measured at a given site. The values in the "sites x environmental parameters" table do not always need to be dimensionally homogeneous, however, many methods require that they be standardised prior to analysis.

This data structure may be generalised to any "object" (row) and "variable" (column) combination (e.g. "sample x OTU"); however, you must be sure that the data meets the assumptions defined by the method you choose. Sample (object) independence and dimensional homogeneity are common requirements.

For an in-depth discussion of ecological data representation and its connection to ecological theory, see the opening chapters of Legendre and Legendre (1998).



Figure 1: Common data tables used in ecological analysis. a) A "sites x species abundance" table and b) a "sites x environmental parameter" table

  • Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.