Spatial analysis‎ > ‎

Principal coordinates of neighbour matrices

The main idea...

Principal coordinates of neighbour matrices (PCNM; Borcard and Legendre, 2002; Borcard et al., 2004; Dray et al., 2006), also known as Moran's Eigenvector Maps (MEM) is a powerful approach able to detect spatial or temporal structures (henceforth, only spatial structures will be discussed) of varying scale in response data. Essentially, spatial variables are used to determine the distance between sites with special focus on neighbouring sites. These distances are then decomposed into a new set of independent (and hence orthogonal) spatial variables. These variables may then used as explanatory variables in an appropriate constrained analysis, and those that show significant explanatory power may then be incorporated to models that account for different spatial scales of variation. PCNM can detect a wide range of spatial structures, including autocorrelation as well as "bumps" and periodic structures.  The general approach is illustrated in Figure 1 and described in more detail below.

One proposed strength of PCNM is that each of the spatial variables created can be treated as 'just' another explanatory variable in popular and powerful analyses such as redundancy analysis (RDA) or canonical correspondence analysis (CCA). This is, at times, preferable to converting tables of response and non-spatial, explanatory variables into (dis)similarity matrices and resorting to partial mantel testing (see Legendre et al., 2008).

The approach...

The distance between objects is represented as a Euclidean distance matrix, calculated from spatial data (e.g. latitude and longitude values) associated with the sample locations. As the name suggests, PCNM is primarily concerned with 'neighbouring' sites. Thus, the analyst will set a threshold distance above which distances are simply considered "large". Any Euclidean distances above this value will be set to four times the threshold value (for an explanation on why a factor of four is used, see Borcard and Legendre, 2002). This modified distance matrix is then subject to principal coordinates analysis (PCoA). Due to the 'truncation' of the original distance matrix to create a neighbour matrix, a PCoA on a neighbour matrix will (typically) produce more eigenvectors relative to the same analysis on a standard distance matrix. All resulting eigenvectors with positive eigenvalues may be used as a new set of explanatory, spatial variables in either a multiple regression approach (for univariate response data) or a multivariate constrained analysis

The positive eigenvectors generated by the PCoA step of this procedure provide a spectral decomposition of any spatial relationships between sample locations. That is, each eigenvector will model a different spatial scale and the response variables' relationship to non-spatial explanatory variables (e.g. environmental parameters) may be scrutinised independently at each scale.  As with any PCoA axes, these are orthogonal to one another and thus independent.

Figure 1: Illustration of the principal coordinates of neighbour matrices approach. The procedure is described in the main text. The threshold selected in the figure is arbitrary and for illustration only. Note that this approach can also be used for temporal data. For more on positive and negative eigenvalues see the principal coordinates (PCoA) endpoint

Results and interpretation

Recall, that PCNM may be used to detect temporal structures, however, the text below refers only to spatial structure. 

 The neighbour matrix / threshold  value Depending on the implementation, the neighbour matrix generated by PCNM (described above) or the threshold value will be returned. Noting the threshold value is essential for reproducibility. 

 Positive eigenvectors
 (PCNM variables)
Eigenvectors generated by the PCoA step of the PCNM procedure described here and associated with positive eigenvalues represent independent variables that capture spatial structure of a defined scale. These are extracted from the neighbour matrix and     

 PCNM base functions

For each positive PCoA eigenvector, the variation of a principal coordinate scores across sampling locations can often be described by a base function. This is especially true when sampling has been conducted at regular spatial intervals. Combining these base functions will approximate the overall spatial variation in the response data.
 Significant PCNM variables  Following a constrained analysis, those PCNM variables that have been found to significantly explain variation in your response data are candidates for inclusion in a spatial model or submodel.

 Spatial submodels 

Significant PCNM variables, or groups of such variables that describe structures on similar scales, may be used to build submodels from the overall PCNM solution. Submodels may include other explanatory variables and allow insight into differential responses at different spatial scales. Submodels are not returned automatically and must be built by the user.

Spatial variables generated by PCNM and found to be significant with relation to your response data can be incorporated into a more comprehensive model which includes other explanatory matrices (e.g. environmental factors) using, for example, variation partitioning (VP). With different PCNM variables, the interpretation of these models will apply to different spatial scales.

Key assumptions
  • Euclidean distances should be representative of the distances between your sampling locations.
  • The threshold value selected to create the neighbour matrix is large enough to ensure that all sites are connected to at least one other site with their Euclidean distances.
  • The constrained analysis technique used to determine significant PCNM variables must be appropriate to the response data.
  • The spatial granularity of your sampling design and strategy should be aligned to the goals of a particular spatially-motivated analysis. If they are not, it is unlikely there will be a reliable estimate of spatial structuring.
  • If the threshold value is larger than the spatial structures affecting the response data, PCNM may not detect these structures. See Borcard and Legendre (2002) for possible approaches to deal with high thresholds caused by a few widely separated objects (e.g. arising from irregular sampling).
  • The threshold value chosen should not be less than the minimum distance required to ensure that all sites are connected to at least one other site with their Euclidean distances. Many implementations include a routine, such as minimum spanning tree calculation, to generate a default threshold value that satisfies this condition. If the threshold is smaller, groups of eigenfunctions that correspond to connected sub-groups of objects will be created. These will only describe spatial variation in their corresponding group.
  • After fitting PCNM variables to your response data, the residual variation should have no spatial trend.
  • If response data has been collected at irregular intervals, the PCNM functions may have 'unusual' properties and be difficult to interpret.
  • As always, ensure you fully understand the steps that any PCNM implementation you may use. Some will deviate from the procedure described here.

  • R
    • The pcnm() function in the vegan package will deliver the eigenvectors and values of a PCoA analysis on a neighbour matrix. A weights argument is avaialble if canonical correspondence analysis (CCA) is used as the constrained analysis technique.
    • The spacemakeR package hosted in the R-Forge repository. See this page for more.
    • The PCNM package hosted in the R-Forge repository. See this page for more.