Predicts site group from indicators

Function predict.indicators takes an object of class indicators and determines the probability of the indicated site group given a community data set. If no new data set is provided, the function can calculate the probabilities corresponding to the original sites used to build the indicators object.

Usage

# S3 method for class 'indicators'
predict(object, newdata = NULL, cv = FALSE, ...)

Arguments

object: An object of class 'indicators'.
newdata: A community data table (with sites in rows and species in columns) for which predictions are needed. This table can contain either presence-absence or abundance data, but only presence-absence information is used for the prediction. If NULL, then the original data set used to derive the indicators object is used as data.
cv: A boolean flag to indicate that probabilities should be calculated using leave-one-out cross validation (i.e recalculating positive predictive value of indicators after excluding the target site).
...: In function predict, additional arguments not used (included for compatibility with predict).

Value

If confidence intervals are available in x, function predict.indicators returns a matrix where communities are in rows and there are three columns, correspoinding to the probability of the indicated site group along with the confidence interval. If confidence intervals are not available in x, or if cv = TRUE, then predict.indicators returns a single vector with the probability of the indicated site group for each community.

Details

Function indicators explores the indicator value of the simultaneous occurrence of sets of species (i.e. species combinations). The method is described in De Cáceres et al. (2012) and is a generalization of the Indicator Value method of Dufrêne & Legendre (1997). The current function predict.indicators is used to predict the indicated site group from the composition of a new set of observations. For communities where one or more of the indicator species combinations are found, the function returns the probability associated to the indicator that has the highest positive predictive value (if confidence intervals are available, the maximum value is calculated across the lower bounds of the confidence interval). For communities where none of the indicator species combinations is found, the function returns zeroes. If newdata = NULL, the function can be used to evaluate the predictive power of a set of indicators in a cross-validated fashion. For each site in the data set, recalculates the predictive value of indicators after excluding the information of the site, and then evaluates the probability of the site group.

References

De Cáceres, M., Legendre, P., Wiser, S.K. and Brotons, L. 2012. Using species combinations in indicator analyses. Methods in Ecology and Evolution 3(6): 973-982.

Dufrêne, M. and P. Legendre. 1997. Species assemblages and indicator species: The need for a flexible asymetrical approach. Ecological Monographs 67:345-366.

Author

Miquel De Cáceres Ainsa, EMF-CREAF

Examples

library(stats)

data(wetland) ## Loads species data

## Creates three clusters using kmeans
wetkm <- kmeans(wetland, centers=3) 


## Run indicator analysis with species combinations for the first group
sc <- indicators(X=wetland, cluster=wetkm$cluster, group=1, verbose=TRUE, At=0.5, Bt=0.2)
#> Target site group: 1
#> Number of candidate species: 33
#> Number of sites: 41 
#> Size of the site group: 14 
#> Starting species  1 ... accepted combinations: 0 
#> Starting species  2 ... accepted combinations: 0 
#> Starting species  3 ... accepted combinations: 16 
#> Starting species  4 ... accepted combinations: 32 
#> Starting species  5 ... accepted combinations: 32 
#> Starting species  6 ... accepted combinations: 79 
#> Starting species  7 ... accepted combinations: 79 
#> Starting species  8 ... accepted combinations: 82 
#> Starting species  9 ... accepted combinations: 88 
#> Starting species  10 ... accepted combinations: 88 
#> Starting species  11 ... accepted combinations: 88 
#> Starting species  12 ... accepted combinations: 92 
#> Starting species  13 ... accepted combinations: 92 
#> Starting species  14 ... accepted combinations: 92 
#> Starting species  15 ... accepted combinations: 112 
#> Starting species  16 ... accepted combinations: 115 
#> Starting species  17 ... accepted combinations: 115 
#> Starting species  18 ... accepted combinations: 115 
#> Starting species  19 ... accepted combinations: 115 
#> Starting species  20 ... accepted combinations: 120 
#> Starting species  21 ... accepted combinations: 136 
#> Starting species  22 ... accepted combinations: 139 
#> Starting species  23 ... accepted combinations: 143 
#> Starting species  24 ... accepted combinations: 144 
#> Starting species  25 ... accepted combinations: 144 
#> Starting species  26 ... accepted combinations: 144 
#> Starting species  27 ... accepted combinations: 144 
#> Starting species  28 ... accepted combinations: 144 
#> Starting species  29 ... accepted combinations: 144 
#> Starting species  30 ... accepted combinations: 144 
#> Starting species  31 ... accepted combinations: 144 
#> Starting species  32 ... accepted combinations: 144 
#> Starting species  33 ... accepted combinations: 144 
#> Number of valid combinations: 144
#> Number of remaining species: 15 
#> Calculating statistical significance (permutational test)...

## Use the indicators to make predictions of the probability of group #1
## Normally an independent data set should be used, because 'wetland' was used to derive
## indicators. The same would be obtained calling 'predict(sc)' without further arguments.
p <- predict(sc, wetland)

## Calculate cross-validated probabilities (recalculates 'A' statistics once for each site 
## after excluding it, and then calls predict.indicators for that site)
pcv <- predict(sc, cv = TRUE)

## Show original membership to group 1 along with (resubstitution) predicted probabilities  
## and cross-validated probabilities. Cross-validated probabilities can be lower for sites
## originally belonging to the target site group and higher for other sites.
data.frame(Group1 = as.numeric(wetkm$cluster==1), Prob = p, Prob_CV = pcv)
#>    Group1      Prob   Prob_CV
#> 5       0 0.7500000 1.0000000
#> 8       0 0.8000000 1.0000000
#> 13      0 0.6250000 0.7142857
#> 4       0 0.0000000 0.0000000
#> 17      0 0.0000000 0.0000000
#> 3       0 0.0000000 0.0000000
#> 9       0 0.5714286 0.6666667
#> 21      0 0.6428571 0.6923077
#> 16      0 0.6666667 0.8571429
#> 14      0 0.8571429 1.0000000
#> 2       0 0.7777778 0.8750000
#> 15      0 0.0000000 0.0000000
#> 1       0 0.0000000 0.0000000
#> 7       0 0.8333333 1.0000000
#> 10      0 0.6428571 0.6923077
#> 40      1 1.0000000 1.0000000
#> 23      1 1.0000000 1.0000000
#> 25      1 1.0000000 1.0000000
#> 22      1 1.0000000 1.0000000
#> 20      1 1.0000000 1.0000000
#> 6       1 1.0000000 1.0000000
#> 18      1 1.0000000 1.0000000
#> 12      1 1.0000000 1.0000000
#> 39      1 1.0000000 1.0000000
#> 19      1 1.0000000 1.0000000
#> 11      1 1.0000000 1.0000000
#> 30      0 0.5000000 0.6000000
#> 34      0 0.5000000 0.6000000
#> 28      0 0.0000000 0.0000000
#> 31      0 0.5000000 0.6000000
#> 26      1 1.0000000 1.0000000
#> 29      1 1.0000000 1.0000000
#> 33      1 1.0000000 1.0000000
#> 24      0 0.9473684 0.9642857
#> 36      0 0.9473684 1.0000000
#> 37      0 0.0000000 0.0000000
#> 41      0 0.5000000 0.6666667
#> 27      0 0.0000000 0.0000000
#> 32      0 0.0000000 0.0000000
#> 35      0 0.8000000 1.0000000
#> 38      0 0.5000000 0.6000000