Vegetation clustering methods
vegclust.Rd
Performs hard or fuzzy clustering of vegetation data
Usage
vegclust(x, mobileCenters, fixedCenters = NULL, method="NC", m = 2, dnoise = NULL,
eta = NULL, alpha=0.001, iter.max=100, nstart=1, maxminJ = 10, seeds=NULL,
verbose=FALSE)
vegclustdist(x, mobileMemb, fixedDistToCenters = NULL, method="NC", m = 2, dnoise = NULL,
eta = NULL, alpha=0.001, iter.max=100, nstart=1, seeds=NULL, verbose=FALSE)
Arguments
- x
Community data. A site-by-species matrix or data frame (for
vegclust
) or a site-by-site dissimilarity matrix ordist
object (forvegclustdist
).- mobileCenters
A number, a vector of seeds, or coordinates for mobile clusters.
- fixedCenters
A matrix or data frame with coordinates for fixed (non-mobile) clusters.
- mobileMemb
A number, a vector of seeds, or starting memberships for mobile clusters.
- fixedDistToCenters
A matrix or data frame with the distances to fixed cluster centers.
- method
A clustering model. Current accepted models are:
"KM"
: K-means or hard c-means (MacQueen 1967)"KMdd"
: Hard c-medoids (Krishnapuram et al. 1999)"FCM"
: Fuzzy c-means (Bezdek 1981)"FCMdd"
: Fuzzy c-medoids (Krishnapuram et al. 1999)"NC"
: Noise clustering (Dave and Krishnapuram 1997)"NCdd"
: Noise clustering with medoids"HNC"
: Hard noise clustering"HNCdd"
: Hard noise clustering with medoids"PCM"
: Possibilistic c-means (Krishnapuram and Keller 1993)"PCMdd"
: Possibilistic c-medoids
- m
The fuzziness exponent to be used (this is relevant for all models except for kmeans)
- dnoise
The distance to the noise cluster, relevant for noise clustering (NC).
- eta
A vector of reference distances, relevant for possibilistic C-means (PCM).
- alpha
Threshold used to stop iterations. The maximum difference in the membership matrix of the current vs. the previous iteration will be compared to this value.
- iter.max
The maximum number of iterations allowed.
- nstart
If
mobileCenters
ormobileMemb
is a number, how many random sets should be chosen?- maxminJ
When random starts are used, these will stop if at least
maxminJ
runs ended up in the same functional value.- seeds
If
mobileCenters
ormobileMemb
is a number, a vector indicating which objects are potential initial centers. IfNULL
all objects are valid seeds.- verbose
Flag to print extra output.
Details
Functions vegclust
and vegclustdist
try to generalize the kmeans
function in stats
in three ways.
Firstly, they allows different clustering models. Clustering models can be divided in (a) fuzzy or hard; (b) centroid-based or medoid-based; (c) Partitioning (KM and FCM family), noise clustering (NC family), and possibilistic clustering (PCM and PCMdd). The reader should refer to the original publications to better understand the differences between models.
Secondly, users can specify fixed clusters (that is, centroids that do not change their positions during iterations). Fixed clusters are intended to be used when some clusters were previously defined and new data has been collected. One may allow some of these new data points to form new clusters, while some other points will be assigned to the original clusters. In the case of models with cluster repulsion (such as KM, FCM or NC) the new (mobile) clusters are not allowed to 'push' the fixed ones. As a result, mobile clusters will occupy new regions of the reference space.
Thirdly, vegclustdist
implements the distance-based equivalent of vegclust
. The results of vegclust
and vegclustdist
will be the same (if seeds are equal) if the distance matrix is calculated using the Euclidean distance (see function dist
). Otherwise, the equivalence holds by resorting on principal coordinates analysis.
Note that all data frames or matrices used as input of vegclust
should be defined on the same space of species (see conformveg
). Unlike kmeans
, which allows different specific algorithms, here updates of prototypes (centroids or medoids) are done after all objects have been reassigned (Forgy 1965). In order to obtain hard cluster definitions, users can apply the function defuzzify
to the vegclust
object.
Value
Returns an object of type vegclust
with the following items:
- mode
raw
for functionvegclust
anddist
for functionvegclustdist
.- method
The clustering model used
- m
The fuzziness exponent used (
m=1
in case of kmeans)- dnoise
The distance to the noise cluster used for noise clustering (NC, HNC, NCdd or HNCdd). This is set to
NULL
for other models.- eta
The reference distance vector used for possibilistic clustering (PCM or PCMdd). This is set to
NULL
for other models.- memb
The fuzzy membership matrix. Columns starting with "M" indicate mobile clusters, whereas columns starting with "F" indicate fixed clusters.
- mobileCenters
If
vegclust
is used, this contains a data frame with the coordinates of the mobile centers (centroids or medoids). Ifvegclustdist
is used, it will contain the indices of mobile medoids for models KMdd, FCMdd, HNCdd, NCdd and PCMdd; orNULL
otherwise.- fixedCenters
If
vegclust
is used, this contains a data frame with the coordinates of the fixed centers (centroids or medoids). Ifvegclustdist
is used, it will contain the indices of fixed medoids for models KMdd, FCMdd, HNCdd, NCdd and PCMdd; orNULL
otherwise.- dist2clusters
The matrix of object distances to cluster centers. Columns starting with "M" indicate mobile clusters, whereas columns starting with "F" indicate fixed clusters.
- withinss
In the case of methods KM, FCM, NC, PCM and HNC it contains the within-cluster sum of squares for each cluster (squared distances to cluster center weighted by membership). In the case of methods KMdd, FCMdd, NCdd, HNCdd and PCMdd it contains the sum of distances to each cluster (weighted by membership).
- size
The number of objects belonging to each cluster. In case of fuzzy clusters the sum of memberships is given.
- functional
The objective function value (the minimum value attained after all iterations).
References
Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769.
MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam and J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.
Davé, R. N. and R. Krishnapuram (1997) Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270-293.
Bezdek, J. C. (1981) Pattern recognition with fuzzy objective functions. Plenum Press, New York.
Krishnapuram, R., Joshi, A., & Yi, L. (1999). A Fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. IEEE International Fuzzy Systems (pp. 1281–1286).
Krishnapuram, R. and J. M. Keller. (1993) A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1, 98-110.
De Cáceres, M., Font, X, Oliva, F. (2010) The management of numerical vegetation classifications with fuzzy clustering methods. Journal of Vegetation Science 21 (6): 1138-1151.
Examples
## Loads data
data(wetland)
## This equals the chord transformation
## (see also 'decostand' in package vegan)
wetland.chord = as.data.frame(sweep(as.matrix(wetland), 1,
sqrt(rowSums(as.matrix(wetland)^2)), "/"))
## Create noise clustering with 3 clusters. Perform 10 starts from random seeds
## and keep the best solution
wetland.nc = vegclust(wetland.chord, mobileCenters=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)
## Fuzzy membership matrix
wetland.nc$memb
#> M1 M2 M3 N
#> 5 4.631923e-04 1.934923e-04 9.810980e-01 0.0182453582
#> 8 3.853257e-03 2.696115e-03 7.382161e-01 0.2552345091
#> 13 7.864722e-06 4.053434e-06 9.995086e-01 0.0004794780
#> 4 4.326738e-06 2.765733e-06 9.995816e-01 0.0004113170
#> 17 1.771420e-03 2.004422e-03 8.769007e-01 0.1193234439
#> 3 4.507224e-03 3.371931e-03 2.317281e-01 0.7603927883
#> 9 5.545494e-03 4.058727e-03 4.289602e-02 0.9474997554
#> 21 5.009408e-05 1.348443e-05 9.973078e-01 0.0026285984
#> 16 4.775704e-03 1.736775e-03 7.258593e-01 0.2676282154
#> 14 3.767558e-04 1.439041e-04 9.777028e-01 0.0217765177
#> 2 4.978630e-06 2.224694e-06 9.996401e-01 0.0003527178
#> 15 1.039574e-03 6.087027e-04 8.643911e-01 0.1339606695
#> 1 1.936547e-05 6.868146e-06 9.985536e-01 0.0014201274
#> 7 8.933405e-03 4.168937e-03 2.818186e-01 0.7050790391
#> 10 7.932120e-03 3.627818e-03 5.517684e-02 0.9332632237
#> 40 9.982441e-01 2.184429e-05 2.813868e-05 0.0017058790
#> 23 8.786846e-01 1.394202e-03 1.412218e-02 0.1057990526
#> 25 9.699884e-01 2.218667e-04 7.293952e-03 0.0224957781
#> 22 3.255074e-01 3.033436e-03 4.872820e-01 0.1841771737
#> 20 9.620317e-01 3.191525e-04 6.151673e-04 0.0370339485
#> 6 9.950089e-01 1.033829e-04 5.125278e-05 0.0048364332
#> 18 9.975305e-01 2.704033e-05 1.728555e-05 0.0024251398
#> 12 9.995425e-01 5.011688e-06 3.722050e-06 0.0004487221
#> 39 9.844351e-01 2.760139e-04 2.090807e-04 0.0150798520
#> 19 9.972399e-01 3.986057e-05 1.618438e-04 0.0025584371
#> 11 8.855198e-01 1.770112e-03 1.113401e-03 0.1115966737
#> 30 6.221509e-03 3.470612e-02 6.039936e-03 0.9530324381
#> 34 1.656704e-04 9.705787e-01 1.628479e-04 0.0290927778
#> 28 9.042788e-03 3.684396e-02 6.661688e-03 0.9474515647
#> 31 2.265240e-04 9.585717e-01 2.690234e-04 0.0409327962
#> 26 1.525082e-01 1.885451e-02 5.030634e-03 0.8236066965
#> 29 1.948998e-01 4.811058e-02 6.861306e-03 0.7501283180
#> 33 9.771155e-01 5.025501e-04 1.363616e-04 0.0222455946
#> 24 3.259673e-05 9.987283e-01 7.966515e-06 0.0012311549
#> 36 4.793132e-04 9.955968e-01 2.857521e-05 0.0038953389
#> 37 3.670480e-05 9.964571e-01 2.801749e-05 0.0034781754
#> 41 1.520015e-03 8.473454e-01 7.745152e-03 0.1433894721
#> 27 7.144360e-03 2.477801e-02 5.295813e-03 0.9627818187
#> 32 1.140535e-02 5.088722e-02 7.431382e-03 0.9302760570
#> 35 1.950680e-02 1.325170e-02 3.255441e-01 0.6416974428
#> 38 2.323373e-03 2.545335e-03 8.427572e-01 0.1523740657
## Cardinality of fuzzy clusters (i.e., the number of objects belonging to each cluster)
wetland.nc$size
#> M1 M2 M3
#> 11.41565 6.02761 12.49528
## Obtains hard membership vector, with 'N' for objects that are unclassified
defuzzify(wetland.nc$memb)$cluster
#> 5 8 13 4 17 3 9 21 16 14 2 15 1 7 10 40
#> "M3" "M3" "M3" "M3" "M3" "N" "N" "M3" "M3" "M3" "M3" "M3" "M3" "N" "N" "M1"
#> 23 25 22 20 6 18 12 39 19 11 30 34 28 31 26 29
#> "M1" "M1" "M3" "M1" "M1" "M1" "M1" "M1" "M1" "M1" "N" "M2" "N" "M2" "N" "N"
#> 33 24 36 37 41 27 32 35 38
#> "M1" "M2" "M2" "M2" "M2" "N" "N" "N" "M3"
## The same result is obtained with a matrix of chord distances
wetland.d = dist(wetland.chord)
wetland.d.nc = vegclustdist(wetland.d, mobileMemb=3, m = 1.2, dnoise=0.75,
method="NC", nstart=10)