% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/GENERICS-cvi.R, R/S4-TSClusters-methods.R
\docType{methods}
\name{cvi}
\alias{cvi}
\alias{cvi,matrix-method}
\alias{cvi,PartitionalTSClusters-method}
\alias{cvi,PartitionalTSClusters}
\alias{cvi,HierarchicalTSClusters-method}
\alias{cvi,HierarchicalTSClusters}
\alias{cvi,FuzzyTSClusters-method}
\alias{cvi,FuzzyTSClusters}
\title{Cluster validity indices}
\usage{
cvi(a, b = NULL, type = "valid", ..., log.base = 10)

\S4method{cvi}{matrix}(a, b = NULL, type = "valid", ...,
  log.base = 10)

\S4method{cvi}{PartitionalTSClusters}(a, b = NULL, type = "valid", ...,
  log.base = 10)

\S4method{cvi}{HierarchicalTSClusters}(a, b = NULL, type = "valid",
  ..., log.base = 10)

\S4method{cvi}{FuzzyTSClusters}(a, b = NULL, type = "valid", ...,
  log.base = 10)
}
\arguments{
\item{a}{An object returned by \code{\link[=tsclust]{tsclust()}}, for crisp partitions a vector that can be coerced to
integers which indicate the cluster memberships, or the membership matrix for soft clustering.}

\item{b}{If needed, a vector that can be coerced to integers which indicate the cluster
memberships. The ground truth (if known) should be provided here.}

\item{type}{Character vector indicating which indices are to be computed. See supported values
below.}

\item{...}{Arguments to pass to and from other methods.}

\item{log.base}{Base of the logarithm to be used in the calculation of VI (see details).}
}
\value{
The chosen CVIs
}
\description{
Compute different cluster validity indices (CVIs) of a given cluster partition, using the
clustering distance measure and centroid function if applicable.
}
\details{
Clustering is commonly considered to be an unsupervised procedure, so evaluating its performance
can be rather subjective. However, a great amount of effort has been invested in trying to
standardize cluster evaluation metrics by using cluster validity indices (CVIs).

In general, CVIs can be either tailored to crisp or fuzzy partitions. CVIs can be classified as
internal, external or relative depending on how they are computed. Focusing on the first two, the
crucial difference is that internal CVIs only consider the partitioned data and try to define a
measure of cluster purity, whereas external CVIs compare the obtained partition to the correct
one. Thus, external CVIs can only be used if the ground truth is known.

Note that even though a fuzzy partition can be changed into a crisp one, making it compatible
with many of the existing crisp CVIs, there are also fuzzy CVIs tailored specifically to fuzzy
clustering, and these may be more suitable in those situations. Fuzzy partitions usually have no
ground truth associated with them, but there are exceptions depending on the task's goal.

Each index defines their range of values and whether they are to be minimized or maximized. In
many cases, these CVIs can be used to evaluate the result of a clustering algorithm regardless of
how the clustering works internally, or how the partition came to be.

Knowing which CVI will work best cannot be determined a priori, so they should be tested for each
specific application. Usually, many CVIs are utilized and compared to each other, maybe using a
majority vote to decide on a final result. Furthermore, it should be noted that many CVIs perform
additional distance calculations when being computed, which can be very considerable if using DTW
or GAK.
}
\note{
In the original definition of many internal and fuzzy CVIs, the Euclidean distance and a mean
centroid was used. \strong{The implementations here change this, making use of whatever
distance/centroid was chosen during clustering}. However, some of the CVIs assume that the
distances are symmetric, since cross-distance matrices are calculated and only the upper/lower
triangulars are considered. A warning will be given if the matrices are not symmetric and the CVI
assumes so.

Because of the above, calculating CVIs for clusterings made with \code{\link[=TADPole]{TADPole()}} is a special case.
Since TADPole uses 3 distances during its execution (DTW, LB_Keogh and Euclidean), it is not
obvious which one should be used for the calculation of CVIs. Nevertheless, \code{\link[=dtw_basic]{dtw_basic()}} is used
by default.

The formula for the SF index in Saitta et al. (2007) does not correspond to the one in Arbelaitz
et al. (2013). The one specified in the former is used here.

The formulas for the Silhouette index are not entirely correct in Arbelaitz et al. (2013), refer
to Rousseeuw (1987) for the correct ones.

The formulas for the PBMF index are not entirely unambiguous in the literature, the ones given in
Lin (2013) are used here.
}
\section{External CVIs}{

\itemize{
\item Crisp partitions (the first 4 are calculated via \code{\link[flexclust:comPart]{flexclust::comPart()}})
\itemize{
\item \code{"RI"}: Rand Index (to be maximized).
\item \code{"ARI"}: Adjusted Rand Index (to be maximized).
\item \code{"J"}: Jaccard Index (to be maximized).
\item \code{"FM"}: Fowlkes-Mallows (to be maximized).
\item \code{"VI"}: Variation of Information (Meila (2003); to be minimized).
}
\item Fuzzy partitions (based on Lei et al. (2017))
\itemize{
\item \code{"RI"}: Soft Rand Index (to be maximized).
\item \code{"ARI"}: Soft Adjusted Rand Index (to be maximized).
\item \code{"VI"}: Soft Variation of Information (to be minimized).
\item \code{"NMIM"}: Soft Normalized Mutual Information based on Max entropy (to be maximized).
}
}
}

\section{Internal CVIs}{


The indices marked with an exclamation mark (!) calculate (or re-use if already available) the
whole distance matrix between the series in the data. If you were trying to avoid this in the
first place, then these CVIs might not be suitable for your application.

The indices marked with a question mark (?) depend on the extracted centroids, so bear that in
mind if a hierarchical procedure was used and/or the centroid function has associated
randomness (such as \code{\link[=shape_extraction]{shape_extraction()}} with series of different length).

The indices marked with a tilde (~) require the calculation of a global centroid. Since \code{\link[=DBA]{DBA()}}
and \code{\link[=shape_extraction]{shape_extraction()}} (for series of different length) have some randomness associated,
these indices might not be appropriate for those centroids.
\itemize{
\item Crisp partitions
\itemize{
\item \code{"Sil"} (!): Silhouette index (Rousseeuw (1987); to be maximized).
\item \code{"D"} (!): Dunn index (Arbelaitz et al. (2013); to be maximized).
\item \code{"COP"} (!): COP index (Arbelaitz et al. (2013); to be minimized).
\item \code{"DB"} (?): Davies-Bouldin index (Arbelaitz et al. (2013); to be minimized).
\item \code{"DBstar"} (?): Modified Davies-Bouldin index (DB*) (Kim and Ramakrishna (2005); to be
minimized).
\item \code{"CH"} (~): Calinski-Harabasz index (Arbelaitz et al. (2013); to be maximized).
\item \code{"SF"} (~): Score Function (Saitta et al. (2007); to be maximized; see notes).
}
\item Fuzzy partitions (using the nomenclature from Wang and Zhang (2007))
\itemize{
\item \code{"MPC"}: to be maximized.
\item \code{"K"} (~): to be minimized.
\item \code{"T"}: to be minimized.
\item \code{"SC"} (~): to be maximized.
\item \code{"PBMF"} (~): to be maximized (see notes).
}
}
}

\section{Additionally}{

\itemize{
\item \code{"valid"}: Returns all valid indices depending on the type of \code{a} and whether \code{b} was
provided or not.
\item \code{"internal"}: Returns all internal CVIs. Only supported for \linkS4class{TSClusters} objects.
\item \code{"external"}: Returns all external CVIs. Requires \code{b} to be provided.
}
}

\examples{

cvi(CharTrajLabels, sample(CharTrajLabels), type = c("ARI", "VI"))

}
\references{
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Perez, J. M., & Perona, I. (2013). An extensive
comparative study of cluster validity indices. Pattern Recognition, 46(1), 243-256.

Kim, M., & Ramakrishna, R. S. (2005). New indices for cluster validity assessment. Pattern
Recognition Letters, 26(15), 2353-2363.

Lei, Y., Bezdek, J. C., Chan, J., Vinh, N. X., Romano, S., & Bailey, J. (2017). Extending
information-theoretic validity indices for fuzzy clustering. IEEE Transactions on Fuzzy Systems,
25(4), 1013-1018.

Lin, H. Y. (2013). Effective Feature Selection for Multi-class Classification Models. In
Proceedings of the World Congress on Engineering (Vol. 3).

Meila, M. (2003). Comparing clusterings by the variation of information. In Learning theory and
kernel machines (pp. 173-187). Springer Berlin Heidelberg.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of
cluster analysis. Journal of computational and applied mathematics, 20, 53-65.

Saitta, S., Raphael, B., & Smith, I. F. (2007). A bounded index for cluster validity. In
International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 174-187).
Springer Berlin Heidelberg.

Wang, W., & Zhang, Y. (2007). On fuzzy cluster validity indices. Fuzzy sets and systems, 158(19),
2095-2117.
}
