Wiley
Browse
prozac.txt (29.11 kB)

Dataset for: Some Remarks on the R2 for Clustering

Download (29.11 kB)
dataset
posted on 2018-05-14, 12:12 authored by Nicola Loperfido, Thaddeus Tarpey
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.

History

collectionID

4065347

Usage metrics

    Statistical Analysis and Data Mining

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC