Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Universitat Politècnica de Catalunya. SOCO - Soft Computing
2015
After decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.
Peer Reviewed
Postprint (author's final draft)
Conference report
Inglés
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica; Protein research; Fuzzy clustering; K-Means; Clustering stability analysis; Cramér’s V index; G Protein-Coupled Receptors; Proteïnes -- Investigació
Springer
http://link.springer.com/chapter/10.1007/978-3-319-16483-0_52
Open Access
E-prints [72986]