-
Notifications
You must be signed in to change notification settings - Fork 0
A matlab package for K-means-based consensus clustering
License
linhaobuaa/KCC
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
============================================================================================= KCC Version 1.2, 2023-04-25 This package is distributed under GNU GENERAL PUBLIC LICENSE (Version 3). (see LICENSE) Copyright (c) 2017-2023 Hao Lin & Hongfu Liu & Junjie Wu. ============================================================================================= 1. Introduction =============== KCC is a MATLAB package for K-means-based Consensus Clustering framework with different utility functions: - U_c: Category Utility Function with Euclidean distance as distance measure - U_h: Shannon Entropy Utility Function with KL-divergence as distance measure - U_cos: Cosine Utility Function with Cosine Similarity as distance measure - U_lp: Lp Utility Function with Lp-norm as distance measure - NUx: normalized form of the above utility functions 2. Installation and Basic Usage =============================== A. Copy all .m files of Matlab/Src to the current directory in your MATLAB environment or a directory in your MATLAB path. B. In the MATLAB command window, to run an illustrative example of KCC with different utility functions, type as follows, > demo After executing this command, evaluation metrics of KCC experiments with different utility functions will be stored in the result files. 3. Functions in the package =========================== A. Format of Input Files ------------------------ - Data file: two types of data files. For the file with subfix '.dat', rows correspond to observations; columns correspond to variables. For the file with subfix '.mat', it is a sparse matrix format. Note that in all data files class labels are excluded! See example in the folder "data\iris.dat" "data\mm.mat". - Truelabels file (optional, used when true cluster labels are known): n-by-1 vector of known cluster labels for all data points, see example in the folder "data\iris_rclass.dat". B. Illustrative Example ------------------------ - demo.m: demonstrates how to set up input arguments and use KCC with different utility functions - demoNumberBP.m: demonstrates KCC experiments with increasing number of basic partitions - demoStrategyBP.m: demonstrates KCC experiments with RFS strategy. - demoIBPI.m: demonstrates KCC experiments using Strategy-I for generating incomplete basic partitions - demoIBPII.m: demonstrates KCC experiments using Strategy-II for generating incomplete basic partitions - demoEvacluster.m: demonstrates KCC experiments to evaluate the cluster solution using internal metrics and determines the best number of clusters for the consensus clustering -demoEvaTimeMem.m: demonstrates how to measure the full execution time and peak memory usage of using KCC C. The process of KCC includes ------------------------------ (1) Generating basic partitions There are two functions for generating basic partitions: - BasicCluster_RFS: generates basic partitions using RFS strategy - BasicCluster_RPS: generates basic partitions using RPS strategy (2) Preprocessing for consensus clustering - Preprocess: prepare for consensus clustering (3) Performing consensus function - KCC: perform the final consensus function using different utility functions (4) Evaluating clustering quality - exMeasure: computes external validity scores for clustering results - inMeasure: computes internal validity scores for clustering results D. Auxiliary functions ---------------------- - load_sparse: loads input text data as a sparse matrix. - hungarian: solves the assignment problem using the Hungarian method (auxiliary function for permuting labels of clustering results to match true labels as good as possible). - BasicCluster_RPS_missing: randomly removes data instances from a data set and then employs k-means on the incomplete data set (auxiliary function for generating incomplete basic partitions using strategy-I). - addmissing: randomly removes some labels from complete basic partitions (auxiliary function for generating incomplete basic partitions using strategy-II). - distance_cos, distance_cos_miss, distance_euc, distance_euc_miss, distance_kl, distance_kl_miss, distance_lp, distance_lp_miss: distance calculation on dataset with or without missing value using different distance measures, i.e., cosine similarity, euclidean distance, KL-divergence, Lp-norm. - gClusterDistribution: calculates cluster distribution for basic partitions (auxiliary function for preprocessing). - Ucompute, Ucompute_miss: calculates the utility function on data set with or without missing value (auxiliary function for consensus clustering). - gCentroid, gCentroid_miss: updates centroid for each cluster on data set with or without missing value (auxiliary function for consensus clustering). - sCentroid, sCentroid_miss: initializes centroid for each cluster on data set with or without missing value (auxiliary function for consensus clustering). * Note: to get a description for each function, type "help" following by the function name in the MATLAB command window. 4. Contact ========== For questions and comments, please feel free to contact Dr. Hao Lin at haolin@buaa.edu.cn. 5. Cite ========== For use of the software, please cite the paper published in ACM TOMS with the following BibTex. @article{lin2023algorithm, title={Algorithm xxxx: KCC: A MATLAB Package for K-means-based Consensus Clustering}, author={Lin, Hao and Liu, Hongfu and Wu, Junjie and Li, Hong and G{\"u}nnemann, Stephan}, journal={ACM transactions on mathematical software}, year={2023}, publisher={ACM New York, NY} } 6. Ongoing Development ====================== This code is being developed on an on-going basis at the author's [Github site](https://github.com/linhaobuaa/KCC). Please go there if you would like to get a more recent version of the software.
About
A matlab package for K-means-based consensus clustering
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published