Skip to content

durgeshsamariya/awesome-clustering-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Clustering Resources Awesome

GitHub stars GitHub forks License

Clustering or Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). - Wikipedia



Table of Contents


1. Books

Data Clustering by Chandan K. Reddy and Charu C. Aggarwal. This text book covers most of the clustering techniques. Highly recommended to people working in clustering.

Data Clustering: Theory, Algorithms, and Applications by Guojun Gan, Chaoqun Ma and Jianhong Wu. This is a useful compendium of a variety of methods of clustering, for a variety of data types, with numerous measures of similarity, and many examples of algorithms. The ultimate emphasis is on the algorithms, even the implementation in MATLAB or C++.


2. Papers


2.1. Survey Papers

Title Publication Venue Year Reference URL
Survey of clustering algorithms IEEE Transactions on Neural Networks 2005 [1] [URL]
A Survey of Clustering Data Mining Techniques Springer 2006 [2] [URL]
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering ACM Transactions on Knowledge Discovery from Data 2009 [3] [PDF]
A Survey of Text Clustering Algorithms Springer 2012 [4] [URL]
A Survey of Recent Advances in Hierarchical Clustering Algorithms The computer journal 1983 [5] [PDF]
Subspace clustering for high dimensional data: a review ACM Sigmod Exploration Newsletter 2004 [10] [PDF]
Clustering of time series data—a survey Pattern Recognition 2005 [17] [PDF]
A comprehensive survey of clustering algorithms Annals of Data Science 2015 [18] [URL]

2.2. State-of-the-Art Papers

Title Publication Venue Year Reference URL
DBSCAN KDD 1996 [6] [PDF]

2.3. Density Based Clustering Algorithms

Title Publication Venue Year Reference URL
DBSCAN KDD 1996 [6] [PDF]
OPTICS: ordering points to identify the clustering structure ACM Sigmod record 1999 [7] [PDF]
A distribution-based clustering algorithm for mining in large spatial databases ICDE 1998 [8] [PDF]
An efficient approach to clustering in large multimedia databases with noise AAAI 1998 [9] [PDF]

2.4. Distance Based Clustering Algorithms

Title Publication Venue Year Reference URL

2.5. Time Series Clustering

Title Publication Venue Year Reference URL
[] [PDF]
[] [PDF]
[] [PDF]
[] [PDF]

2.6. Text Clustering

Title Publication Venue Year Reference URL
Frequent term-based text clustering KDD 2002 [15] [PDF]
An Evaluation on Feature Selection for Text Clustering ICML 2003 [16] [PDF]
[] [PDF]
[] [PDF]
[] [PDF]

2.7. Subspace Clustering

Title Publication Venue Year Reference URL
Subspace clustering for high dimensional data: a review ACM Sigmod Exploration Newsletter 2004 [10] [PDF]
Density-Connected Subspace Clustering for High-Dimensional Data SIAM 2004 [11] [PDF]
Entropy-based subspace clustering for mining numerical data KDD 1999 [12] [PDF]
Low rank subspace clustering (LRSC) Pattern Recognition Letters 2014 [13] [PDF]
DUSC: Dimensionality Unbiased Subspace Clustering ICDM 2007 [14] [PDF]

3. Online Courses

Coursera Machine Learning course by Andrew Ng, Stanford University [See Video]

Coursera Machine Learning with Python by IBM [See Video]

Course on Clustering in Machine Learning by Google [See Video]

Coursera Clustering Analysis in Data Mining course by University of Illinois at Urbana-Champaign [See Video]

4. Clustering Datasets

Clustering basic benchmark: http://cs.joensuu.fi/sipu/datasets/

5. List of Journals

Journal of Machine Learning Research

IEEE Transactions on Pattern Analysis and Machine Intelligence

Data Mining and Knowledge Discovery

IEEE Access

Pattern Recognition Letters

ACM SIGKDD Explorations Newsletter

References

[1] Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3), 645-678.
[2] Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data (pp. 25-71). Springer, Berlin, Heidelberg.
[3] Kriegel, H. P., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1), 1-58.
[4] Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77-128). Springer, Boston, MA.
[5] Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The computer journal, 26(4), 354-359.
[6] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231).
[7] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.
[8] Xu, X., Ester, M., Kriegel, H. P., & Sander, J. (1998, February). A distribution-based clustering algorithm for mining in large spatial databases. In Proceedings 14th International Conference on Data Engineering (pp. 324-331). IEEE.
[9] Alexander Hinneburg and Daniel A. Keim. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, 58–65.
[10] Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6, 1 (June 2004), 90–105. DOI:https://doi.org/10.1145/1007730.1007731
[11] Kailing, K., Kriegel, H. P., & Kröger, P. (2004, April). Density-connected subspace clustering for high-dimensional data. In Proceedings of the 2004 SIAM international conference on data mining (pp. 246-256). Society for Industrial and Applied Mathematics.
[12] Cheng, C. H., Fu, A. W., & Zhang, Y. (1999, August). Entropy-based subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 84-93).
[13] Vidal, R., & Favaro, P. (2014). Low rank subspace clustering (LRSC). Pattern Recognition Letters, 43, 47-61.
[14] Assent, I., Krieger, R., Müller, E., & Seidl, T. (2007, October). DUSC: Dimensionality unbiased subspace clustering. In seventh IEEE international conference on data mining (ICDM 2007) (pp. 409-414). IEEE.
[15] Florian Beil, Martin Ester, and Xiaowei Xu. 2002. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02). Association for Computing Machinery, New York, NY, USA, 436–442. DOI:https://doi.org/10.1145/775047.775110
[16] Liu, T., Liu, S., Chen, Z., & Ma, W. Y. (2003). An evaluation on feature selection for text clustering. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 488-495).
[17] Liao, T. W. (2005). Clustering of time series data—a survey. Pattern recognition, 38(11), 1857-1874.
[18] Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165-193.

More to come...

More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!

Last updated on November 25, 2020

Releases

No releases published

Packages

No packages published