Clustering or Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). - Wikipedia
Data Clustering by Chandan K. Reddy and Charu C. Aggarwal. This text book covers most of the clustering techniques. Highly recommended to people working in clustering.
Data Clustering: Theory, Algorithms, and Applications by Guojun Gan, Chaoqun Ma and Jianhong Wu. This is a useful compendium of a variety of methods of clustering, for a variety of data types, with numerous measures of similarity, and many examples of algorithms. The ultimate emphasis is on the algorithms, even the implementation in MATLAB or C++.
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
Survey of clustering algorithms | IEEE Transactions on Neural Networks | 2005 | [1] | [URL] |
A Survey of Clustering Data Mining Techniques | Springer | 2006 | [2] | [URL] |
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering | ACM Transactions on Knowledge Discovery from Data | 2009 | [3] | [PDF] |
A Survey of Text Clustering Algorithms | Springer | 2012 | [4] | [URL] |
A Survey of Recent Advances in Hierarchical Clustering Algorithms | The computer journal | 1983 | [5] | [PDF] |
Subspace clustering for high dimensional data: a review | ACM Sigmod Exploration Newsletter | 2004 | [10] | [PDF] |
Clustering of time series data—a survey | Pattern Recognition | 2005 | [17] | [PDF] |
A comprehensive survey of clustering algorithms | Annals of Data Science | 2015 | [18] | [URL] |
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
DBSCAN | KDD | 1996 | [6] | [PDF] |
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
DBSCAN | KDD | 1996 | [6] | [PDF] |
OPTICS: ordering points to identify the clustering structure | ACM Sigmod record | 1999 | [7] | [PDF] |
A distribution-based clustering algorithm for mining in large spatial databases | ICDE | 1998 | [8] | [PDF] |
An efficient approach to clustering in large multimedia databases with noise | AAAI | 1998 | [9] | [PDF] |
Title | Publication Venue | Year | Reference | URL |
---|
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
[] | [PDF] | |||
[] | [PDF] | |||
[] | [PDF] | |||
[] | [PDF] |
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
Frequent term-based text clustering | KDD | 2002 | [15] | [PDF] |
An Evaluation on Feature Selection for Text Clustering | ICML | 2003 | [16] | [PDF] |
[] | [PDF] | |||
[] | [PDF] | |||
[] | [PDF] |
Title | Publication Venue | Year | Reference | URL |
---|---|---|---|---|
Subspace clustering for high dimensional data: a review | ACM Sigmod Exploration Newsletter | 2004 | [10] | [PDF] |
Density-Connected Subspace Clustering for High-Dimensional Data | SIAM | 2004 | [11] | [PDF] |
Entropy-based subspace clustering for mining numerical data | KDD | 1999 | [12] | [PDF] |
Low rank subspace clustering (LRSC) | Pattern Recognition Letters | 2014 | [13] | [PDF] |
DUSC: Dimensionality Unbiased Subspace Clustering | ICDM | 2007 | [14] | [PDF] |
Coursera Machine Learning course by Andrew Ng, Stanford University [See Video]
Coursera Machine Learning with Python by IBM [See Video]
Course on Clustering in Machine Learning by Google [See Video]
Coursera Clustering Analysis in Data Mining course by University of Illinois at Urbana-Champaign [See Video]
Clustering basic benchmark: http://cs.joensuu.fi/sipu/datasets/
Journal of Machine Learning Research
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining and Knowledge Discovery
ACM SIGKDD Explorations Newsletter
[1] | Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3), 645-678. |
[2] | Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data (pp. 25-71). Springer, Berlin, Heidelberg. |
[3] | Kriegel, H. P., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1), 1-58. |
[4] | Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77-128). Springer, Boston, MA. |
[5] | Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The computer journal, 26(4), 354-359. |
[6] | Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231). |
[7] | Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60. |
[8] | Xu, X., Ester, M., Kriegel, H. P., & Sander, J. (1998, February). A distribution-based clustering algorithm for mining in large spatial databases. In Proceedings 14th International Conference on Data Engineering (pp. 324-331). IEEE. |
[9] | Alexander Hinneburg and Daniel A. Keim. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98). AAAI Press, 58–65. |
[10] | Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6, 1 (June 2004), 90–105. DOI:https://doi.org/10.1145/1007730.1007731 |
[11] | Kailing, K., Kriegel, H. P., & Kröger, P. (2004, April). Density-connected subspace clustering for high-dimensional data. In Proceedings of the 2004 SIAM international conference on data mining (pp. 246-256). Society for Industrial and Applied Mathematics. |
[12] | Cheng, C. H., Fu, A. W., & Zhang, Y. (1999, August). Entropy-based subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 84-93). |
[13] | Vidal, R., & Favaro, P. (2014). Low rank subspace clustering (LRSC). Pattern Recognition Letters, 43, 47-61. |
[14] | Assent, I., Krieger, R., Müller, E., & Seidl, T. (2007, October). DUSC: Dimensionality unbiased subspace clustering. In seventh IEEE international conference on data mining (ICDM 2007) (pp. 409-414). IEEE. |
[15] | Florian Beil, Martin Ester, and Xiaowei Xu. 2002. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02). Association for Computing Machinery, New York, NY, USA, 436–442. DOI:https://doi.org/10.1145/775047.775110 |
[16] | Liu, T., Liu, S., Chen, Z., & Ma, W. Y. (2003). An evaluation on feature selection for text clustering. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 488-495). |
[17] | Liao, T. W. (2005). Clustering of time series data—a survey. Pattern recognition, 38(11), 1857-1874. |
[18] | Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165-193. |
More to come...
More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!
Last updated on November 25, 2020