This project includes some tools by the Google Graph Mining team, namely in-memory clustering. Our tools can be used for solving data mining and machine learning problems that either inherently have a graph structure or can be formalized as graph problems. For more information, see our NeurIPS'20 workshop.
Among others, this repository contains shared memory parallel clustering algorithms which scale to graphs with tens of billions of edges and are based on the following research papers:
-
Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth, Laxman Dhulipala, David Eisenstat, Jakub Lacki, Vahab Mirrokni, Jessica Shi, NeurIPS'22. See https://github.com/google/graph-mining/tree/main/in_memory/clustering/hac
-
Scalable community detection via parallel correlation clustering, Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Łącki, Vahab Mirrokni, VLDB'21. See https://github.com/google/graph-mining/tree/main/in_memory/clustering/correlation
-
Affinity Clustering: Hierarchical Clustering at Scale, Mohammadhossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, Vahab Mirrokni, NeurIPS'17 (the paper describes a MapReduce algorithm). See https://github.com/google/graph-mining/tree/main/in_memory/clustering/affinity
-
Distributed Balanced Partitioning via Linear Embedding, Kevin Aydin, MohammadHossein Bateni, Vahab Mirrokni, WSDM'16 (the paper describes a MapReduce algorithm). See https://github.com/google/graph-mining/tree/main/in_memory/clustering/parline
This is not an officially supported Google product. For questions/comments, please create an issue on this repository.
- Install Bazel
- Run the example:
bazel run //examples:quickstart