Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-10994] Add clustering coefficient computation in GraphX #9150

Closed
wants to merge 8 commits into from

Conversation

SherlockYang
Copy link

The Clustering Coefficient (CC) is a fundamental measure in social (or other type of) network analysis assessing the degree to which nodes tend to cluster together [1][2]. Clustering coefficient, along with density, node degree, path length, diameter, connectedness, and node centrality are seven most important properties to characterise a network [3].

We found that GraphX has already implemented connectedness, node centrality, path length, but does not have a componenet for computing clustering coefficient. This actually was the first intention for us to implement an algorithm to compute clustering coefficient for each vertex of a given graph.

Clustering coefficient is very helpful to many real applications, such as user behaviour prediction and structure prediction (like link prediction). We did that before in a bunch of papers (e.g., [4-5]), and also found many other publication papers using this metric in their work [6-8]. We are very confident that this feature will benefit GraphX and attract a large number of users.

References
[1] https://en.wikipedia.org/wiki/Clustering_coefficient
[2] Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’ networks." nature 393.6684 (1998): 440-442. (with 27266 citations).
[3] https://en.wikipedia.org/wiki/Network_science
[4] Jing Zhang, Zhanpeng Fang, Wei Chen, and Jie Tang. Diffusion of "Following" Links in Microblogging Networks. IEEE Transaction on Knowledge and Data Engineering (TKDE), Volume 27, Issue 8, 2015, Pages 2093-2106.
[5] Yang Yang, Jie Tang, Jacklyne Keomany, Yanting Zhao, Ying Ding, Juanzi Li, and Liangwei Wang. Mining Competitive Relationships by Learning across Heterogeneous Networks. In Proceedings of the Twenty-First Conference on Information and Knowledge Management (CIKM'12). pp. 1432-1441.
[6] Clauset, Aaron, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the prediction of missing links in networks. Nature 453.7191 (2008): 98-101. (with 973 citations)
[7] Adamic, Lada A., and Eytan Adar. Friends and neighbors on the web. Social networks 25.3 (2003): 211-230. (1238 citations)
[8] Lichtenwalter, Ryan N., Jake T. Lussier, and Nitesh V. Chawla. New perspectives and methods in link prediction. In KDD'10.

Usage

Here is a usage example for LocalClusteringCoefficient:

import org.apache.spark.graphx._
import org.apache.spark._

val conf = new SparkConf().setAppName("testApp")
val sc = new SparkContext(conf)
// load a graph
val graph = GraphLoader.edgeListFile(sc, "graph.txt").partitionBy(PartitionStrategy.RandomVertexCut)

// perform the local clustering coefficient computation 
val LccCounter = graph.localClusteringCoefficient()

// output results for each vertex
val verts = LccCounter.vertices
verts.collect().foreach { case (vid, count) =>
    println(vid + ": " + count)
}

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@SherlockYang SherlockYang changed the title [Spark-10994] Add local clustering coefficient computation in GraphX [Spark-10994] Add clustering coefficient computation in GraphX Oct 19, 2015
@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. For this one you might want to consider creating a spark-packages.org package.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants