[Spark-10994] Add clustering coefficient computation in GraphX #9150

SherlockYang · 2015-10-16T16:53:33Z

The Clustering Coefficient (CC) is a fundamental measure in social (or other type of) network analysis assessing the degree to which nodes tend to cluster together [1][2]. Clustering coefficient, along with density, node degree, path length, diameter, connectedness, and node centrality are seven most important properties to characterise a network [3].

We found that GraphX has already implemented connectedness, node centrality, path length, but does not have a componenet for computing clustering coefficient. This actually was the first intention for us to implement an algorithm to compute clustering coefficient for each vertex of a given graph.

Clustering coefficient is very helpful to many real applications, such as user behaviour prediction and structure prediction (like link prediction). We did that before in a bunch of papers (e.g., [4-5]), and also found many other publication papers using this metric in their work [6-8]. We are very confident that this feature will benefit GraphX and attract a large number of users.

References
[1] https://en.wikipedia.org/wiki/Clustering_coefficient
[2] Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’ networks." nature 393.6684 (1998): 440-442. (with 27266 citations).
[3] https://en.wikipedia.org/wiki/Network_science
[4] Jing Zhang, Zhanpeng Fang, Wei Chen, and Jie Tang. Diffusion of "Following" Links in Microblogging Networks. IEEE Transaction on Knowledge and Data Engineering (TKDE), Volume 27, Issue 8, 2015, Pages 2093-2106.
[5] Yang Yang, Jie Tang, Jacklyne Keomany, Yanting Zhao, Ying Ding, Juanzi Li, and Liangwei Wang. Mining Competitive Relationships by Learning across Heterogeneous Networks. In Proceedings of the Twenty-First Conference on Information and Knowledge Management (CIKM'12). pp. 1432-1441.
[6] Clauset, Aaron, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the prediction of missing links in networks. Nature 453.7191 (2008): 98-101. (with 973 citations)
[7] Adamic, Lada A., and Eytan Adar. Friends and neighbors on the web. Social networks 25.3 (2003): 211-230. (1238 citations)
[8] Lichtenwalter, Ryan N., Jake T. Lussier, and Nitesh V. Chawla. New perspectives and methods in link prediction. In KDD'10.

Usage

Here is a usage example for LocalClusteringCoefficient:

import org.apache.spark.graphx._
import org.apache.spark._

val conf = new SparkConf().setAppName("testApp")
val sc = new SparkContext(conf)
// load a graph
val graph = GraphLoader.edgeListFile(sc, "graph.txt").partitionBy(PartitionStrategy.RandomVertexCut)

// perform the local clustering coefficient computation 
val LccCounter = graph.localClusteringCoefficient()

// output results for each vertex
val verts = LccCounter.vertices
verts.collect().foreach { case (vid, count) =>
    println(vid + ": " + count)
}

AmplabJenkins · 2015-10-16T16:57:11Z

Can one of the admins verify this patch?

rxin · 2016-06-15T22:03:45Z

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. For this one you might want to consider creating a spark-packages.org package.

SherlockYang added 8 commits October 16, 2015 00:35

add local clustering coeffcient computation along with test suite

20b56b7

revise test suite

83529d9

unit test

169298d

=

97c705e

=

bd8d212

=

18af17c

=

a33789f

=

8e1a356

SherlockYang changed the title ~~[Spark-10994] Add local clustering coefficient computation in GraphX~~ [Spark-10994] Add clustering coefficient computation in GraphX Oct 19, 2015

asfgit closed this in 1a33f2e Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark-10994] Add clustering coefficient computation in GraphX #9150

[Spark-10994] Add clustering coefficient computation in GraphX #9150

SherlockYang commented Oct 16, 2015

AmplabJenkins commented Oct 16, 2015

rxin commented Jun 15, 2016

[Spark-10994] Add clustering coefficient computation in GraphX #9150

[Spark-10994] Add clustering coefficient computation in GraphX #9150

Conversation

SherlockYang commented Oct 16, 2015

Usage

AmplabJenkins commented Oct 16, 2015

rxin commented Jun 15, 2016