Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3190][GraphX] fix VertexRDD.count exceed on large graph #12835

Closed

Conversation

liyuance
Copy link

@liyuance liyuance commented May 2, 2016

As [SPARK-3190] and #2106 described, VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes.
And the PR above expected to fix the issue by converting partition sizes to Longs before summing them. But when the number of vertices in specific partition exceed Integer.MAX_VALUE also can repreduce this issue.
The fundamental cause of this problem is the variable “size” is defined as type Int in class VertexPartitionBase.
def size: Int = mask.cardinality()

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented May 2, 2016

cc @ankurdave

maropu added a commit to maropu/spark that referenced this pull request Apr 23, 2017
@asfgit asfgit closed this in b771fed Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants