Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10642][PySpark] Fix crash when calling rdd.lookup() on tuple keys #8796

Closed
wants to merge 2 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 17, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling rdd.lookup() on a RDD with tuple keys, portable_hash will return a long. That causes DAGScheduler.submitJob to throw java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer.

@@ -84,7 +84,7 @@ def portable_hash(x):
h ^= len(x)
if h == -1:
h = -2
return h
return int(h)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking the dumb question here, but is this intended to return an int? sys.maxsize does not appear to be the max positive 32-bit int.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it use sys.maxint instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @davies the real question is whether this hash is intended to be 32-bit or 64-bit, and my Python knowledge is too limited to reason about this. It appears that it's computing a 64-bit hash given the size of sys.maxsize but maybe that's platform dependent or something. Anyway: I kind of suspect you're right that it's 32-bit, but I think that has to be verified first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The h become a long when h *= 1000003, even after h &= sys.maxsize or (maxint).

The fix looks good to me.

@SparkQA
Copy link

SparkQA commented Sep 17, 2015

Test build #42589 has finished for PR 8796 at commit 1eac461.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 17, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 17, 2015

Test build #42596 has finished for PR 8796 at commit 1eac461.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 17, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 17, 2015

Test build #42608 has finished for PR 8796 at commit d5bfb01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 17, 2015
…keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Sep 17, 2015
…keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Sep 17, 2015
…keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Sep 17, 2015
…keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 136c77d Sep 17, 2015
@davies
Copy link
Contributor

davies commented Sep 17, 2015

LGTM, merging into master and 1.5, 1.4, 1.3, 1.2 branches

ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#8796 from viirya/fix-pyrdd-lookup.

(cherry picked from commit 136c77d)
Signed-off-by: Davies Liu <[email protected]>
(cherry picked from commit 9f8fb33)
@viirya viirya deleted the fix-pyrdd-lookup branch December 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants