-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19505][Python] AttributeError on Exception.message in Python3 #16845
[SPARK-19505][Python] AttributeError on Exception.message in Python3 #16845
Conversation
Yes, up to my knowledge, |
python/pyspark/broadcast.py
Outdated
@@ -82,7 +82,7 @@ def dump(self, value, f): | |||
except pickle.PickleError: | |||
raise | |||
except Exception as e: | |||
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + e.message | |||
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + str(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we should keep in mind that
>>> Exception(u"jörn").message
u'j\xf6rn'
>>> str(Exception(u"jörn"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)
this could cause such encoding problem. I think it'd be really rare case though. It seems this usage is already there in the code base so I assume it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I forgot about Python 2's str
/unicode
behavior. It could be a problem if an exception includes user input as part of the message. IMO it's worth handling it rigorously since throwing in the except:
hides the original error. I'll look at fixes tonight, in the worse case we could add a get_exception_message()
helper.
+1 to @HyukjinKwon comment. |
* Add util._exception_message helper * Switch from "e.message" to "_exception_message(e)"
cb5c34d
to
7b8ace4
Compare
Added a helper that falls back from |
(Actually, I personally prefer |
BTW, could anyone say "ok to test"? Anyhow it is true that |
IMO the helper is better since the Exception message is controlled by the user. I thought Holden was +1'ing the problem (not handling unicode) but re-reading she might have been +1'ing that it was fine. Let me know which way you prefer, it's trivial to switch back to using |
Yea, I think she is meant to decide this :). |
Another alternate is to use # Python2.6
>>> import six
>>> six.text_type(Exception(u"unicöde"))
u'unic\xf6de' I also submitted a patch to six to add an |
Let's see if I can ask Jenkins to test or if I need to get another list updates for that. Jenkins ok to test. |
Ok seems like I need to bug one of the other committers to ask Jenkins to test this, maybe @davies can do this in the meantime? |
Ok in theory I've been added, so lets see Jenkins test this please. |
Test build #73446 has finished for PR 16845 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, I've got a few questions but it looks in pretty good shape already :)
python/pyspark/broadcast.py
Outdated
@@ -82,7 +83,8 @@ def dump(self, value, f): | |||
except pickle.PickleError: | |||
raise | |||
except Exception as e: | |||
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + e.message | |||
msg = ("Could not serialize broadcast: " + e.__class__.__name__ + ": " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this maybe be easier to read using format rather than string concatenation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was trying to minimize changes but I prefer formatting with %
. Switched.
|
||
if __name__ == "__main__": | ||
import doctest | ||
(failure_count, test_count) = doctest.testmod() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests won't be run by the default test script, we could fix this by updating the list of things to test - but since this function is only used in one place for now maybe it would make sense to just put this in cloudpickle.py anyways. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the util is called from both cloudpickle.py and broadcast.py, so I'd rather leave the helper here and add it the list of things to test. Leaving it as is for now but willing to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable, in that case you will need to make sure this is called by the test scripts (take a look at ./dev/sparktestsupport/modules.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Holden, looking at this today (sorry for the delay, been busy at work).
FWIW the patch to add support for the utility function six
was rejected: benjaminp/six#177
So being able to format the messages in the 2.X line and 3 line seems like a reasonable improvement, even if we are using a deprecated feature in 2.6+ for the 2.X line. Let me know when you have a chance to update the PR so the tests are run. |
f4428a1
to
91dc9bf
Compare
Added to |
Ping! Let me know if you need more work from me. |
LGTM - sorry for the slowness. |
Let's make sure everything is still fine since we've made some changes in the testing infrastructure though - if it passes I'll merge this tonight (european time). Jenkins retest this please. |
Great, thanks Holden! |
Jenkins, retest this please. |
Test build #75631 has finished for PR 16845 at commit
|
Merged to master |
## What changes were proposed in this pull request? Added `util._message_exception` helper to use `str(e)` when `e.message` is unavailable (Python3). Grepped for all occurrences of `.message` in `pyspark/` and these were the only occurrences. ## How was this patch tested? - Doctests for helper function ## Legal This is my original work and I license the work to the project under the project’s open source license. Author: David Gingrich <[email protected]> Closes apache#16845 from dgingrich/topic-spark-19505-py3-exceptions.
What changes were proposed in this pull request?
Added
util._message_exception
helper to usestr(e)
whene.message
is unavailable (Python3). Grepped for all occurrences of.message
inpyspark/
and these were the only occurrences.How was this patch tested?
Legal
This is my original work and I license the work to the project under the project’s open source license.