Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19505][Python] AttributeError on Exception.message in Python3 #16845

Closed

Conversation

dgingrich
Copy link

@dgingrich dgingrich commented Feb 8, 2017

What changes were proposed in this pull request?

Added util._message_exception helper to use str(e) when e.message is unavailable (Python3). Grepped for all occurrences of .message in pyspark/ and these were the only occurrences.

How was this patch tested?

  • Doctests for helper function

Legal

This is my original work and I license the work to the project under the project’s open source license.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 8, 2017

Yes, up to my knowledge, message attribute is deprecated from 2.6 and removed in 3.x. It seems reasonable. Let me please cc @holdenk.

@@ -82,7 +82,7 @@ def dump(self, value, f):
except pickle.PickleError:
raise
except Exception as e:
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + e.message
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + str(e)
Copy link
Member

@HyukjinKwon HyukjinKwon Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we should keep in mind that

>>> Exception(u"jörn").message
u'j\xf6rn'
>>> str(Exception(u"jörn"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)

this could cause such encoding problem. I think it'd be really rare case though. It seems this usage is already there in the code base so I assume it is fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I forgot about Python 2's str/unicode behavior. It could be a problem if an exception includes user input as part of the message. IMO it's worth handling it rigorously since throwing in the except: hides the original error. I'll look at fixes tonight, in the worse case we could add a get_exception_message() helper.

@holdenk
Copy link
Contributor

holdenk commented Feb 13, 2017

+1 to @HyukjinKwon comment.

* Add util._exception_message helper
* Switch from "e.message" to "_exception_message(e)"
@dgingrich dgingrich force-pushed the topic-spark-19505-py3-exceptions branch from cb5c34d to 7b8ace4 Compare February 14, 2017 18:02
@dgingrich
Copy link
Author

Added a helper that falls back from e.message to str(e). Also rebased to master.

@HyukjinKwon
Copy link
Member

(Actually, I personally prefer str(e) unless we can identify any actual case although it is a potential problem but maybe too narrow to introduce another util.. Maybe, I am being too cautious.)

@HyukjinKwon
Copy link
Member

BTW, could anyone say "ok to test"? Anyhow it is true that Exception does not have message in Python 3 and we need a fix.

@dgingrich
Copy link
Author

dgingrich commented Feb 18, 2017

IMO the helper is better since the Exception message is controlled by the user. I thought Holden was +1'ing the problem (not handling unicode) but re-reading she might have been +1'ing that it was fine. Let me know which way you prefer, it's trivial to switch back to using str.

@HyukjinKwon
Copy link
Member

Yea, I think she is meant to decide this :).

@dgingrich
Copy link
Author

dgingrich commented Feb 18, 2017

Another alternate is to use six. In the standard library text_type works:

# Python2.6
>>> import six
>>> six.text_type(Exception(u"unicöde"))
u'unic\xf6de'

I also submitted a patch to six to add an exception_message helper (benjaminp/six#177) but I don't know if it will be accepted since Exception.message has been deprecated since 2.6.

@holdenk
Copy link
Contributor

holdenk commented Feb 18, 2017

Let's see if I can ask Jenkins to test or if I need to get another list updates for that. Jenkins ok to test.

@holdenk
Copy link
Contributor

holdenk commented Feb 22, 2017

Ok seems like I need to bug one of the other committers to ask Jenkins to test this, maybe @davies can do this in the meantime?

@holdenk
Copy link
Contributor

holdenk commented Feb 24, 2017

Ok in theory I've been added, so lets see Jenkins test this please.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73446 has finished for PR 16845 at commit 7b8ace4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I've got a few questions but it looks in pretty good shape already :)

@@ -82,7 +83,8 @@ def dump(self, value, f):
except pickle.PickleError:
raise
except Exception as e:
msg = "Could not serialize broadcast: " + e.__class__.__name__ + ": " + e.message
msg = ("Could not serialize broadcast: " + e.__class__.__name__ + ": "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this maybe be easier to read using format rather than string concatenation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was trying to minimize changes but I prefer formatting with %. Switched.


if __name__ == "__main__":
import doctest
(failure_count, test_count) = doctest.testmod()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests won't be run by the default test script, we could fix this by updating the list of things to test - but since this function is only used in one place for now maybe it would make sense to just put this in cloudpickle.py anyways. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the util is called from both cloudpickle.py and broadcast.py, so I'd rather leave the helper here and add it the list of things to test. Leaving it as is for now but willing to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable, in that case you will need to make sure this is called by the test scripts (take a look at ./dev/sparktestsupport/modules.py).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Holden, looking at this today (sorry for the delay, been busy at work).

FWIW the patch to add support for the utility function six was rejected: benjaminp/six#177

@holdenk
Copy link
Contributor

holdenk commented Mar 9, 2017

So being able to format the messages in the 2.X line and 3 line seems like a reasonable improvement, even if we are using a deprecated feature in 2.6+ for the 2.X line. Let me know when you have a chance to update the PR so the tests are run.

@dgingrich dgingrich force-pushed the topic-spark-19505-py3-exceptions branch from f4428a1 to 91dc9bf Compare March 9, 2017 19:47
@dgingrich
Copy link
Author

Added to pyspark.util to modules. Let me know if you want any other changes.

@dgingrich
Copy link
Author

Ping! Let me know if you need more work from me.

@holdenk
Copy link
Contributor

holdenk commented Apr 6, 2017

LGTM - sorry for the slowness.

@holdenk
Copy link
Contributor

holdenk commented Apr 6, 2017

Let's make sure everything is still fine since we've made some changes in the testing infrastructure though - if it passes I'll merge this tonight (european time).

Jenkins retest this please.

@dgingrich
Copy link
Author

Great, thanks Holden!

@holdenk
Copy link
Contributor

holdenk commented Apr 9, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 9, 2017

Test build #75631 has finished for PR 16845 at commit 91dc9bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 6297697 Apr 11, 2017
@holdenk
Copy link
Contributor

holdenk commented Apr 11, 2017

Merged to master

peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

Added `util._message_exception` helper to use `str(e)` when `e.message` is unavailable (Python3).  Grepped for all occurrences of `.message` in `pyspark/` and these were the only occurrences.

## How was this patch tested?

- Doctests for helper function

## Legal

This is my original work and I license the work to the project under the project’s open source license.

Author: David Gingrich <[email protected]>

Closes apache#16845 from dgingrich/topic-spark-19505-py3-exceptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants