[SPARK-19926][PYSPARK] Make pyspark exception more user-friendly #17267

HyukjinKwon · 2017-03-13T13:00:49Z

Hm.. does this work for unicode in Python 2, for example, spark.range(1).select("아")? Up to my knowledge, converting it to ascii directly throws an exception.

>>> str(u"아")

Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in position 0: ordinal not in range(128)

>>> repr(u"아")

"u'\\uc544'"

Maybe, we should check if this is unicode and do .encode.

I just tested with this change as below to help:

before

>>> try: ... spark.range(1).select(u"아") ... except Exception as e: ... print e

u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544]\n+- Range (0, 1, step=1, splits=Some(8))\n"

>>> spark.range(1).select(u"아")

Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544]\n+- Range (0, 1, step=1, splits=Some(8))\n"

after

>>> try: ... spark.range(1).select(u"아") ... except Exception as e: ... print e

Traceback (most recent call last): File "<stdin>", line 4, in <module> File ".../spark/python/pyspark/sql/utils.py", line 27, in __str__ return str(self.desc) UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in position 17: ordinal not in range(128)

>>> spark.range(1).select(u"아")

Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException

@uncleGen, could you double check if I did something wrong maybe?

We can add a check under Python2. If it is unicode, just encode it with utf-8.

@HyukjinKwon Good catch!

Ah, thank you for confirmation. I thought I was mistaken :).

Maybe another benefit for this change is, before it you will see the error log in your example like:

u"cannot resolve '\uc544' given input columns: [id];;\n'Project ['\uc544]

repr will show unicode escape characters \uc544. Even you encode it, you will see binary representation for it. str can show the correct "아" if encoded with utf-8.

If I test it correctly.

Yea, I support this change and tested some more cases with that encode.

based on latest commit:

>>> df.select("아") Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 75, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException : cannot resolve '`아`' given input columns: [age, name];; 'Project ['아] +- Relation[age#0L,name#1] json

-Original file line number
+Diff line change
@@ Expand Up / @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): @@
             self.stackTrace = stackTrace
         def __str__(self):
-            return repr(self.desc)
+            return str(self.desc)
     class AnalysisException(CapturedException):
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19926][PYSPARK] Make pyspark exception more user-friendly #17267

Diff view

Diff view

There are no files selected for viewing

HyukjinKwon Mar 13, 2017 •

edited

Loading

HyukjinKwon Mar 13, 2017 •

edited

Loading

HyukjinKwon Mar 13, 2017

viirya Mar 13, 2017

viirya Mar 13, 2017

HyukjinKwon Mar 13, 2017

viirya Mar 13, 2017

HyukjinKwon Mar 13, 2017

uncleGen Mar 14, 2017

[SPARK-19926][PYSPARK] Make pyspark exception more user-friendly #17267

[SPARK-19926][PYSPARK] Make pyspark exception more user-friendly #17267

Diff view

Diff view

There are no files selected for viewing

HyukjinKwon Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

HyukjinKwon Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

HyukjinKwon Mar 13, 2017

Choose a reason for hiding this comment

viirya Mar 13, 2017

Choose a reason for hiding this comment

viirya Mar 13, 2017

Choose a reason for hiding this comment

HyukjinKwon Mar 13, 2017

Choose a reason for hiding this comment

viirya Mar 13, 2017

Choose a reason for hiding this comment

HyukjinKwon Mar 13, 2017

Choose a reason for hiding this comment

uncleGen Mar 14, 2017

Choose a reason for hiding this comment

HyukjinKwon Mar 13, 2017 •

edited

Loading

HyukjinKwon Mar 13, 2017 •

edited

Loading