[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18324

dataknocker · 2017-06-16T06:56:01Z

What changes were proposed in this pull request?

check if traceback.format_exc() is unicode then encode utf8.

How was this patch tested?

We can run in pyspark:

def f():
    raise Exception("中")
spark = SparkSession.builder.master('local').getOrCreate()
spark.sparkContext.parallelize([1]).map(lambda x: f()).count()

Before fixed this bug, this program will be blocked.
After fixed this bug, this program will throw exception expected.

And I have added the test to pyspark.tests.

… UnicodeDecodeError

dataknocker · 2017-06-16T06:58:30Z

@HyukjinKwon pr #18262 have some problems. So I add this new pr.
I add my test for this change.

HyukjinKwon · 2017-06-17T09:39:26Z

python/pyspark/tests.py

+
+                self.sc.parallelize([1]).map(lambda x: f()).count()
+            except Exception:
+                pass


I would check this with assertRaises and the error message too.

HyukjinKwon · 2017-06-17T09:40:01Z

python/pyspark/tests.py

+            except Exception:
+                pass
+
+        t = threading.Thread(target=run)


Why should we run this in a thread?

@HyukjinKwon This test mainly check whether it is blocked. So I use thread join, if it is blocked before fixing the bug the program will wait 10s and exit instead blocking other tests.

ueshin · 2017-06-19T04:43:59Z

python/pyspark/worker.py

@@ -177,8 +180,11 @@ def process():
            process()
    except Exception:
        try:
+            exc_info = traceback.format_exc()
+            if isinstance(exc_info, unicode):
+                exc_info = exc_info.encode('utf-8')


I guess we need to follow #17267 each other to fix correctly.

Yes, we should take a closer look. BTW, just note that, they are a bit different in that sense this one needs to return bytes in Python 3 / string (bytes) in Python 2 whereas #17267 needs to produce string (unicode) in Python 3 / string (bytes) in Python 2.

@HyukjinKwon @ueshin what need I do next?

Let's wait for the resolution of #17267 if you don't mind. I think we should be careful of this change.

cc @zero323 and @davies here too (for the approach here). This instance is a bit different.

IMHO, we have a strong assumption that the string is in UTF-8 and this PR now allows writing out the bytes as are. This is a hole which I can't come up with a clean solution to handle because this means all other encoded strings can be written up to my knowledge. Also, we have this assumption in JVM side that this is in UTF-8.

However, in Java, it mangles if it is not in UTF-8 rather than throwing an exception up to my knowledge. I guess this is still better than hanging there.

Would you have a better idea to deal with this maybe or is there anything I missed here?

@HyukjinKwon this pr can only be hanging?

cc @jiangxb1987

advancedxy · 2019-09-06T02:29:33Z

@dataknocker is there any updates on this pr?

AmplabJenkins · 2019-09-16T18:25:36Z

Can one of the admins verify this patch?

HyukjinKwon · 2019-09-17T00:23:51Z

@advancedxy, would you like to take this over?

advancedxy · 2019-09-17T02:16:03Z

@advancedxy, would you like to take this over?

All right, let me take this over. And hope we can get this into Spark3 and backports to 2.4

HyukjinKwon · 2019-09-19T08:20:50Z

Closing this in favour of #25847

…from python execution in Python 2 ### What changes were proposed in this pull request? This PR allows non-ascii string as an exception message in Python 2 by explicitly en/decoding in case of `str` in Python 2. ### Why are the changes needed? Previously PySpark will hang when the `UnicodeDecodeError` occurs and the real exception cannot be passed to the JVM side. See the reproducer as below: ```python def f(): raise Exception("中") spark = SparkSession.builder.master('local').getOrCreate() spark.sparkContext.parallelize([1]).map(lambda x: f()).count() ``` ### Does this PR introduce any user-facing change? User may not observe hanging for the similar cases. ### How was this patch tested? Added a new test and manually checking. This pr is based on #18324, credits should also go to dataknocker. To make lint-python happy for python3, it also includes a followup fix for #25814 Closes #25847 from advancedxy/python_exception_19926_and_21045. Authored-by: Xianjin YE <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

Fixed executor blocked because traceback.format_exc encode utf8 throw…

6cbc3f7

… UnicodeDecodeError

dataknocker mentioned this pull request Jun 16, 2017

[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18262

Closed

HyukjinKwon reviewed Jun 17, 2017

View reviewed changes

ueshin reviewed Jun 19, 2017

View reviewed changes

jiangxb1987 mentioned this pull request Jun 26, 2018

[SPARK-19926][PYSPARK] Make pyspark exception more user-friendly #17267

Closed

dongjoon-hyun added the PYSPARK label Jun 14, 2019

HyukjinKwon closed this Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18324

[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18324

dataknocker commented Jun 16, 2017

dataknocker commented Jun 16, 2017 •

edited

Loading

HyukjinKwon Jun 17, 2017

HyukjinKwon Jun 17, 2017

dataknocker Jun 19, 2017 •

edited

Loading

ueshin Jun 19, 2017

HyukjinKwon Jun 19, 2017

dataknocker Jun 20, 2017

HyukjinKwon Jun 21, 2017

HyukjinKwon Jun 21, 2017

dataknocker Dec 1, 2017

dataknocker Jun 22, 2018

advancedxy commented Sep 6, 2019

AmplabJenkins commented Sep 16, 2019

HyukjinKwon commented Sep 17, 2019

advancedxy commented Sep 17, 2019

HyukjinKwon commented Sep 19, 2019

[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18324

[SPARK-21045][PYSPARK]Fixed executor blocked because traceback.format_exc throw UnicodeDecodeError #18324

Conversation

dataknocker commented Jun 16, 2017

What changes were proposed in this pull request?

How was this patch tested?

dataknocker commented Jun 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dataknocker Jun 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Sep 6, 2019

AmplabJenkins commented Sep 16, 2019

HyukjinKwon commented Sep 17, 2019

advancedxy commented Sep 17, 2019

HyukjinKwon commented Sep 19, 2019

dataknocker commented Jun 16, 2017 •

edited

Loading

dataknocker Jun 19, 2017 •

edited

Loading