[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error #47079

HyukjinKwon · 2024-06-25T03:29:23Z

What changes were proposed in this pull request?

This PR fixes the error messages and classes when Python UDFs are used in higher order functions.

Why are the changes needed?

To show the proper user-facing exceptions with error classes.

Does this PR introduce any user-facing change?

Yes, previously it threw internal error such as:

from pyspark.sql.functions import transform, udf, col, array
spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect()

Before:

py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)

After:

pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D;
Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))#4]
+- Range (0, 1, step=1, splits=Some(16))

How was this patch tested?

Unittest was added

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2024-06-25T03:30:03Z

cc @cloud-fan and @ueshin

cloud-fan · 2024-06-25T06:43:48Z

This is more like an unsupported feature, that we don't support Python UDF as the lambda function within a higher order function. How about the UNSUPPORTED_FEATURE error?

BTW, will we ever support it in the future?

HyukjinKwon · 2024-06-25T06:51:38Z

I don't think we will ever support this :-). Let me fix up the error

common/utils/src/main/resources/error/error-conditions.json

yaooqinn · 2024-06-26T05:43:17Z

Merged to master, thank you @HyukjinKwon @cloud-fan

… throw internal error ### What changes were proposed in this pull request? This PR fixes the error messages and classes when Python UDFs are used in higher order functions. ### Why are the changes needed? To show the proper user-facing exceptions with error classes. ### Does this PR introduce _any_ user-facing change? Yes, previously it threw internal error such as: ```python from pyspark.sql.functions import transform, udf, col, array spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect() ``` Before: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) ``` After: ``` pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D; Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))#4] +- Range (0, 1, step=1, splits=Some(16)) ``` ### How was this patch tested? Unittest was added ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47079 from HyukjinKwon/SPARK-48706. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Kent Yao <[email protected]>

github-actions bot added SQL PYTHON labels Jun 25, 2024

HyukjinKwon force-pushed the SPARK-48706 branch from 28f4da4 to 8cf8fb1 Compare June 25, 2024 05:18

Python UDF in higher order functions should not throw internal error

c425405

HyukjinKwon force-pushed the SPARK-48706 branch from 8cf8fb1 to c425405 Compare June 25, 2024 06:24

Address a comment

c53aa7e

cloud-fan reviewed Jun 25, 2024

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

fixup

3b60e26

HyukjinKwon force-pushed the SPARK-48706 branch from 982a920 to 3b60e26 Compare June 25, 2024 08:42

fixup

e9063c8

cloud-fan approved these changes Jun 26, 2024

View reviewed changes

yaooqinn approved these changes Jun 26, 2024

View reviewed changes

yaooqinn closed this in 07cbba6 Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error #47079

[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error #47079

HyukjinKwon commented Jun 25, 2024 •

edited

Loading

HyukjinKwon commented Jun 25, 2024

cloud-fan commented Jun 25, 2024

HyukjinKwon commented Jun 25, 2024

yaooqinn commented Jun 26, 2024

[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error #47079

[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error #47079

Conversation

HyukjinKwon commented Jun 25, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

HyukjinKwon commented Jun 25, 2024

cloud-fan commented Jun 25, 2024

HyukjinKwon commented Jun 25, 2024

yaooqinn commented Jun 26, 2024

HyukjinKwon commented Jun 25, 2024 •

edited

Loading