Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23141][SQL][PYSPARK] Support data type string as a returnType for registerJavaFunction. #20307

Closed
wants to merge 3 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Jan 18, 2018

What changes were proposed in this pull request?

Currently UDFRegistration.registerJavaFunction doesn't support data type string as a returnType whereas UDFRegistration.register, @udf, or @pandas_udf does.
We can support it for UDFRegistration.registerJavaFunction as well.

How was this patch tested?

Added a doctest and existing tests.

@ueshin
Copy link
Member Author

ueshin commented Jan 18, 2018

cc @HyukjinKwon

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, LGTM.

@@ -310,14 +310,22 @@ def registerJavaFunction(self, name, javaClassName, returnType=None):
... "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, seems we need to fix :param returnType: across all other related APIs saying it takes DDL-formatted type string.

@ueshin, mind opening a minor PR for this - udf, pandas_udf, registerJavaFunction and register separately? If you are busy, will do it tonight. Doing this here is fine to me too, up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update them here soon.

"""

jdt = None
if returnType is not None:
if not isinstance(returnType, DataType):
returnType = _parse_datatype_string(returnType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The param doc needs to be modified too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's #20307 (comment) :).

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86317 has finished for PR 20307 at commit 1a2c01d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86322 has finished for PR 20307 at commit d41709f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
can fail on special rows, the workaround is to incorporate the condition into the functions.

:param f: python function if used as a standalone function
:param returnType: a :class:`pyspark.sql.types.DataType` object
:param returnType: the return type of the registered user-defined function. The value can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems typo: the return type of the registered user-defined function. -> the return type of the user-defined function.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'll fix it. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86339 has finished for PR 20307 at commit c731876.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master and branch-2.3.

asfgit pushed a commit that referenced this pull request Jan 18, 2018
…for registerJavaFunction.

## What changes were proposed in this pull request?

Currently `UDFRegistration.registerJavaFunction` doesn't support data type string as a `returnType` whereas `UDFRegistration.register`, `udf`, or `pandas_udf` does.
We can support it for `UDFRegistration.registerJavaFunction` as well.

## How was this patch tested?

Added a doctest and existing tests.

Author: Takuya UESHIN <[email protected]>

Closes #20307 from ueshin/issues/SPARK-23141.

(cherry picked from commit 5063b74)
Signed-off-by: hyukjinkwon <[email protected]>
@asfgit asfgit closed this in 5063b74 Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants