-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23327] [SQL] Update the description and tests of three external API or functions #20495
Conversation
Test build #87026 has finished for PR 20495 at commit
|
Test build #87027 has finished for PR 20495 at commit
|
|
||
>>> spark.createDataFrame([('ABC',)], ['a']).select(length('a').alias('length')).collect() | ||
[Row(length=3)] | ||
>>> spark.createDataFrame([('ABC ',)], ['a']).select(length('a').alias('length')).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, not only description
, this PR improves the test coverage and refactos the code, too.
Could you update the PR description/title more correctly?
Otherwise, we had better split this PR according to @rdblue 's recommendations in our dev mailing list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
python/pyspark/sql/functions.py
Outdated
@@ -1705,10 +1705,12 @@ def unhex(col): | |||
@ignore_unicode_prefix | |||
@since(1.5) | |||
def length(col): | |||
"""Calculates the length of a string or binary expression. | |||
"""Computes the character length of a given string or number of bytes or a binary string. | |||
The length of character strings include the trailing spaces. The length of binary strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a side note, why is it calling out trailing spaces? what about leading spaces? isn't all spaces factored into the character length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ask because I want to understand this better to see if we should update R https://github.com/apache/spark/blob/master/R/pkg/R/functions.R#L1029
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is LEN
in MS SQL Server excludes trailing blanks. : )
Yeah. This PR also can updates it in R side too.
python/pyspark/sql/functions.py
Outdated
@@ -1705,10 +1705,12 @@ def unhex(col): | |||
@ignore_unicode_prefix | |||
@since(1.5) | |||
def length(col): | |||
"""Calculates the length of a string or binary expression. | |||
"""Computes the character length of a given string or number of bytes or a binary string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of bytes of a binary value
?
Test build #87081 has finished for PR 20495 at commit
|
retest this please |
LGTM |
Test build #87085 has finished for PR 20495 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, one comment
@@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run | |||
*/ | |||
// scalastyle:off line.size.limit | |||
@ExpressionDescription( | |||
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.", | |||
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are other places use "binary string" and here we have "binary data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be consistent, either character string
vs binary string
, or string data
vs binary data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for string data / binary data
Test build #87091 has finished for PR 20495 at commit
|
Test build #87124 has finished for PR 20495 at commit
|
… API or functions ## What changes were proposed in this pull request? Update the description and tests of three external API or functions `createFunction `, `length` and `repartitionByRange ` ## How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #20495 from gatorsmile/updateFunc. (cherry picked from commit c36fecc) Signed-off-by: gatorsmile <[email protected]>
Thanks! Merged to master/2.3 |
… API or functions ## What changes were proposed in this pull request? Update the description and tests of three external API or functions `createFunction `, `length` and `repartitionByRange ` ## How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes apache#20495 from gatorsmile/updateFunc.
What changes were proposed in this pull request?
Update the description and tests of three external API or functions
createFunction
,length
andrepartitionByRange
How was this patch tested?
N/A