Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23327] [SQL] Update the description and tests of three external API or functions #20495

Closed
wants to merge 6 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Feb 3, 2018

What changes were proposed in this pull request?

Update the description and tests of three external API or functions createFunction , length and repartitionByRange

How was this patch tested?

N/A

@gatorsmile
Copy link
Member Author

cc @srinathshankar @rxin @cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 3, 2018

Test build #87026 has finished for PR 20495 at commit 9ecc809.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 3, 2018

Test build #87027 has finished for PR 20495 at commit c33cc9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


>>> spark.createDataFrame([('ABC',)], ['a']).select(length('a').alias('length')).collect()
[Row(length=3)]
>>> spark.createDataFrame([('ABC ',)], ['a']).select(length('a').alias('length')).collect()
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, not only description, this PR improves the test coverage and refactos the code, too.
Could you update the PR description/title more correctly?
Otherwise, we had better split this PR according to @rdblue 's recommendations in our dev mailing list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gatorsmile gatorsmile changed the title [SPARK-23327] [SQL] Update the description of three external API or functions [SPARK-23327] [SQL] Update the description and tests of three external API or functions Feb 3, 2018
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@@ -1705,10 +1705,12 @@ def unhex(col):
@ignore_unicode_prefix
@since(1.5)
def length(col):
"""Calculates the length of a string or binary expression.
"""Computes the character length of a given string or number of bytes or a binary string.
The length of character strings include the trailing spaces. The length of binary strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a side note, why is it calling out trailing spaces? what about leading spaces? isn't all spaces factored into the character length?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because I want to understand this better to see if we should update R https://github.com/apache/spark/blob/master/R/pkg/R/functions.R#L1029

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is LEN in MS SQL Server excludes trailing blanks. : )

Yeah. This PR also can updates it in R side too.

@@ -1705,10 +1705,12 @@ def unhex(col):
@ignore_unicode_prefix
@since(1.5)
def length(col):
"""Calculates the length of a string or binary expression.
"""Computes the character length of a given string or number of bytes or a binary string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of bytes of a binary value?

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87081 has finished for PR 20495 at commit 9e97db9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87085 has finished for PR 20495 at commit 9e97db9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, one comment

@@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.",
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are other places use "binary string" and here we have "binary data"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent, either character string vs binary string, or string data vs binary data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for string data / binary data

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87091 has finished for PR 20495 at commit 9e97db9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 6, 2018

Test build #87124 has finished for PR 20495 at commit b837053.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Feb 7, 2018
… API or functions

## What changes were proposed in this pull request?
Update the description and tests of three external API or functions `createFunction `, `length` and `repartitionByRange `

## How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes #20495 from gatorsmile/updateFunc.

(cherry picked from commit c36fecc)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in c36fecc Feb 7, 2018
@gatorsmile
Copy link
Member Author

Thanks! Merged to master/2.3

robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 12, 2018
… API or functions

## What changes were proposed in this pull request?
Update the description and tests of three external API or functions `createFunction `, `length` and `repartitionByRange `

## How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes apache#20495 from gatorsmile/updateFunc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants