Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20754][SQL] Support TRUNC (number) #18106

Closed
wants to merge 19 commits into from
Closed

[SPARK-20754][SQL] Support TRUNC (number) #18106

wants to merge 19 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented May 25, 2017

What changes were proposed in this pull request?

Move TruncDate() from datetimeExpressions.scala to misc.scala, and add support TRUNC(number), it's similar to Oracle TRUNC(number):

> SELECT TRUNC(1234567891.1234567891, 4);
 1234567891.1234
> SELECT TRUNC(1234567891.1234567891, -4);
 1234560000
> SELECT TRUNC(1234567891.1234567891, 0);
 1234567891
> SELECT TRUNC(1234567891.1234567891);
 1234567891

The MOD and POSITION function alias will be added by follow-up PR.

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77361 has finished for PR 18106 at commit a5ade70.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Trunc(data: Expression, format: Expression = Literal(0))

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77370 has finished for PR 18106 at commit c63856b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #77394 has finished for PR 18106 at commit e7e6e5b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #77399 has finished for PR 18106 at commit 7157820.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ghost pushed a commit to dbtsai/spark that referenced this pull request Jun 14, 2017
## What changes were proposed in this pull request?

apache#18106 Support TRUNC (number),  We should also add function alias for `MOD `and `POSITION`.

`POSITION(substr IN str) `is a synonym for `LOCATE(substr,str)`. same as MySQL: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_position

## How was this patch tested?

unit tests

Author: Yuming Wang <[email protected]>

Closes apache#18206 from wangyum/SPARK-20754-mod&position.
wangyum added 2 commits June 14, 2017 22:54
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala
	sql/core/src/test/resources/sql-tests/inputs/datetime.sql
	sql/core/src/test/resources/sql-tests/inputs/operators.sql
	sql/core/src/test/resources/sql-tests/results/datetime.sql.out
	sql/core/src/test/resources/sql-tests/results/operators.sql.out
@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78057 has finished for PR 18106 at commit 3d92a48.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Trunc(data: Expression, format: Expression)

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78080 has started for PR 18106 at commit b391b6a.

@wangyum
Copy link
Member Author

wangyum commented Jun 15, 2017

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78094 has finished for PR 18106 at commit b391b6a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jun 15, 2017

cc @gatorsmile

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?

apache#18106 Support TRUNC (number),  We should also add function alias for `MOD `and `POSITION`.

`POSITION(substr IN str) `is a synonym for `LOCATE(substr,str)`. same as MySQL: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_position

## How was this patch tested?

unit tests

Author: Yuming Wang <[email protected]>

Closes apache#18206 from wangyum/SPARK-20754-mod&position.
@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78273 has started for PR 18106 at commit 5456e61.

@wangyum
Copy link
Member Author

wangyum commented Jun 20, 2017

Retest this please.

@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78325 has started for PR 18106 at commit 5456e61.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78344 has started for PR 18106 at commit 5456e61.

@shaneknapp
Copy link
Contributor

test this please


override def inputTypes: Seq[AbstractDataType] =
Seq(TypeCollection(DateType, DoubleType, DecimalType),
TypeCollection(StringType, IntegerType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might lead to wrong input types combinations such as (DoubleType, StringType) and (DateType, IntegerType)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to have only trunc for truncating number and datetime. We should prevent wrong input types.

@viirya
Copy link
Member

viirya commented Jun 21, 2017

Is there duplicated codes between trunc(number) and trunc(date)? If no, seems to me we don't necessarily let one expression to have two different features.

@viirya
Copy link
Member

viirya commented Jun 21, 2017

Although then we can't use just one trunc function, it seems ok for me because not all databases use trunc to truncate both number and datetime.

@SparkQA
Copy link

SparkQA commented Jun 27, 2017

Test build #78693 has finished for PR 18106 at commit f8b1f44.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):


@since(1.5)
def trunc(date, format):
def trunc(data, truncParam):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin @holdenk re: changing param name in python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this definitely breaks backward compatibility for keyword-argument usage in Python.

@felixcheung
Copy link
Member

I'll add @gatorsmile since this is SQL.

@wangyum
Copy link
Member Author

wangyum commented Jun 28, 2017

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78786 has finished for PR 18106 at commit f8b1f44.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2017

Test build #80066 has finished for PR 18106 at commit f8b1f44.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

ping @wangyum

@wangyum
Copy link
Member Author

wangyum commented Aug 1, 2017

I'll fix it

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80150 has finished for PR 18106 at commit 3d40c36.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Aug 2, 2017

Jenkins, retest this please

@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):


@since(1.5)
def trunc(date, format):
def trunc(data, truncParam):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, would you mind revert this renaming? This breaks the compatibility if user script calls this by

trunc(..., format= ...)
trunc(date=..., format= ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can work around this with kwargs if it's important to change the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it brings complexity for both args and kwargs e.g., when both set, method signature in doc and etc. I wonder if it is that important.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80152 has finished for PR 18106 at commit 3d40c36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2017

Test build #80159 has finished for PR 18106 at commit 931f07d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

This is close to merge. Could you resolve the conflicts? Then, I will review it. Thanks!

@felixcheung
Copy link
Member

R has trunc in master/2.3 as well

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
#	sql/core/src/test/resources/sql-tests/inputs/operators.sql
#	sql/core/src/test/resources/sql-tests/results/operators.sql.out
# Conflicts:
#	sql/core/src/test/resources/sql-tests/inputs/datetime.sql
#	sql/core/src/test/resources/sql-tests/results/datetime.sql.out
@SparkQA
Copy link

SparkQA commented Oct 28, 2017

Test build #83155 has finished for PR 18106 at commit b59a2df.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 28, 2017

Test build #83158 has finished for PR 18106 at commit 679ff98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ClearCacheCommand() extends RunnableCommand

@dongjoon-hyun
Copy link
Member

@wangyum .
https://issues.apache.org/jira/browse/SPARK-20754 is resolved by you. Could you close this PR?

@wangyum wangyum closed this Sep 14, 2018
@wangyum
Copy link
Member Author

wangyum commented Sep 14, 2018

@dongjoon-hyun Actually TRUNC (number) not resolved. I will fix it soon.
https://issues.apache.org/jira/browse/SPARK-23906

@dongjoon-hyun
Copy link
Member

+100, @wangyum . Thanks. :)

@wangyum wangyum deleted the SPARK-20754-trunc branch October 8, 2019 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants