-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20754][SQL] Support TRUNC (number) #18106
Conversation
Test build #77361 has finished for PR 18106 at commit
|
Test build #77370 has finished for PR 18106 at commit
|
Test build #77394 has finished for PR 18106 at commit
|
Test build #77399 has finished for PR 18106 at commit
|
## What changes were proposed in this pull request? apache#18106 Support TRUNC (number), We should also add function alias for `MOD `and `POSITION`. `POSITION(substr IN str) `is a synonym for `LOCATE(substr,str)`. same as MySQL: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_position ## How was this patch tested? unit tests Author: Yuming Wang <[email protected]> Closes apache#18206 from wangyum/SPARK-20754-mod&position.
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala sql/core/src/test/resources/sql-tests/inputs/datetime.sql sql/core/src/test/resources/sql-tests/inputs/operators.sql sql/core/src/test/resources/sql-tests/results/datetime.sql.out sql/core/src/test/resources/sql-tests/results/operators.sql.out
Test build #78057 has finished for PR 18106 at commit
|
Test build #78080 has started for PR 18106 at commit |
Jenkins, retest this please |
Test build #78094 has finished for PR 18106 at commit
|
cc @gatorsmile |
## What changes were proposed in this pull request? apache#18106 Support TRUNC (number), We should also add function alias for `MOD `and `POSITION`. `POSITION(substr IN str) `is a synonym for `LOCATE(substr,str)`. same as MySQL: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_position ## How was this patch tested? unit tests Author: Yuming Wang <[email protected]> Closes apache#18206 from wangyum/SPARK-20754-mod&position.
Test build #78273 has started for PR 18106 at commit |
Retest this please. |
Test build #78325 has started for PR 18106 at commit |
test this please |
Test build #78344 has started for PR 18106 at commit |
test this please |
|
||
override def inputTypes: Seq[AbstractDataType] = | ||
Seq(TypeCollection(DateType, DoubleType, DecimalType), | ||
TypeCollection(StringType, IntegerType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might lead to wrong input types combinations such as (DoubleType, StringType) and (DateType, IntegerType)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to have only trunc
for truncating number and datetime. We should prevent wrong input types.
Is there duplicated codes between trunc(number) and trunc(date)? If no, seems to me we don't necessarily let one expression to have two different features. |
Although then we can't use just one |
Test build #78693 has finished for PR 18106 at commit
|
python/pyspark/sql/functions.py
Outdated
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): | |||
|
|||
|
|||
@since(1.5) | |||
def trunc(date, format): | |||
def trunc(data, truncParam): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this definitely breaks backward compatibility for keyword-argument usage in Python.
I'll add @gatorsmile since this is SQL. |
Jenkins, retest this please |
Test build #78786 has finished for PR 18106 at commit
|
retest this please |
Test build #80066 has finished for PR 18106 at commit
|
ping @wangyum |
I'll fix it |
Test build #80150 has finished for PR 18106 at commit
|
Jenkins, retest this please |
python/pyspark/sql/functions.py
Outdated
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): | |||
|
|||
|
|||
@since(1.5) | |||
def trunc(date, format): | |||
def trunc(data, truncParam): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum, would you mind revert this renaming? This breaks the compatibility if user script calls this by
trunc(..., format= ...)
trunc(date=..., format= ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can work around this with kwargs if it's important to change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but it brings complexity for both args and kwargs e.g., when both set, method signature in doc and etc. I wonder if it is that important.
Test build #80152 has finished for PR 18106 at commit
|
Test build #80159 has finished for PR 18106 at commit
|
This is close to merge. Could you resolve the conflicts? Then, I will review it. Thanks! |
R has |
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala # sql/core/src/test/resources/sql-tests/inputs/operators.sql # sql/core/src/test/resources/sql-tests/results/operators.sql.out
# Conflicts: # sql/core/src/test/resources/sql-tests/inputs/datetime.sql # sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Test build #83155 has finished for PR 18106 at commit
|
Test build #83158 has finished for PR 18106 at commit
|
@wangyum . |
@dongjoon-hyun Actually |
+100, @wangyum . Thanks. :) |
What changes were proposed in this pull request?
Move
TruncDate()
fromdatetimeExpressions.scala
tomisc.scala
, and add supportTRUNC(number)
, it's similar to Oracle TRUNC(number):The
MOD
andPOSITION
function alias will be added by follow-up PR.How was this patch tested?
unit tests