[SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) #22419

wangyum · 2018-09-14T08:30:15Z

What changes were proposed in this pull request?

Add UDF TRUNCATE(number):

> SELECT TRUNCATE(1234567891.1234567891, 4);
 1234567891.1234
> SELECT TRUNCATE(1234567891.1234567891, -4);
 1234560000
> SELECT TRUNCATE(1234567891.1234567891, 0);
 1234567891
> SELECT TRUNCATE(1234567891.1234567891);
 1234567891

It's similar to MySQL TRUNCATE(X, D)

How was this patch tested?

unit tests

SparkQA · 2018-09-14T08:38:02Z

Test build #96067 has finished for PR 22419 at commit b5365e2.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class Truncate(number: Expression, scale: Expression)

SparkQA · 2018-09-14T12:59:42Z

Test build #96068 has finished for PR 22419 at commit bf7103a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-09-18T11:58:55Z

btw, in the title, not UDF but built-in?

maropu · 2018-09-18T12:21:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala

+  /**
+   * Returns double type input truncated to scale decimal places.
+   */
+  def trunc(input: Double, scale: Int): Double = {


Why do you put this function in a separate file? Any plan to reuse this?

maropu · 2018-09-18T12:23:40Z

I just linked to the previous discussion: #18106

ueshin · 2018-09-18T17:28:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+  override def right: Expression = scale
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(TypeCollection(DoubleType, DecimalType), IntegerType)


Don't we need to support FloatType?

ueshin · 2018-09-18T17:32:15Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala

+
+    checkEvaluation(Truncate(Literal.create(1D, DoubleType),
+      NonFoldableLiteral.create(null, IntegerType)),
+      null)


Why only NonFoldableLiteral? What if Literal.create(null, IntegerType) for scale?

ueshin · 2018-09-18T17:32:39Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala

+      null)
+    checkEvaluation(Truncate(Literal.create(null, DoubleType),
+      NonFoldableLiteral.create(null, IntegerType)),
+      null)


Could you add tests for DecimalType, and FloatType if we need to support?

ueshin · 2018-09-18T17:48:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+    number.dataType match {
+      case DoubleType => MathUtils.trunc(input1.asInstanceOf[Double], truncScale)
+      case DecimalType.Fixed(_, _) =>
+        MathUtils.trunc(input1.asInstanceOf[Decimal].toJavaBigDecimal, truncScale)


I guess we have to return Decimal instead of java.math.BigDecimal?

ueshin · 2018-09-18T17:57:26Z

sql/core/src/test/resources/sql-tests/inputs/operators.sql

+select truncate(1234567891.1234567891, -4), truncate(1234567891.1234567891, 0), truncate(1234567891.1234567891, 4);
+select truncate(cast(1234567891.1234567891 as decimal), -4), truncate(cast(1234567891.1234567891 as decimal), 0), truncate(cast(1234567891.1234567891 as decimal), 4);
+select truncate(cast(1234567891.1234567891 as long), -4), truncate(cast(1234567891.1234567891 as long), 0), truncate(cast(1234567891.1234567891 as long), 4);
+select truncate(cast(1234567891.1234567891 as long), 9.03)


Could you add a test omitting scale?

ueshin · 2018-09-18T18:00:34Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

+   */
+  def truncate(number: Column, scale: Int): Column = withExpr {
+    Truncate(number.expr, Literal(scale))
+  }


We need def truncate(number: Column) to support omitting scale?

wangyum · 2018-09-19T10:23:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+  override def checkInputDataTypes(): TypeCheckResult = {
+    super.checkInputDataTypes() match {
+      case TypeCheckSuccess =>
+        if (scale.foldable) {


Same to RoundBase. only support foldable:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

Line 1076 in c715694

if (scale.foldable) {

SparkQA · 2018-09-19T14:20:45Z

Test build #96242 has finished for PR 22419 at commit c715694.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2018-09-21T08:22:49Z

On second thoughts, I'm wondering whether we can reuse RoundBase?
I mean:

case class Truncate(child: Expression, scale: Expression)
  extends RoundBase(child, scale, BigDecimal.RoundingMode.DOWN, "ROUND_DOWN")
    with Serializable with ImplicitCastInputTypes {
  def this(child: Expression) = this(child, Literal(0))
}

If we want to round negative values towards negative infinity instead of towards zero, we should use RoundingMode.FLOOR instead of DOWN, thought.

Btw, could you add test cases for negative value child as well?

# Conflicts: # sql/core/src/test/resources/sql-tests/results/operators.sql.out

wangyum · 2018-09-21T17:11:28Z

@ueshin Thanks a lot!

SparkQA · 2018-09-21T20:32:07Z

Test build #96448 has finished for PR 22419 at commit 479b31f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-09-21T23:17:35Z

retest this please

wangyum · 2018-09-21T23:47:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

@@ -413,6 +413,7 @@ object Decimal {
  val ROUND_HALF_EVEN = BigDecimal.RoundingMode.HALF_EVEN
  val ROUND_CEILING = BigDecimal.RoundingMode.CEILING
  val ROUND_FLOOR = BigDecimal.RoundingMode.FLOOR
+  val ROUND_DOWN = BigDecimal.RoundingMode.DOWN


Need this change, otherwise:

Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 138: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 138: A method named "ROUND_DOWN" is not declared in any enclosing class nor any supertype, nor through a static import

I guess we need to modify Decimal.changePrecision() as well? Could you add it to DecimalSuite.scala#L207 to check?

I added it and seem do not need change longVal. This test case can passed.

ueshin

What about this?

Btw, could you add test cases for negative value child as well?

ueshin · 2018-09-22T00:41:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

@@ -413,6 +413,7 @@ object Decimal {
  val ROUND_HALF_EVEN = BigDecimal.RoundingMode.HALF_EVEN
  val ROUND_CEILING = BigDecimal.RoundingMode.CEILING
  val ROUND_FLOOR = BigDecimal.RoundingMode.FLOOR
+  val ROUND_DOWN = BigDecimal.RoundingMode.DOWN


I guess we need to modify Decimal.changePrecision() as well? Could you add it to DecimalSuite.scala#L207 to check?

SparkQA · 2018-09-22T03:19:00Z

Test build #96460 has finished for PR 22419 at commit 479b31f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-22T07:05:01Z

Test build #96466 has finished for PR 22419 at commit b7e3460.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-09-22T07:36:26Z

retest this please

SparkQA · 2018-09-22T11:42:14Z

Test build #96469 has finished for PR 22419 at commit b7e3460.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-09-24T14:30:46Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

+   * Returns the value of the column `e` truncated to 0 places.
+   *
+   * @group math_funcs
+   * @since 2.4.0


@since 2.5.0 now.

maropu · 2018-09-24T14:30:52Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

+   * Scale can be negative to truncate (make zero) scale digits left of the decimal point.
+   *
+   * @group math_funcs
+   * @since 2.4.0


maropu · 2018-09-24T14:37:28Z

#22419 (comment)
This approach looks good to me cuz it makes the implementation simpler. But, there is one thing I worry about; truncating is a kind of rounding (Is it ok to extend RoundBase for Truncate)? This might be just a naming issue of RoundBase though. cc: @gatorsmile

SparkQA · 2018-09-26T03:52:47Z

Test build #96587 has finished for PR 22419 at commit ae7eb73.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-09-26T03:55:00Z

retest this please

SparkQA · 2018-09-26T05:50:11Z

Test build #96596 has finished for PR 22419 at commit ae7eb73.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-09-26T11:09:43Z

retest this please

SparkQA · 2018-09-26T15:20:58Z

Test build #96628 has finished for PR 22419 at commit ae7eb73.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-09-27T15:36:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+       1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Truncate(child: Expression, scale: Expression)


I am still preferring to extend trunc. Not straightforward to know the difference between truncate and trunc

Hi @wangyum, can you still extend trunc?
If not, what were the major reasons you decided to separate these? Thanks!

We don't know we should trunc StringType to number or trunc to date.
For example:

SELECT trunc('2.5'); SELECT trunc('2009-02-12');

In that case, its ok to handle the string as date. How about only accepting float, double, and decimal for number truncation?

SparkQA · 2018-10-22T07:03:48Z

Test build #97730 has finished for PR 22419 at commit ae7eb73.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T07:05:03Z

Test build #97709 has finished for PR 22419 at commit ae7eb73.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T07:05:05Z

Test build #97727 has finished for PR 22419 at commit ae7eb73.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T14:25:29Z

Test build #97797 has finished for PR 22419 at commit ae7eb73.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Support truncate number

b5365e2

Add MathUtils

bf7103a

maropu reviewed Sep 18, 2018

View reviewed changes

wangyum changed the title ~~[SPARK-23906][SQL] Add UDF TRUNCATE(number)~~ [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) Sep 18, 2018

ueshin reviewed Sep 18, 2018

View reviewed changes

Add float type.

c715694

wangyum commented Sep 19, 2018

View reviewed changes

wangyum added 2 commits September 22, 2018 00:56

Merge remote-tracking branch 'upstream/master' into SPARK-23906

87cea0b

# Conflicts: # sql/core/src/test/resources/sql-tests/results/operators.sql.out

Implements by BigDecimal.RoundingMode.DOWN

479b31f

wangyum commented Sep 21, 2018

View reviewed changes

ueshin reviewed Sep 22, 2018

View reviewed changes

Add ROUND_DOWN to DecimalSuite

b7e3460

maropu reviewed Sep 24, 2018

View reviewed changes

@SInCE 2.4.0 -> @SInCE 2.5.0

ae7eb73

gatorsmile reviewed Sep 27, 2018

View reviewed changes

wangyum closed this Nov 12, 2018

[SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) #22419

[SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) #22419

Conversation

wangyum commented Sep 14, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Sep 14, 2018

SparkQA commented Sep 14, 2018

maropu commented Sep 18, 2018

Choose a reason for hiding this comment

maropu commented Sep 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 19, 2018

ueshin commented Sep 21, 2018

wangyum commented Sep 21, 2018

SparkQA commented Sep 21, 2018

wangyum commented Sep 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 22, 2018

SparkQA commented Sep 22, 2018

wangyum commented Sep 22, 2018

SparkQA commented Sep 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maropu commented Sep 24, 2018 • edited Loading

SparkQA commented Sep 26, 2018

wangyum commented Sep 26, 2018

SparkQA commented Sep 26, 2018

wangyum commented Sep 26, 2018

SparkQA commented Sep 26, 2018

Choose a reason for hiding this comment

ueshin Oct 4, 2018 • edited Loading

Choose a reason for hiding this comment

wangyum Oct 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

maropu commented Sep 24, 2018 •

edited

Loading

ueshin Oct 4, 2018 •

edited

Loading

wangyum Oct 4, 2018 •

edited

Loading