Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) #22419

Closed
wants to merge 7 commits into from
Closed

[SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) #22419

wants to merge 7 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Sep 14, 2018

What changes were proposed in this pull request?

Add UDF TRUNCATE(number):

> SELECT TRUNCATE(1234567891.1234567891, 4);
 1234567891.1234
> SELECT TRUNCATE(1234567891.1234567891, -4);
 1234560000
> SELECT TRUNCATE(1234567891.1234567891, 0);
 1234567891
> SELECT TRUNCATE(1234567891.1234567891);
 1234567891

It's similar to MySQL TRUNCATE(X, D)

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Sep 14, 2018

Test build #96067 has finished for PR 22419 at commit b5365e2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Truncate(number: Expression, scale: Expression)

@SparkQA
Copy link

SparkQA commented Sep 14, 2018

Test build #96068 has finished for PR 22419 at commit bf7103a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 18, 2018

btw, in the title, not UDF but built-in?

/**
* Returns double type input truncated to scale decimal places.
*/
def trunc(input: Double, scale: Int): Double = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you put this function in a separate file? Any plan to reuse this?

@maropu
Copy link
Member

maropu commented Sep 18, 2018

I just linked to the previous discussion: #18106

@wangyum wangyum changed the title [SPARK-23906][SQL] Add UDF TRUNCATE(number) [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number) Sep 18, 2018
override def right: Expression = scale

override def inputTypes: Seq[AbstractDataType] =
Seq(TypeCollection(DoubleType, DecimalType), IntegerType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to support FloatType?


checkEvaluation(Truncate(Literal.create(1D, DoubleType),
NonFoldableLiteral.create(null, IntegerType)),
null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only NonFoldableLiteral? What if Literal.create(null, IntegerType) for scale?

null)
checkEvaluation(Truncate(Literal.create(null, DoubleType),
NonFoldableLiteral.create(null, IntegerType)),
null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for DecimalType, and FloatType if we need to support?

number.dataType match {
case DoubleType => MathUtils.trunc(input1.asInstanceOf[Double], truncScale)
case DecimalType.Fixed(_, _) =>
MathUtils.trunc(input1.asInstanceOf[Decimal].toJavaBigDecimal, truncScale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to return Decimal instead of java.math.BigDecimal?

select truncate(1234567891.1234567891, -4), truncate(1234567891.1234567891, 0), truncate(1234567891.1234567891, 4);
select truncate(cast(1234567891.1234567891 as decimal), -4), truncate(cast(1234567891.1234567891 as decimal), 0), truncate(cast(1234567891.1234567891 as decimal), 4);
select truncate(cast(1234567891.1234567891 as long), -4), truncate(cast(1234567891.1234567891 as long), 0), truncate(cast(1234567891.1234567891 as long), 4);
select truncate(cast(1234567891.1234567891 as long), 9.03)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test omitting scale?

*/
def truncate(number: Column, scale: Int): Column = withExpr {
Truncate(number.expr, Literal(scale))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need def truncate(number: Column) to support omitting scale?

override def checkInputDataTypes(): TypeCheckResult = {
super.checkInputDataTypes() match {
case TypeCheckSuccess =>
if (scale.foldable) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same to RoundBase. only support foldable:

@SparkQA
Copy link

SparkQA commented Sep 19, 2018

Test build #96242 has finished for PR 22419 at commit c715694.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Sep 21, 2018

On second thoughts, I'm wondering whether we can reuse RoundBase?
I mean:

case class Truncate(child: Expression, scale: Expression)
  extends RoundBase(child, scale, BigDecimal.RoundingMode.DOWN, "ROUND_DOWN")
    with Serializable with ImplicitCastInputTypes {
  def this(child: Expression) = this(child, Literal(0))
}

If we want to round negative values towards negative infinity instead of towards zero, we should use RoundingMode.FLOOR instead of DOWN, thought.

Btw, could you add test cases for negative value child as well?

# Conflicts:
#	sql/core/src/test/resources/sql-tests/results/operators.sql.out
@wangyum
Copy link
Member Author

wangyum commented Sep 21, 2018

@ueshin Thanks a lot!

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96448 has finished for PR 22419 at commit 479b31f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Sep 21, 2018

retest this please

@@ -413,6 +413,7 @@ object Decimal {
val ROUND_HALF_EVEN = BigDecimal.RoundingMode.HALF_EVEN
val ROUND_CEILING = BigDecimal.RoundingMode.CEILING
val ROUND_FLOOR = BigDecimal.RoundingMode.FLOOR
val ROUND_DOWN = BigDecimal.RoundingMode.DOWN
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this change, otherwise:

Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 138: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, Column 138: A method named "ROUND_DOWN" is not declared in any enclosing class nor any supertype, nor through a static import

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to modify Decimal.changePrecision() as well? Could you add it to DecimalSuite.scala#L207 to check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it and seem do not need change longVal. This test case can passed.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

Btw, could you add test cases for negative value child as well?

@@ -413,6 +413,7 @@ object Decimal {
val ROUND_HALF_EVEN = BigDecimal.RoundingMode.HALF_EVEN
val ROUND_CEILING = BigDecimal.RoundingMode.CEILING
val ROUND_FLOOR = BigDecimal.RoundingMode.FLOOR
val ROUND_DOWN = BigDecimal.RoundingMode.DOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to modify Decimal.changePrecision() as well? Could you add it to DecimalSuite.scala#L207 to check?

@SparkQA
Copy link

SparkQA commented Sep 22, 2018

Test build #96460 has finished for PR 22419 at commit 479b31f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 22, 2018

Test build #96466 has finished for PR 22419 at commit b7e3460.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Sep 22, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 22, 2018

Test build #96469 has finished for PR 22419 at commit b7e3460.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Returns the value of the column `e` truncated to 0 places.
*
* @group math_funcs
* @since 2.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@since 2.5.0 now.

* Scale can be negative to truncate (make zero) scale digits left of the decimal point.
*
* @group math_funcs
* @since 2.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@maropu
Copy link
Member

maropu commented Sep 24, 2018

#22419 (comment)
This approach looks good to me cuz it makes the implementation simpler. But, there is one thing I worry about; truncating is a kind of rounding (Is it ok to extend RoundBase for Truncate)? This might be just a naming issue of RoundBase though. cc: @gatorsmile

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96587 has finished for PR 22419 at commit ae7eb73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Sep 26, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96596 has finished for PR 22419 at commit ae7eb73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Sep 26, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96628 has finished for PR 22419 at commit ae7eb73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

1234567891
""")
// scalastyle:on line.size.limit
case class Truncate(child: Expression, scale: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still preferring to extend trunc. Not straightforward to know the difference between truncate and trunc

Copy link
Member

@ueshin ueshin Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wangyum, can you still extend trunc?
If not, what were the major reasons you decided to separate these? Thanks!

Copy link
Member Author

@wangyum wangyum Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know we should trunc StringType to number or trunc to date.
For example:

SELECT trunc('2.5');
SELECT trunc('2009-02-12');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, its ok to handle the string as date. How about only accepting float, double, and decimal for number truncation?

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97730 has finished for PR 22419 at commit ae7eb73.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97709 has finished for PR 22419 at commit ae7eb73.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97727 has finished for PR 22419 at commit ae7eb73.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97797 has finished for PR 22419 at commit ae7eb73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this Nov 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants