Supports casting between ANSI interval types and integral types #5353

res-life · 2022-04-28T10:31:29Z

Contributes #5113
Closes #5111
Supports:

Cast year-month interval type to (byte | short | int | long) types
Cast (byte | short | int | long) types to year-month interval type
Cast day-time interval type to (byte | short | int | long) types
Cast (byte | short | int | long) types to day-time interval type

Signed-off-by: Chong Gao [email protected]

Signed-off-by: Chong Gao <[email protected]>

res-life · 2022-04-28T10:35:47Z

build

res-life · 2022-04-28T10:38:48Z

Depends on this: #5352.
Cherry-picked the #5352 PR

Spark change is:
apache/spark@9553ed7072

Cpu throws a exception for the following scenario because startField != endField

spark.sql("select cast(1 as interval year to month)").show()
===== CPU output: =====
pyspark.sql.utils.AnalysisException: cannot resolve 'CAST(1 AS INTERVAL YEAR TO MONTH)' due to data type mismatch: cannot cast int to interval year to month; line 1 pos 7;
'Project [unresolvedalias(cast(1 as interval year to month), None)]
+- OneRowRelation

// cast(1 as interval year) is OK.
// cast(1 as interval year to month) is NOK.

// this is the Spark code  check `startField` == `endField`
case (_: IntegralType, DayTimeIntervalType(s, e)) if s == e => true

Gpu do not need to do this. CPU already checked this in the analysis phase.

res-life · 2022-04-28T10:54:04Z

build

revans2 · 2022-04-28T13:22:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

@@ -561,6 +561,39 @@ object GpuCast extends Arm {
        GpuIntervalUtils.castStringToDayTimeIntervalWithThrow(
          input.asInstanceOf[ColumnVector], dayTime)

+      // cast(`day time interval` as integral)
+      case (dt: DataType, _: LongType) if GpuTypeShims.isSupportedDayTimeType(dt) =>
+        GpuIntervalUtils.dayTimeIntervalToLong(input.asInstanceOf[ColumnVector], dt)


But input is not guaranteed to be a ColumnVector for nested types. Like casting an array of DayTimeInterval to an Array of longs. These need to be ColumnView, but you should be able to treat it exactly the same as a ColumnVector.

sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/GpuTypeShims.scala

revans2 · 2022-04-28T13:28:39Z

integration_tests/src/main/python/cast_test.py

+@pytest.mark.parametrize('integral_type', integral_types)
+def test_cast_day_time_interval_to_integral_no_overflow(integral_type):
+    assert_gpu_and_cpu_are_equal_collect(
+        lambda spark: unary_op_df(spark, DayTimeIntervalGen(start_field='day', end_field='day', min_value=timedelta(seconds=-128 * 86400), max_value=timedelta(seconds=127 * 86400)))


nit: If we want to have these all as separate queries, then lets have a separate test for each one so we can parallelize the execution and not have one test failure keep another test from running. If we are okay with them being a single test where one failure can mask another, like it is here, then can we combine them into a single query so it can run faster?

Done, combined into a single query to run faster.

integration_tests/src/main/python/cast_test.py

res-life · 2022-05-10T13:26:02Z

build

res-life · 2022-05-11T10:33:50Z

No need to update AnsiCast I think, please double-check. @revans2
Spark does not allow inserting the interval column into the integral column.

spark.sql("create table t2(c1 interval year, c2 long) using parquet")
spark.sql("insert into t2 values (interval 1 year, 9223372036854775807)")
spark.sql("insert into t2 select c2, c1 from t2")
pyspark.sql.utils.AnalysisException: Cannot write incompatible data to table '`default`.`t2`':
- Cannot safely cast 'c1': bigint to interval year
- Cannot safely cast 'c2': interval year to bigint

res-life · 2022-05-17T11:25:16Z

@revans2 Help review, thanks.

Chong Gao added 2 commits April 28, 2022 17:10

Supports casting between ANSI interval types and integral types

447223e

Signed-off-by: Chong Gao <[email protected]>

Fix spark330 build due to mapKeyNotExistError changed

98a9e29

Signed-off-by: Chong Gao <[email protected]>

res-life force-pushed the cast-between-interval-and-integer branch from f570c04 to 98a9e29 Compare April 28, 2022 10:37

res-life requested review from firestarman, jlowe, revans2 and wbo4958 April 28, 2022 12:57

revans2 reviewed Apr 28, 2022

View reviewed changes

sameerz added the feature request New feature or request label Apr 28, 2022

Chong Gao added 3 commits May 10, 2022 20:25

Add test cases for nested types; Add the shim layer for hasSideEffects

cbb179e

Merge branch 'branch-22.06' into cast-between-interval-and-integer

b7dcedc

Fix

e4f88b4

revans2 approved these changes May 17, 2022

View reviewed changes

res-life merged commit 78750ef into NVIDIA:branch-22.06 May 18, 2022

res-life deleted the cast-between-interval-and-integer branch May 18, 2022 01:28

sameerz added this to the May 2 - May 20 milestone May 19, 2022

NVnavkumar mentioned this pull request Jun 23, 2022

[Audit][FEA] ANSI mode: CAST between AnsiIntervalType and IntegralType in Spark 3.4+ #5902

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports casting between ANSI interval types and integral types #5353

Supports casting between ANSI interval types and integral types #5353

res-life commented Apr 28, 2022 •

edited

Loading

res-life commented Apr 28, 2022

res-life commented Apr 28, 2022 •

edited

Loading

res-life commented Apr 28, 2022

revans2 Apr 28, 2022

res-life May 10, 2022

revans2 Apr 28, 2022

res-life May 10, 2022

res-life commented May 10, 2022

res-life commented May 11, 2022 •

edited

Loading

res-life commented May 17, 2022

Supports casting between ANSI interval types and integral types #5353

Supports casting between ANSI interval types and integral types #5353

Conversation

res-life commented Apr 28, 2022 • edited Loading

res-life commented Apr 28, 2022

res-life commented Apr 28, 2022 • edited Loading

res-life commented Apr 28, 2022

revans2 Apr 28, 2022

Choose a reason for hiding this comment

res-life May 10, 2022

Choose a reason for hiding this comment

revans2 Apr 28, 2022

Choose a reason for hiding this comment

res-life May 10, 2022

Choose a reason for hiding this comment

res-life commented May 10, 2022

res-life commented May 11, 2022 • edited Loading

res-life commented May 17, 2022

res-life commented Apr 28, 2022 •

edited

Loading

res-life commented Apr 28, 2022 •

edited

Loading

res-life commented May 11, 2022 •

edited

Loading