[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 #9722

abellina · 2023-11-15T14:14:09Z

FAILED^[[0m ../../../../integration_tests/src/main/python/arithmetic_ops_test.py::test_floor_scale_zero[Double][DATAGEN_SEED=1700009407, INJECT_OOM]

Repro:

SPARK_RAPIDS_TEST_DATAGEN_SEED=1700009407 ./run_pyspark_from_build.sh -k test_floor_scale_zero

654798 [2023-11-15T02:31:18.513Z] ----------------------------- Captured stdout call -----------------------------
654799 [2023-11-15T02:31:18.513Z] ### CPU RUN ###
654800 [2023-11-15T02:31:18.513Z] ### GPU RUN ###
654801 [2023-11-15T02:31:18.513Z] ### COLLECT: GPU TOOK 0.10657858848571777 CPU TOOK 0.14875459671020508 ###
654802 [2023-11-15T02:31:18.513Z] --- CPU OUTPUT
654803 [2023-11-15T02:31:18.513Z] +++ GPU OUTPUT
654804 [2023-11-15T02:31:18.513Z] @@ -1944,7 +1944,7 @@
654805 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=Decimal('0'))
654806 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=None)
654807 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=Decimal('0'))
654808 [2023-11-15T02:31:18.513Z] -Row(floor(a, 0)=Decimal('-940459146980391'))
654809 [2023-11-15T02:31:18.513Z] +Row(floor(a, 0)=Decimal('-940459146980392'))
654810 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=None)
654811 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=None)
654812 [2023-11-15T02:31:18.513Z]  Row(floor(a, 0)=None)

The text was updated successfully, but these errors were encountered:

jlowe · 2023-11-28T16:31:02Z

test_floor is working with the same datagen and seed, so this is ultimately a difference in floor(a) vs floor(a, 0). Here's a snippet from a Spark 3.3.2 shell session showing the issue:

scala> val df2 = sql("select a, floor(a) as f, floor(a, 0) as fa from df"
scala> df2.filter("abs(a) < 940459146980400 and abs(a) > 940459146980000").show(truncate=false)
23/11/28 10:25:12 WARN GpuOverrides: 
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
  @Partitioning <SinglePartition$> could run on GPU

+--------------+----------------+----------------+
|a             |f               |fa              |
+--------------+----------------+----------------+
|-9.40459147E14|-940459146980391|-940459146980392|
+--------------+----------------+----------------+

jlowe · 2023-11-28T17:08:02Z

The difference is tied to the output types and operations involved. floor(a) produces a long, whereas floor(a, 0) produces a decimal(16,0). The latter is computed first by casting a as a decimal(30,15) and then computing floor at scale=0 on that. What's interesting is that we start to diverge from Spark CPU on the cast to decimal(30,15):

scala> spark.conf.set("spark.rapids.sql.enabled", "false")

scala> Seq(-9.40459146980391E14).toDF("a").repartition(1).createOrReplaceTempView("df")

scala> sql("select a, cast(a as decimal(30,15)) as c, floor(a) as f, floor(a, 0) as fa from df").collect
res22: Array[org.apache.spark.sql.Row] = Array([-9.40459146980391E14,-940459146980391.000000000000000,-940459146980391,-940459146980391])

scala> sql("select a, cast(a as decimal(30,15)) as c, floor(a) as f, floor(a, 0) as fa from df").printSchema
root
 |-- a: double (nullable = false)
 |-- c: decimal(30,15) (nullable = true)
 |-- f: long (nullable = true)
 |-- fa: decimal(16,0) (nullable = true)

scala> spark.conf.set("spark.rapids.sql.enabled", "true")

scala> Seq(-9.40459146980391E14).toDF("a").repartition(1).createOrReplaceTempView("df")

scala> sql("select a, cast(a as decimal(30,15)) as c, floor(a) as f, floor(a, 0) as fa from df").collect
23/11/28 11:07:33 WARN GpuOverrides: 
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
      @Expression <AttributeReference> a#102 could run on GPU

23/11/28 11:07:33 WARN GpuOverrides: 
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
      @Expression <AttributeReference> a#102 could run on GPU

23/11/28 11:07:33 WARN GpuOverrides: 
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
      @Expression <AttributeReference> a#102 could run on GPU

23/11/28 11:07:33 WARN GpuOverrides: 
!Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
  @Partitioning <SinglePartition$> could run on GPU
  ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
    @Expression <AttributeReference> a#102 could run on GPU

res26: Array[org.apache.spark.sql.Row] = Array([-9.40459146980391E14,-940459146980391.021204383727616,-940459146980391,-940459146980392])

jlowe · 2023-11-28T17:23:23Z

So this ends up being solely about the known imprecision when casting floating point values to decimals. The test has to explicitly set spark.rapids.sql.castFloatToDecimal.enabled=true to keep everything on the GPU, and that leads to the difference of -940459146980391.021204383727616 vs. -940459146980391.000000000000000 when casting the double to a decimal(30,15). Performing a floor on those values results in a difference, because floor moves to the next integral value towards negative infinity.

If the GPU matched the CPU when casting floating point to decimals, the test would pass as written.

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 15, 2023

abellina mentioned this issue Nov 15, 2023

Follow up from random datagen seed PR #9703

Closed

revans2 removed the ? - Needs Triage Need team to review and classify label Nov 15, 2023

thirtiseven mentioned this issue Nov 24, 2023

[BUG] test_ceil_scale_zero may fail with different datagen_seed #9846

Closed

jlowe changed the title ~~[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700004494~~ [BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 Nov 28, 2023

jlowe self-assigned this Nov 28, 2023

jlowe mentioned this issue Nov 28, 2023

Fix zero-scale floor and ceil tests #9874

Merged

jlowe linked a pull request Nov 28, 2023 that will close this issue

Fix zero-scale floor and ceil tests #9874

Merged

jlowe closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 #9722

[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 #9722

abellina commented Nov 15, 2023 •

edited

Loading

jlowe commented Nov 28, 2023

jlowe commented Nov 28, 2023

jlowe commented Nov 28, 2023

[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 #9722

[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407 #9722

Comments

abellina commented Nov 15, 2023 • edited Loading

jlowe commented Nov 28, 2023

jlowe commented Nov 28, 2023

jlowe commented Nov 28, 2023

abellina commented Nov 15, 2023 •

edited

Loading