[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

razajafri · 2020-03-31T05:20:13Z

Describe the bug
Running max aggregate on a table returns invalid result. I haven't gotten around to writing a unit test for this but can do if so required

Steps/Code to reproduce bug
Create the following table

scala> spark.sql(""select * from floatsAndDoubles"").show
+-----+------+
|float|double|
+-----+------+
|  NaN|   NaN|
| 1.02|   NaN|
|  NaN|   4.5|
+-----+------+

running an aggregate(max) op on the double-column will result in the following table

+----------+-----------+
| float    |max(double)|
+----------+-----------+
| 1.020000 | -179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 |
| NaN      | 4.500000  |
+----------+-----------+

Expected behavior
It should output this

"scala> spark.sql(""select float, max(double) from floatsAndDoubles group by float"").show
+-----+-----------+
|float|max(double)|
+-----+-----------+
| 1.02|        NaN|
|  NaN|        NaN|
+-----+-----------+"

Additional context
For context here is what aggregate(sum) does in cudf

+------+-----+
|float | sum |
+------+-----+
| 1.02 | NaN |
| NaN  | NaN |
+------+-----+

The text was updated successfully, but these errors were encountered:

harrism · 2020-03-31T05:59:02Z

@razajafri Please provide a title for this issue.

jrhemstad · 2020-03-31T13:32:45Z

Why is there an expectation that NaN is greater than other values?

jrhemstad · 2020-03-31T13:44:00Z

Closing this as redundant with #4753

jlowe · 2020-03-31T13:44:52Z

Why is there an expectation that NaN is greater than other values?

Because that would be consistent with the sorted order. See https://github.com/rapidsai/cudf/blob/branch-0.14/cpp/tests/table/row_operators_tests.cu#L43-L44

razajafri added bug Something isn't working Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Mar 31, 2020

razajafri changed the title ~~[BUG]~~ [BUG] Max aggregate returns invalid results for a float32 column with nans Mar 31, 2020

jrhemstad closed this as completed Mar 31, 2020

jrhemstad mentioned this issue Mar 31, 2020

[FEA] Groupby MIN/MAX with NaN values does not match what Spark expects #4753

Open

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

razajafri commented Mar 31, 2020

harrism commented Mar 31, 2020

jrhemstad commented Mar 31, 2020

jrhemstad commented Mar 31, 2020

jlowe commented Mar 31, 2020

[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

Comments

razajafri commented Mar 31, 2020

harrism commented Mar 31, 2020

jrhemstad commented Mar 31, 2020

jrhemstad commented Mar 31, 2020

jlowe commented Mar 31, 2020