Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Max aggregate returns invalid results for a float32 column with nans #4754

Closed
razajafri opened this issue Mar 31, 2020 · 4 comments
Closed
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS

Comments

@razajafri
Copy link
Contributor

Describe the bug
Running max aggregate on a table returns invalid result. I haven't gotten around to writing a unit test for this but can do if so required

Steps/Code to reproduce bug
Create the following table

scala> spark.sql(""select * from floatsAndDoubles"").show
+-----+------+
|float|double|
+-----+------+
|  NaN|   NaN|
| 1.02|   NaN|
|  NaN|   4.5|
+-----+------+

running an aggregate(max) op on the double-column will result in the following table

+----------+-----------+
| float    |max(double)|
+----------+-----------+
| 1.020000 | -179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 |
| NaN      | 4.500000  |
+----------+-----------+

Expected behavior
It should output this

"scala> spark.sql(""select float, max(double) from floatsAndDoubles group by float"").show
+-----+-----------+
|float|max(double)|
+-----+-----------+
| 1.02|        NaN|
|  NaN|        NaN|
+-----+-----------+"

Additional context
For context here is what aggregate(sum) does in cudf

+------+-----+
|float | sum |
+------+-----+
| 1.02 | NaN |
| NaN  | NaN |
+------+-----+
@razajafri razajafri added bug Something isn't working Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Mar 31, 2020
@harrism
Copy link
Member

harrism commented Mar 31, 2020

@razajafri Please provide a title for this issue.

@razajafri razajafri changed the title [BUG] [BUG] Max aggregate returns invalid results for a float32 column with nans Mar 31, 2020
@jrhemstad
Copy link
Contributor

Why is there an expectation that NaN is greater than other values?

@jrhemstad
Copy link
Contributor

Closing this as redundant with #4753

@jlowe
Copy link
Member

jlowe commented Mar 31, 2020

Why is there an expectation that NaN is greater than other values?

Because that would be consistent with the sorted order. See https://github.com/rapidsai/cudf/blob/branch-0.14/cpp/tests/table/row_operators_tests.cu#L43-L44

@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

5 participants