Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Aggregations in ANSI mode do not detect overflows #3424

Closed
revans2 opened this issue Sep 9, 2021 · 9 comments · Fixed by #3597
Closed

[BUG] Aggregations in ANSI mode do not detect overflows #3424

revans2 opened this issue Sep 9, 2021 · 9 comments · Fixed by #3597
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 9, 2021

Describe the bug
We recently hit a bug in the tests where ANSI mode was being leaked #3423. That was fixed, but it should have failed on the GPU too, not only on the CPU. If we are in ANSI mode we really should fall back to the CPU for any aggregation that can overflow. This is going to be bad for anyone using ANSI mode, but until we can get help from cudf to support this type of thing we are not going to be able to support it.

There is an added problem that because the aggregations happen in different orders on the GPU, even from one run to the next, we might overflow on one run and not on another if the data includes both negative and positive values in it. If we do implement overflow checking we are going to need to think this through very carefully.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 9, 2021
@Salonijain27 Salonijain27 added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Sep 14, 2021
@revans2 revans2 self-assigned this Sep 20, 2021
@revans2 revans2 added this to the Sep 13 - Sep 24 milestone Sep 20, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Sep 20, 2021

I went through ANSI mode in Spark 3.2.0-rc3 to find all of the operators that checked the ANSI mode config. I then looked at the operators that we already support and the following is the results. N/A means we don't support the operator but it checks ANSI mode. The Aggregations I pulled out just the ones that we support and started to look through them to see if there was anything special about it.

  • Arithmetic:
    • Add (All)
    • Multiply (All)
    • UnaryMinus (All)
    • Abs: SPARK-33275 (3.2.0)
    • Subtract: (All)
    • Divide: SPARK-33008 (3.1.0)
    • IntegralDivide: SPARK-33008 (3.1.0) and SPARK-35152 (3.2.0)
    • Remainder: SPARK-33008 (3.1.0)
    • Pmod: SPARK-33008 (3.1.0)
  • Aggregations:
    • Sum: Relies on Add, except for special case in Decimal (3.1.0)
    • Average: Relies on Add, except for special case in Decimal (3.1.0)
    • CentralMomentAgg Nothing Special
  • Collection Operations:
    • ElementAt: SPARK-33386 (3.1.0) and SPARK-33460 (3.1.0)
  • Complex Type Extractors:
    • GetArrayItem: SPARK-33386 (3.1.0)
    • GetMapValue: SPARK-33460 (3.1.0)
  • Date Time Expressions:
    • ToUnixTimestamp: SPARK-33498 (3.1.0)
    • UnixTimestamp: SPARK-33498 (3.1.0)
    • GetTimestamp: SPARK-33498 (3.1.0)
    • NextDay: N/A
    • DateAddInterval (ALL)
    • MakeDate: N/A
    • MakeTimestampTNZ: N/A
    • MakeTimestampLTZ: N/A
    • MakeTimestamp: N/A
  • Interval Expressions:
    • MultiplyInterval: N/A
    • DivideInterval: N/A
    • MakeInterval: N/A
  • String Expressions:
    • Elt: N/A
    • ParseUrl: N/A

@abellina
Copy link
Collaborator

Does it mean that for all non-N/A there's work to do (since we do implement, but we are likely not checking the ansi mode flag)?

@revans2
Copy link
Collaborator Author

revans2 commented Sep 20, 2021

Does it mean that for all non-N/A there's work to do (since we do implement, but we are likely not checking the ansi mode flag)?

More or less. I have not gone through and done an audit on what our code is doing yet. Will start working though it and updating things. I will probably just have us fall back to the CPU for all of these initially in ANSI mode, and then file follow on issues to have us clean it up so we are doing as much of the right things as we can for each operator.

@abellina
Copy link
Collaborator

I'll take the aggregates for now, may be just preventing them to get on the GPU if ANSI.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 21, 2021

I will take all of the arithmetic ones.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 21, 2021

I will start to work on the Collection Operations and Complex Type Extractors operations now

@revans2
Copy link
Collaborator Author

revans2 commented Sep 21, 2021

Never mind those are all done. I'll start to look at Date Time Expressions instead now.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 21, 2021

It looks like everything is in except for Aggregations right now, so I am going to assign this to @abellina who is working on that code.

@revans2 revans2 assigned abellina and unassigned revans2 Sep 21, 2021
@abellina
Copy link
Collaborator

@revans2 here's the PR for the aggs, #3597

@revans2 revans2 linked a pull request Sep 22, 2021 that will close this issue
@nartal1 nartal1 mentioned this issue Apr 5, 2022
49 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants