Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix regressions in WindowFunctionSuite with Spark 3.2.0 #3415

Closed
andygrove opened this issue Sep 8, 2021 · 6 comments · Fixed by #3460
Closed

[BUG] Fix regressions in WindowFunctionSuite with Spark 3.2.0 #3415

andygrove opened this issue Sep 8, 2021 · 6 comments · Fixed by #3460
Assignees
Labels
bug Something isn't working P0 Must have for release Spark 3.2+

Comments

@andygrove
Copy link
Contributor

Describe the bug

Spark 3.2.0 has a new DayTimeIntervalType which we do not support. See apache/spark@cd649e7

This is causing some test failures in WindowFunctionSuite

WITH DECIMALS: [Window] [RANGE] [ ASC] [- [ ] 2 DAYS, 3 DAYS] *** FAILED ***
WITH DECIMALS: [Window] [RANGE] [DESC] [- [ ] 2 DAYS, 3 DAYS] *** FAILED ***
WITH DECIMALS: [Window] [RANGE] [ ASC] [- [ ] 2 DAYS, CURRENT ROW] *** FAILED ***
WITH DECIMALS: [Window] [RANGE] [DESC] [- [ ] 2 DAYS, CURRENT ROW] *** FAILED ***
WITH DECIMALS: [Window] [RANGE] [ ASC] [CURRENT ROW, 3 DAYS] *** FAILED ***
WITH DECIMALS: [Window] [RANGE] [DESC] [CURRENT ROW, 3 DAYS] *** FAILED ***
IGNORE ORDER, WITH DECIMALS: [Window] [MIXED WINDOW SPECS] *** FAILED ***

Steps/Code to reproduce bug

mvn -Dbuildver=320 verify

Expected behavior
Tests should pass

Environment details (please complete the following information)
N/A

Additional context
N/A

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify Spark 3.2+ labels Sep 8, 2021
@wbo4958
Copy link
Collaborator

wbo4958 commented Sep 9, 2021

I can repro the errors, and the explain log is like below,

*Exec <ProjectExec> will run on GPU
  !Exec <WindowExec> cannot run on GPU because not all expressions can be replaced
    @Expression <Alias> count(dollars#5) windowspecdefinition(uid#0L, _w1#21 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, INTERVAL '-3' DAY, INTERVAL '2' DAY)) AS count_dollars#19L could run on GPU
      !Expression <WindowExpression> count(dollars#5) windowspecdefinition(uid#0L, _w1#21 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, INTERVAL '-3' DAY, INTERVAL '2' DAY)) cannot run on GPU because the type of boundary is not supported in a window range function, found INTERVAL '2' DAY; the type of boundary is not supported in a window range function, found INTERVAL '-3' DAY
        @Expression <AggregateExpression> count(dollars#5) could run on GPU
          @Expression <Count> count(dollars#5) could run on GPU
            @Expression <AttributeReference> dollars#5 could run on GPU
        @Expression <WindowSpecDefinition> windowspecdefinition(uid#0L, _w1#21 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, INTERVAL '-3' DAY, INTERVAL '2' DAY)) could run on GPU
          @Expression <AttributeReference> uid#0L could run on GPU
          @Expression <SortOrder> _w1#21 ASC NULLS FIRST could run on GPU
            @Expression <AttributeReference> _w1#21 could run on GPU
          !Expression <SpecifiedWindowFrame> specifiedwindowframe(RangeFrame, INTERVAL '-3' DAY, INTERVAL '2' DAY) cannot run on GPU because Bounds for Range-based window frames must be specified in Integral type (Boolean exclusive) or CalendarInterval. Found DayTimeIntervalType(0,0); upper expression Literal INTERVAL '2' DAY (DayTimeIntervalType(0,0) is not supported); lower expression Literal INTERVAL '-3' DAY (DayTimeIntervalType(0,0) is not supported)
            !Expression <Literal> INTERVAL '-3' DAY cannot run on GPU because expression Literal INTERVAL '-3' DAY produces an unsupported type DayTimeIntervalType(0,0)
            !Expression <Literal> INTERVAL '2' DAY cannot run on GPU because expression Literal INTERVAL '2' DAY produces an unsupported type DayTimeIntervalType(0,0)
    @Expression <AttributeReference> uid#0L could run on GPU
    @Expression <SortOrder> _w1#21 ASC NULLS FIRST could run on GPU
      @Expression <AttributeReference> _w1#21 could run on GPU
    *Exec <SortExec> will run on GPU
      *Expression <SortOrder> uid#0L ASC NULLS FIRST will run on GPU
      *Expression <SortOrder> _w1#21 ASC NULLS FIRST will run on GPU
      *Exec <ShuffleExchangeExec> will run on GPU
        *Partitioning <HashPartitioning> will run on GPU
        *Exec <ProjectExec> will run on GPU
          *Expression <Alias> cast(dateLong#2L as timestamp) AS dateLong1#18 will run on GPU
            *Expression <Cast> cast(dateLong#2L as timestamp) will run on GPU
          *Expression <Alias> cast(dateLong#2L as timestamp) AS _w1#21 will run on GPU
            *Expression <Cast> cast(dateLong#2L as timestamp) will run on GPU
          *Exec <FileSourceScanExec> will run on GPU

Seems there are two issues,

The first one is to add "DayTimeIntervalType" support in the plugin for Spark 3.2.0, especially for TypeChecks framework.

The second one is the Range Window for the DayTimeIntervalType support.

Looks like the first one is not easy to do. Hi @revans2, Could you give some idea on this? Thx

@revans2
Copy link
Collaborator

revans2 commented Sep 9, 2021

Ya this is not going to be simple, because it is going to move a lot of things into the shim layer to try and be able to support these new types. I would say that we hold off on this until we can get the rest of the build onto 3.2. At that point we can try to figure out how to deal with this.

@revans2
Copy link
Collaborator

revans2 commented Sep 15, 2021

This was closed accidentally.

@tgravescs
Copy link
Collaborator

there are still WindowFunctionSuite in latest run so I assume this is still being worked on?

@wbo4958
Copy link
Collaborator

wbo4958 commented Sep 20, 2021

@tgravescs, I just had a PR for Window unit tests, see #3547

@andygrove
Copy link
Contributor Author

The WindowFunctionSuite scala tests are passing for me now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release Spark 3.2+
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants