Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add projection to FilterExec #7932

Closed

Conversation

junjunjd
Copy link
Contributor

Which issue does this PR close?

Closes #5436.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Oct 26, 2023
@Dandandan
Copy link
Contributor

Nice @junjunjd . I think the remaining work is to add it as well to projection push down :)

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules labels Nov 4, 2023
@Dandandan
Copy link
Contributor

@junjunjd FYI, I merged and pushed some changes towards pushing projection pushdown.

@Dandandan
Copy link
Contributor

@junjunjd FYI, I've committed a working version. The remaining work is fixing test (expectations) and/or remaining issues.

@junjunjd
Copy link
Contributor Author

junjunjd commented Nov 6, 2023

Thanks @Dandandan! I will take a look at the tests.

@Dandandan
Copy link
Contributor

Dandandan commented Dec 1, 2023

Current version (not sure where the regressions come from, but results are promising): @junjunjd

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  91.11ms │                90.66ms │     no change │
│ QQuery 2     │  26.68ms │                24.62ms │ +1.08x faster │
│ QQuery 3     │  52.38ms │                42.46ms │ +1.23x faster │
│ QQuery 4     │  51.02ms │                27.82ms │ +1.83x faster │
│ QQuery 5     │ 115.14ms │                70.60ms │ +1.63x faster │
│ QQuery 6     │   9.75ms │                 8.35ms │ +1.17x faster │
│ QQuery 7     │ 212.57ms │               209.77ms │     no change │
│ QQuery 8     │  60.12ms │                75.56ms │  1.26x slower │
│ QQuery 9     │  59.65ms │                81.61ms │  1.37x slower │
│ QQuery 10    │ 113.51ms │                75.39ms │ +1.51x faster │
│ QQuery 11    │  19.42ms │                19.37ms │     no change │
│ QQuery 12    │  58.83ms │                43.10ms │ +1.37x faster │
│ QQuery 13    │  54.67ms │                30.95ms │ +1.77x faster │
│ QQuery 14    │  18.18ms │                12.36ms │ +1.47x faster │
│ QQuery 15    │  58.83ms │                39.23ms │ +1.50x faster │
│ QQuery 16    │  21.80ms │                22.48ms │     no change │
│ QQuery 17    │  53.05ms │                65.16ms │  1.23x slower │
│ QQuery 18    │ 154.05ms │               142.87ms │ +1.08x faster │
│ QQuery 19    │  34.50ms │                29.88ms │ +1.15x faster │
│ QQuery 20    │  62.34ms │                50.79ms │ +1.23x faster │
│ QQuery 21    │ 247.24ms │               170.29ms │ +1.45x faster │
│ QQuery 22    │  14.14ms │                13.91ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘

@Dandandan
Copy link
Contributor

@junjunjd if you are able to work on this, it would be good to fix the remaining tests (either test need to be changed or expected output needs to be changed) and see why we have the regression on a few queries.

@Dandandan
Copy link
Contributor

Ok, on a flight I got some more time to find the regression. It was related to join selection and statistics.

Current version shows no regressions anymore 🎉

--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 151.50ms │               151.57ms │     no change │
│ QQuery 2     │  49.60ms │                49.71ms │     no change │
│ QQuery 3     │  70.18ms │                68.88ms │     no change │
│ QQuery 4     │  51.75ms │                50.40ms │     no change │
│ QQuery 5     │  82.02ms │                81.63ms │     no change │
│ QQuery 6     │  31.84ms │                31.47ms │     no change │
│ QQuery 7     │ 102.68ms │               102.40ms │     no change │
│ QQuery 8     │ 107.65ms │               107.11ms │     no change │
│ QQuery 9     │ 131.21ms │               130.31ms │     no change │
│ QQuery 10    │ 137.28ms │               134.48ms │     no change │
│ QQuery 11    │  37.14ms │                36.55ms │     no change │
│ QQuery 12    │  85.27ms │                84.92ms │     no change │
│ QQuery 13    │ 190.22ms │               180.37ms │ +1.05x faster │
│ QQuery 14    │  50.64ms │                51.16ms │     no change │
│ QQuery 15    │  61.33ms │                55.79ms │ +1.10x faster │
│ QQuery 16    │  54.56ms │                52.42ms │     no change │
│ QQuery 17    │ 100.46ms │               102.43ms │     no change │
│ QQuery 18    │ 188.07ms │               188.75ms │     no change │
│ QQuery 19    │  98.04ms │                98.17ms │     no change │
│ QQuery 20    │  55.19ms │                53.82ms │     no change │
│ QQuery 21    │ 128.41ms │               120.85ms │ +1.06x faster │
│ QQuery 22    │  38.39ms │                37.76ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ projection_filter_exec ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  91.51ms │                89.51ms │     no change │
│ QQuery 2     │  25.79ms │                22.87ms │ +1.13x faster │
│ QQuery 3     │  53.71ms │                41.13ms │ +1.31x faster │
│ QQuery 4     │  51.74ms │                28.65ms │ +1.81x faster │
│ QQuery 5     │ 115.24ms │                84.50ms │ +1.36x faster │
│ QQuery 6     │   9.28ms │                 8.36ms │ +1.11x faster │
│ QQuery 7     │ 215.76ms │               212.17ms │     no change │
│ QQuery 8     │  59.93ms │                60.06ms │     no change │
│ QQuery 9     │  58.50ms │                57.97ms │     no change │
│ QQuery 10    │ 115.36ms │                79.37ms │ +1.45x faster │
│ QQuery 11    │  19.34ms │                19.77ms │     no change │
│ QQuery 12    │  59.47ms │                36.31ms │ +1.64x faster │
│ QQuery 13    │  54.12ms │                43.69ms │ +1.24x faster │
│ QQuery 14    │  18.04ms │                12.17ms │ +1.48x faster │
│ QQuery 15    │  58.53ms │                38.91ms │ +1.50x faster │
│ QQuery 16    │  21.86ms │                21.94ms │     no change │
│ QQuery 17    │  53.52ms │                53.08ms │     no change │
│ QQuery 18    │ 156.56ms │               145.73ms │ +1.07x faster │
│ QQuery 19    │  35.22ms │                28.24ms │ +1.25x faster │
│ QQuery 20    │  63.21ms │                50.03ms │ +1.26x faster │
│ QQuery 21    │ 249.52ms │               173.98ms │ +1.43x faster │
│ QQuery 22    │  14.70ms │                14.32ms │     no change │
└──────────────┴──────────┴────────────────────────┴───────────────┘

@Dandandan
Copy link
Contributor

As adding it to the logicalplan seems to cause a lot of trouble, I plan on moving this to the physical plan optimization phase only.

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 25, 2024
@github-actions github-actions bot closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add projection to FilterExec to avoid unecessary output creation
2 participants