Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary null checks in GroupColumns #12944

Open
Rachelint opened this issue Oct 15, 2024 · 5 comments
Open

Remove unnecessary null checks in GroupColumns #12944

Rachelint opened this issue Oct 15, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@Rachelint
Copy link
Contributor

Rachelint commented Oct 15, 2024

Is your feature request related to a problem or challenge?

As mentioned by @Dandandan in #12809 (comment)

Some null checks are actullay unnecessary for arrays containing no nulls (basically we can just use null_count to check it).

Describe the solution you'd like

As mentioned above, the simple way is using null_count in array to make it.

Furtherly, I found we indeed check which rows are nulls in create_hashes. I think maybe we can reuse this result?

Describe alternatives you've considered

No response

Additional context

No response

@Rachelint
Copy link
Contributor Author

I am working on a poc #12947 for checking the improvement of this idea.

@Rachelint
Copy link
Contributor Author

Rachelint commented Oct 16, 2024

I found it actually now a trivial change.

The challenge is that, we should refactor codes to support vectorize append firstly.

(otherwise, we need to add a branch about which append function we should call for each row's appending, and as a reuslt, we remove a row level branch and add another...)

I am trying it.

@Rachelint
Copy link
Contributor Author

This is the number get from my local:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ vectorize-append-value ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.69ms │                 0.67ms │     no change │
│ QQuery 1     │    67.63ms │                67.32ms │     no change │
│ QQuery 2     │   162.49ms │               164.49ms │     no change │
│ QQuery 3     │   181.03ms │               184.10ms │     no change │
│ QQuery 4     │  1620.87ms │              1601.20ms │     no change │
│ QQuery 5     │  1533.80ms │              1536.84ms │     no change │
│ QQuery 6     │    59.69ms │                61.10ms │     no change │
│ QQuery 7     │    76.13ms │                76.61ms │     no change │
│ QQuery 8     │  2021.02ms │              2143.10ms │  1.06x slower │
│ QQuery 9     │  1926.36ms │              1929.56ms │     no change │
│ QQuery 10    │   526.58ms │               520.71ms │     no change │
│ QQuery 11    │   580.63ms │               595.39ms │     no change │
│ QQuery 12    │  1839.30ms │              1821.41ms │     no change │
│ QQuery 13    │  2994.22ms │              2903.39ms │     no change │
│ QQuery 14    │  2082.50ms │              2030.18ms │     no change │
│ QQuery 15    │  1890.10ms │              1859.78ms │     no change │
│ QQuery 16    │  4108.74ms │              4145.29ms │     no change │
│ QQuery 17    │  3640.05ms │              3663.88ms │     no change │
│ QQuery 18    │  8300.44ms │              8089.79ms │     no change │
│ QQuery 19    │   143.29ms │               143.35ms │     no change │
│ QQuery 20    │  3250.19ms │              3218.41ms │     no change │
│ QQuery 21    │  3909.34ms │              3947.90ms │     no change │
│ QQuery 22    │  9146.31ms │              9170.28ms │     no change │
│ QQuery 23    │ 23824.79ms │             23844.20ms │     no change │
│ QQuery 24    │  1120.92ms │              1125.08ms │     no change │
│ QQuery 25    │   995.13ms │               991.87ms │     no change │
│ QQuery 26    │  1320.37ms │              1298.79ms │     no change │
│ QQuery 27    │  4680.98ms │              4673.34ms │     no change │
│ QQuery 28    │ 23892.37ms │             23543.21ms │     no change │
│ QQuery 29    │   904.17ms │               883.88ms │     no change │
│ QQuery 30    │  1832.41ms │              1798.48ms │     no change │
│ QQuery 31    │  2021.98ms │              1958.83ms │     no change │
│ QQuery 32    │  7319.30ms │              7567.96ms │     no change │
│ QQuery 33    │  9575.96ms │              9609.05ms │     no change │
│ QQuery 34    │  9712.54ms │              9647.64ms │     no change │
│ QQuery 35    │  2772.59ms │              2904.98ms │     no change │
│ QQuery 36    │   248.24ms │               251.74ms │     no change │
│ QQuery 37    │   155.42ms │               150.51ms │     no change │
│ QQuery 38    │   153.19ms │               152.12ms │     no change │
│ QQuery 39    │   642.92ms │               587.60ms │ +1.09x faster │
│ QQuery 40    │    57.70ms │                60.29ms │     no change │
│ QQuery 41    │    53.35ms │                56.26ms │  1.05x slower │
│ QQuery 42    │    66.18ms │                66.08ms │     no change │
└──────────────┴────────────┴────────────────────────┴───────────────┘

@Dandandan
Copy link
Contributor

Makes sense to vectorize appends at the same time, I think this might have even more impact than just omitting null checks.
Removing the null check, but adding another branch will likely not yield any benefit.

@Rachelint
Copy link
Contributor Author

Rachelint commented Oct 18, 2024

Makes sense to vectorize appends at the same time, I think this might have even more impact than just omitting null checks. Removing the null check, but adding another branch will likely not yield any benefit.

I wonder should we vectorize eq together.

I impl a simple version only vectorize the append operation #12996
The benchmark number is generated from it.

But it introduce another branch in the equal_to stage, which I think maybe switching to the all vectorize approach like duckdb can eliminate this new branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants