-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Materialize dictionaries in group keys #8291
Materialize dictionaries in group keys #8291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @qrilka !
Can we please also add a test for this change too (via an explain plan that shows the schema?)
Maybe we could add a way to show schema in DisplayableExecutionPlan
🤔
/// returns schema with dictionary group keys materialized as their value types | ||
/// The actual convertion happens in `RowConverter` and we don't do unnecessary | ||
/// conversion back into dictionaries | ||
fn materialize_dict_group_keys(schema: &Schema, group_count: usize) -> Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupValues
is a trait and there are two implementations of it -- GroupValueRows
and GroupValuesPrimitive
This conversion only applies to GroupValueRows
(though GroupValuesPrimitive
won't be used for input dictionary columns anyways)
I wonder if there is some way to move the conversion into GroupValueRows
so it is clearer from the context when this is needed or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I do understand your point and it saw this when creating my PR but I don't quite see a good way around this: the problem is that GroupValues
get's created only during execution in AggregateExec::execute_typed
but the correct ("materialized") schema looks to be needed in AggregateExec
itself, i.e. already at the planning stage (as a part of impl ExecutionPlan for AggregateExec
).
Maybe I miss some option here?
@alamb I think the following test looks to be enough:
Adding schemas into |
Sorry for the late reply @qrilka -- what I was thinking is "if someone undid this code change, would any of the tests fail to know we broke it" I think the answer is no. So that is what I was thinking about the test. I am not sure if you are proposing a new test or saying an existing test is sufficient |
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
2b07114
to
2194cca
Compare
@alamb I was proposing a new test (which is what is in my quote) now I added it into the PR. Please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @qrilka
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
This PR seems to have caused a regression: #8738 |
Which issue does this PR close?
Closes #7647.
Rationale for this change
From the ticket:
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
What changes are included in this PR?
Now
AggregateExec
includesoriginal_schema
which allows us to differentiate between original schema with dictionaries in place andschema
where we materialize them as their value types.Are these changes tested?
Tested locally with
cargo test
Are there any user-facing changes?
The main user-facing change will be in different type returned for group keys using dictionaries.