Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Specialized GroupColumn for Date/Time/Timestamp types for multi-column GROUP BY #13263

Closed
Tracked by #12680
alamb opened this issue Nov 5, 2024 · 8 comments · Fixed by #13457
Closed
Tracked by #12680
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

Is your feature request related to a problem or challenge?

In #12269 @jayzhan211 made significant improvements to how group values are stored in multi-column aggregations.

Specifically for queries like

SELECT ... FROM ... GROUP BY col1, ... colN

The improvement relies on implementing specialized versions of GroupColumn for the types of col1, colN

We have implemented the primitive types and Strings/StringViews now, but we have not implemented all types

This means queries like

SELECT ... FROM ... GROUP BY int_col, date_col

Will fall back to the slower (but general) GroupValuesRows:

/// representation.
pub struct GroupValuesRows {

Describe the solution you'd like

Implement GroupColumn for all primitive types.

You can see how to do this here:

macro_rules! downcast_helper {
($t:ty, $d:ident) => {
return Ok(Box::new(GroupValuesPrimitive::<$t>::new($d.clone())))
};
}

and the make sure there are tests for each of those types in queries that group on multiple columns

Describe alternatives you've considered

No response

Additional context

Here is an example for how this was done for Strings: #12809

@alamb alamb added the enhancement New feature or request label Nov 5, 2024
@alamb alamb changed the title Implement Specialized GroupColumn for Date/Time/Timestamp types Implement Specialized GroupColumn for Date/Time/Timestamp types for multi-column GROUP BY Nov 5, 2024
@buraksenn
Copy link
Contributor

buraksenn commented Nov 6, 2024

I plan to wrap up nth_value PR today thanks to @jcsherin's reviews, then start on this one if it is okay. Otherwise, I can drop this

@buraksenn
Copy link
Contributor

take

@jonathanc-n
Copy link
Contributor

I might also be working on an implementation for this alongside if that is fine?

@jonathanc-n
Copy link
Contributor

Another thing, should this be done before or after #13262? I'm thinking after, just to reduce overall work.

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2024

Another thing, should this be done before or after #13262? I'm thinking after, just to reduce overall work.

I agree after would be good -- I am hoping that #13262 can be quickly banged out / merged.

However, I think most of the work for this PR will be testing (the actual code is likely not all that much) so that might be good to get ready as well

@jonathanc-n
Copy link
Contributor

@buraksenn Will you be working on this? if not, I would like to try working on this.

@buraksenn
Copy link
Contributor

@buraksenn Will you be working on this? if not, I would like to try working on this.

I was waiting #13262 PR but from that PR I see that you've more knowledge on this topic so I don't want to block this :D. Feel free to work if you've time. I'm dropping this issue I can take a look if you cant find time

@buraksenn buraksenn removed their assignment Nov 14, 2024
@jonathanc-n
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants