-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize count(col)
using table statistics
#904
Comments
coun(col)
using table statisticscount(col)
using table statistics
@Dandandan i wanted to look into this but saw that it was mentioned in #962 as having potential merge conflicts. Is it ok for me to start looking into this? Or should I hold off for now? |
Maybe best to consult @rdettai about what is timing-wise the best way to implement this particular optimization. |
in #965 I have moved that optimization to the physical plan, so if the PR gets accepted, this optimization should definitively be done there instead of the logical plan |
@rdettai ok, will wait for that then. just for my understanding, i thought this was a new optimization rule, so how was it moved to the physical plan already? |
@matthewmturner @rdettai was saying he moved all existing cost based optimizers (not this particular one) into physical plane in that PR :) |
actually, If I'm not mistaken this optimization would go into the |
Exactly! You can already take a try at it from my PR. But you would take the risk that if it does not get merged the work has to be done again on the |
@rdettai glad your PR got merged :) now that it is, is it a good time to try working on this? |
Yes sure! |
@Dandandan @rdettai i was going to add this to the existing |
I would write a separate |
@rdettai got it, thanks. Im still just working on getting an understanding of the existing rules. Would you be able to provide some more details on why there is a thx in advance. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
If both the number of rows and the number of nulls are available as statistic, we can compute the value based on statistics only when reading.
row_count - null_count(row)
Describe the solution you'd like
Add an optimization rule that matches this count expression with column and use the statistics to replace it with the value.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: