Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: avg size should be weighted by number of non-NULL rows #86567

Open
mgartner opened this issue Aug 22, 2022 · 0 comments
Open

opt: avg size should be weighted by number of non-NULL rows #86567

mgartner opened this issue Aug 22, 2022 · 0 comments
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Aug 22, 2022

See this comment: #86460 (review)

While we're here, though, we could fix an existing issue with average size calculation. The value in system.table_statistics.avgSize and returned by stat.AvgSize() is only for non-NULL values. We should probably weight it by the number of non-NULL rows, so that it is an average over all rows, instead of only being an average over non-NULL rows. I.e. it should be something like stats.AvgColSizes[cols.SingleColumn()] = stat.AvgSize() * (stat.RowCount() - stat.NullCount()) / stat.RowCount().

Jira issue: CRDB-18820

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 22, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 22, 2022
@michae2 michae2 added the A-sql-optimizer SQL logical planning and optimizations. label Aug 22, 2022
@mgartner mgartner added the E-quick-win Likely to be a quick win for someone experienced. label Dec 8, 2022
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants