-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate statistics aggregation #8229
Labels
enhancement
New feature or request
Comments
This was referenced Nov 15, 2023
Update here is I have a PR in progress and expect it to be ready for review in the next day or two |
8 tasks
#8254 has the basics in place, but I need to work on other higher priority items this week and next so this has been pushed lower in my priority list |
This was referenced Dec 5, 2023
7 tasks
23 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem or challenge?
There are at least three places in DataFusion where multiple
Statistics
objects are aggregated together, and they do so inconsistently:get_statistics_with_limit
: https://github.com/apache/arrow-datafusion/blob/e54894c39202815b14d9e7eae58f64d3a269c165/datafusion/core/src/datasource/statistics.rs#L34-L332 . Parquet::infer_stats: https://github.com/apache/arrow-datafusion/blob/a892300a5a56c97b5b4ddc9aa4a421aaf412d0fe/datafusion/core/src/datasource/file_format/parquet.rs#L503-L581
(and we actually have another version of this in IOx)
Describe the solution you'd like
I would like to consolidate the three implementations into a
StatisticsAggregator
that knows how to aggregate multipleStatistics
objects that is both documented and well tested.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: