Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate statistics aggregation #8229

Open
alamb opened this issue Nov 15, 2023 · 2 comments
Open

Consolidate statistics aggregation #8229

alamb opened this issue Nov 15, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 15, 2023

Is your feature request related to a problem or challenge?

There are at least three places in DataFusion where multiple Statistics objects are aggregated together, and they do so inconsistently:

  1. get_statistics_with_limit: https://github.com/apache/arrow-datafusion/blob/e54894c39202815b14d9e7eae58f64d3a269c165/datafusion/core/src/datasource/statistics.rs#L34-L33
    2 . Parquet::infer_stats: https://github.com/apache/arrow-datafusion/blob/a892300a5a56c97b5b4ddc9aa4a421aaf412d0fe/datafusion/core/src/datasource/file_format/parquet.rs#L503-L581
  2. Union::statistics: https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/physical-plan/src/union.rs#L612-L611

(and we actually have another version of this in IOx)

Describe the solution you'd like

I would like to consolidate the three implementations into a StatisticsAggregator that knows how to aggregate multiple Statistics objects that is both documented and well tested.

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2023

Update here is I have a PR in progress and expect it to be ready for review in the next day or two

@alamb
Copy link
Contributor Author

alamb commented Dec 4, 2023

#8254 has the basics in place, but I need to work on other higher priority items this week and next so this has been pushed lower in my priority list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant