Adapt column statistics API #717

Dandandan · 2021-07-13T13:19:26Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While looking at adding support for more statistics on the Delta Lake TableProvider implementation I bumped into some limitation in our statistics API.

Currently columnstatistics is a Option<Vec<ColumnStatistics>>.

https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/datasource/datasource.rs#L37

So, it should return the statistics by (correct) index regardless of the order in the files.

Describe the solution you'd like
Either:

Return a HashMap<String, ColumnStatistics> rather than a Option<Vec<ColumnStatistics>>
Pass a Schema parameter to TableProvider::statisitics so the positions of the fields can be calculated.

FWIW, Delta Lake / delta-rs takes the first approach and seems straightforward to implement and use.

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

Dandandan · 2021-07-13T13:41:49Z

Closing, seeing this could be done with the schema on table provider instead.

rdettai · 2021-09-13T09:08:51Z

@Dandandan in #965 I used the schema from the ExecutionPlan trait and it worked fine. But I do agree that it might be better to come up with at data structure that helps asserting that the column_statistics vector is well aligned on the schema fields vector (same size, same types...). I'm adding this as an item in #997, so if you want to close this for now that's fine by me 😃

Dandandan added the enhancement New feature or request label Jul 13, 2021

rdettai mentioned this issue Aug 31, 2021

Moving cost based optimizations to physical planning #962

Closed

rdettai mentioned this issue Sep 13, 2021

Improve statistics (umbrella issue) #997

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt column statistics API #717

Adapt column statistics API #717

Dandandan commented Jul 13, 2021 •

edited

Loading

Dandandan commented Jul 13, 2021

rdettai commented Sep 13, 2021 •

edited

Loading

Adapt column statistics API #717

Adapt column statistics API #717

Comments

Dandandan commented Jul 13, 2021 • edited Loading

Dandandan commented Jul 13, 2021

rdettai commented Sep 13, 2021 • edited Loading

Dandandan commented Jul 13, 2021 •

edited

Loading

rdettai commented Sep 13, 2021 •

edited

Loading