You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While looking at adding support for more statistics on the Delta Lake TableProvider implementation I bumped into some limitation in our statistics API.
Currently columnstatistics is a Option<Vec<ColumnStatistics>>.
@Dandandan in #965 I used the schema from the ExecutionPlan trait and it worked fine. But I do agree that it might be better to come up with at data structure that helps asserting that the column_statistics vector is well aligned on the schema fields vector (same size, same types...). I'm adding this as an item in #997, so if you want to close this for now that's fine by me 😃
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While looking at adding support for more statistics on the Delta Lake
TableProvider
implementation I bumped into some limitation in our statistics API.Currently columnstatistics is a
Option<Vec<ColumnStatistics>>
.https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/datasource/datasource.rs#L37
So, it should return the statistics by (correct) index regardless of the order in the files.
Describe the solution you'd like
Either:
HashMap<String, ColumnStatistics>
rather than aOption<Vec<ColumnStatistics>>
Schema
parameter toTableProvider::statisitics
so the positions of the fields can be calculated.FWIW, Delta Lake / delta-rs takes the first approach and seems straightforward to implement and use.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: