You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the Ballista scheduler or executor deserializes a ParquetExec it collects the statistics again and this is redundant. We should serialize the statistics to avoid this extra work.
Describe the solution you'd like
Add Parquet statistics to serde module.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered:
In apache/datafusion#962 I am considering the possibility to make the statistics part of the ExecutionPlan trait (and remove them from TableProvider). But I think that not all nodes will have a cached version of the statistics, only those nodes for which it is an expensive operation to fetch them and that know that the they will not change.
We will probably not need the statistics on the executor, because I doubt that any re-optimization will take place there. So it might be an optimization further down the road to optionally leave them out of the serialization in that case.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the Ballista scheduler or executor deserializes a ParquetExec it collects the statistics again and this is redundant. We should serialize the statistics to avoid this extra work.
Describe the solution you'd like
Add Parquet statistics to serde module.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: