You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the TableProvider implementations are split by file format (Parquet, CSV...). One other solution to organize TableProviders would be by table format (file system listing, Iceberg, Delta).
Describe the solution you'd like
ExecutionPlan implementations would remain organized by file format. A TableProvider could create different types of execution plan according to its configuration or auto-discovering the data file format from the information stored in the table format
the current implementations for Parquet, CSV, JSON and Avro would go into a ListingTable provider. Implicitly the table format implemented currently:
is given a directory as input
discovers the files using the file system "listing" operation
Schema inference, when required, would be resolved outside the TableProvider and and would be exposed as a service by ballista
Describe alternatives you've considered
An alternative is to leave the table providers organized as is and try to solve the table formats at a different moment of the planning. This is discussed in this design document.
This ListingTable provider could also be added into an external crate. But in that case it would be a partial fork of DataFusion that would require to be maintained separately.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the
TableProvider
implementations are split by file format (Parquet, CSV...). One other solution to organizeTableProvider
s would be by table format (file system listing, Iceberg, Delta).Describe the solution you'd like
ExecutionPlan
implementations would remain organized by file format. ATableProvider
could create different types of execution plan according to its configuration or auto-discovering the data file format from the information stored in the table formatListingTable
provider. Implicitly the table format implemented currently:TableProvider
and and would be exposed as a service by ballistaDescribe alternatives you've considered
An alternative is to leave the table providers organized as is and try to solve the table formats at a different moment of the planning. This is discussed in this design document.
This
ListingTable
provider could also be added into an external crate. But in that case it would be a partial fork of DataFusion that would require to be maintained separately.Additional context
TableDescriptor
abstraction added in FilePartition and PartitionedFile for scanning flexibility #932The text was updated successfully, but these errors were encountered: