Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize table providers by table format #1009

Closed
rdettai opened this issue Sep 16, 2021 · 0 comments · Fixed by #1010
Closed

Reorganize table providers by table format #1009

rdettai opened this issue Sep 16, 2021 · 0 comments · Fixed by #1010
Labels
enhancement New feature or request

Comments

@rdettai
Copy link
Contributor

rdettai commented Sep 16, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the TableProvider implementations are split by file format (Parquet, CSV...). One other solution to organize TableProviders would be by table format (file system listing, Iceberg, Delta).

Describe the solution you'd like

  • ExecutionPlan implementations would remain organized by file format. A TableProvider could create different types of execution plan according to its configuration or auto-discovering the data file format from the information stored in the table format
  • the current implementations for Parquet, CSV, JSON and Avro would go into a ListingTable provider. Implicitly the table format implemented currently:
    • is given a directory as input
    • discovers the files using the file system "listing" operation
  • Schema inference, when required, would be resolved outside the TableProvider and and would be exposed as a service by ballista

Describe alternatives you've considered
An alternative is to leave the table providers organized as is and try to solve the table formats at a different moment of the planning. This is discussed in this design document.

This ListingTable provider could also be added into an external crate. But in that case it would be a partial fork of DataFusion that would require to be maintained separately.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant