Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more file format #310

Closed
kettlelinna opened this issue Nov 9, 2023 · 4 comments
Closed

Support more file format #310

kettlelinna opened this issue Nov 9, 2023 · 4 comments

Comments

@kettlelinna
Copy link

Is your feature request related to a problem? Please describe.
Blaze only support parquet file format so far, and it is cusotmerize, but in fact datafusion have implement parquet source

Describe the solution you'd like
Can we use datafusion reader interface? I think it more easier to extend, btw datafusion have provided multiple reader so far

@richox
Copy link
Collaborator

richox commented Nov 9, 2023

the customized ParquetExec is designed for reading data directly from HDFS via JNI (we don't use object-store or libhdfs because they are too hard to be used in production environemnt).
I don't think datafusion's Reader interface outperforms current ExecutionPlan/SendableRecordBatchStream implementation. and i'm not attracted to datafusion's builtin formats (like csv, json), as they are not widely used in spark.

@kettlelinna
Copy link
Author

yeah, I see. but it is hard to extend data source now, it doesn't have extend interface to support that. we can easy to extend datasource if we use datafusion reader something like deltalake, avro, etc

@richox
Copy link
Collaborator

richox commented Nov 10, 2023

yeah, I see. but it is hard to extend data source now, it doesn't have extend interface to support that. we can easy to extend datasource if we use datafusion reader something like deltalake, avro, etc

it should be hard. different formats have lots of specialized logics of reading data, like pruning, data type converting, delimiting, and so on. i don't have any idea to design an input format interface yet.

@richox
Copy link
Collaborator

richox commented Jul 4, 2024

related to #498

@richox richox closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants