-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk load for greptimedb #405
Comments
I've invested bulk loading parquet files last week. As parquet is our (and the only) native supported format, we only need to supply some manifest and our specific metadata (in persist storage and in meta server) to make parquet files query-able and even writable. But what about other format like csv or json? They cannot be directly queried (for now). Two approaches I come up with is
|
And in a cluster we should have to split the file according to the table's partition rule as well? This is better done in frontend via some custom sql like And let frontend to deal with more formats like csv or json. We can convert them to parquet internally. |
Yes. We can let frontend preprocess(split) it and upload them all to OSS.
I also prefer to convert other formats to parquet. Though support them is not complex but considering the possible modification in the future it would be better to unify the format. |
Bulk load data from sources, such as:
The text was updated successfully, but these errors were encountered: