Bulk load for greptimedb #405

killme2008 · 2022-11-07T00:55:58Z

Bulk load data from sources, such as:

csv file
json file
parquet file
other tables
mysql table
....

waynexia · 2022-11-07T06:42:36Z

I've invested bulk loading parquet files last week. As parquet is our (and the only) native supported format, we only need to supply some manifest and our specific metadata (in persist storage and in meta server) to make parquet files query-able and even writable.

But what about other format like csv or json? They cannot be directly queried (for now). Two approaches I come up with is

an offline converter that converts other format into parquet, and ingest the converted parquet file.
add support for those formats.

sunng87 · 2022-11-10T16:05:16Z

make parquet files query-able and even writable.

And in a cluster we should have to split the file according to the table's partition rule as well? This is better done in frontend via some custom sql like COPY INTO

And let frontend to deal with more formats like csv or json. We can convert them to parquet internally.

waynexia · 2022-11-11T04:09:58Z

And in a cluster we should have to split the file according to the table's partition rule as well?

Yes. We can let frontend preprocess(split) it and upload them all to OSS.

And let frontend to deal with more formats like csv or json. We can convert them to parquet internally.

I also prefer to convert other formats to parquet. Though support them is not complex but considering the possible modification in the future it would be better to unify the format.

killme2008 · 2023-05-08T07:17:59Z

Already implemented in #1038 #1064

killme2008 added the C-enhancement Category Enhancements label Nov 7, 2022

waynexia mentioned this issue Nov 21, 2022

Export persisted data #604

Closed

killme2008 mentioned this issue Jan 28, 2023

Supports INSERT INTO SELECT statement #760

Closed

killme2008 closed this as completed May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk load for greptimedb #405

Bulk load for greptimedb #405

killme2008 commented Nov 7, 2022 •

edited by waynexia

Loading

waynexia commented Nov 7, 2022

sunng87 commented Nov 10, 2022 •

edited

Loading

waynexia commented Nov 11, 2022

killme2008 commented May 8, 2023 •

edited

Loading

Bulk load for greptimedb #405

Bulk load for greptimedb #405

Comments

killme2008 commented Nov 7, 2022 • edited by waynexia Loading

waynexia commented Nov 7, 2022

sunng87 commented Nov 10, 2022 • edited Loading

waynexia commented Nov 11, 2022

killme2008 commented May 8, 2023 • edited Loading

killme2008 commented Nov 7, 2022 •

edited by waynexia

Loading

sunng87 commented Nov 10, 2022 •

edited

Loading

killme2008 commented May 8, 2023 •

edited

Loading