Example for table_ddl option usage? #33
-
Could you give an example of the table_ddl option? I haven't found the case in the documentation yet. I have one highly deep and very large json file in multiple chunks, and it appears that sling is able to import the single chunk correctly, building a nice table in the PostgreSQL DB. Some field types still need to be specified more precisely/differently, so I decided to do it for the entire table using DDL file. Can ddl be provided as a sql file, or which format of the table_ddl is expected here? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Beta Was this translation helpful? Give feedback.
-
For the source: S3
target: PG
streams:
's3://bucket/path/to/json_folder':
source_options:
flatten: true
target_options:
table_ddl: |
create or replace table "my_schema"."my_table" (
....
)
object: '"my_schema"."my_table"' # make sure to fully qualify so matches in DDL
mode: full-refresh |
Beta Was this translation helpful? Give feedback.
-
The idea of ingesting the entire directory with json chunks works perfectly. Thanks! The flattening and import of 500 json files (approx 2 Gb size) took roughly 10 minutes. That is indeed quick. But if I'm not mistaken, the 30-40% of the time were spent for migrating columns from the temp table to the final table in the PostgreSQL DB. I know, it is not correct in terms of operations of persistent storage, but is it possible to prevent creation of the temp table and spare that time? There are another tools and procedures to verify the imported table in our workflow. Ok, still need to QC the imported table, and compare with other methods, but I'm thinking about including sling package into the data import pipeline. So, my next step will be to try out the package for importing some 20-200 GB tables, and if that works, a switch to prevent creation of temp tables to save that time would be helpful. |
Beta Was this translation helpful? Give feedback.
-
Interesting suggestion. I guess if it's not a The temp table is a must in those cases ( That would also not work for However, that could work in I'll open a branch for this. 👍 |
Beta Was this translation helpful? Give feedback.
-
Opened #35 |
Beta Was this translation helpful? Give feedback.
For the
table_ddl
, here is a replication YAML example.