-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify encodings for materialised tables on Redshift #1016
Comments
hey @jdub! Thanks for creating this issue. I want to document my thinking around 1) why this doesn't exist already and 2) how we could make it happen in the future.
dbt uses Creating this empty table from DDL has always been problematic in dbt, as we don't really have a good place to put this DDL code. Moreover, I'd argue that dbt exists to help mitigate the need for actually writing DDL! These days, I consider "needing to care about column encoding" a limitation of Redshift, rather than a feature.
The new v2 schema.yml files support column specifications. I think there might be an opportunity to augment these schema specs with things like column types and encodings. dbt could then use the schema spec for a model, coupled with model configs (like sort/dist keys) to generate the DDL for a table. It could run this DDL, then use the model SQL to insert into the table. I like that this approach doesn't force users to write DDL directly, but I fear it might be a little bit constraining. There are heaps of other table DDL options, like foreign keys / unique keys / etc, that I don't have a ton of interest in supporting in dbt. It's just not the right tool for that particular job. So, it could be that the better option is to to just allow users to write their own DDL. Regarding some specific things you mentioned in the issue:
Advanced approach: Specify an encoding for individual columns, perhaps falling back to a default.
Definitely keen to hear your thoughts here! |
Aha! The |
I'm going to close this - until Redshift provides a mechanism for configuring column encodings in a |
closing (again) in favor of #1438 |
In my experience the workflow in Redshift to add compression to an existing table is to rename that table, make a new table with desired encodings, then insert the data from the previous table to the new table, then drop the previous table. In my dbt setup my biggest models are also incremental models. In theory, I can sneak into Redshift, run the above operation on my already-existing incremental tables, and then as long as the table isn't dropped or a Is it possible to integrate the above steps into the dbt job as some sort of post-hook that only runs after Apologies if this is already possible, as I am far from a dbt expert! |
hey @dataxpress - check out this macro. It implements something like what you're describing, using I'd either invoke this macro from a post-hook (checking the |
Feature description
My materialised tables are created with pretty good defaults -- no encoding for sort keys, lzo for everything else -- but I'd like to switch to zstd, and perhaps provide my own per-column settings.
Basic approach: Specify a single encoding for unsorted columns.
Advanced approach: Specify an encoding for individual columns, perhaps falling back to a default.
Who will this benefit?
Redshift users who want to customise their data encoding. I'd hope that'd be a lot of dbt users!
The text was updated successfully, but these errors were encountered: