Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom table DDL integrated with the dbt flow #1017

Closed
jdub opened this issue Sep 20, 2018 · 2 comments
Closed

Custom table DDL integrated with the dbt flow #1017

jdub opened this issue Sep 20, 2018 · 2 comments

Comments

@jdub
Copy link

jdub commented Sep 20, 2018

Feature description

I'd like to keep my DDL in dbt, and use dbt conveniences (loops) to create tables. But, I also want this to integrate nicely with the dbt flow (particularly with materialised tables, etc).

One example problem I've faced a number of times: I use AWS's Data Migration Service (DMS) to replicate data from PostgreSQL into Redshift. DMS doesn't create particularly good tables by default, so usually you'd create them before starting the DMS task. With sharded tables and databases, that can mean a lot of tables! It'd be great to define a table creation "template" in dbt, and then use it in an arbitrary Jinja/Python loop.

Weird idea (note, coming from a dbt beginner):

project/tables/history.sql =

create table {{ schema }}.{{ table }} (
  id integer primary key,
  title varchar(100) not null encode zstd,
  date timestamp with time zone not null encode zstd
);

Then in a sensible place for table creation, I'd make a little loop to create tables with the history template with names like, history_1, history_2 (two shards).

Perhaps for materialisations, one of the config parameters might be which table template to use?

Who will this benefit?

Redshift users (perhaps other backends if they support similar extensions in DDL) who want to create tables appropriately optimised for their data.

This might be related to #901 and #1016.

@drewbanin
Copy link
Contributor

hey @jdub - I think "source" table creation falls under the auspices of the E and L in ELT, and I'm not certain that creating source tables does fit into the dbt flow! dbt is built around repeatability and every dbt user can build the whole project from scratch. I'm not sure how that notion would translate into one-time operations like building source tables.

So, what I can imagine is dbt just running arbitrary jinja-sql scripts for you with some sort of context (think: vars, env_vars, a profile, etc). We're tracking that kind of thing over here: #582

I'm very ok with the idea of folks using dbt to do exactly what you're describing, but I'd hesitate to make it a first-class part of the tool. Lmk what you think about that!

@jdub
Copy link
Author

jdub commented Sep 21, 2018

Yeah, I think the operation concept is a good fit – I'm going to close this ticket in favour of it.

@jdub jdub closed this as completed Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants