-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake create or replace #1409
Snowflake create or replace #1409
Conversation
…nnet/dbt into snowflake_create_or_replace
…nnet/dbt into snowflake_create_or_replace
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
Thanks for opening this PR @bastienboutonnet - will give this a look today :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cosmetic comments here, and a couple of areas to simplify these materializations even further. I really like your approach for reconciling that issue with the on false
clause in the incremental materialization.
This is really stellar! Happy to discuss any of the comments I dropped in here, otherwise, let me know when this is ready for another look. At that point, I'll kick off the integration tests and we can hopefully get this merged :D
plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
@drewbanin thanks a lot for reviewing this. I implemented most of your feedback. I still have a question regarding the |
…eate_or_replace"" This reverts commit 4f62978.
|
||
{%- if unique_key is none -%} | ||
{# -- if no unique_key is provided run regular insert as Snowflake may complain #} | ||
insert into {{ target_relation }} ({{ dest_cols_csv }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a really good fix for the on false
issue with Snowflake's merge statements. Do you think it makes sense to put this logic here? Or should we move it into the Snowflake implementation of get_merge_sql
?
I like the idea of making materializations represent business logic instead of database logic, as they become a lot more generalizable. Curious what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes total sense! I actually was feeling a bit "awkward" about having this logic sit there but didn't think too much about where else it could live and this is very good, so I'm going to go ahead and change this as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think this would be the place to implement it. If unique_key
is provided, then we can proceed with common_get_merge_sql
, otherwise we should return the insert
statement you've built here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, its exactly what I just started doing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing, I realised there is no incremental deletes anymore, and the merge statement doesn't call a delete. Would you think we need it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation of incremental
models on Snowflake used delete
statements to approximate an upsert. Before we did:
create temp table as (select * from model code)
delete from destination table where destination unique key = temp table unique key
insert into destination table (select * from temp table)
So, records were only deleted if they were going to be immediately re-inserted. We'd actually prefer not to call a delete
, and instead use the merge
to update these rows in-place. This should be handled by the when matched
clause in the merge
statement.
I do think there's a conversation to be had about performance. I wonder if there's any difference between:
- Deleting existing records and reinserting them (with new values)
- Updating existing records in place
An example
Destination table
unique_key | value |
---|---|
1 | abc |
2 | def |
Temp table (generated from model select
)
unique_key | value |
---|---|
2 | ghi |
3 | xyz |
Desired destination table state
unique_key | value |
---|---|
1 | abc |
2 | ghi |
3 | xyz |
So, there are two ways to accomplish this desired end-state. We can either (pseudocode):
1. delete + insert
delete from destination table where id = 2
insert into destination table (id, value) values (2, ghi), (3, xyz)
2. update + insert (via merge
)
merge into destination table
from temp table
when matched update -- updates row with id = 2
when not matched insert -- adds rows with id = 3
This does raise an interesting question about edge-case behavior with merge
. What happens if there are duplicate unique_id
s in either 1) the destination table or 2) the staging table?
Previously, it was straightforward to understand how the delete
+ insert
pattern behaved. While having a duplicated unique_key
would probably lead to undesirable results, the insert
and delete
queries would execute successfully.
With the merge
implementation, I think users will see an error about non-deterministic results if their unique_key
is not unique! All told, I think this will actually be a good thing, as it should help alert users to bugs in their model code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. From what you say here's what I think. Merge is definitely the preferable option and I think unless there's really a good reason for it, you should be getting an error if you're trying to insert dupes. There is probably something fucked up with the source.
Alternatively we could add support for the ERROR_ON_NONDETERMINISTIC_MERGE
session parameter (when FALSE
it would pick one of the duplicated rows and insert it) but there doesn't seem to be a clear way on how to select the row and I think this is just bad anyway. I don't really see the point of inserting a dupe row. So I agree with your last point in that comment. So I think the current implementation is cool.
This PR is in really good shape! Just one comment about non-destructive mode, and maybe an interesting discussion to have about the job of the Can you take a pass through and remove/update any "todo" comments in here? Definitely let me know if you still have outstanding questions about these things :) |
plugins/snowflake/dbt/include/snowflake/macros/materializations/incremental.sql
Outdated
Show resolved
Hide resolved
plugins/snowflake/dbt/include/snowflake/macros/materializations/table.sql
Outdated
Show resolved
Hide resolved
@bastienboutonnet just fixed a merge conflict (we updated |
Awesome! Should I be worried that it looks like many tests are failing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Thanks for your hard work here @bastienboutonnet - this is going to be a really wonderful addition to dbt on Snowflake ❄️ 🎉 💯
Had a few chats with @drewbanin regarding wanting to solve an issue with Snowflake lack of proper transactions which would cause downtime to tables which ended up either truncated or dropped when doing full-refreshes of incremental tables or re-generating regular tables.
I originally suggested doing table swaps but @drewbanin suggested we do
create or replace
which actually makes a lot more sense and is neater in implementation (no need to create temporary tables that need to be cleaned etc.)Regarding incremental logic for Snowflake @drewbanin pointed out some work had started being done to use merge instead of inserts in this PR: #1307, so it made sense to build on top of that PR to solve the
on false
issue (well solve...) and rework the materialisation logics of incremental runs and tables.Aims:
Incremental Materialisation/Merge:
unique_key
is provided, we revert to a regularinsert ...
as this seemed to cause issues withon false
.Full-refresh and tables materialisations
Leverage
create or replace
in Snowflake for full-refreshes and table materialisations--non-destructive
in future versions)Relates to following issues:
#525
#1101