Set unique temp table suffix to allow parallel incremental executions #811

huangxingyi-git · 2024-09-26T15:07:07Z

Resolves #

Description

Set unique table suffix to allow parallel incremental execution
For some specific cases (eg. backfill very large amount of data), we need to execute parallel multiple dbt run of specific incremental(replace_where) model in which we pass the date (or country) as var argument.
For example, we have a model we run every day using Airflow for which we pass the a date relative to the Airflow scheduler.
FYI
https://github.com/dbt-labs/dbt-athena/pull/650/files

If we want to process by batch of N days in parallel using Airflow concurrency, we need the tmp table create by each of the dbt run to be unique. Else, you are going to end up with N insert attempting to run with the same __dbt_tmp name, creating conflict and ultimately creating failure.

issue

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

benc-db · 2024-10-04T16:24:27Z

Apologies for the delay; I've been busy getting ready for 1.9.0 release and Coalesce next week. Code looks straightforward, so I will just verify that it does not break functional tests.

Signed-off-by: huang xingyi <[email protected]>

huangxingyi-git requested review from andrefurlan-db, benc-db and rcypher-databricks as code owners September 26, 2024 15:07

benc-db had a problem deploying to azure-prod October 4, 2024 16:33 — with GitHub Actions Failure

huangxingyi-git and others added 3 commits October 4, 2024 09:51

Set unique temp table suffix to allow parallel incremental executions

fef1d22

Signed-off-by: huang xingyi <[email protected]>

Set unique temp table suffix to allow parallel incremental executions

919f9be

Signed-off-by: huang xingyi <[email protected]>

apply black

2abb515

benc-db force-pushed the unique-temp-table-name branch from 2b7ec31 to 2abb515 Compare October 4, 2024 16:51

benc-db temporarily deployed to azure-prod October 4, 2024 17:01 — with GitHub Actions Inactive

benc-db previously approved these changes Oct 4, 2024

View reviewed changes

Changelog

87cfc5a

benc-db dismissed their stale review via 87cfc5a October 4, 2024 17:23

benc-db merged commit d2f3f82 into databricks:main Oct 4, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set unique temp table suffix to allow parallel incremental executions #811

Set unique temp table suffix to allow parallel incremental executions #811

huangxingyi-git commented Sep 26, 2024 •

edited

Loading

benc-db commented Oct 4, 2024

Set unique temp table suffix to allow parallel incremental executions #811

Set unique temp table suffix to allow parallel incremental executions #811

Conversation

huangxingyi-git commented Sep 26, 2024 • edited Loading

Description

Checklist

benc-db commented Oct 4, 2024

huangxingyi-git commented Sep 26, 2024 •

edited

Loading