Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe logic for concurrent executions #1132

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rattata2me
Copy link

@rattata2me rattata2me commented Jul 19, 2024

resolves #
docs

Problem

when running an incremental table with the on_schema_change policy set to append_new_columns in a dbt project. If two jobs concurrently perform the column check operation, they both generate the same ALTER TABLE statement. Because of this simultaneous execution, a race condition occurs where one job's ALTER TABLE statement succeeds, while the slower executor encounters a SQL compilation error stating that the column already exists. This issue stems from the inability to ensure that the column schema remains unchanged between the column check operation and the execution of the ALTER TABLE statement, leading to potential failures in concurrent environments.

Solution

The solution to this problem is to incorporate the IF NOT EXISTS and IF EXISTS conditions in the ALTER TABLE statements. By using these conditions, the ALTER TABLE statement will only attempt to add a column if it does not already exist, and drop a column only if it exists.
However, this introduces some drawbacks. These include increased complexity in SQL logic, potential masking of underlying schema synchronization issues, minor performance impacts, and the risk of partial schema updates. Additionally, this solution is specific to Snowflake or databases that support these conditions, making it less portable to other database systems.

Checklist

  • [ X] I have read the contributing guide and understand what's expected of me
  • [ X] I have run this code in development and it appears to resolve the stated issue
  • [ X] This PR includes tests, or tests are not required/relevant for this PR
  • [ X] This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@rattata2me rattata2me requested a review from a team as a code owner July 19, 2024 11:35
@cla-bot cla-bot bot added the cla:yes label Jul 19, 2024
@rattata2me
Copy link
Author

rattata2me commented Jul 19, 2024

This is my proposed solution to resolve the concurrency issues stated in #1123 . Right now I can not think of any major drawbacks of fixing the issue by usign the 'IF EXISTS' statements, any feedback on this is appreciated.

@colin-rogers-dbt
Copy link
Contributor

By 'concurrent executions' do you mean performing dbt commands like dbt run against the same target in parallel?

@rattata2me
Copy link
Author

rattata2me commented Jul 24, 2024

Yes precisely, this happens when executing dbt run in two separate processes at the same time against the same target table. More info here #1123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants