-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safe logic for concurrent executions #1132
base: main
Are you sure you want to change the base?
Safe logic for concurrent executions #1132
Conversation
This is my proposed solution to resolve the concurrency issues stated in #1123 . Right now I can not think of any major drawbacks of fixing the issue by usign the 'IF EXISTS' statements, any feedback on this is appreciated. |
By 'concurrent executions' do you mean performing dbt commands like |
Yes precisely, this happens when executing |
resolves #
docs
Problem
when running an incremental table with the on_schema_change policy set to append_new_columns in a dbt project. If two jobs concurrently perform the column check operation, they both generate the same ALTER TABLE statement. Because of this simultaneous execution, a race condition occurs where one job's ALTER TABLE statement succeeds, while the slower executor encounters a SQL compilation error stating that the column already exists. This issue stems from the inability to ensure that the column schema remains unchanged between the column check operation and the execution of the ALTER TABLE statement, leading to potential failures in concurrent environments.
Solution
The solution to this problem is to incorporate the IF NOT EXISTS and IF EXISTS conditions in the ALTER TABLE statements. By using these conditions, the ALTER TABLE statement will only attempt to add a column if it does not already exist, and drop a column only if it exists.
However, this introduces some drawbacks. These include increased complexity in SQL logic, potential masking of underlying schema synchronization issues, minor performance impacts, and the risk of partial schema updates. Additionally, this solution is specific to Snowflake or databases that support these conditions, making it less portable to other database systems.
Checklist