Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks on target server when more than 1 dbt thread #195

Closed
deanja opened this issue Dec 23, 2021 · 2 comments · Fixed by #204
Closed

Deadlocks on target server when more than 1 dbt thread #195

deanja opened this issue Dec 23, 2021 · 2 comments · Fixed by #204
Assignees
Labels
Milestone

Comments

@deanja
Copy link

deanja commented Dec 23, 2021

Expected Behaviour

dbt can run many tests in parallel without causing database deadlocks.
dbt can query information_schema and do DDL on separate threads without deadlocks.

Actual Behaviour

Intermittent deadlocks. So far only observed on dbt build (with tests) and dbt test.

('40001', '[40001] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 62) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLFetch)')

An example deadlock log, showing implicated queries/ddl, is pasted at the end of this issue.

Steps to reproduce

dbt test --threads 8

or

dbt build--threads 8

Yesterday I could reliably get it to fail with 2 threads. Today it's running fine with 8 sometimes. Oooouch! Seems slightly easier to recreate when there are some bigger relations (10M+ rows) to test.

Configuration

A set of small generic dbt tests - I have ~50 tests
DAG does not affect dbt test. For dbt build I have ~4 paths through the DAG.
dbt 1.0.0
SQL Server 2019
ODBC Driver 2017

Possible causes

The "what changed recently " cause is that from dbt 0.21.1 test creates a lot more tables, as a workaround for there being no nested CTE functionality in SQL Server.

Cause in SQL Server. Not sure, but seems the engine has it's own means of ensuring integrity when querying INFORMATION_SCHEMA and making DDL changes at the same time. In DML, the query would simply get blocked (delayed) while the write completes, not cause a deadlock, except in some edge cases involving index updates. Things were fine pre 0.21.1 probably because of lower volume of operations per second - most tests are quick, and they can take advantage of parallelism.

A root cause is that there's not nested CTE in SQL Server. dbt-core tests and packages like dbt-expectations increasingly use nested CTEs.

Other information

Current workaround: limit dbt threads to 1.

Some discussion of possible causes and solutions on dbt slack

Concern in back of my mind is that we could also see this error then doing dbt run, given a busy enough DAG. I don't know the inner workings of dbt, but if they involve querying INFORMATION_SCHEMA and making DDL changes at the same time on another thread then there is cause for concern. No symptoms so far though :)

Have also been getting SQL engine errors that the name of some test tables exceed SQL Server's maximum of 128 characters. Also intermittent, and only surfaces if there was some other error in the SQL that caused the test run to fail.

Resources

Extract from a deadlock log from SQL Server:

<deadlock>
...
      </executionStack>
      <inputbuf>
select
      table_catalog as [database],
      table_name as [name],
      table_schema as [schema],
      case when table_type = 'BASE TABLE' then 'table'
           when table_type = 'VIEW' then 'view'
           else table_type
      end as table_type

    from [dw2].information_schema.tables
    where table_schema like 'dbt_test__audit'
     </inputbuf>
...
      <inputbuf>
  
  USE [dw2];
  if object_id ('"dbt_test__audit"."source_not_null_relation_column"','U') is not null
      begin
      drop table "dbt_test__audit"."source_not_null_relation_column"
      end
   </inputbuf>
</deadlock>

The log was found using: (credit to credit )

DECLARE @xelfilepath NVARCHAR(260)
SELECT @xelfilepath = dosdlc.path
FROM sys.dm_os_server_diagnostics_log_configurations AS dosdlc;
SELECT @xelfilepath = @xelfilepath + N'system_health_*.xel'
 DROP TABLE IF EXISTS  #TempTable
 SELECT CONVERT(XML, event_data) AS EventData
        INTO #TempTable FROM sys.fn_xe_file_target_read_file(@xelfilepath, NULL, NULL, NULL)
         WHERE object_name = 'xml_deadlock_report'
SELECT EventData.value('(event/@timestamp)[1]', 'datetime2(7)') AS UtcTime, 
            CONVERT(DATETIME, SWITCHOFFSET(CONVERT(DATETIMEOFFSET, 
      EventData.value('(event/@timestamp)[1]', 'VARCHAR(50)')), DATENAME(TzOffset, SYSDATETIMEOFFSET()))) AS LocalTime, 
            EventData.query('event/data/value/deadlock') AS XmlDeadlockReport
     FROM #TempTable
     ORDER BY UtcTime DESC;

Run it in SSMS and you'll get nice links to the xml for each deadlock.

@Elliot2718 Elliot2718 mentioned this issue Jan 27, 2022
@mikaelene
Copy link
Collaborator

Maybe this should be optional? Perhaps set in profiles.yml.

@SportsTribution
Copy link

Isn’t part of the problems with the temporary creation of tables in Test that different tests create the same temporary table? That’s how I read the get_relation part of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants