Standardizing model execution handling of cancellation and timeouts #8522
Replies: 3 comments
-
The feature is definitely a great idea (and lacking right now, I was contemplating watching the logs to intercept the ids and do it as a wrapper). I'd suggest to go for Option 1 (default) with Option 3 as a possibility since it would be agnostic of the db features to do so. Then I would move forward with the dbt bigquery cancel PR to ensure it works fairly well and plan to backport it to dbt-core as well. |
Beta Was this translation helpful? Give feedback.
-
I would like to advocate for handling We send |
Beta Was this translation helpful? Give feedback.
-
Running queries costs money, dbt should support killing runaway queries and killing queries when the dbt run is cancelled or fails.
Currently there is no standard approach, and in many adapters no ability at all, for dbt to manage timing out long running model executions and to cancel queries if a dbt run is failing or "cancelled" (i.e. SIGINT/SIGKILL). This discussion is focused on a high level exploration of these options for how this could be implemented.
Cancellation/Failure
To support the "cancelled" scenario we should add a method to the base adapter that accepts the identifier of the model being executed on the underlying data warehouse. Then dbt-core could simply add a try/catch on
KeyboardInterrupt
and attempt to invoke the adapter cancellation method.Timeouts
There are two approaches to timing out a model execution: 1) have dbt-core actively track execution time and cancel the execution itself or 2) delegate the timeout to the underlying data warehouse and rely on it to reap the execution/return an exception.
Option 1 has been explored in this dbt-bigquery PR as the bigquery python client (at the time of writing this) does not respect the configured timeout on the BQ project. The upside of this approach is that dbt can make solid guarantees to the user about model execution timing. Also, each adapter does not need to know how to manage timeouts on the underlying data warehouse. The significant downside of this approach is it adds overhead to model execution both in code complexity and in runtime performance.
Option 2 has essentially the inverse pro/con: we can't make strong guarantees about timing out execution but it is significantly more performant and less error prone.
A third possibility could be to mix the approaches: having dbt rely on the data warehouse to time out the execution when it supports that and falling back to a dbt managed timeout.
Beta Was this translation helpful? Give feedback.
All reactions