Standardizing model execution handling of cancellation and timeouts #8522

colin-rogers-dbt · 2023-08-30T16:47:54Z

colin-rogers-dbt
Aug 30, 2023
Maintainer

Running queries costs money, dbt should support killing runaway queries and killing queries when the dbt run is cancelled or fails.

Currently there is no standard approach, and in many adapters no ability at all, for dbt to manage timing out long running model executions and to cancel queries if a dbt run is failing or "cancelled" (i.e. SIGINT/SIGKILL). This discussion is focused on a high level exploration of these options for how this could be implemented.

Cancellation/Failure
To support the "cancelled" scenario we should add a method to the base adapter that accepts the identifier of the model being executed on the underlying data warehouse. Then dbt-core could simply add a try/catch on KeyboardInterrupt and attempt to invoke the adapter cancellation method.

Timeouts
There are two approaches to timing out a model execution: 1) have dbt-core actively track execution time and cancel the execution itself or 2) delegate the timeout to the underlying data warehouse and rely on it to reap the execution/return an exception.

Option 1 has been explored in this dbt-bigquery PR as the bigquery python client (at the time of writing this) does not respect the configured timeout on the BQ project. The upside of this approach is that dbt can make solid guarantees to the user about model execution timing. Also, each adapter does not need to know how to manage timeouts on the underlying data warehouse. The significant downside of this approach is it adds overhead to model execution both in code complexity and in runtime performance.

Option 2 has essentially the inverse pro/con: we can't make strong guarantees about timing out execution but it is significantly more performant and less error prone.

A third possibility could be to mix the approaches: having dbt rely on the data warehouse to time out the execution when it supports that and falling back to a dbt managed timeout.

Fleid · 2023-09-06T22:50:29Z

Fleid
Sep 6, 2023

YES.

0 replies

github-christophe-oudar · 2023-09-12T12:40:02Z

github-christophe-oudar
Sep 12, 2023

The feature is definitely a great idea (and lacking right now, I was contemplating watching the logs to intercept the ids and do it as a wrapper).

I'd suggest to go for Option 1 (default) with Option 3 as a possibility since it would be agnostic of the db features to do so.
Making it a dbt-core implementation with adapter method would probably be the safer so that the adapters can decide whether or not they're rather rely on the underlying DB feature.

Then I would move forward with the dbt bigquery cancel PR to ensure it works fairly well and plan to backport it to dbt-core as well.

0 replies

benmosher · 2023-10-03T17:57:44Z

benmosher
Oct 3, 2023
Collaborator

Then dbt-core could simply add a try/catch on KeyboardInterrupt and attempt to invoke the adapter cancellation method.

I would like to advocate for handling BaseException so you also capture SystemExit or any other unhandled exception!

We send SIGTERM in the IDE today (technically Celery does on our behalf), which raises SystemExit, which alongside KeyboardInterrupt should be the only two exceptions extending from BaseException, but not Exception.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardizing model execution handling of cancellation and timeouts #8522

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Standardizing model execution handling of cancellation and timeouts #8522

colin-rogers-dbt Aug 30, 2023 Maintainer

Replies: 3 comments

Fleid Sep 6, 2023

github-christophe-oudar Sep 12, 2023

benmosher Oct 3, 2023 Collaborator

colin-rogers-dbt
Aug 30, 2023
Maintainer

Fleid
Sep 6, 2023

github-christophe-oudar
Sep 12, 2023

benmosher
Oct 3, 2023
Collaborator