Refactor deferral: always `merge_from_artifact` & support all commands #9040

jtcohen6 · 2023-11-08T20:02:05Z

resolves #7965
resolves #8715
throwback: #2740

Problem

Weird logical divergence between clone and other tasks in terms of how deferral works. (I borrowed some of @MichelleArk's spike work from spike: inferred contracts #8339.)
Some commands don't support --defer & --state, but they shouldn't raise an error. That's the case currently (e.g. if you submit run-operation within the IDE while "Defer to production" is enabled)

Solution

Consolidate how deferral works for all runnable tasks
Support --defer & friends as "global" flags

@runleonarun adding copilot summary

This pull request primarily focuses on consolidating deferral methods and flags within the dbt project. The changes involve modifications to various methods and classes across multiple files, including core/dbt/cli/flags.py and core/dbt/cli/main.py. The most significant changes include the addition of a new params_assigned_from_user set and changes to the _assign_params method, reordering and removal of decorators in core/dbt/cli/main.py, and the introduction of defer_relation in core/dbt/contracts/graph/manifest.py.

Changes to _assign_params method in core/dbt/cli/flags.py:
- A new set params_assigned_from_user is introduced and used to track user-provided and default values across both 'parent' and 'child' levels. This is to support detection of mutually exclusive flags later on. [1] [2]
- The _assign_params method is modified to include params_assigned_from_user as a parameter. [1] [2]
Reordering and removal of decorators in core/dbt/cli/main.py:
- Several decorators are added to the global_flags function and removed from other functions. This change is likely aimed at consolidating the use of these decorators. [1] [2] [3] and others)
Introduction of defer_relation in core/dbt/contracts/graph/manifest.py:
- For non-ephemeral refable nodes, a new defer_relation attribute is added. This attribute is used when the model is deferred and the adapter doesn't support zero-copy cloning. [1] [2]
Changes in core/dbt/task/clone.py:
- The CloneTask class now always requires a state manifest, regardless of whether the --defer flag has been set.
- The defer_to_manifest method is removed. Unlike other commands, 'clone' always requires a state manifest.
Consolidation of deferral methods and flags:
- An 'Under the Hood' change is added to consolidate deferral methods and flags.

These changes seem to be part of a larger refactoring effort aimed at simplifying the codebase and improving the handling of deferral methods and flags.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

github-actions · 2023-11-08T20:02:23Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2023-11-08T20:11:55Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (a1f78a8) 86.65% compared to head (1156ca2) 86.60%.

Files	Patch %	Lines
core/dbt/task/runnable.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9040      +/-   ##
==========================================
- Coverage   86.65%   86.60%   -0.06%     
==========================================
  Files         230      230              
  Lines       27060    26987      -73     
==========================================
- Hits        23449    23372      -77     
- Misses       3611     3615       +4

Flag	Coverage Δ
integration	`83.48% <97.50%> (-0.13%)`	⬇️
unit	`65.22% <75.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/unit/test_manifest.py

core/dbt/context/providers.py

core/dbt/task/runnable.py

MichelleArk · 2023-12-04T01:18:11Z

core/dbt/task/runnable.py

+            favor_state=bool(self.args.favor_state),
+        )
+        # TODO: is it wrong to write the manifest here? I think it's right...
+        write_manifest(self.manifest, self.config.project_target_path)


we may want to gate on self.config.args.write_json here, similar to what's done when the manifest is initially written here: https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/cli/requires.py#L277

longer term.. I wonder if it'd actually make more sense to do this deferral + merging business in the requires.manifest wrapper to centralize where manifest loading + writing happens. It seems like at this point there is some command-specific functionality which justifies this functionality living at the task-level for now though

The biggest reason to have this at the task level right now is that we're running database queries (= populating + check the adapter cache) while calculating deferral, to know which selected models exist in the expected target schema(s). Everything in requires.manifest ("parsing") takes place before we've made any database connections.

We could try more substantially changing the way that deferral works: During manifest loading, we add defer_relation as an attribute to every node. Then, during each node's compilation, the RuntimeRefResolver checks to see if that node exists in the expected schema (cache lookup) — or skips the check if --favor-state — and resolves to defer_relation instead if not found. We wouldn't be recording in the manifest that a given node was deferred — but that doesn't feel like such a big loss, and it would still be reflected in the compiled_code of other nodes downstream.

That feels significantly tidier. What's your (& my) appetite to try doing that here? Or in a follow-up PR? :)

In fact, come to think of it, this bit of the clone task now doesn't make nearly as much sense, because defer_to_manifest needs to populate the adapter cache as part of resolving which nodes are "deferred" and which just get a defer_relation, which isn't even a meaningful distinction for the purpose of clone:

dbt-core/core/dbt/task/clone.py

Lines 125 to 126 in ed0c432

# unlike in other tasks, we want to add information from the --state manifest *before* caching!

self.defer_to_manifest(adapter, selected_uids)

Risk of breaking changes? Currently, an unselected node will be actually overwritten with the node entry from the stateful manifest. Every single attribute. If you write graph-parsing Jinja logic, and traverse to that node's entry, those are the attributes you'd see. So custom integrations which write those node attributes back to the data warehouse (for example) might be a little bit funky. I think that's quite niche as backwards-incompatible behavior changes go.

This thought got the better of me — here's a first attempt:

Move deferral resolution from merge_from_artifact to RuntimeRefResolver #9199

Let's continue working on the follow-up refactor once this is merged to de-risk the scope of these changes, especially with all the other under-the-hood refactoring going in this release :)

dbeatty10 · 2024-01-12T17:42:48Z

This covers 3 out of the 4 UserConfig missing from global configs that showed up in a separate and independent audit.

What about indirect_selection? It is the only 1 of the 4 that showed up there but is not covered here.

core/dbt/cli/main.py

jtcohen6 · 2024-01-17T09:24:35Z

core/dbt/task/seed.py

@@ -62,10 +62,6 @@ def print_result_line(self, result):


 class SeedTask(RunTask):
-    def defer_to_manifest(self, adapter, selected_uids):


Discussed offline with @MichelleArk: Should we actually still disable deferral for tasks like seed/freshness/list, since those tasks never (in theory) resolve runtime references?

It's technically possible to have a seed reference another node, via a pre/post hook. Or write custom code in a project (on-run-*) hook that wants to use the defer_relation property of nodes in the graph. I don't think that's good, but it is possible.

At this point it feels like a choice between consistency, or a (small) performance optimization. Either way, I want all commands to support the flags as CLI options. We could keep the overrides in for now, until the follow-up refactor (#9199) that removes deferral logic from tasks entirely, and makes it part of manifest loading. That does feel like the right direction, and this is a step on the way there.

cla-bot bot added the cla:yes label Nov 8, 2023

jtcohen6 force-pushed the jerco/7965-refactor-deferral branch 2 times, most recently from 67543e0 to edf37c0 Compare November 8, 2023 21:46

jtcohen6 changed the title ~~Refactor deferral: always merge_from_artifact & support all commands~~ Refactor deferral: always merge_from_artifact & support all commands Nov 8, 2023

jtcohen6 force-pushed the jerco/7965-refactor-deferral branch from 51642a3 to b9a61d0 Compare November 8, 2023 23:08

jtcohen6 added 5 commits December 3, 2023 19:44

Combine merge_from_artifact + add_from_artifact

20262ab

All commands support --defer

94bcaab

Add changelog entry

b0d6705

Fix mutually exclusive flag detection

dc4f91d

Clone doesnt require --defer

ed0c432

jtcohen6 force-pushed the jerco/7965-refactor-deferral branch from 86415da to ed0c432 Compare December 3, 2023 18:45

jtcohen6 requested a review from MichelleArk December 3, 2023 18:45

jtcohen6 marked this pull request as ready for review December 3, 2023 18:45

jtcohen6 requested a review from a team as a code owner December 3, 2023 18:45

MichelleArk reviewed Dec 4, 2023

View reviewed changes

tests/unit/test_manifest.py Show resolved Hide resolved

MichelleArk reviewed Dec 4, 2023

View reviewed changes

core/dbt/context/providers.py Show resolved Hide resolved

MichelleArk reviewed Dec 4, 2023

View reviewed changes

core/dbt/task/runnable.py Outdated Show resolved Hide resolved

MichelleArk reviewed Dec 4, 2023

View reviewed changes

jtcohen6 mentioned this pull request Dec 4, 2023

Move deferral resolution from merge_from_artifact to RuntimeRefResolver #9199

Merged

5 tasks

graciegoheen assigned MichelleArk and jtcohen6 Jan 5, 2024

dbeatty10 mentioned this pull request Jan 12, 2024

[CT-2169] [Bug] All global configs should also be settable in ProjectFlags #7036

Open

Merge branch 'main' into jerco/7965-refactor-deferral

d5a9713

MichelleArk reviewed Jan 12, 2024

View reviewed changes

core/dbt/cli/main.py Show resolved Hide resolved

MichelleArk added 3 commits January 12, 2024 17:24

nits: update comment + test name

ddd5a9e

only write manifest if write_json is true

9b3866b

add indirect_selection as global_flag

1156ca2

MichelleArk approved these changes Jan 15, 2024

View reviewed changes

jtcohen6 commented Jan 17, 2024

View reviewed changes

jtcohen6 merged commit 321031c into main Jan 17, 2024
51 checks passed

jtcohen6 deleted the jerco/7965-refactor-deferral branch January 17, 2024 09:24

This was referenced Jan 19, 2024

add missing options dbt-labs/docs.getdbt.com#4740

Merged

Example in profiles.yml for indirect_selection dbt-labs/docs.getdbt.com#4773

Closed

This was referenced Feb 1, 2024

[Bug] Update command-based logic in compiled_code context member, to match sql #9502

Closed

Fix compiled_code, reimplement sql as wrapper #9503

Merged

tlento mentioned this pull request Feb 26, 2024

Update dbt-semantic-interfaces dependency to compatible range #9671

Merged

This was referenced May 8, 2024

[CT-3569] [Feature] Add a --favor-state-selector flag that supports node selection syntax #9410

Closed

Check if ref'd resource is selected before favoring state #10108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor deferral: always `merge_from_artifact` & support all commands #9040

Refactor deferral: always `merge_from_artifact` & support all commands #9040

jtcohen6 commented Nov 8, 2023 •

edited by runleonarun

Loading

github-actions bot commented Nov 8, 2023

codecov bot commented Nov 8, 2023 •

edited

Loading

MichelleArk Dec 4, 2023

MichelleArk Dec 4, 2023

jtcohen6 Dec 4, 2023 •

edited

Loading

jtcohen6 Dec 4, 2023

MichelleArk Jan 12, 2024

dbeatty10 commented Jan 12, 2024

jtcohen6 Jan 17, 2024

	# unlike in other tasks, we want to add information from the --state manifest before caching!
	self.defer_to_manifest(adapter, selected_uids)

		@@ -62,10 +62,6 @@ def print_result_line(self, result):


		class SeedTask(RunTask):
		def defer_to_manifest(self, adapter, selected_uids):

Refactor deferral: always merge_from_artifact & support all commands #9040

Refactor deferral: always merge_from_artifact & support all commands #9040

Conversation

jtcohen6 commented Nov 8, 2023 • edited by runleonarun Loading

Problem

Solution

@runleonarun adding copilot summary

Checklist

github-actions bot commented Nov 8, 2023

codecov bot commented Nov 8, 2023 • edited Loading

Codecov Report

MichelleArk Dec 4, 2023

Choose a reason for hiding this comment

MichelleArk Dec 4, 2023

Choose a reason for hiding this comment

jtcohen6 Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

jtcohen6 Dec 4, 2023

Choose a reason for hiding this comment

MichelleArk Jan 12, 2024

Choose a reason for hiding this comment

dbeatty10 commented Jan 12, 2024

jtcohen6 Jan 17, 2024

Choose a reason for hiding this comment

Refactor deferral: always `merge_from_artifact` & support all commands #9040

Refactor deferral: always `merge_from_artifact` & support all commands #9040

jtcohen6 commented Nov 8, 2023 •

edited by runleonarun

Loading

codecov bot commented Nov 8, 2023 •

edited

Loading

jtcohen6 Dec 4, 2023 •

edited

Loading