-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move deferral resolution from merge_from_artifact
to RuntimeRefResolver
#9199
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
fa16a8c
to
7a3598f
Compare
|
||
def message(self) -> str: | ||
return f"Merged {self.num_merged} items from state (sample: {self.sample})" | ||
# Removed A004: MergedFromState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're no longer fully overwriting deferred manifest nodes with the stateful manifest's alternative, I didn't see much use in keeping this event around
core/dbt/contracts/graph/manifest.py
Outdated
if "defer_relation" in node: | ||
del node["defer_relation"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd excluded this field from serialization when it was just an implementation detail for the clone
task, but now that it's more generally applicable, I think it makes sense to include in serialized manifest.json
.
On the other hand, I think it makes sense to remove the deferred: boolean
field.
@@ -181,10 +180,6 @@ def test_run_and_defer(self, project, unique_schema, other_schema): | |||
assert other_schema not in results[0].node.compiled_code | |||
assert unique_schema in results[0].node.compiled_code | |||
|
|||
with open("target/manifest.json") as fp: | |||
data = json.load(fp) | |||
assert data["nodes"]["seed.test.seed"]["deferred"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're no longer resolving & writing this boolean field. Every node gets a defer_relation
(if appropriate), and the determination of whether or not to use it is made at runtime within RuntimeRefResolver
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9199 +/- ##
==========================================
- Coverage 88.14% 88.14% -0.01%
==========================================
Files 181 181
Lines 22702 22700 -2
==========================================
- Hits 20011 20008 -3
- Misses 2691 2692 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
core/dbt/contracts/graph/nodes.py
Outdated
@@ -342,7 +342,6 @@ class ParsedNode(NodeInfoMixin, ParsedNodeMandatory, SerializableType): | |||
docs: Docs = field(default_factory=Docs) | |||
patch_path: Optional[str] = None | |||
build_path: Optional[str] = None | |||
deferred: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I don't think this field is necessary anymore. However:
- Removing this does require updating a lot of tests
- It's exactly the sort of breaking change (removing a field) that we said we would try to avoid doing for artifacts in v1.8+...
@@ -226,6 +223,7 @@ def test_clone_same_target_and_state(self, project, unique_schema, other_schema) | |||
|
|||
clone_args = [ | |||
"clone", | |||
"--defer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like currently, dbt clone
only needs to pass --state
(or --defer-state
), but not the --defer
flag. The behavior it's performing is very much analogous to deferral, though.
This does not feel clear in our docs: https://docs.getdbt.com/reference/commands/clone
I'd be okay with requiring it from now on. The alternative is something gross like this in requires.manifest
:
if flags.defer or flags.which == 'clone':
5963a9a
to
4f1fa63
Compare
requires.manifest
+ RefResolver
parse_manifest
+ RuntimeRefResolver
0504c41
to
658999f
Compare
One failing test At the same time, I got myself into quite a tricky loop trying to debug an edge case with dbt unit tests, until I realized that this PR should actually fix that edge case:
|
69ee8b6
to
6f1ad1e
Compare
@@ -181,7 +181,6 @@ class ParsedResource(ParsedResourceMandatory): | |||
docs: Docs = field(default_factory=Docs) | |||
patch_path: Optional[str] = None | |||
build_path: Optional[str] = None | |||
deferred: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming this is a minor change and safe to make here from a runtime serialization/deserialization perspective 👍 Independent unit tests here: #10066
Additionally, we're not using this field for anything functional in dbt-core's implementation, it was previously just accessed in testing to assert certain behaviour had occurred during deferral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving minor schema evolution changes
942ae58
to
8e84df8
Compare
…to reduce scope of refactor
8e84df8
to
6b63e31
Compare
write_perf_info: bool, | ||
write: bool, | ||
write_json: bool, | ||
) -> Manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before 6b63e31, this was where I was calling merge_from_artifact
. Now this file diff is just adding type annotations and an inline comment.
@@ -454,10 +454,10 @@ def print_results_line(self, results, execution_time) -> None: | |||
|
|||
def before_run(self, adapter, selected_uids: AbstractSet[str]) -> None: | |||
with adapter.connection_named("master"): | |||
self.defer_to_manifest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this method is now to add defer_relation
to all nodes in the project.
Previously, we needed to know at this point which objects exist in the target database locations (which requires the cache). Now that's cache check is happening within RuntimeRefResolver
, meaning that we can consistently call defer_to_manifest
before populate_adapter_cache
in every task.
@@ -479,8 +477,8 @@ def populate_adapter_cache( | |||
|
|||
def before_run(self, adapter, selected_uids: AbstractSet[str]): | |||
with adapter.connection_named("master"): | |||
self.defer_to_manifest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on RunTask.before_run
parse_manifest
+ RuntimeRefResolver
merge_from_artifact
to RuntimeRefResolver
2a47663
to
1377431
Compare
resolves: #10017
follow-up to #9040
Problem
In #9040 (comment), @MichelleArk wondered aloud:
Solution
Within
requires.manifest
: Adddefer_relation
to all relevant nodes right after manifest loadingWithin
RuntimeRefResolver
: Handle logic for picking the target relation versusdefer_relation
, based on--favor-state
and adapter cache checks (does the target relation exist in the expected location?)This seems like a promising way to streamline much of this logic (for all tasks), and move it earlier (outside of tasks entirely).
Checklist