-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs as a real node config and support node_color
for coloring the DAG
#5397
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
) | ||
|
||
# we validate that node_color has a suitable value to prevent dbt-docs from crashing | ||
def __post_init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stu-k How do you feel about implementing color validation at the dbt-core layer vs. the dbt-docs layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all CSS/HTML/JS is currently isolated in dbt-docs
, I don't feel great about including CSS specific logic in dbt-core
. This would introduce a case where if you needed to change the functionality of a thing, you'd have to modify a codebase where it is counterintuitive to do so.
cc: @jtcohen6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Stu!
Makes sense to me. We can go ahead and include this defensive code in the dbt-docs
codebase.
Part of our thinking was to make sure dbt tells the developer that they have the wrong colors when compiling their code. I know a lot of dbt users that don't run dbt docs generate
until it hits a scheduled job. Therefore, the troubleshooting happens on the tailend vs. upfront. But sounds like we're okay with that tradeoff given maintainability of the codebase matters more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both for sharing your perspectives! I see compelling points on both sides, and I could go either way.
Truth: There's really zero other CSS/HTML/JS in dbt-core
. dbt-core
does know a teensy bit about Markdown, just enough to make description
fields populate correctly in dbt-docs
, but it doesn't do any of the rendering itself.
Also, users should include docs generation + manual inspection of docs during development / CI, if they're messing with custom node colors, to makes sure that what they're doing doesn't break their docs site. (I bet we can also add some defensive logic to the docs site that ignores invalid color codes, rather than crashing.)
At the same time — by adding docs.node_color
as a first-class config/property in dbt-core
, similar to docs.show
— we are setting an expectation that dbt-core
knows about it and takes an interest in validating this config, more than it would for "any" fields like meta
. That, along with making this configurable according to file paths (FQNs) and other rule sets, is one of the reasons I'd be supportive of incorporating this logic into dbt-core
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it makes sense to add it to dbt-core
the next question would be whether it should also be in dbt-docs
. I modified dbt-labs/dbt-docs#281 a few days ago to add the validation on the dbt-docs
side as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said my piece above, I'm happy to defer to @stu-k's preference, motivated by separation of concerns between dbt-core
and dbt-docs
@@ -285,7 +286,7 @@ def same_contents(cls, unrendered: Dict[str, Any], other: Dict[str, Any]) -> boo | |||
# 'meta' moved here from node | |||
mergebehavior = { | |||
"append": ["pre-hook", "pre_hook", "post-hook", "post_hook", "tags"], | |||
"update": ["quoting", "column_types", "meta"], | |||
"update": ["quoting", "column_types", "meta", "docs"], | |||
"dict_key_append": ["grants"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled down this branch locally and it didn't work. Do you need to add to this dict_key_append
similar to https://github.com/dbt-labs/dbt-core/pull/5343/files#diff-007a86f768b23e1516e6d253d71656b96d46672fc7432dbc81d1d9445c98ca05R289 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on, I may have messed up my local dev env. One moment please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get an error or did it just not add values under model_id.config.docs
?
I feel like "update" is what we want to do. If a model has docs
defined because of the folder it is in, if we redefine it at the model level we just want the most specific value to be kept (rather than combining with the folder one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested, the config at the individual YAML file was correctly overwriting the folder config. I think what is missing is that in the YAML file, it needs to be defined under config
, like
models:
- name: xxxx
config:
docs:
node_color: "#e0115f"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yes, this must be it. Thanks for writing this out explicitly. I'm aligned with your approach!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I weighed in on the thread about where validation should happen, and also gave me $0.02 about how we'd want to move forward with the docs
config/property merge
) | ||
|
||
# we validate that node_color has a suitable value to prevent dbt-docs from crashing | ||
def __post_init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both for sharing your perspectives! I see compelling points on both sides, and I could go either way.
Truth: There's really zero other CSS/HTML/JS in dbt-core
. dbt-core
does know a teensy bit about Markdown, just enough to make description
fields populate correctly in dbt-docs
, but it doesn't do any of the rendering itself.
Also, users should include docs generation + manual inspection of docs during development / CI, if they're messing with custom node colors, to makes sure that what they're doing doesn't break their docs site. (I bet we can also add some defensive logic to the docs site that ignores invalid color codes, rather than crashing.)
At the same time — by adding docs.node_color
as a first-class config/property in dbt-core
, similar to docs.show
— we are setting an expectation that dbt-core
knows about it and takes an interest in validating this config, more than it would for "any" fields like meta
. That, along with making this configurable according to file paths (FQNs) and other rule sets, is one of the reasons I'd be supportive of incorporating this logic into dbt-core
directly.
docs: Dict[str, Any] = field( | ||
default_factory=dict, | ||
metadata=MergeBehavior.Update.meta(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, what we really want here is to merge the values of node.docs
and node.config.docs
, just as we do today with node.meta
and node.config.meta
. If we can do that, the change will be both sensible and backwards compatible.
There are a few relevant spots for that (and I'm probably not catching all of them, so @dbt-labs/core-language should really advise):
dbt-core/core/dbt/parser/base.py
Lines 284 to 287 in d5608dc
# If we have meta in the config, copy to node level, for backwards | |
# compatibility with earlier node-only config. | |
if "meta" in config_dict and config_dict["meta"]: | |
parsed_node.meta = config_dict["meta"] |
dbt-core/core/dbt/parser/schemas.py
Lines 807 to 821 in d5608dc
# We want to raise an error if 'meta' is in two places, and move 'meta' | |
# from toplevel to config if necessary | |
def normalize_meta_attribute(self, data, path): | |
if "meta" in data: | |
if "config" in data and "meta" in data["config"]: | |
raise ParsingException( | |
f""" | |
In {path}: found meta dictionary in 'config' dictionary and as top-level key. | |
Remove the top-level key and define it under 'config' dictionary only. | |
""".strip() | |
) | |
else: | |
if "config" not in data: | |
data["config"] = {} | |
data["config"]["meta"] = data.pop("meta") |
We're moving toward a world where:
config
can accept any values, set/inherit them via config hierarchy, and get them for use at runtime- any
config
values with important metadata implications should move up to top-level node properties (tags
,meta
, etc today; probablymaterialized
, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jtcohen6 , I will have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding an additional .02 - it seems that doing the work to support this config in dbt Core means we should also validate it's correct there, despite that introducing some new knowledge to the codebase around valid colors. Basically a +1 for @jtcohen6 's second point above.
From developer experience perspective, it seems very possible that someone could set this incorrectly, and then continue adding / running for a good long time without in the interim doing a dbt docs generate
to refresh (and then catch the error). Totally agree that docs generation should be part of CI also FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointers @jtcohen6
We actually also need to remove/modify this part handling the "legacy" docs (I spent way too much time figuring out why node.docs
was not getting populated):
dbt-core/core/dbt/contracts/graph/parsed.py
Line 524 in d63ad4c
self.docs = patch.docs |
Is the goal to make the docs dict attached to a dataclass with show
/node_color
or would we want it to be just a generic dict
like meta
?
I will update the PR soon with my progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal to make the docs dict attached to a dataclass with
show
/node_color
or would we want it to be just a genericdict
likemeta
?
I think the former. meta
should remain a sorta special case in its acceptance of absolutely anything, since dbt makes zero promises as far as validating its contents.
Of course, when docs
is serialized into manifest.json
, it should just be a standard dictionary with up to 2 keys, show
(always set) and node_color
(optionally set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a branch working locally when making docs
a dict
but making it a dataclass is more complex and raises a few errors.
- Making
node_color
optional still requires setting a Default value, so we might need to have some specific code to handle the deletion of the attribute ifnode_color = None
. The error I get without setting a default value isTypeError: non-default argument 'node_color' follows default argument
- When adding the code to populate
docs
fromconfig.docs
with the following snippets, mahsumaro raises the errorNameError: name 'dbt' is not defined
which I can't pinpoint why. If I remove the new lines for "docs",docs
is not moved from across fromconfig
(expected) but it also doesn't raise the error.
Code in core/dbt/parser/base.py
:
# the original code for "meta"
if "meta" in config_dict and config_dict["meta"]:
parsed_node.meta = config_dict["meta"]
# the new code for "docs"
if "docs" in config_dict and config_dict["docs"]:
parsed_node.docs = config_dict["docs"]
Debug error:
name 'dbt' is not defined
17:25:10.583468 [debug] [MainThread]: Traceback (most recent call last):
File "/Users/bper/dev/dbt-core2/core/dbt/main.py", line 129, in main
results, succeeded = handle_and_check(args)
File "/Users/bper/dev/dbt-core2/core/dbt/main.py", line 191, in handle_and_check
task, res = run_from_args(parsed)
File "/Users/bper/dev/dbt-core2/core/dbt/main.py", line 238, in run_from_args
results = task.run()
File "/Users/bper/dev/dbt-core2/core/dbt/task/runnable.py", line 451, in run
self._runtime_initialize()
File "/Users/bper/dev/dbt-core2/core/dbt/task/runnable.py", line 159, in _runtime_initialize
super()._runtime_initialize()
File "/Users/bper/dev/dbt-core2/core/dbt/task/runnable.py", line 92, in _runtime_initialize
self.load_manifest()
File "/Users/bper/dev/dbt-core2/core/dbt/task/runnable.py", line 81, in load_manifest
self.manifest = ManifestLoader.get_full_manifest(self.config)
File "/Users/bper/dev/dbt-core2/core/dbt/parser/manifest.py", line 218, in get_full_manifest
manifest = loader.load()
File "/Users/bper/dev/dbt-core2/core/dbt/parser/manifest.py", line 403, in load
self.write_manifest_for_partial_parse()
File "/Users/bper/dev/dbt-core2/core/dbt/parser/manifest.py", line 549, in write_manifest_for_partial_parse
manifest_msgpack = self.manifest.to_msgpack()
File "/Users/bper/.venv/dbt-core2/lib/python3.9/site-packages/mashumaro/serializer/msgpack.py", line 40, in to_msgpack
self.to_dict(**dict(DEFAULT_DICT_PARAMS, **dict_params)),
File "<string>", line 9, in to_dict
File "<string>", line 9, in <dictcomp>
File "<string>", line 74, in __pack_union_Manifest_nodes__a52031cbc3504db8b4c0b1b1952cd791
NameError: name 'dbt' is not defined
@b-per Love the error handling at compile time! Can you dynamically update the log output to point to the node/folder config that needs to be fixed? example: |
After live conversation with @b-per @matt-winkler @sungchun12:
Pending Benoit's latest code changes, I'm going to mark this one |
We still need to add tests but I have pushed the latest changes with |
FYI, I have pushed the branch test-docs-dataclass with my attempt at making the docs a dataclasss instead of a dictionary. That branch is currently failing with the error:
|
@b-per I bumped our manifest version to v7 as it's needed for the config change. This will get some of the artifact tests passing. I didn't see this as in scope for this work specifically. You can't merge in the PR since many tests are still failing but you can merge my branch directly into yours. Related PR: #5561 |
Thanks to you and Gerda for the pointer in this commit! I checked with a test project and I am getting the same output for In the last commits, I have:
I will work with the team in getting the tests passing and new ones created |
@b-per added tests for overriding |
Hi @emmyoop I wanted to clarify your comment on artifact tests from above. Are you saying that this branch is dependent on those being resolved before it's merge-able, or can we proceed independently with testing this feature? |
@matt-winkler all tests will need to be passing before this is mergable. These changes were causing artifacts tests to fail because of the new node config. That should have been resolved by the code @b-per merged in from my branch. I see there are some conflicts with the code that was merged in last week for the 1.3.0 beta. I'll see if I can resolve these later today. |
Good deal thanks @emmyoop, so in that case it seems like next steps are:
Anything I'm missing? |
I just pushed some changes to fix the unit tests. In a similar way that |
eb3dd02
to
b3ce967
Compare
node_color
for coloring the DAGnode_color
for coloring the DAG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
Thank you for all your hard work and coordination on this one @b-per, @sungchun12 and @matt-winkler! ❤️
I'm so proud of everyone involved in this PR. This was a meaningful, hard problem to solve for. I honestly think we'll be the first mainstream data tool to add custom colors to data lineage(anecdotally). When people talk about the bronze, silver, gold layers, the conversation will be literal now :) @b-per @matt-winkler, this was so fun and yet another role model for open source, async work together! @emmyoop @stu-k @jtcohen6, thank you for the excellent input on mechanical design. You role modeled what great open source stewardship looks and feels like. Everyone, be ready to be celebrated publicly because I'm going to demo this live in front of a face to face audience on 08/18/2022 at this event: https://www.getdbt.com/resources/drinks-data-dbt-ca/ |
…the DAG (#5397) * add Optional node_color config in Docs dataclass * Remove node_color from the original docs config * Add docs config and input validation * Handle when docs is both under docs and config.docs * Add node_color to Docs * Make docs a Dict to avoid parsing errors * Make docs a dataclass instead of a Dict * Fix error when using docs as dataclass * Simplify generator for the default value * skeleton for test fixtures * bump manifest to v7 * + config hierarchy tests * add show override tests * update manifest * Remove node_color from the original docs config * Add node_color to Docs * Make docs a Dict to avoid parsing errors * Make docs a dataclass instead of a Dict * Simplify generator for the default value * + config hierarchy tests * add show override tests * Fix unit tests * Add tests in case of incorrect input for node_color * Rename tests and Fix typos * Fix functional tests * Fix issues with remote branch * Add changie entry * modify tests to meet standards (#5608) Co-authored-by: Matt Winkler <[email protected]> Co-authored-by: Emily Rockman <[email protected]>
…the DAG (dbt-labs#5397) * add Optional node_color config in Docs dataclass * Remove node_color from the original docs config * Add docs config and input validation * Handle when docs is both under docs and config.docs * Add node_color to Docs * Make docs a Dict to avoid parsing errors * Make docs a dataclass instead of a Dict * Fix error when using docs as dataclass * Simplify generator for the default value * skeleton for test fixtures * bump manifest to v7 * + config hierarchy tests * add show override tests * update manifest * Remove node_color from the original docs config * Add node_color to Docs * Make docs a Dict to avoid parsing errors * Make docs a dataclass instead of a Dict * Simplify generator for the default value * + config hierarchy tests * add show override tests * Fix unit tests * Add tests in case of incorrect input for node_color * Rename tests and Fix typos * Fix functional tests * Fix issues with remote branch * Add changie entry * modify tests to meet standards (dbt-labs#5608) Co-authored-by: Matt Winkler <[email protected]> Co-authored-by: Emily Rockman <[email protected]>
related to #5343
related to #5333
This is a variant of #5343 where the docs config can now be defined in
dbt_project.yml
as well as individualyml
files and theconfig()
block.The original
docs
config that acceptsshow:false
has not been touched for now to avoid breaking changes for people using the original config.This PR goes along the other PR dbt-labs/dbt-docs#281 on the dbt-docs repository.
Testing showed that the
dbt-docs
DAG crashes if invalid color names are provided hence why inputs fornode_color
are checked on the dbt-core sideI have not added tests so far as I would be keen to get feedback on the approach first but I will add tests if we decide to go forward with that code.
Copilot description
This pull request includes various changes to improve the functionality and documentation of the codebase. The most important changes include adding a new function for validating HTML colors, updating functions to handle the presence of a
docs
key, modifying the expected manifest to correctly set thenode_color
attribute, and adding new attributes and helper methods for handling documentation.Main functionality changes:
core/dbt/contracts/graph/utils.py
: Added a new functionvalidate_color
for validating HTML colors.core/dbt/parser/base.py
: Updated theupdate_parsed_node_config
function to handle the presence of adocs
key in theconfig_dict
and import theDocs
class. [1] [2]tests/functional/artifacts/expected_manifest.py
: Modified theexpected_manifest.py
file to correctly set thenode_color
attribute in the expected manifest. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]core/dbt/contracts/graph/model_config.py
: Updated thesame_contents
function and added a newdocs
field to theNodeConfig
class. [1] [2]Documentation changes:
core/dbt/contracts/graph/unparsed.py
: Added a new optional attributenode_color
to theDocs
dataclass..changes/unreleased/Features-20220803-104230.yaml
: Added a new entry to makedocs
configurable indbt_project.yml
and add anode_color
attribute.tests/functional/configs/test_custom_node_colors_configs.py
: Added a new test file to validate the behavior of custom node colors at different levels. (tests/functional/configs/test_custom_node_colors_configs.py)