-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom metadata to be attached to every log #5999
Allow custom metadata to be attached to every log #5999
Conversation
@@ -1,2 +1,3 @@ | |||
SECRET_ENV_PREFIX = "DBT_ENV_SECRET_" | |||
DEFAULT_ENV_PLACEHOLDER = "DBT_DEFAULT_PLACEHOLDER" | |||
METADATA_ENV_PREFIX = "DBT_ENV_CUSTOM_ENV_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name might be natural in terms of some context I don't have yet, but I wonder if it can be improved. Maybe DBT_LOGGING_VAR_ or DBT_METADATA_VAR_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an existing name (docs). Metadata is in the manifest and these vars end up there currently. I centralized the constant into this (newish) file as part of my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be my preference to have a new key for this behavior, such as DBT_LOG_CUSTOM_ENV. Users already have existing DBT_ENV_CUSTOM_ENV environment variables, which they might actually be surprised to find in every log line. @jtcohen6 could you weigh in on this?
To me, this seems to be a different use case than we already have for DBT_ENV_CUSTOM_ENV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflating these two use cases also makes it impossible to put things in one place in the artifact output as opposed to in the artifact output plus every log line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure specifically how users are using DBT_ENV_CUSTOM_ENV
currently.
I do see what you're saying with users possibly being surprised these env vars show up in logs now. But is there a use case for these being the exact variables they want to show up? Definitely need @jtcohen6 to weigh in.
It's easy to separate out if that's the use case we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now those DBT_ENV_CUSTOM_ENV variables show up in the artifacts that are written up in the end, and there was a recent change to put them in the context for access in Jinja.
I'm not familiar with how users might use those, but from some of our test cases, it does look like they were intended for putting things like a run_id or a job_id. So maybe it would be fine to use this prefix string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tagging me in, and for the good conversation here!
Users set these env vars with the explicit intent of having them show up in metadata artifacts. It makes it easier to debug and quickly cross-reference. I see logs as a metadata artifact, in very much the same vein.
This was also our intent by crafting such an unwieldy prefix. It's hard to type DBT_ENV_CUSTOM_ENV_
by accident. I'm definitely open to a better name (DBT_ENV_CUSTOM_METADATA_
?) that's equally clear about what it's doing. By convention, "functional" env vars should just be prefixed with DBT_
, including env vars setting global configs.
I think it's an okay assumption to automatically coordinate between these. If later on, we hear a valid use case for having them show up in one, and not the other, I'm willing to admit that I was wrong! But I'd rather do this than require users to set two sets of identical env vars from the get-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input @jtcohen6! I do think DBT_ENV_CUSTOM_METADATA_
does a better job of saying what the var is for sure, so lean towards this new name.
My understanding is that if we change the name we should add a deprecation warning for the old name and remove it in 2.0. My question is would DBT_ENV_CUSTOM_ENV_
only show up in the artifacts (since that's maintaining current behavior) and DBT_ENV_CUSTOM_METADATA_
shows up in both logs and artifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still vote in favor of coordinating the information so that it shows up in both places.
How would we feel about implementing this as DBT_ENV_CUSTOM_ENV_
for now, and then opening a new ticket for the rename (+ deprecation warning)?
In general, this looks fine though. We are in the process of an initiative to move global things to be non-global, but since the good places to put this particular global haven't been defined yet, making it global for now is probably fine. Does need tests :) |
…ta-logs-struct-logging
resolves #5228
Description
Adds all env vars prefixed with the
METADATA_ENV_PREFIX
to all logs.Checklist
changie new
to create a changelog entry