Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom metadata to be attached to every log #5999

Merged

Conversation

emmyoop
Copy link
Member

@emmyoop emmyoop commented Oct 3, 2022

resolves #5228

Description

Adds all env vars prefixed with the METADATA_ENV_PREFIX to all logs.

Checklist

@emmyoop emmyoop added the Skip Changelog Skips GHA to check for changelog file label Oct 3, 2022
@emmyoop emmyoop requested review from a team as code owners October 3, 2022 21:20
@cla-bot cla-bot bot added the cla:yes label Oct 3, 2022
@emmyoop emmyoop changed the title Er/ct 624 metadata logs struct logging Allow custom metadata to be attached to every log Oct 3, 2022
@emmyoop emmyoop marked this pull request as draft October 4, 2022 12:24
@emmyoop emmyoop removed request for a team, gshank and ChenyuLInx October 4, 2022 12:25
@@ -1,2 +1,3 @@
SECRET_ENV_PREFIX = "DBT_ENV_SECRET_"
DEFAULT_ENV_PLACEHOLDER = "DBT_DEFAULT_PLACEHOLDER"
METADATA_ENV_PREFIX = "DBT_ENV_CUSTOM_ENV_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name might be natural in terms of some context I don't have yet, but I wonder if it can be improved. Maybe DBT_LOGGING_VAR_ or DBT_METADATA_VAR_?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an existing name (docs). Metadata is in the manifest and these vars end up there currently. I centralized the constant into this (newish) file as part of my changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be my preference to have a new key for this behavior, such as DBT_LOG_CUSTOM_ENV. Users already have existing DBT_ENV_CUSTOM_ENV environment variables, which they might actually be surprised to find in every log line. @jtcohen6 could you weigh in on this?

To me, this seems to be a different use case than we already have for DBT_ENV_CUSTOM_ENV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflating these two use cases also makes it impossible to put things in one place in the artifact output as opposed to in the artifact output plus every log line.

Copy link
Member Author

@emmyoop emmyoop Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure specifically how users are using DBT_ENV_CUSTOM_ENV currently.

I do see what you're saying with users possibly being surprised these env vars show up in logs now. But is there a use case for these being the exact variables they want to show up? Definitely need @jtcohen6 to weigh in.

It's easy to separate out if that's the use case we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now those DBT_ENV_CUSTOM_ENV variables show up in the artifacts that are written up in the end, and there was a recent change to put them in the context for access in Jinja.

I'm not familiar with how users might use those, but from some of our test cases, it does look like they were intended for putting things like a run_id or a job_id. So maybe it would be fine to use this prefix string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging me in, and for the good conversation here!

Users set these env vars with the explicit intent of having them show up in metadata artifacts. It makes it easier to debug and quickly cross-reference. I see logs as a metadata artifact, in very much the same vein.

This was also our intent by crafting such an unwieldy prefix. It's hard to type DBT_ENV_CUSTOM_ENV_ by accident. I'm definitely open to a better name (DBT_ENV_CUSTOM_METADATA_?) that's equally clear about what it's doing. By convention, "functional" env vars should just be prefixed with DBT_, including env vars setting global configs.

I think it's an okay assumption to automatically coordinate between these. If later on, we hear a valid use case for having them show up in one, and not the other, I'm willing to admit that I was wrong! But I'd rather do this than require users to set two sets of identical env vars from the get-go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input @jtcohen6! I do think DBT_ENV_CUSTOM_METADATA_ does a better job of saying what the var is for sure, so lean towards this new name.

My understanding is that if we change the name we should add a deprecation warning for the old name and remove it in 2.0. My question is would DBT_ENV_CUSTOM_ENV_ only show up in the artifacts (since that's maintaining current behavior) and DBT_ENV_CUSTOM_METADATA_ shows up in both logs and artifacts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still vote in favor of coordinating the information so that it shows up in both places.

How would we feel about implementing this as DBT_ENV_CUSTOM_ENV_ for now, and then opening a new ticket for the rename (+ deprecation warning)?

@emmyoop emmyoop marked this pull request as ready for review October 4, 2022 14:37
@emmyoop emmyoop requested a review from gshank October 4, 2022 14:38
@gshank
Copy link
Contributor

gshank commented Oct 4, 2022

In general, this looks fine though. We are in the process of an initiative to move global things to be non-global, but since the good places to put this particular global haven't been defined yet, making it global for now is probably fine.

Does need tests :)

@emmyoop emmyoop requested review from a team and iknox-fa October 12, 2022 16:47
@emmyoop emmyoop merged commit 5d3d75d into ct-1047-proto_structured_logging Oct 13, 2022
@emmyoop emmyoop deleted the er/ct-624-metadata-logs-struct-logging branch October 13, 2022 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes Skip Changelog Skips GHA to check for changelog file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants