Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1983] [Feature] Log the meta-dictionary instead of stringifying #6803

Closed
3 tasks done
tmastny opened this issue Jan 31, 2023 · 8 comments
Closed
3 tasks done

[CT-1983] [Feature] Log the meta-dictionary instead of stringifying #6803

tmastny opened this issue Jan 31, 2023 · 8 comments
Labels
duplicate This issue or pull request already exists enhancement New feature or request logging

Comments

@tmastny
Copy link
Contributor

tmastny commented Jan 31, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

With the current release the meta field in node_info is stringified before logging: b45de95.

The means that that configs written like this:

{{ config(meta={"owners": ["team1", "team2"]})}}

are logged like this:

{"owners": "['team1', 'team2']"}

This makes it significantly harder to work with meta fields that have nested elements or arrays.

I propose we stop stringifying the meta field and log it as. Here's a PR with the change: #6804

Describe alternatives you've considered

The alternative is to parse the strings in the logging systems that consume it.

In the original PR, there was some concern about logging without stringifying, but I've tested it (manually and with a functional test) and it works.

Who will this benefit?

Users who want to use the meta field in their logging systems.

  • a full meta dictionary (rather than string) allows you to easy have more complicated associations. For example "meta": {"owners": ["team1", "team2"]}
  • logging systems like DataDog automatically parse JSON very well. It's much harder to parse a stringified version like "["team1", "team2"]".

Are you interested in contributing this feature?

Yes, here is a PR: #6804.

Anything else?

No response

@tmastny tmastny added enhancement New feature or request triage labels Jan 31, 2023
@github-actions github-actions bot changed the title [Feature] Log the meta-dictionary instead of stringifying [CT-1983] [Feature] Log the meta-dictionary instead of stringifying Jan 31, 2023
@jtcohen6
Copy link
Contributor

cc @gshank - relevant to forthcoming tech debt ticket about needing to stringify values in proto dictionaries

@tmastny
Copy link
Contributor Author

tmastny commented Jan 31, 2023

@jtcohen6 I'm curious why it's necessary to stringify the values? I've tested it locally and with a functional test in the PR and it seems like logging works without stringifying.

@jtcohen6
Copy link
Contributor

We need to coordinate the event payload within the contract defined by the associated proto(col buffer) schema:

map<string, string> meta = 9;

meta: Dict[str, str] = betterproto.map_field(
9, betterproto.TYPE_STRING, betterproto.TYPE_STRING
)

@tmastny
Copy link
Contributor Author

tmastny commented Jan 31, 2023

How do you tell if the payload is violating a contract? Is it manual or is there a test? (all the tests passed on my PR #6804)

What happens if the contract is violated? (since all the tests pass, nothing immediately breaks)

Could this be a place where it's worth letting the contract disagree with the implementation?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 3, 2023

@tmastny You're asking all the right questions! :)

We planned to add that testing when we actually added first-class support for binary logging, which would require validating the proto serialization of every event. We've delayed that work for now. In the meantime, we should still have automated testing that validates the proto schema/serialization of every event:

@jtcohen6 jtcohen6 removed the triage label Feb 6, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 10, 2023

Next steps here would be:

  • Adding the testing described above
  • Need to figure out how to define proto schemas for dictionaries with arbitrary nested non-string values (Dict[str, Any]). Could this be a Struct?
  • We should implement this for any "map" objects: meta, adapter_response, args (in MainReportArgs) ... any others? ...

Or, if that's not possible:

  • Convert to a JSON string, and validate it as JSON (instead of Dict[str, str])
  • Advise downstream log consumers to extract the stringified values to JSON.

@jtcohen6
Copy link
Contributor

Upping the priority for this one because it's maybe a prerequisite for #6820

@gshank says next steps here:

  • Check out what Struct can do
  • See what map<str, Any> can do
  • Both have their limitations
  • Pick one, and go implement

@jtcohen6
Copy link
Contributor

Closing as a duplicate of #6832, to keep the implementation conversation centralized over there

@jtcohen6 jtcohen6 added the duplicate This issue or pull request already exists label Feb 27, 2023
@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request logging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants