Initial structured logging changes #5954

gshank · 2022-09-28T13:58:44Z

resolves #5662

Description

Initial changes for using proto message in logging events.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

emmyoop

This is a big PR. I had some questions in several places.

emmyoop · 2022-09-29T20:52:57Z

.github/workflows/structured-logging-schema-check.yml

-
-      # apply our schema tests to every log event from the previous step
-      # skips any output that isn't valid json
-      - uses: actions-rs/cargo@v1
-        with:
-          command: run
-          args: --manifest-path test/interop/log_parsing/Cargo.toml


Are we going to convert to somehow check against our proto definitions here in the future? This action is now doing the same things as our main workflow for integrations tests except it's changing the log format to be json without validating.

I removed the rust piece because a lot of it would have to be refactored and in my opinion it doesn't help that much. If someone disagrees or wants to do that re-implementation work, that's fine. I would personally vote for doing that in Python, so it can more easily be run locally.

The test as it is now does check that everything can be serialized and did catch a case where merging main had removed some updates to a fire_event call.

There's another ticket to implement log_format of protobuf and round trip the formatted logs.

@nathaniel-may do you remember why this was written in Rust?

We're consuming the logs from an "outside source" so it could be written in anything. I chose Rust because you can generate most of the code we'd be using here from a type definition. Plus Python's serialization-deserialization story is pretty abysmal so I didn't consider that a good option.

So the way I see it, we only have two ways to verify that our structured logs conform to anything even remotely reasonable for our users to consume:

mypy in base_types.py but mypy is limited on what it can check because it has to be defined at the type level (e.g. - it can't catch positive vs negative integers). These type hints are composed through inheritance and aren't particularly easy to use as a schema.

These interop tests written in Rust. There are lots of values we can't check at the type level so I think it's good we check values since it's a programmatic interface. There are certainly alternatives to this approach if we want to pin down these invariants another way.

I will concede that I didn't add very many value checks so @gshank is correct to call out it's limited usefulness in its current state. The original idea is that we would add more to these tests over time which we haven't really done. If Rust isn't the right choice for the team I'm perfectly fine to see this rewritten, but I don't think it should be removed without being replaced.

Will open a ticket to look at restoring some of this testing.

emmyoop · 2022-09-29T20:56:47Z

core/dbt/events/types.py

+# =======================================================
+# A - Pre-project loading
+# =======================================================


core/dbt/events/types.py

emmyoop · 2022-09-29T21:06:28Z

core/dbt/events/types.py



+# TODO: can we combine this with ConnectionLeftOpen?


This was originally built with the model of events never being reused. That's why ConnectionClosed and ConnectionClosed2 exist currently. Do we need to rethink this model with proto events?

That's a good point. I do think that in some cases we might have a logically equivalent location that's in different places in the code, but we want to use the same event.

Thinking on this more, when we start reusing the events the value of the codes gets lost. We won't know with certainty that the log that came out for code A1234 was generated in file foo.py on line 127. The value there feels higher to me than not having to maintain a few events that are similar, if not the same.

core/dbt/adapters/base/connections.py

core/dbt/adapters/reference_keys.py

core/dbt/adapters/factory.py

emmyoop · 2022-09-30T15:41:23Z

core/dbt/events/README.md

+## Compiling types.proto
+
+In the core/dbt/events directory: ```protoc --python_betterproto_out . types.proto```


Can you add a note about when we need to compile the types.proto?

core/dbt/events/functions.py

emmyoop · 2022-09-30T16:22:56Z

core/setup.py

@@ -48,6 +48,7 @@
    install_requires=[
        "Jinja2==3.1.2",
        "agate>=1.6,<1.6.4",
+        "betterproto>=1.2.5",


Do we have a standard on how we're pinning packages? Most (but not all) the time we seem to either pin to an exact version or a range with lower and upper bounds.

I don't know if there's a standard. Do you think we should put == instead?

Our policies for these things are WIP (here), but basically: IMO we shouldn't tightly pin (==) if we don't need. (It makes sense to do that in "locked" requirements files that guarantee a working install, but not for setup requirements.) And the reason to add an upper bound is just to avoid breaking changes that could come in v2.0.0. Have we tested with older versions of betterproto v1, before 1.2.5? If it can be as simple as betterproto>1,<2, amazing. If we're using capabilities that are new since v1.0, then betterproto>=1.2.5,<2 should do the trick.

v1.2.5 was released May 2020. Looks like they've been actively working on a beta 2.0 since then. Since the release is older, I would vote to pin it. Seems unlikely they will release any new minor/patch releases for v1. I can't find any timelines for the beta but have an open question in their slack about it. @gshank were you able to find anything about timelines?

As I recall the betterproto 2.0 beta has some great features but definitely some breaking/unexpected behavior. Not locking to the major version seems risky

also there can be unexpected behavior if we compile with a different betterproto version than we use at runtime

Pinned to 1.2.5

colin-rogers-dbt · 2022-10-04T23:31:55Z

dev-requirements.txt

@@ -1,3 +1,4 @@
+betterproto[compiler]


we should lock this version as well

* use extra dict * remove commented out code * fix duplicates * add test * consolidate * working, but needs some cleanup * convert back to globals * reset globals in tests * fix up some imports * fix text for windows * modify test for windows

…s/dbt-core into ct-1047-proto_structured_logging

emmyoop

Just a few small things to tweak, overall looks good!

emmyoop · 2022-10-14T15:49:15Z

core/dbt/events/README.md


+Note that no attributes can exist in these event classes except for fields defined in the protobuf definitions, because the betterproto metaclass will throw an error. Betterproto provides a to_dict() method to convert the generated classes to a dictionary and from that to json. However some attributes will successfully convert to dictionaries but not to serialized protobufs, so we need to test both output formats.jjkkkkkkkkkkj


I'm guessing the jjkkkkkkkkkkj at the end was accidental?

emmyoop · 2022-10-14T15:49:36Z

core/dbt/events/README.md

- a string attribute `code`, that's unique across events
- assign a log level by extending `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`
+- a method `code`, that's unique across events
+- assign a log level by using the Level misin: `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`


Suggested change

- assign a log level by using the Level misin: `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`

- assign a log level by using the Level mixin: `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`

emmyoop · 2022-10-14T15:57:34Z

core/dbt/events/types.py

-class AdapterEventInfo(InfoLevel, AdapterEventBase, ShowException):
-    code: str = "E002"
+    def message(self):
+        return f"Running with dbt{self.version}"


Should this include the log version as well? You're passing it in in main.py.

That would change the terminal output, which I don't think is necessary for the log format, just wanted to put it in a structured entry someplace. We can discuss what to do about the log format.

### Description Fixes linter errors caused by [upstream changes](dbt-labs/dbt-core#5954). ``` dbt/adapters/databricks/connections.py:440: error: Argument "conn_name" to "ConnectionUsed" has incompatible type "Optional[str]"; expected "str" [arg-type] dbt/adapters/databricks/connections.py:449: error: Argument "conn_name" to "SQLQuery" has incompatible type "Optional[str]"; expected "str" [arg-type] dbt/adapters/databricks/connections.py:491: error: Argument "conn_name" to "ConnectionUsed" has incompatible type "Optional[str]"; expected "str" [arg-type] dbt/adapters/databricks/connections.py:496: error: Argument "conn_name" to "SQLQuery" has incompatible type "Optional[str]"; expected "str" [arg-type] ```

Initial structured logging changes

8f32611

gshank requested a review from a team as a code owner September 28, 2022 13:58

gshank requested a review from a team September 28, 2022 13:58

gshank requested review from a team and leahwicz as code owners September 28, 2022 13:58

gshank requested a review from Fleid September 28, 2022 13:58

cla-bot bot added the cla:yes label Sep 28, 2022

gshank mentioned this pull request Sep 28, 2022

[CT-1265] Implement 'protobuf' log_format options and create tests comparing proto and json logs #5955

Closed

gshank added 3 commits September 28, 2022 10:46

add betterproto to setup.py

7567000

reformat test file

eefed83

fix UpdateReference event call

cfc647c

emmyoop reviewed Sep 30, 2022

View reviewed changes

Rename import of proto_types

13f7d4b

colin-rogers-dbt reviewed Oct 4, 2022

View reviewed changes

emmyoop mentioned this pull request Oct 6, 2022

[CT-1038] No exc_info for a number of logging events #5652

Closed

gshank and others added 14 commits October 7, 2022 14:09

Merge branch 'main' into ct-1047-proto_structured_logging

18358e0

Fix names of freshness logging events

fffc49d

remove "this" from core/dbt/events/functions.py

e514d7c

CT-1047: Fix execution_time definitions to use float

c68300b

rename reference key functions to use "ref_key"

3230aec

Merge branch 'main' into ct-1047-proto_structured_logging

b2f70b8

CT-1047: Revert unintended checking of changes to functions.py

90f12fc

rename _make_key to _make_ref_key

905695c

Merge branch 'ct-1047-proto_structured_logging' of github.com:dbt-lab…

ab67184

…s/dbt-core into ct-1047-proto_structured_logging

Merge branch 'main' into ct-1047-proto_structured_logging

57b87d8

Update betterproto requirements to pin 1.2.5

4dea56d

Update readme for events

70cdc80

formatting

b4a16fa

emmyoop approved these changes Oct 14, 2022

View reviewed changes

typos

d4e30a6

gshank closed this Oct 14, 2022

gshank reopened this Oct 14, 2022

gshank merged commit 9b84b6e into main Oct 14, 2022

gshank deleted the ct-1047-proto_structured_logging branch October 14, 2022 17:57

ueshin mentioned this pull request Oct 14, 2022

Fix linter errors caused by upstream changes. databricks/dbt-databricks#216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial structured logging changes #5954

Initial structured logging changes #5954

gshank commented Sep 28, 2022

emmyoop left a comment

emmyoop Sep 29, 2022

gshank Sep 30, 2022

emmyoop Sep 30, 2022

nathaniel-may Oct 3, 2022

nathaniel-may Oct 3, 2022 •

edited

Loading

gshank Oct 14, 2022

emmyoop Sep 29, 2022

emmyoop Sep 29, 2022

gshank Sep 30, 2022

emmyoop Sep 30, 2022

emmyoop Sep 30, 2022

emmyoop Sep 30, 2022

gshank Sep 30, 2022

jtcohen6 Sep 30, 2022

emmyoop Sep 30, 2022

colin-rogers-dbt Oct 4, 2022

colin-rogers-dbt Oct 4, 2022

gshank Oct 14, 2022

colin-rogers-dbt Oct 4, 2022

gshank Oct 14, 2022

emmyoop left a comment

emmyoop Oct 14, 2022

gshank Oct 14, 2022

emmyoop Oct 14, 2022

emmyoop Oct 14, 2022

gshank Oct 14, 2022

		## Compiling types.proto

		In the core/dbt/events directory: ```protoc --python_betterproto_out . types.proto```


		Note that no attributes can exist in these event classes except for fields defined in the protobuf definitions, because the betterproto metaclass will throw an error. Betterproto provides a to_dict() method to convert the generated classes to a dictionary and from that to json. However some attributes will successfully convert to dictionaries but not to serialized protobufs, so we need to test both output formats.jjkkkkkkkkkkj

	- assign a log level by using the Level misin: `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`
	- assign a log level by using the Level mixin: `DebugLevel`, `InfoLevel`, `WarnLevel`, or `ErrorLevel`

Initial structured logging changes #5954

Initial structured logging changes #5954

Conversation

gshank commented Sep 28, 2022

Description

Checklist

emmyoop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathaniel-may Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emmyoop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathaniel-may Oct 3, 2022 •

edited

Loading