Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogRecord Resource Serialization #3346

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add unit to view instrument selection criteria
([#3341](https://github.com/open-telemetry/opentelemetry-python/pull/3341))
- Upgrade opentelemetry-proto to 0.20 and regen
[#3355](https://github.com/open-telemetry/opentelemetry-python/pull/3355))
([#3355](https://github.com/open-telemetry/opentelemetry-python/pull/3355))
- LogRecord now JSON serializes resource objects to match ReadableSpan and MetricsData equivalents
([#3345](https://github.com/open-telemetry/opentelemetry-python/issues/3345))

## Version 1.18.0/0.39b0 (2023-05-04)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,9 @@ def to_json(self, indent=4) -> str:
if self.span_id is not None
else "",
"trace_flags": self.trace_flags,
"resource": repr(self.resource.attributes)
"resource": json.loads(self.resource.to_json())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass in the indent to to_json()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzchen The serialized JSON is immediately deserialized before being reserialized again as part of the larger data structure. The indent should only matter in terms of consistency for that final serialization process, right?

See comparable ReadableSpan that behaves identically:

f_span["resource"] = json.loads(self.resource.to_json())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some context, this to_json was originally added for debugging purposes to be used in ConsoleExporter. That's why there is a repr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzchen 🎗️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sernst, what do you think about making a change to -- instead of converting the Resource to JSON then immediately converting that JSON to an object -- refactor the code a bit to add a method to Resource, e.g. to_dictionary that returns a dictionary representation that both to_json can use and that can be used here, e.g.

"resource": self.resource.to_dictionary()

Totally understand that your code here is consistent with the rest of the codebase, and no worries if you feel my suggestion would be out of scope -- we can definitely make these changes in a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmcollins Yeah, I agree that avoiding the conversion to JSON and back with a direct dictionary serialization is a much better way to go. I would suggest that this should go beyond just the resource object. I noticed the pattern heavily used in within the sub-objects of ReadableSpan, MetricsData and LogRecord conversion. I think it would make sense to make all of the objects involved in the serialization process capable of dictionary serialization without intermediate JSON serialization.

I've just added a second commit to this PR, which does just that:

64946b4

These new changes introduce to_dict to the objects included in the existing JSON serialization process for ReadableSpan, MetricsData, LogRecord, and Resource objects. This includes adding to_dict to objects that are included within the serialized data structures of these objects. In places where repr() serialization was used, it has been replaced by a JSON-compatible serialization instead. Inconsistencies between null and empty string values were preserved, but in cases where attributes are optional, an empty dictionary is provided as well to be more consistent with cases where attributes are not optional and an empty dictionary represents no attributes were specified on the containing object.

These changes also included:

  1. Dictionary typing was included for all the to_dict methods for clarity in subsequent usage.
  2. DataT and DataPointT were did not include the exponential histogram types in point.py, and so those were added with new to_json and to_dict methods as well for consistency. It appears that the exponential types were added later and including them in the types might have been overlooked. Please let me know if that is a misunderstanding on my part.
  3. OrderedDict was removed in a number of places associated with the existing to_json functionality given its redundancy for Python 3.7+ compatibility. I was assuming this was legacy code for previous compatibility, but please let me know if that's not the case as well.
  4. to_dict was added to objects like SpanContext, Link, and Event that were previously being serialized by static methods within the ReadableSpan class and accessing private/protected members. This simplified the serialization in the ReadableSpan class and those methods were removed. However, once again, let me know if there was a larger purpose to those I could not find.

Finally, I used to_dict as the method names here to be consistent with other related usages. For example, dataclasses.asdict(). But, mostly because that was by far the most popular usage within the larger community:

328k files found on GitHub that define to_dict functions, which include some of the most popular Python libraries to date:

https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define to_dictionary functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use to_dictionary instead let me know and I will adjust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow awesome solution @sernst! Yes, it appears that to_dict is a better choice 😄

Do you think your second commit warrants its own PR? (I'm not sure what the convention is in this repo). Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, I'm not sure what the preference would be here either. I'm happy to separate it out into another PR if that is preferred.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes from the second commit discussed here have been moved into a separate PR:

#3365

tracking against #3364

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanthccv In response to yesterday's comment, please see previous on this thread for those topics already being discussed and actions taken. Would have been helpful to have feedback in context rather than a bifurcated thread. Thanks!

if self.resource
else "",
else None,
},
indent=indent,
)
Expand Down
27 changes: 26 additions & 1 deletion opentelemetry-sdk/tests/logs/test_log_record.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from opentelemetry.attributes import BoundedAttributes
from opentelemetry.sdk._logs import LogLimits, LogRecord
from opentelemetry.sdk.resources import Resource


class TestLogRecord(unittest.TestCase):
Expand All @@ -32,7 +33,7 @@ def test_log_record_to_json(self):
"trace_id": "",
"span_id": "",
"trace_flags": None,
"resource": "",
"resource": None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to add a test to verify that a non-empty Resource also converts to json correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, certainly agree that coverage was missing. As a first time contributor here I was trying to things slim, but I'm happy to add it:

https://github.com/open-telemetry/opentelemetry-python/pull/3346/files#diff-c202e3816c113727d0d32d83fa4becda1632ae14c6bf62b7421e96cb936b2c85R45

},
indent=4,
)
Expand All @@ -42,6 +43,30 @@ def test_log_record_to_json(self):
).to_json()
self.assertEqual(expected, actual)

def test_log_record_to_json_with_resource(self):
"""Should JSON serialize/deserialize Resource objects within log records."""
expected = json.dumps(
{
"body": "a log line",
"severity_number": "None",
"severity_text": None,
"attributes": None,
"timestamp": "1970-01-01T00:00:00.000000Z",
"trace_id": "",
"span_id": "",
"trace_flags": None,
"resource": {"attributes": {"foo": "bar"}, "schema_url": ""},
},
indent=4,
)

actual = LogRecord(
timestamp=0,
body="a log line",
resource=Resource(attributes={"foo": "bar"}),
).to_json()
self.assertEqual(expected, actual)

def test_log_record_bounded_attributes(self):
attr = {"key": "value"}

Expand Down