-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogRecord Resource Serialization #3346
LogRecord Resource Serialization #3346
Conversation
@@ -211,9 +211,9 @@ def to_json(self, indent=4) -> str: | |||
if self.span_id is not None | |||
else "", | |||
"trace_flags": self.trace_flags, | |||
"resource": repr(self.resource.attributes) | |||
"resource": json.loads(self.resource.to_json()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pass in the indent to to_json()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lzchen The serialized JSON is immediately deserialized before being reserialized again as part of the larger data structure. The indent should only matter in terms of consistency for that final serialization process, right?
See comparable ReadableSpan
that behaves identically:
f_span["resource"] = json.loads(self.resource.to_json()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add some context, this to_json
was originally added for debugging purposes to be used in ConsoleExporter
. That's why there is a repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lzchen 🎗️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sernst, what do you think about making a change to -- instead of converting the Resource
to JSON then immediately converting that JSON to an object -- refactor the code a bit to add a method to Resource
, e.g. to_dictionary
that returns a dictionary representation that both to_json
can use and that can be used here, e.g.
"resource": self.resource.to_dictionary()
Totally understand that your code here is consistent with the rest of the codebase, and no worries if you feel my suggestion would be out of scope -- we can definitely make these changes in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmcollins Yeah, I agree that avoiding the conversion to JSON and back with a direct dictionary serialization is a much better way to go. I would suggest that this should go beyond just the resource object. I noticed the pattern heavily used in within the sub-objects of ReadableSpan
, MetricsData
and LogRecord
conversion. I think it would make sense to make all of the objects involved in the serialization process capable of dictionary serialization without intermediate JSON serialization.
I've just added a second commit to this PR, which does just that:
These new changes introduce to_dict
to the objects included in the existing JSON serialization process for ReadableSpan
, MetricsData
, LogRecord
, and Resource
objects. This includes adding to_dict
to objects that are included within the serialized data structures of these objects. In places where repr()
serialization was used, it has been replaced by a JSON-compatible serialization instead. Inconsistencies between null and empty string values were preserved, but in cases where attributes are optional, an empty dictionary is provided as well to be more consistent with cases where attributes are not optional and an empty dictionary represents no attributes were specified on the containing object.
These changes also included:
- Dictionary typing was included for all the
to_dict
methods for clarity in subsequent usage. DataT
andDataPointT
were did not include the exponential histogram types in point.py, and so those were added with newto_json
andto_dict
methods as well for consistency. It appears that the exponential types were added later and including them in the types might have been overlooked. Please let me know if that is a misunderstanding on my part.- OrderedDict was removed in a number of places associated with the existing
to_json
functionality given its redundancy for Python 3.7+ compatibility. I was assuming this was legacy code for previous compatibility, but please let me know if that's not the case as well. to_dict
was added to objects likeSpanContext
,Link
, andEvent
that were previously being serialized by static methods within theReadableSpan
class and accessing private/protected members. This simplified the serialization in theReadableSpan
class and those methods were removed. However, once again, let me know if there was a larger purpose to those I could not find.
Finally, I used to_dict
as the method names here to be consistent with other related usages. For example, dataclasses.asdict()
. But, mostly because that was by far the most popular usage within the larger community:
328k files found on GitHub that define to_dict
functions, which include some of the most popular Python libraries to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python
versus
3.3k files found on GitHub that define to_dictionary
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python
However, if there is a preference for this library to use to_dictionary
instead let me know and I will adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow awesome solution @sernst! Yes, it appears that to_dict
is a better choice 😄
Do you think your second commit warrants its own PR? (I'm not sure what the convention is in this repo). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yeah, I'm not sure what the preference would be here either. I'm happy to separate it out into another PR if that is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikanthccv In response to yesterday's comment, please see previous on this thread for those topics already being discussed and actions taken. Would have been helpful to have feedback in context rather than a bifurcated thread. Thanks!
@@ -31,7 +31,7 @@ def test_log_record_to_json(self): | |||
"trace_id": "", | |||
"span_id": "", | |||
"trace_flags": None, | |||
"resource": "", | |||
"resource": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to add a test to verify that a non-empty Resource also converts to json correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, certainly agree that coverage was missing. As a first time contributor here I was trying to things slim, but I'm happy to add it:
a63486a
to
587581e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM and seem like a nice improvement.
def to_dict(self) -> LogRecordDict: | ||
return { | ||
"body": self.body, | ||
"severity_number": getattr(self.severity_number, "value", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an UNSPECIFIED severity value, should we used it here instead of None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that seems reasonable to me. Made the change and updated the LogRecordDict
to reflect that severity_number
will be an int
instead of an Optional[int]
.
64946b4
to
5e47fcd
Compare
This added a lot of public symbols and changes that are not related to the original issue in the last commit. Please keep the changes limited to the linked issue. I wouldn't add all the |
Currently, the `LogRecord.to_json` method serializes the resource object using `repr` of its attributes. This differs from how the serialization process is handled in `ReadableSpan.to_json` and `MetricsData.to_json`, which utilize the `Resource.to_json` functionality directly. Using `repr` does not produce a json-parseable output and doesn't follow the same depth of serialization as the other two signal types. Therefore, this change carries over the serialization process from the spans and metrics signal types to the logs type. Fixes open-telemetry#3345
5e47fcd
to
0dff7dd
Compare
cb1a3ce
to
37de27a
Compare
bump. Are you still working on this? |
@lzchen Yep, will re-engage here soon. |
This PR needs fixes, marking it as draft. |
Closing since #3972 did the roughly the same change |
related to fix in open-telemetry/opentelemetry-python#3346
Description
Currently, the
LogRecord.to_json
method serializes the resource object usingrepr
of its attributes. This differs from how the serialization process is handled inReadableSpan.to_json
andMetricsData.to_json
, which utilize theResource.to_json
functionality directly. Usingrepr
does not produce a json-parseable output and doesn't follow the same depth of serialization as the other two signal types. Therefore, this change carries over the serialization process from the spans and metrics signal types to the logs type.Fixes #3345
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Ran tox as specified in CONTRIBUTING.md and tested within a simple test application using the
ConsoleLogExporter
to see the changes echoed in the terminal.Does This PR Require a Contrib Repo Change?
Checklist: