Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log-formatting to the otel-LogHandler #3673

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

a-recknagel
Copy link

@a-recknagel a-recknagel commented Feb 5, 2024

Description

Added support for log message formatters to opentelemetry.sdk._logs._internal.LogHandler.

Fixes #3664
Fixes #3353

Type of change

  • added a self.format call on the original LogRecord to the code in the LogHandler's emit method, the result is added as the body field of the otel-LogRecord.
  • commented the code that filteres out all standard-LogRecord fields from getting added to the otel-LogRecord. I identified three sections, 1) those that are considered "private" by the LogRecord API (see a note "You shouldn’t need to format this yourself" in https://docs.python.org/3/library/logging.html#logrecord-attributes) or only get used to cache results, 2) those that have a semantic convention and have their key translated accordingly, and 3) those that do not have a semantic convention (yet) and just get plain dropped.
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

The documentation should include an example which shows how exactly adding a formatter changes the result of what gets emitted to the receiver. I think there is some element of surprise here, because the standard procedure is that, if a formatter was set, the handler emits a string whose content is wholly dictated by said formatter.

The breaking change is due to me removing a check where a non-string msg attribute in the original LogRecord will be attached to the otel-LogRecord as-is. Now, the format call will always return a string, even if no formatter was set and args is empty. I believe this is the right behavior and makes the codebase easier to maintain. Code that looked like this

log.info({"message": "foo", "context": "bar"})

could, before this change, be interpreted as a complex data structure by some log-renderer. I'd argue that the "right way of logging" (with this example here illustrating a migration-path) should be the following:

log.info("foo", extra={"context": "bar"})

How Has This Been Tested?

I wrote three new unit tests, and updated a few other ones that broke due to the change.

Does This PR Require a Contrib Repo Change?

  • No

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Breaking Change for the log-record's body-field

Type change

Important context here is the formal definition of the body-field in the otel data-model docs. Summed up, the key points are:

  • the type is any, so exporters need to support it as such
  • can be a human-readable string
  • can be structured data, if that is what the application emits

The proposed patch would change the possible type of the emitted data to effectively be just string. This is not in conflict with the data-model, since string is a subtype of any. Exporters should still support any to be compliant with the spec, even if it would not occur in the reference implementation (i.e. users can of course choose to subclass otel's LoggingHandler and overwrite its emit method to write something else than a string into the emitted log-record's body-field).

Content Change

Given a logging-call like

logger = logging.getLogger("foo")
logger.info("bar")

Formerly, the content of the body field of the resulting record would always be just "foo". After this change, running logging.basicConfig would attach a formatter using BASIC_FORMAT which transform that message into "INFO:foo:bar" on the body field. If you'd want to keep the former behavior, you'd have to avoid using logging.basicConfig or explicitly switch its formatter with logging.Formatter("%(message)s") afterwards.

Excursion: What are the args in a log-record for?

If you take a look at the signature of Logger.log (which is what Logger.debug, Logger.info, etc. call with their respective level), there is an *args in there, which is also persisted in the LogRecord as an attribute.

Given a complex object such as

class ComplexObject:
    def __str__(self):
        import time
        time.sleep(5)
        return "zzz... Baz"

complex_object = ComplexObject()

its purpose is to safe time and effort on log-messages with expensive serialization operations to avoid constructs like

import logging

if logging.RootLogger.level <= logging.DEBUG:
    logging.debug(f"Status of our complex object: {complex_object!s}")

where you don't want to spend 5 seconds waiting for complex_object to be turned into a string just to not end up emitting the result at all because your loglevel was higher than debug.

The same can be achieved with just

import logging

logging.debug(f"Status of our complex object: %s", complex_object)

And that's it, that's all that the *args in Logger.log should reasonably be used for. In practice, this rarely ever comes up.

Examples

We are going to use this script to test the behavior of the logging for the current main-branch (original), and what it would look like after applying this patch (patched):

otel_logging.py

import logging.config

from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import ConsoleLogExporter, SimpleLogRecordProcessor
from opentelemetry.sdk.resources import Resource

logger_provider = LoggerProvider(resource=Resource.create({}))
logger_provider.add_log_record_processor(SimpleLogRecordProcessor(ConsoleLogExporter()))

logging.config.dictConfig({
    "version": 1,
    "formatters": {
        "simple": {"format": "%(name)s [%(module)s.%(lineno)s]: %(message)s"}
    },
    "handlers": {
        "console": {
            "level": "INFO",
            "formatter": "simple",
            "class": "opentelemetry.sdk._logs.LoggingHandler",
            "logger_provider": logger_provider,
        }
    },
    "root": {"handlers": ["console"], "level": "INFO"},
})

logging.info("foo")
logging.info("foo %s %s", "bar", "baz")
logging.info(["bar"])
logging.info({"baz": 1, "qux": None})

The output is quite verbose, so I'll only show the actual differences here, namely the content of the body-field.

original

"body": "foo",
"body": "foo bar baz",
"body": [
    "bar"
],
"body": {
    "baz": 1,
    "qux": null
},

Note that the original type of the first call-argument is retained, string-formatting of the args took place, but the specified logging-formatter was not called. As can be seen in the last call, None has also been transformed into null by the otel processor/exporter, which will be picked up on in the next section.

patched

"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": "root [otel_logging.28]: ['bar']",
"body": "root [otel_logging.29]: {'baz': 1, 'qux': None}",

Note that all of these bodies are strings.

Justification of the change and discarded alternatives

Leave as-is

Python developers expect log-formatting to work. I acknowledge that as far as otel is concerned it is in a bit of an awkward space, since it could be considered a kind of "rendering" that should be handled in the frontend, and not within the client. But it's a staple of python's standard logging, there is an always-on default, and its absence caused a fair bit of confusion already (see linked issues).

Also, the current behavior is buggy for non-jsonable objects. If there were an additional logging event logging.info(set()), it would not appear in the output, and instead be quietly discarded.

Only apply log-formatting if the first argument is a string

This would keep the option open to emit any-type arguments, but would in my personal opinion be inconsistent behavior and cause confusion. It would share the bug of the current implementation with respect to non-jsonable objects, and the output would look like this:

mixed

"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": [
    "bar"
],
"body": {
    "baz": 1,
    "qux": null
},

A possible issue here could be that the bodies lacking the format-information might fall through if dashboard-users are not aware of this discrepancy, e.g. "I know something went wrong in app.py, so I'll search for all bodies with *.\[app\.\d+\].*". Following the least astonishment principle and being consistent with how the body looks in general would be preferable in such a case.

Merge this PR

Log-formatting would work in a predictable, consistent, expected, and robust manner. We would lose the option to log structured data though, but at least for dictionaries there is a migration path. While sparingly documented (in the docs of Logger.debug, if you scroll a bit), you can pass dictionaries during logging with the extra-keyword and have its values persisted as log-record attributes:

logging.info("foo", extra={"bar": ["bar"], "baz": 1, "qux": None})

This is not new functionality, it was supported for a long time and would, as far as I can tell, provide a clean migration-path for users currently relying on passing structured data passed in log-events retaining a parseable shape:

"body": "root [otel_logging.26]: foo",
"attributes": {
    "bar": [
        "bar"
    ],
    "baz": 1,
    ...,
},

Note that qux was filtered out though, since None is not a valid attribute-value as decided by the AttributeValue type and the _clean_attribute function applying it.

@a-recknagel a-recknagel requested a review from a team February 5, 2024 12:15
Copy link

linux-foundation-easycla bot commented Feb 5, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

…before emitting it as the body, and retain all meaningful attributes of the original logrecord.
@bogdandrutu
Copy link
Member

bogdandrutu commented Feb 5, 2024

I think this is against the semantic conventions. To understand what I mean, see #3675 in otel we have standard semantic conventions for things like lineno and we should follow them and not add everything as python defined them.

@a-recknagel
Copy link
Author

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

@bogdandrutu
Copy link
Member

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

It is ok to create an issue, but would be worried merging this PR since then it becomes a breaking change.

@a-recknagel
Copy link
Author

In that case I'll try to find all semantic conventions that exist for the standard logrecord attributes and add them with the right names at least. It might actually be just the code. and thread. ones.

@a-recknagel
Copy link
Author

Ok, I've updated my fork with your changes and suggestions.

@a-recknagel
Copy link
Author

@bogdandrutu @mariojonke if it's not a bother, I'd like to ask if the PR in its current state could be approved, or if there is something left that I should improve on. Thank you for your time.

@lzchen
Copy link
Contributor

lzchen commented Jul 11, 2024

Discussed in 07/11/2024 SIG. Today we have an exclude list of attributes (which essentially includes all of python log record attributes) that are ignored when mapping to Otel LogRecord attributes. We do recognize that the transformation of python logging record to Otel LogRecord is confusing for Python users (especially ones that are coming in to use formatters expecting certain attributes to exist). We believe this issue of discovery and transparency is a separate issue from the inability to allow formatting of logrecords. We can address the discovery issue separately (whether it be part of this PR or not is up to @a-recknagel :) or you can open up a separate tracking issue to maybe improve our documentation). The PR as is is valuable in itself without introducing any attribute changes. @a-recknagel if you agree with this guidance, please update the pr with any changes you think are necessary to reflect the conclusion and let people know when it is ready to review. Thanks for your hard work!

@a-recknagel
Copy link
Author

@lzchen now that I read the pr again, I guess we should have also discussed the matter of the breaking change, my bad. Should it be picked up in the next SIG again? That would give me a bit of time to write the docs, too.

@lzchen
Copy link
Contributor

lzchen commented Jul 11, 2024

@a-recknagel

Yes of course! Feel free to add your topic to the agenda for the Python. Thanks a lot for your patience and hard work on this issue :)

@jeremydvoss
Copy link
Contributor

"Fixes 3673"
This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?

@a-recknagel
Copy link
Author

"Fixes 3673"
This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?

I don't know why it's there, it's not part of the issue text so I can't remove it.

@a-recknagel
Copy link
Author

@lzchen @jeremydvoss I have removed the code which changed the attributes on log-records in order to keep this PR more in line with its intended purpose of adding log-formatting, and only that. I hope this will make it easier to add this change to the next release.

self.assertTrue("division by zero" in stack_trace)
self.assertTrue(__file__ in stack_trace)
self.assertIsNotNone(log_record)
self.assertIn("Zero Division Error", log_record.body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hides a bad breaking change. Exception messages now weirdly add a newline and NoneType parameter.

@jeremydvoss jeremydvoss mentioned this pull request Aug 29, 2024
10 tasks

self.assertIsNotNone(log_record)
self.assertEqual(len(log_record.attributes), 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was removed. Attributes should stay the same even when formatted.

@jeremydvoss
Copy link
Contributor

I recommend closing this PR since #4166 has been merged. The other changes that were in the earlier iterations of this PR can be separated into a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Status: In Progress
9 participants