Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the purpose of events in OpenTelemetry #4045

Closed
tedsuo opened this issue May 10, 2024 · 17 comments
Closed

Define the purpose of events in OpenTelemetry #4045

tedsuo opened this issue May 10, 2024 · 17 comments
Assignees
Labels
spec:logs Related to the specification/logs directory triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned

Comments

@tedsuo
Copy link
Contributor

tedsuo commented May 10, 2024

OpenTelemetry has added the concept of an event, but has not fully defined the purpose of events. This has created some confusion and difficulty in the Event SIG, and will create further confusion as events become stable. Before we go any further, I'm requesting that we agree on the purpose of events, and how they relate to logs, and add this description to the specification. Much like how we recently defined the purpose of OpenTelemetry attributes, clarity here will allow us to make coherent decisions in our design.

Through many discussions, I believe that we have arrived at a useful definition, and the motivations for that definition. I've described it below:

OpenTelemetry Events

Unlike some other systems, OpenTelemetry's definition of an event is not a separate data stream. In OpenTelemetry, an event is a "semantically rigorous log." Events in OpenTelemetry are explicitly defined as semantic convention for logs.

Because OpenTelemetry is designed to emit logs that are not only human readable, but machine readable, a definition for rigorous semantics is needed. Whenever OpenTelemetry instrumentation emits a log to describe a computer operation, that log MUST have the same level of semantic consistency that we expect from other signals such as traces and metrics. The term that OpenTelemetry uses for these structured logs is "event." Any semantic convention that we define for the log signal MUST be defined as an event.

That's it. That's what an event is in OpenTelemetry. The structure used for creating semantic conventions for logs. Nothing more, nothing less.

OpenTelemetry Event API

What is the purpose of the Event API in OpenTelemetry? OpenTelemetry exposes an Event API, but does not expose a Log API. Why is this? To answer this question, we must divide logs into two categories: logs created by end users and logs created by shared instrumentation.

End users are encouraged to continue using their existing logger in their application, and to add a log appender that sends these logs to the log bridge API. End users are not required to use the Event API.

That said, end users may want to log events. If their logger is capable of emitting a structured log that matches the OpenTelemetry definition of an event, users MAY use their logger to create events, as events are just semantic conventions for logs. However, if their logger is not capable of emitting this kind of structured log, end users SHOULD use the OpenTelemetry Event API for this purpose.

Shared OpenTelemetry instrumentation – library instrumentation designed to be loaded into many applications – MUST use the OpenTelemetry API. This is for three reasons.

First, shared instrumentation packages should not emit unstructured logs, they should only emit fully structured and semantically rigorous logs designed for machine analysis, because these packages are only describing computer operations. If the computer operations being described are common, they should be added to our registry of semantic conventions so that they can be consistent across all of the libraries that emit the same event.

The second and more important reason for requiring the use of the Event API is that shared instrumentation must be loaded into many applications, and taking a dependency on a third party logger is incredibly problematic.

If an instrumentation package pulls in a third party logger, the chances that it select the exact logger selected by the end user is low. It is unreasonable to assume that all end users will select the same logger. This means that the end user must know of the existence of this additional logger, and must correctly configure it with a log appender to either send the data to the OpenTelemetry log bridge or to the end user's primary logger. Different shared instrumentation packages may choose different loggers. Determining how many loggers and what kind of loggers are present across all of the shared instrumentation packages pulled into their application is a huge burden to place on an end user. From a product perspective, this is a terrible experience.

Finally, and most importantly, shared instrumentation packages may in fact choose the same logger, but choose incompatible versions. This creates exactly the kind of unresolvable dependency conflict that OpenTelemetry works hard to avoid. For this reason alone, all shared instrumentation must use the Event API in order to record OpenTelemetry events. To create an OpenTelemetry API that explicitly avoids dependency conflicts, only to have all of that hard work undone by requiring shared instrumentation to depend upon a third party logger is a non-starter for OpenTelemetry. It violates a core tenant of OpenTelemetry API design and cannot be allowed.

OpenTelemetry Event SDK

Because OpenTelemetry defines events as logs with rigorous semantic conventions, there is no separate SDK for events. The SDK creates a log object from the Event API, and passes that object into the top of the log pipeline. All events contain an event.name attribute, so it is easy to write log processors that only process events.

Okay, that's it. Those are the definitions and motivations that we have come to. Please let me know what you think in the comments.

@tedsuo tedsuo added the spec:logs Related to the specification/logs directory label May 10, 2024
@trask
Copy link
Member

trask commented May 10, 2024

+💯 I like this a lot!

@jack-berg
Copy link
Member

This is similar I was trying to get at with #3254. ➕

Related food for thought: if shared instrumentation must use the event API, then we must provide good tooling for getting those events into the third party logger used by the app, while avoiding circular loops. The log SDK excels at sending data to network locations via OTLP. We don't want to take on the burden of recreating the rich config options of existing log frameworks, but we need to find a way to make it so you can see your log records recorded via the OpenTelemetry Event API in stdout / console. OTLP can't be the only way to access them.

@tedsuo
Copy link
Contributor Author

tedsuo commented May 11, 2024

@jack-berg that makes sense, and I think the solution is to have the constructor for the EventProvider implementation (which is the logging SDK in our case) take an optional LogEmitter. If that is provided, events are sent to that LogEmitter instead of the SDK's logger.Emit function. Users can then implement a LogEmitter interface that sends the data to their logger of choice, which will cause events to funnel into the top of the user's entire logging pipeline without any issues with circular loops.

There's probably some nonsense around keeping the original EventProvider's attributes on the log without them getting overwritten later by the LoggingProvider that user's logger is sending the data to, but other than that I don't really see any complications.

@alexvanboxel
Copy link

alexvanboxel commented May 11, 2024

I like the definitions; as a consumer, it's almost what I need. But I have some questions when reading the statement (I haven't kept up-to-date with the track that the Events have gone through).

In the section Events, it's unclear to me that the semantic conventions are focused on log attributes, the body, or both. Reading on to the API section it almost feels like the Events are focused on the body, as I have yet to see any existing framework that is able to create the attributes.

In the SDK section it's unclear to me if bridging functionality needs to provide a flow back to the logging framework (maybe with loss of fidelity, or clear mapping rules from attributes to log-line). (edit: this was the same comment as @jack-berg mentioned)

@tedsuo
Copy link
Contributor Author

tedsuo commented May 12, 2024

@alexvanboxel semantic conventions for Events cover both the attributes and the body. So far, I do not believe that we have seen an example of an event that needed additional attributes, but we've mostly been focusing on event's generated by web browsers. That said, there's no reason why events could not have attributes.

@tedsuo
Copy link
Contributor Author

tedsuo commented May 12, 2024

To clarify, it's possible for an event to only have attributes, and not have a body. But that would likely be an event defined by OpenTelemetry, not an event created by another system such as a browser.

@alexvanboxel
Copy link

I'm currently thinking about 2 use cases (body with a mix of body and attribute).:

A company internal one

Several years ago, we defined a Collibra Telemetry Log Event with mixed results. This event is based entirely on the definition of a JSON body that, also includes an event_name field. If we detect this event format in our OTLP logs stream, we reroute it to analytical backends where analysts can analyze the data. The goal was to simplify correlation between different events by using common fields in the JSON. I call it a mixed success because while it's been widely adopted within the company (due to being rerouted to the analytics backend), the more it's used, the more challenging it becomes to accurately correlate events.

So, I've been waiting for Events to work with version v2 of our definition. The idea is to focus primarily on governance around the attributes used across teams. Examples include tenant.id, workflow.id, job.id, session.id, and user.id. Additionally, we want to provide teams with more freedom in defining the event body content. While some attributes like session.id might already be included in semantic conventions, others might be specific to our company's needs.

We want the v2 be used in the backend and the frontend (browser). So this is an example of an event with a body and attributes, but defined outside standard OpenTelemetry semantic conventions.

Proxy and Access logging

I've joined the security semantic conventions SIG to draft an idea of a common Proxy and Access logging. I'm looking for the events API to base this on. I already see a mix of attributes, with the already defined semantic conventions for networking, HTTP, users, session, but also body. I think it will be impractical to have everything defined in a attribute, certainly if it is too domain-specific, that would go in the body.

@alexvanboxel
Copy link

So to clarify: if events can combine both, and semconv around the events can describe a combination of body and attributes I think that's good. But for me that's not clear in the definition above, but maybe that's fine if it can be clarified somewhere else in the spec.

@tedsuo
Copy link
Contributor Author

tedsuo commented May 14, 2024

@alexvanboxel for clarity: attributes in OpenTelemetry are defined as simple data structures, and are intended to be used as indexes or dimensions. The body field is intended for storing complex data structures. When defining an event, both the attributes and the body need to be consistent. Does that help?

@alexvanboxel
Copy link

@alexvanboxel for clarity: attributes in OpenTelemetry are defined as simple data structures, and are intended to be used as indexes or dimensions. The body field is intended for storing complex data structures. When defining an event, both the attributes and the body need to be consistent. Does that help?

Yes, it does. Do you already have an idea of how you will describe the body, or is it too early to answer that?

@austinlparker austinlparker added the triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned label May 14, 2024
tedsuo added a commit to tedsuo/opentelemetry-specification that referenced this issue May 16, 2024
Refine use cases based issue open-telemetry#4045
@tedsuo
Copy link
Contributor Author

tedsuo commented May 16, 2024

@alexvanboxel do you mean how to describe the body as a semantic convention. Yes we are working on it. You can see an example here: https://github.com/open-telemetry/semantic-conventions/blob/main/model/logs/mobile-events.yaml

@tedsuo
Copy link
Contributor Author

tedsuo commented May 16, 2024

Based on this discussion, I've updated #3969 to add logging from shared libraries as a reason to use the Event API.

tedsuo added a commit to tedsuo/opentelemetry-specification that referenced this issue May 17, 2024
Refine use cases based issue open-telemetry#4045
@kelko
Copy link

kelko commented May 25, 2024

If I may extend this question about the purpose of "Event" (but I can create an own ticket if preferred):

This events as described here ("more structured version of logs"), how do they relate to events inside of trace spans?

When I first started to read about OTel I did not know this definition of events and thought a possible way to make use of the distinction between events of trace spans and logs is:

  • logs are every important "state" of global interest, that Ops or Devs want to analyse and filter. E.g. "database not available", "License expired", "Whoops, this code should never have been reached"
  • span events are important situations happening during execution, that is relevant to understand the flow of that execution. e.g. API returned 401, users locked himself out, ...

And I liked that idea. With trace-related things as events at the span, stored e.g. in Jaeger, the logs would only have the most important things I need to see at a glance, stored e.g. in Loki.

A first test with OTel in .NET seemed to confirm, as events are added to System.Diagnostics.Activity and not created individually and are stored with that, as far as I could tell. Additionally Microsoft.Extension.ILogger is for logging, which I could direct into a different target.

Coming back to the discussion here about Events and Logs:
Are the Events in context of trace spans the same as the Events in context of "more structured logging"?
If so: is that desirable? I liked the ability to divide & conquer.

carlosalberto pushed a commit that referenced this issue Jun 3, 2024
Resolves #3254 and #4045 

The PR adds clarity to the features provided by Events, as well as
guidance on when it is appropriate to use the Event API.

## Changes

* Reorganized the Data Model section for clarity
* Include MUST for the requirement that Event `Body` and `Attributes`
fields conform to a schema.
* Include use cases for when Events are appropriate.
* Include a warning that advanced logging features are not currently
accessible when using Events.
@tedsuo
Copy link
Contributor Author

tedsuo commented Jun 18, 2024

Hello @kelko! To explain, Events are a replacement for SpanEvents.

In OpenTelemetry, all LogRecords include a SpanContext if there is an active trace when they are recorded. It is possible to include these logs in either a tracing backend or a logging backend, in both cases using the SpanID and TraceID to index them is helpful. But in general, the industry is moving away from traces, logs, and metrics being stored in separate backends, as this is a very limiting approach. OpenTelemetry data is a graph which contains all of these signals together, so that a single backend can process all of this data together.

But, within the logging signal, it is still possible to separate Events from regular LogRecords, because Events have a specific format. You can add a processor which turns any LogRecord containing event.name into a SpanEvent, and you will get the behavior you described in your comment.

@kelko
Copy link

kelko commented Jun 18, 2024

@tedsuo Thanks for the clarification 🙏

@cijothomas
Copy link
Member

Because OpenTelemetry defines events as logs with rigorous semantic conventions, there is no separate SDK for events.

Is https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/event-sdk.md#events-sdk intended to be removed, based on the above comment?

@tedsuo
Copy link
Contributor Author

tedsuo commented Jul 8, 2024

@cijothomas no, it is not. To clarify, what I meant was that Events are LogRecords, and are expected to be fed into the Log SDK. As it says in the spec:

The EventLoggerProvider MUST be implemented as a proxy to an instance of LoggerProvider.

If there is some need for Event-specific pre-processing, that can be shimmed in at this layer. Right now, there nothing defined that would need this. Events have their own Provider in order to create space for this eventuality, and to allow alternate implementations the freedom to do whatever they would like to do with the Event API.

Given that the PR is now accepted, and no more changes are being proposed, I'm satisfied that this discussion has done its job and I am closing the issue. Thanks y'all!

@tedsuo tedsuo closed this as completed Jul 8, 2024
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
)

Resolves open-telemetry#3254 and open-telemetry#4045 

The PR adds clarity to the features provided by Events, as well as
guidance on when it is appropriate to use the Event API.

## Changes

* Reorganized the Data Model section for clarity
* Include MUST for the requirement that Event `Body` and `Attributes`
fields conform to a schema.
* Include use cases for when Events are appropriate.
* Include a warning that advanced logging features are not currently
accessible when using Events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned
Projects
None yet
Development

No branches or pull requests

7 participants