-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: OpenTelemetry Logs, Events and Domains #2897
Comments
@tedsuo @jack-berg @scheler @zenmoto @alanwest @MSNev @Aneurysm9 @djaglowski please take a look. |
cc @open-telemetry/specs-logs-approvers |
The span solution is poor for discoverability. The system processing the data and / or the user querying the data needs to be aware of the conventions ahead of time. They need to be aware that all HTTP spans require the presence of I assert that for events, the class or type of the event is the core characteristic of the signal. (In contrast, I'd argue that for spans, identifying the class is important, but the core characteristic is likely the hierarchical arrangement.) You want to be able to unambiguously filter for all events of a particular class or type. You want to be able to discover the complete set of distinct classes or types of events that have occurred. This is not possible with duck typing approach taken with spans. Without a data model level definition of what an event is, you can't have an event API. And without an event API, we have to tell users to use existing log frameworks to create event-like things in OpenTelemetry, since we've been adamant (correctly so) that we should not create yet another logging API (currently the log API is only meant for the log appender use case). Telling users to use existing log frameworks to create event-like things is bad a UX, since it requires that users inspect the source code to understand how the log framework data models translate to the OpenTelemetry data model. My opinion is that we need an event API. We need an unambiguous way of identifying OpenTelemetry events in the data model. We need the identification of events in the data model to have very clear semantics to avoid incorrect mapping from existing systems which have events which are semantically different from OpenTelemetry events. My suggestion for changing the event semantic conventions to be less ambiguous is here. |
Yes. I support. I think this does a few things:
As I have not been involved, I can't speak for the client-side group, but trying to put an end user hat on I've always suspected that a browser/mobile specific API would provide better ergonomics. I think it would ultimately be confusing to end users to use a "logger" to record client side interactions.
You had me convinced awhile back that this approach warrants a stronger effort before jumping to a generic solution like Using your example of That said, I'd prefer we push on these limitations. |
This is true. However, why is it acceptable for spans and not for logs/events?
What is the basis for this assertion? I don't understand why this is so.
The way I suggested is equally unambiguous.
Yes. And my suggestion is that we don't need an event API.
This is the part that I mentioned is TBD. Maybe we shouldn't be so adamant about it.
So it sounds like, we need an API, but because we are adamant that it can't be a Logging API let's call it Event API and we are off the hook? :-) |
@alanwest What prevents anything to have an attribute "event.type=browser" even if it was not in fact a browser event? |
Let's say I'm authoring some HTML5 map widget. When users click on my map widget I want to emit a click event with details about the lat/long for the click. What would I do in this case? Have my own custom event or use the standard one with some extra attributes? Adding extra attributes to a standard event might surprise/break backends? Creating a custom event then might ghost these from "browser" or "click" queries completely? I haven't been involved in the client discussions either, but unclear to me we have even solved the original problem 😄 |
This takes us back to the discussion on "combining" them in the first place.. I still propose that they should
IMHO, We need some way to uniquely identify what the log is representing whether that be a generic log, kubernetes log or an event (or some other possible structured record). For events, my original preference for identifying an event was a single And unlike Spans where consumers (backends / UX / etc) "infer" what is being reported by the presence of specific attributes which includes the concept of composition (including anything and everything), at least for events this is explicitly NOT a requirement and should be avoided. eg. what do you do with a log event containing Whether this is a From the discussions yesterday, my preference would be to define that log records which contain a "schema" definition and an index ( Failing that a single presence of a Summary (TLTR)
As long as it's replaced with a
As long as it's split out into its own definition, it would also be "nice" to have a single simple unified Events Api to support the creation of events for a given defined domain. Even if this API is just defining the convenience method(s) for creating and publishing events |
Please, I implore you and the log sig: think about the end user when making this sort of proposal. If I, as an end user, want to create an OpenTelemetry event, what would you be asking me to have to do and know in order to do it, with this proposal? I would need to know that a) I have to use this weird other logging API, that isn't my normal logging API. b) know how to precisely construct a log message so that it would be interpreted as an event by my backend of choice. This is not a user-friendly or ergonomic choice. The end user should not have to understand the logs data model in order to create an OpenTelemetry event. This should be some that is done easily and simply using an Event API. Vendors and creators of backends that support OpenTelemetry events will need to understand these details, of course, but we definitely should not push the requirement of this knowledge onto the end user. Unless, of course, you don't want to have events in OpenTelemetry at all, which is what this proposal seems to be suggesting. |
I agree, but think even this doesn't go far enough. If the Event API is a wrapper around the LogRecord creation API and ensures that the identifying attributes are present doesn't this just push the issue up one layer of abstraction? lr := logrecord.New([]logrecord.Attribute{
{key: "event.domain", value: "browser"},
{key: "event.name", value: "click"},
{key: "browser.click.target", value: "#my-cool-widget"},
...,
} How is that any different from: ev := event.New("browser", "click", []logrecord.Attribute{
{key: "browser.click.target", value: "#my-cool-widget"},
...,
} If I have domain-specific attributes I still need to know that I need to use this weird other event API and provide it attributes that fit my schema without the API itself having any knowledge of that schema or ability to help me avoid shooting myself in the foot. Instead, I'd propose that we not have any generic Event API and instead let groups that want to develop specific event creation APIs to do so. That way, a browser click event could be something more like this: ev := browser.Click(element.ID) |
@CodeBlanch you provide the following attributes: {
"browser.event.type": "click",
"browser.event.lat": 123,
"browser.event.lon": 234,
... whatever other attributes you need to record for the click
}
Why would it break backends? Why is there an expectation that there can't be extra attributes? In fact we have exactly the opposite in our telemetry stability requirements and allow adding log attributes.
Only if they don't follow Otel recommendations on what stability to expect from telemetry. |
@MSNev Yes, we need it. I explained how to do that identification. We can check that
Presence of an attribute seems to be as reliable an indicator as the value of an attribute.
Do you mean containing at the same time? Be restrictive and deal with it like you deal with any other malformed data or be permissive and assume one of the attributes matters and ignore the rest. This is no different than receiving an impossible combination of
Are you arguing that checking for presence is not simple, not unambiguous or is not performant? What exactly is the argument?
Please elaborate. I am not sure I understand what issue is this.
Why can't you define a simple API helper in your domain to create events of your domain's type? For example: function createEvent(logger, type, attrs) {
logger.createLogRecord({attrs...,"browser.event.type":type})
} |
@jkwatson good to see you back at Otel! Looks like you are interested in this topic. It would be great if you could help! Join the Log SIG calls if you can.
I suggest that there is no such thing as OpenTelemetry event. There is for example browser events, for which the browser instrumentation library will define a nice API to call.
No, you don't need to do either of those. Call a purpose-built events API exposed by the library you use.
I agree. I think purpose-built APIs give you exactly this. You can have an API that is shaped to match the problem domain (e.g. browser events or mobile events or whatever else it is). |
I think it will be good to list down the use-cases for Events. RUM is one (both browser and mobile). The other requirement I have seen so far is for capturing Custom Events that different vendors have API for for their users. And the custom events API is across all products including RUM, APM and Infra. There's also a third category for receiver of events from other sources, such as Kubernetes Events. Given these multiple use-cases, I thought it will be good to have a common representation of Events and an API. I don't know if it helps backends if different groups model Events differently and not have a central definition of Events in the specification. |
@scheler can you please clarify what this is?
These are in the Collector. They don't need an API. |
The backends do not know the source. When they are looking for Events, they are looking for messages with specific common characteristics. Events don't need to be created only through an API. Events must have a data model that backends can rely on. Of course, Kubernetes Events or other events received by the Collector need not be Events. However, this fact must be published so the backends know what they are receiving. For the purpose of this discussion, we can drop this category as I dont know if anyone really need them as Events. I don't know if @dmitryax needs K8s events as OpenTelemetry Events or just wanted to align them as they are named events. |
Because it may contain
As called out, if we define the semantics of what is an event vs what is a general log then you also avoid the situation of (for example) someone wanting to send an event (like perhaps an exception) which happens to contain other attributes that would normally be an event. eg. An exception occurred while processing event x -- if both are present what is this...
It's not unambiguous or performant as now the back end needs to know what "event" it might want to route (in order of preference if multiple names are present). As opposed to looking for a single "this is an event" and then routing/validating as such.
If there is a schema present then the backend (may) perform additional validation of the content of the fields and cause the event to be dropped (and not sent for storage / indexing) because the received "event" is not deemed to be valid, examples of possible simple validation
This also helps with simple transformations where a value is passed as a string, but want to be stored as a numeric, or enum value validation. A more extreme example would be if a value is passed as a simple JSON encoded blob, this "could" provide direction on how fields are converted without it needing to be a full OTLP attribute definition Where the schema would define that "key1" is a string value and "key2" is an integer.
I thought that is what I said :-) create domains specific API helpers. which could hide the fact that it's using the logger and hide the fact that it's adding additional (fixed) properties to identify the event type / schema definition etc function createBrowserEvent(type, attrs) {
_logger.createLogRecord({attrs...,"event.type": "browser." + type})
}
function createBrowserPageViewEvent(attrs) {
_logger.createLogRecord({attrs...,"event.type":"browser.pageview"})
} |
If there is to be no event API, then I propose we get rid of an otel logging API for Java altogether. Java does not need another logging API . Period. Full stop. |
I don't know why it's acceptable for spans. I think attribute presence for classification based on prior knowledge is cumbersome and may be fragile in some instances. For example, you can't actually use the presence of
My chain of reasoning is: 1. It's valuable to have an API for emitting events. 2. You can't define an event API without defining what an event is. 3. The presence of an event class or type is a common thread in many existing definitions of events.
Those are unambiguous yes. But the total set of events received is not discoverable. Can only discover the set of browser events.
Consider the following options:
Those clearly aren't the only options, but for me option 1 is simpler to explain to users and simpler for backends to consume. It's not clear to me what we stand to gain by taking away an event API, and / or taking away a clear way to identify the class of the events. Put another way, even if we can move forward without an event API, and without a clear way to identify the class of the events, why impose a constraint which makes things harder to reason about. |
There would still need to be some generic "event.name" and maybe "event.domain" attribute in the data model so that generic events can be supported by all backends simply by following the spec and not require knowing about every type of event -- since not every type of event will be supported by every backend if that direction were taken. A backend could require the user input the custom attribute to look for to create an "event" but that seems like unnecessary work for both the backend UI and the user when the log data model could support this. An event API that acts on top of the logs SDK would be preferable in my opinion for users -- along with continuing to not require a logs API since there are languages already with capable structured log apis. |
I should also add here that one of the "definitions" of an event being considered for the RUM sig is that an "event" is defined by
The rationale for having all of the defined "event" fields in This is also where the "schema" definition would be extremely useful to allow defining the validation on this embedded So taking this further this could also be extended to "move" the name into the example using the browser domain version 1 {
event.schema: "otel://browser/1",
event.data: {
name: "pageview",
....
}
} |
Spans already have a structure; compared to them, logs are going to be super noisy, unstructured, and possibly orders of magnitude more in volume. Always querying for a known set of fields domain & name like so |
OK. To summarize the comments: there is not much love for the presence-based domain indication and there are known downsides. I am going to close this proposal as rejected. I am at Kubecon, so won't be able to work on for until next week. I still think the current spec is not the best we can do. |
We have had a lot of discussions around events and logs and can't come to an agreement on what exactly to do.
I think we need to step back and understand what problem we are solving.
The key moment for me was when @tedsuo mentioned that the desire for "events" originated from the need to have a "primary index", which is essentially a piece of data on which to base "groupby" or "filter" functions on the backend. For example, perhaps you want to see only Kubernetes events, so you want to be able to filter by this criteria. Or perhaps you want to see all browser events grouped by their type (e.g. "click" events grouped separately from "scroll" events).
This need drove the proposal to have the "event.domain" and "event.name". The tuple ("event.domain", "event.name") becomes that "primary index" that can be used for filtering, grouping and any other similar querying.
Now here is the thing: I don't think this is necessary. We have this exact problem in spans and we solve it differently.
For example you may have "http" spans or you may have "database" spans. How are these differentiated? We don't have an attribute with a value that tells you which domain the span belongs to. Instead we use the fact of presence of a particular attribute to know which domain the span belongs to. If the span has "http.method" attribute (or any of the required http attributes) then it is an "http" span.
So, if you want to filter all "http" spans, you choose the ones which have the "http.method" attribute present. If you want to see all "http" spans grouped by their method you use the value of "http.method" for grouping.
What prevents us from using the exact same approach for logs and events?
For example, all browser events can have an attribute "browser.event.type". For click events we will have "browser.event.type=click".
Another example is Kubernetes. For Kubernetes events there is no defined concept of "type" at all, but there is a concept of "reason" so you will have "k8s.event.reason" as the attribute, the presence of which indicates it is a Kubernetes event.
Note how the attribute names for different domains are different. This aligns well with how things work in the tracing world, for span attributes.
So, my question is: why do we even need the new concept of "event.domain" and "event.name"?
Proposal
I suggest deleting "event.domain" and "event.name" semantic conventions.
Experts of a particular area need to come up with their own set of semantic conventions in their own namespace. For example, browser events will be in the "browser.*" namespace. This is exactly what we do for spans and metrics.
I suggest deleting the "Events API". Also delete the event_domain parameter from "Get Logger" API. These are unnecessary.
Otel libraries that help instrumenting the browser can provide helper APIs for creating browser-specific events (DOM events). That specific API can require the "eventType" as parameter and record its value as "browser.event.type" attribute.
FAQ
Does this Solve the Original Problem?
Yes, it does. The "primary indexing" will be based on the presence of the attribute instead of the attribute value. This aligns well with what we do in the tracing world.
Are Backends Capable of Presence Check?
I believe there are at least some that do. See for example Elasticsearch Exists query or Splunk if(isnull()) query.
Backends that aren't (are there any?) will need to implement it if they want to have the associated capabilities.
What Else Does This Help With?
The entire debate of what is an event and how we define it becomes almost unnecessary. If in a particular domain there is a concept of event then reflect that in the naming of the attributes (e.g. "browser.event.type"). That's all that is needed.
All we will assert is this: "Logs and events are represented as Otel's LogRecords concept".
What About Schema URL?
We discussed that perhaps Schema URL can be the indicator of the domain instead of an attribute value. This is no longer necessary since the domain is indicated by a presence of an attribute on the LogRecord.
What Do We Record on the Scope?
Nothing. No Scope attributes are necessary anymore.
How Do We Record Kubernetes Events?
Exactly as we do today: just use a set of "k8s.event.*" attributes.
How Do We Record Windows Events?
Use "windows.event.*" namespace and "windows.event.category" attribute for the presence check.
How Do I Build a UI That Can Groupby and Filter for Domains?
The UI needs to be aware of what domains it supports and what is the presence attribute for the domain. This is exactly the same situation as with spans.
Of course a generic UI that can groupby and filter by an arbitrary attribute will work just fine too.
What does Logs API look like?
TBD, if this we agree on this proposal conceptually.
The text was updated successfully, but these errors were encountered: