Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logs] Add semantic conventions for durable identifiers #372

Closed
CodeBlanch opened this issue Oct 6, 2023 · 26 comments
Closed

[logs] Add semantic conventions for durable identifiers #372

CodeBlanch opened this issue Oct 6, 2023 · 26 comments
Labels

Comments

@CodeBlanch
Copy link
Member

CodeBlanch commented Oct 6, 2023

We have a need across Microsoft to support durable identifiers in logging. A durable identifier could be just an "id" or it could be an "id" + "name". This should be uniform across languages (C++, .NET, Rust, etc.) so that backends can be coded with rules to act on specific log messages.

Definitions

  • id:
    A durable numerical identifier, that can be used to differentiate logs from each other. This is not guaranteed to be unique in anyway. This could be used for scenarios such as filtering based on identifier, triggering actions (for example, when a log with a given eventId is fired, take a memory dump) etc.

  • name:
    Short event identifier that does not contain varying parts. Name describes what happened (e.g. "ProcessStarted"). Recommended to be no longer than 50 characters. Not guaranteed to be unique in any way. Typically used for filtering and grouping purposes in backends.

Proposal

I am going to propose 2 ways to go about this:

Proposal A

Add new fields on General log identification attributes: https://github.com/open-telemetry/semantic-conventions/compare/main...CodeBlanch:logs-id-and-name?expand=1

I think this is a nice fit.

Proposal B

Loosen the events semantic conventions for event.domain & event.name and add event.id: https://github.com/open-telemetry/semantic-conventions/compare/main...CodeBlanch:events-id?expand=1

I don't think this is a particularly nice fit as these are logs and not "events" as the working group originally intended, however it seems some languages have gone down this road:

/cc @alanwest @reyang @trask @utpilla @vishweshbankwar @rajkumar-rangaraj @cijothomas @jack-berg @ThomsonTan @MSNev

@MSNev
Copy link
Contributor

MSNev commented Oct 6, 2023

Loosen the events semantic conventions for event.domain & event.name and add event.id

While I "like" the idea of an id this is completely different from what the event.domain and event.name are for.

The event.domain and event.name ARE REQUIRED to identify that the contents of the payload conforms to the semantic conventions as defined by the domain / name combination.

And looking at a Log definition isn't there already a log.record.uid which can be used as a durable identifier???

In my opinion what you are trying to do "should" really be it's own event so something like

event.domain: "DotNet"
event.name: "LogEvent"

With the payload is then

"event.data":
{
    key: "",
    categoryName: "",
    source: "",
    id: "",
    data: "",
    message: ... // etc
} 

@cijothomas
Copy link
Member

And looking at a Log definition isn't there already a log.record.uid which can be used as a durable identifier???

Thats does not look durable...

@CodeBlanch
Copy link
Member Author

log.record.uid seems to be unique to every log written so I don't think it is a good fit for this.

@jack-berg
Copy link
Member

Is this durable id essentially just a numeric representation of event.domain + event.name? Sort of like a hashcode / checksum applied to the event domain / event name combo, which uniquely identify the category / type of thing happening?

@MSNev
Copy link
Contributor

MSNev commented Oct 6, 2023

Note: for "events" we do have an open issue on whether to "keep" both the event.domain and event.name vs just having the event.name Decide if event.domain separately from event.name is necessary #2994

So it may become something like
event.name: "DotNet.LogEvent"

So any input / preference you might have for identifying that uniqueness of an event would be welcome.

@CodeBlanch
Copy link
Member Author

CodeBlanch commented Oct 9, 2023

@jack-berg In .NET durable id is traditionally a manually-assigned thing. Eg:

new EventId(id: 18, name: "ProcessingFailure")
new EventId(id: 100)

There is a category name in .NET but it isn't part of the EventId contract. There really isn't any guarantee about the uniqueness, that is up to the app/dev to do correctly.

There is a new thing coming in .NET 8 (Nov 2023) to auto-generate Id as a hash of Name if the value wasn't provided: https://github.com/dotnet/runtime/pull/87892/files#diff-0a0edfbf348cc08e79a01ff6b0f9023793c86880d2cc9cbbb88c98bb7eb14575R236

@MSNev

I like the structure you are proposing for turning these into "events" but that is a probably a non-starter. For a couple of reasons:

  • It makes these into .NET-specific things. We need this to be a standard across languages. If we wanted it to be .NET-specific we would just name these attributes something like dotnet.log.event_id + dotnet.log.event_name and call it a day 😄

  • These are logs and people will want to see them in their log pipelines. Adding a couple of attributes/columns should be easy for backends to consume/handle. Creating a new pipeline/signal seems unnecessarily complex and will likely take a long time? There may also be a mismatch as far as volume. Logs will be high volume. Client events will be low volume (assuming?). I don't think backends will want to parse the complex event.data structure to get at the details in the high-volume path given the primary use case is to peek during ingestion. A primary goal is that access should be fast/cheap.

@MSNev
Copy link
Contributor

MSNev commented Oct 9, 2023

It makes these into .NET-specific things.

I just gave the domain "dotnet" as an example, if this can be rationalized into something more generic then it would work.

Client events will be low volume (assuming?).

No, I would expect the reverse as every "client" (browser, android app, ios app, etc) could be sending thousands of (defined) events. eg. some websites track user clicks. And as part of "decoding" these values they have specific UI to handle these events.
No, they are actually the reverse. As every "client' will be sending the event.

I don't think backends will want to parse the complex event.data structure to get at the details in the high-volume path given the primary use case is to peek during ingestion.

But MS does this today, it doesn't necessarily need to happen at the first receiver but and there are cases of some sub-teams performing this at the point of querying the data -- but it DOES happen and is required for defining health alerts in relation to "some" events (such as client side JavaScript errors)

@MSNev
Copy link
Contributor

MSNev commented Oct 9, 2023

I don't think backends will want to parse the complex event.data structure

This is actually one of the main sub-points for why we are pushing for defining the "shape" of an event, (domain, name, payload (event.data)) is so that venders / clients / receivers can make the decision on whether they event want to "parse" the payload (based on the "name") to perform either

  • validation of the inbound event
  • Store in a different table (for faster / better UX)
  • drop / throttle the event (at the server side -- because even if every client only sends 1 event you could still be receiving 100's millions of those events (every browser, for every page load, etc)

@CodeBlanch
Copy link
Member Author

Ya I don't know about the volume bit. I think a human doing things will always be much less than a cloud cranking out logs at scale. But let's put that aside for a second.

Why do we want these logs to be some other signal? No one in the .NET community has asked for events (so far). What they are asking for are these attributes on their logs.

What is the argument against...

General Logs Attributes

Attribute Type Description Examples Requirement Level
log.id string A durable identifier for the Log Record. 1; 0x100F Optional
log.name string A name for the Log Record. RequestProcessed; InvalidResponse Optional
log.record.uid string A unique identifier for the Log Record. [1] 01ARZ3NDEKTSV4RRFFQ69G5FAV Opt-In

@jack-berg
Copy link
Member

log.record.uid already exists.

log.name seems conceptually similar to what we're trying to do with event.domain / event.name. That is, some sort of natural key representing the class / type of thing that happened. I.e. make it easy to query for logs WHERE log.name = 'RequestProcessed and find all the records which have a similar shape. The event.domain field just adds a namespace to avoid collisions of event.name, but I think the idea is otherwise the same.

log.id - I still don't get this. From what I'm reading here, its sort of like an integer alternative to what you call log.name, but not all records with the same log.name have the same log.id. If this is an accurate description, I'm not sure what purpose this serves and how to generalize it outside of dotnet ILogger.

Why do we want these logs to be some other signal? No one in the .NET community has asked for events (so far). What they are asking for are these attributes on their logs.

Not sure what you mean by this. Currently, OpenTelemetry logs and events share the same data model. As of now, event is defined as:

Events are recorded as LogRecords that are shaped in a special way: Event LogRecords have the attributes event.domain and event.name (and possibly other LogRecord attributes).

Where event.domain / event.name are conceptually similar to what you suggest as log.name. That definition doesn't seem to be at odds with what the .NET community has asked for. (With the exception of the log.id field, which I'm still trying to understand.)

@CodeBlanch
Copy link
Member Author

@jack-berg

log.id is by far the more important thing. My understanding is that something like Windows has thousands (millions?) of logs defined each with an assigned unique id (keep me honest here @reyang). In .NET if you supply the structure, the id is required and name is optional. The name is really just a convenience thing for UI display.

As of now, event is defined as...

I think this would be a perfectly suitable solution:

Attribute Type Description Examples Requirement Level
event.domain string The domain identifies the business context for the events. [1] browser Optional
event.name string The name identifies the event. click; exception Optional
event.id string The durable identifier for the event. 1; 0x100F Optional

But it is the "as of now" bit that is the problem for me. If the future goal is...

"event.domain": "DotNet",
"event.name": "LogEvent",
"event.data":
{
    key: "",
    categoryName: "",
    source: "",
    id: "",
    data: "",
    message: ... // etc
} 

That seems like a non-starter. Because...

a) We can't have a .NET specific solution. It most suit C++, Rust, and others.
b) Usage of EventId structure will be mixed. Most devs in small shops I would wager do not take the time to manage or specify the structure, which is perfectly fine. But the runtime internals will. We don't want to bifurcate the users log streams and hope their backend will do the right thing, do we?

Let's say the user has a client application which calls out to a cloud service. Client app emits these RUM events and they show up in a nice event UI. The cloud app emits logs. Because of this bifurcation some logs go to a nice log UI and some logs go to the event UI and show mixed in with the client interactions. Good or bad experience?

@jack-berg
Copy link
Member

Let's ignore the event payload / event.data for a second. I do have opinions on that but think that talking about them just muddies the waters.

Wondering if you can help me understand how I would use event.id outside of a Windows / .NET use case. An event has a event.domain and event.name. Together, these function as a key that allows you to identify the class / type of thing that happened. I can easily filter log records to find the ones with a particular event.domain / event.name.

Now let's consider a future where we also add a event.id, defined as a durable identifier for the event. This appears to be an alternative to the event.domain / event.name key. Let's consider a user calling a future event API where we have an event.id attribute. The event API requires that you specify the event.domain / event.name because those are key parts of the event semantic conventions. Presumably, it would also accept a new event.id parameter.

How do we describe to a user what value should be used for event.id? How does a user ensure that the event.id they select is unique to others recorded by other instrumentation? How do we describe the difference between event.id and the other event.domain / event.name key (they seem to serve the same function).

Let's say the user has a client application which calls out to a cloud service. Client app emits these RUM events and they show up in a nice event UI. The cloud app emits logs. Because of this bifurcation some logs go to a nice log UI and some logs go to the event UI and show mixed in with the client interactions. Good or bad experience?

IMO, the point of the Event semantic conventions isn't to be prescriptive to how backends should process / ingest these Events. Its to give backends and users alike a very clean way to identify all the times a particular class / type of thing happened. With spans we rely on duck typing: How do you identify a HTTP server span? You find all spans with span.kind = SERVER AND http.request.method != null. That's an ok heuristic, but we can do better. That's the point of event.domain / event.name. The combo gives you a no-nonsense way to select all the times that particular class / type of thing happened. Sure you could probably do pattern matching on the log body, but that is likely to be less performant (e.g. consider a regex match against a body vs. WHERE event.domain = foo AND event.name = bar) and a pretty crappy experience.

If a backend wants to do something fancy, and store a particular type of event in a special way, that's fine, but it should also be fine for the backend to just treat all the log records the same, whether or not they have event.domain / event.name attached. We should mean it when we use "log" and "event" interchangeably in the log data model.

@CodeBlanch
Copy link
Member Author

Let's consider a user calling a future event API

This may be the wrong thinking. What I'm pushing here is for the spec to define where appenders/bridge components can drop this information should the originating framework support such a concept.

.NET ILogger usage is like this:

public class WeatherForecastController
{
	private readonly ILogger<WeatherForecastController> _logger;

	public WeatherForecastController(ILogger<WeatherForecastController> logger)
	{
		_logger = logger;
	}

	public IEnumerable<WeatherForecast> Get()
	{
		WeatherForecast[] forecasts = GenerateForecasts();

		_logger.LogInformation(
			eventId: LogEvents.WeatherForecastsGenerated,
			message: "WeatherForecasts generated {count}",
			args: forecasts.Length);

		return forecasts;
	}
}

internal static class LogEvents
{
	public static EventId WeatherForecastsGenerated { get; } = new(id: 4001);
}

Let's look at some different ways we could send this off.

[Parden the pseudo json here. Trying to sort of show how it would look for OTLP log data model but I didn't try to nail the structure perfectly.]

One thing we are looking at for the categoryName ("WebApplication1.Controllers.WeatherForecastController" in this case) is to send that as instrumentationScope.name. Something like this:

{
   "instrumentationscope": { "name": "WebApplication1.Controllers.WeatherForecastController" } },
   "logs": [
      {
         "body": "WeatherForecasts generated {count}",
         "attributes": [
             { "key": "log.id", "value": 4001 },
             { "key": "count", "value": 100 },
         ]
      }
   ]
}

Or using event stuff:

{
   "instrumentationscope": { "name": "OpenTelemetry.Bridge.ILogger" } },
   "logs": [
      {
         "body": "WeatherForecasts generated {count}",
         "attributes": [
             { "key": "event.domain", "value": "WebApplication1.Controllers.WeatherForecastController" },
             { "key": "event.id", "value": 4001 },
             { "key": "count", "value": 100 },
         ]
      }
   ]
}

EventSource (C++, .NET) usage is like this:

[EventSource(Name = "MyLibrary")]
internal class LibraryEventSource : EventSource
{
	public static LibraryEventSource Log = new();

	[Event(100_001, Message = "Unexpected request length: {0}", Level = EventLevel.Warning)]
	public void LogUnexpectedRequestLengthWarning(int requestLength)
	{
		WriteEvent(100_001, requestLength);
	}
}

public class SomeClass
{
	public bool ProcessRequest(int length)
	{
		if (length < 0 || length > 1024)
		{
			LibraryEventSource.Log.LogUnexpectedRequestLengthWarning(length);
			return false;
		}

		// Process request

		return true;
	}
}

Here is how that might look using the same styles:

{
   "instrumentationscope": { "name": "MyLibrary" } },
   "logs": [
      {
         "body": "Unexpected request length: {requestLength}",
         "attributes": [
             { "key": "log.id", "value": 100_001 },
             { "key": "log.name", "value": "LogUnexpectedRequestLengthWarning" },
             { "key": "requestLength", "value": 9000 },
         ]
      }
   ]
}

Or using event stuff:

{
   "instrumentationscope": { "name": "OpenTelemetry.Bridge.EventSource" } },
   "logs": [
      {
         "body": "Unexpected request length: {requestLength}",
         "attributes": [
             { "key": "event.domain", "value": "MyLibrary" },
             { "key": "event.id", "value": 4001 },
             { "key": "event.name", "value": "LogUnexpectedRequestLengthWarning" },
             { "key": "requestLength", "value": 9000 },
         ]
      }
   ]
}

How do we describe to a user what value should be used for event.id?

We shouldn't. Again, I'm not trying to design a new event API. I'm trying to bridge things. It is up to the logging framework to define that. We just need to give it a home.

@MSNev
Copy link
Contributor

MSNev commented Oct 11, 2023

IMO, the point of the Event semantic conventions isn't to be prescriptive to how backends should process / ingest these Events. Its to give backends and users alike a very clean way to identify all the times a particular class / type of thing happened.

Exactly

If a backend wants to do something fancy, and store a particular type of event in a special way, that's fine, but it should also be fine for the backend to just treat all the log records the same,

Spot on, the domain/name combination is the "key" value that backends would use to determine (if) they want to do anything special.

Ahh, now that I see all 3 examples (it takes be back to my C++ .COM days) I see that the id in this case is really the unique specific id for the "Provider" (or DLL back in my C/C++ days) for the error/event, (which also handled the encoding / decoding of any encoded binary data). Looking at it from this perspective you could say that the id is actually the name (if you want to represent as an OTel "Event") and the include "name" is just an (optional) attribute of the event payload -- Although, I think in this case it may be better to just represent this as a log as most venders won't want to do this for random application / component "Custom" error / events. And most users will probably also just deal with in the same manner (which agrees with your statement of that no-one has asked for it -- because each "message" (id / Provider) is effectively unique.

One of the things we have not yet get to defining as part of the event definition is "how would any application represent their own 'Custom' event" and I think this specific case is really about how do you take the Windows Event representation and relay it.

So apart from just defining this as a normal LogRecord (which IMHO should avoid using the event.domain and event.name) this could be the starting point for identifying / defining either

  • what a generic "Custom" event should look (user / application / component) - that is a event.domain that is not defined by OTel
  • A special type of OTel event for this type of windows centric LogEvent (as opposed to my original suggestion of a "DotNet' centric domain)

@CodeBlanch
Copy link
Member Author

Should we meet on this? It feels like everything is pulling in different directions.

Here is an idea...

In the data model spec, InstrumentationScope for logs has an interesting quirk where the Logger Name SHOULD be recorded as the Instrumenation Scope name. What we're doing in .NET at the moment is putting our category name for ILogger there. For EventSource, I could set the Name as instrumentation scope name.

Let's say we do that and drop event.domain. That leaves just event.name on the events semantic conventions.

We have an area in the spec called General log identification attributes which declares itself as: These attributes may be used for identifying a Log Record

So we delete the events semantic conventions and define...


Attribute Type Description Examples Requirement Level
log.id string A durable identifier for the Log Record. [1] 1; 0x100F Optional
log.name string A name for the Log Record. RequestProcessed; InvalidResponse Optional
log.uid string A unique identifier for the Log Record. [2] 01ARZ3NDEKTSV4RRFFQ69G5FAV Opt-In

[1]: A durable numerical identifier that can be used to differentiate logs from each other. If a durable id is provided, other log records of the same shape should use the same id. Durable identifiers SHOULD be unique for a given instrumentation scope. A hash of the instrumentation scope name and log template may be used to create a durable identifier.
[2]: If an uid is provided, other log records with the same uid will be considered duplicates and can be removed safely. ...[more stuff exists]...


Essentially what I'm saying is instrumentation scope name gives us the "domain" or "namespace" we were using event.domain for.

@alanwest
Copy link
Member

It feels like everything is pulling in different directions.

I agree. Based on my understanding of your requirements, here's my suggestion for this issue. Delete "Proposal B" from the issue description because I think it has distracted the conversation away from your needs. OpenTelemetry is defining a concept of events, but your requirements seem unrelated.

The concept you're proposing of a durable identifier seems different. It sounds like you need a way to classify logs, but not necessarily strictly define the log's schema/shape. My understanding of OpenTelemetry events is that a particular kind of event has a strictly defined schema, so it is the wrong tool for your needs.

So we delete the events semantic conventions and define...

If I'm correct that your requirements are unrelated to OpenTelemetry events, I do not think it makes sense to recommend any alterations to the event conventions. I think in this way you can refocus the effort solely on the the introduction of a log.id and/or log.name attribute.

@cjablonski76
Copy link

cjablonski76 commented Jul 25, 2024

@CodeBlanch was there ever any conclusion for this topic? The temporary OTEL_DOTNET_EXPERIMENTAL_OTLP_EMIT_EVENT_LOG_ATTRIBUTES flag for dotnet specifically is a non-starter for our business to depend on :(.

@cijothomas
Copy link
Member

@cjablonski76 From what I can tell, no progress and this topic is still open.

@CodeBlanch
Copy link
Member Author

I think this got too confused with events. I'm going to recreate as something more specific to logs.

@HHobeck
Copy link

HHobeck commented Aug 27, 2024

In my opinion the event.name attribute which is already part of the Semantic Conventions for Events (experimental) should be used for log event name because it is semantically the same. From the application point of view it should be no different if you use the EventAPI to log an event or to use the log provider by passing the event.id and event.name natively. The second option is preferred because it has no dependency to third party instrumentation libraries. The event.id should be just a non-humane readably representation of the event.name not more not less. At the end it is up to the application developer how to use it without guarantees.

As an alternative I would live with log.id and log.name. Important is in my opinion that you guys are coming to a decision on this topic and push it to a stable stage and go forward. I mean all arguments are on the table, right?!

@CodeBlanch: Have you already created a new issue for this?

@HHobeck
Copy link

HHobeck commented Aug 27, 2024

I want to give a different perspective. The elastic common schema (ECS) is a standard which is comming from elastic. What I have understood is that on long term the ECS fields will be integrated somehow into the otel standard (which makes sense). That doesn't mean that otel can import concepts from the ECS standard as well.

If you take a look into the Event Fields you will find:

Field Description
event.action The action captured by the event.
event.code Identification code for this event, if one exists.

This is exactly we are talking about. Is it only .NET Framework related!? No, it is not.

@cijothomas
Copy link
Member

@HHobeck That is my understanding as well based on #1339 (comment)

(You are right! It is not .NET related. .NET was merely used to show an example as its ILogger has 1st class support for human-readable name and a machine version of it via EventId struct with id,name in it)

(whether we use "event.id" or "event.code" for the machine-friendly version - No strong preference for me either way. It surely looks like event.code from ECS was intended to be the same idea.

@JamesNK
Copy link
Contributor

JamesNK commented Sep 18, 2024

This issue was closed but I don't see a link to the new issue. Where is it?

@cijothomas
Copy link
Member

This issue was closed but I don't see a link to the new issue. Where is it?

#1339
(It'll be some time before we re-start that conversation, given open-telemetry/oteps#265 is just settling...)

@JamesNK
Copy link
Contributor

JamesNK commented Sep 20, 2024

That PR is closed. A new, new issue is needed. Or reopen this one.

The reason why I'm interested: I have a UI for viewing semantic logs. Filtering by log name/id seems like a first class concept and it would be good to have known attribute names that could be used to filter the applicable attribute.

@cijothomas
Copy link
Member

Yes agree.
Can you open an issue in the Spec repo? I think it is more of a spec issue now, than just a semantic convention.

These are open questions:

  1. OTep Clarifying "What are OTel Events" just got merged. It left an open question - whether EventName should be a top-level field on LogRecord or it should be an attribute "event.name". Depending on the decisions, this is something that'd take some time to propagate through spec/proto-files etc.

  2. And then the question - Is ILogger's EventName same as what OTel envisions as EventName for its Events? My answer is yes, but I heard different opinion about this.

  3. And then the question - what about the numerical id? If its 1:1 with EventName, then EventId should be same place as name! Either as top-level field together, or attributes (again together).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests