Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Name field back to Log data model #2398

Closed
jjharr opened this issue Mar 3, 2022 · 15 comments
Closed

Add Name field back to Log data model #2398

jjharr opened this issue Mar 3, 2022 · 15 comments
Labels
spec:logs Related to the specification/logs directory

Comments

@jjharr
Copy link

jjharr commented Mar 3, 2022

This issue relates to issue 2074

I propose to add the name field back to the log data model, for the following reasons:

  1. While I realize that OpenTelemetry is not 1.0 yet, there are actually a lot of people using Open Telemetry. The recent removal of the name field hard-breaks all existing code without (as far as I can tell) a compelling reason.

  2. The original intent of the field was as a low cardinality field. This was mentioned in the discussions for issue 2074, but not really addressed adequately. My company, and some of our very large customers, use it in exactly that way. They keep a Grafana dashboard that is a histogram of error types. It is an extremely important dashboard. You can only create a histogram like that if you have a low cardinality field. This is a very broad use case. Yes, you can theoretically do that if you just keep it as an attribute, but it increases processing overhead, which matters at the volumes we deal with.

  3. Finally, and by far most importantly in the long term, this change really undermines good semantic logging patterns. We just barely, after great effort, trained our developers to use fixed strings (instead of interpolated strings) as log messages and augment those fixed strings with context-specific attributes. This pattern makes log searchability vastly better. Demoting name to an attribute completely undermines that pattern. I cannot over-emphasize how important this is. Pretty much every logging API in the world allows the developer to add a single message, and if this message is not cardinality constrained, then developers will just use interpolated strings like they always have. Nobody is going to enter a high-cardinality value and then a low cardinality one for attributes. So by demoting name to attributes, all the benefits of low-cardinality have been discarded. Since Open Telemetry is on track to become a de-facto logging standard, it is my view that by making this change, Open Telemetry is losing the opportunity to meaningfully, and positively, change how developers in next many years do their logging.

@jjharr jjharr added the spec:logs Related to the specification/logs directory label Mar 3, 2022
@epsteina16
Copy link

Copying my comment from #2074

This issue just came up for me as well since we have implemented a new API that customers are using. Our API is based off of the OpenTelemetry log data model and we've placed a significant focus on the Name field.

There are two arguments made in the decision to drop the name field that I would like to challenge:

Dropping name is low risk since if we eventually decide its valuable (e.g. to accommodate an event API), we can add it back with a backwards compatible change.

I understand that OpenTelemetry is in it's early stages but we really need to consider backwards compatibility. Once you have others working off of this specification a change like this IS high risk. This is a breaking change for us.

Finally, as #2074 (comment) by @djaglowski, the data model requires that top level field "semantics must be the same for all known log and event formats and can be mapped directly and unambiguously to this data model". Without having these well defined semantics, name fails to meet the criteria for a top level field.

I completely agree that it is important for existing log formats to map well into the log data model. This excludes the category of new applications that can be written with better logging practices in mind. I strongly believe the Name field should be recommended for use by new logging APIs and new applications because of the low cardinality aspect. Just because other log formats don't have a "Name" equivalent doesn't mean it should be dropped from the log data model. It can just be Optional and left out of example mappings.

@tigrannajaryan
Copy link
Member

@open-telemetry/specs-logs-approvers FYI. This does not prevent declaration of stability since it will be an additive change to the data model.

@jimafrost
Copy link

Hmm. My code uses this field as well. Coding for this has been frustrating given that the spec seems to change every time I come back to look. Is there a particular reason this had to go away? It's certainly useful for grouping warnings or errors by a particular kind when generating summary counts. I've been using it with errno-style values like ECONNREFUSED.

@tigrannajaryan
Copy link
Member

To move forward with this I think we need 3 things:

  1. Show that there is enough community interest. It would be useful to tell who you represent, why this is important for your users/customers. It helps if you can be specific to help understand the size and the impact. Links to implementations that depend on the Name field is also useful.
  2. List of use cases where this field is needed. We need more details with examples.
  3. Definition of the Name field that has clear, understandable semantics that everybody agrees on.

@djaglowski
Copy link
Member

It's more than fair to discuss this again, as clearly not everyone with a stake in it was involved in the decisions. That said, unless I am mistaken, I think all of the points raised on this issue were discussed and considered in the course of the decision. I'll try my best to rehash the rationale for each point.

While I realize that OpenTelemetry is not 1.0 yet, there are actually a lot of people using Open Telemetry. The recent removal of the name field hard-breaks all existing code without (as far as I can tell) a compelling reason.

I think the most compelling reason to remove the field was that there was no clear consensus on how one should populate or interpret this field. Several distinct use cases were presented, each of which applied conflicting assumptions and requirements. The decision to make a breaking change was not taken lightly. A lot of effort was put into trying to justify keeping this field, but no one was able to produce a clear definition that was broadly agreed upon. No one wants breaking changes, but ultimately removal of the field was deemed necessary in order to get to a point where we could guarantee stability going forward. We cannot promising stability of a field that does not have a clear definition.

The original intent of the field was as a low cardinality field. This was mentioned in the discussions for #2074, but not really addressed adequately. My company, and some of our very large customers, use it in exactly that way. They keep a Grafana dashboard that is a histogram of error types. It is an extremely important dashboard. You can only create a histogram like that if you have a low cardinality field. This is a very broad use case. Yes, you can theoretically do that if you just keep it as an attribute, but it increases processing overhead, which matters at the volumes we deal with.

Cardinality constraints are independent of whether a field should be a top-level field or an attribute. There are lots of attributes with cardinality constraints. We can't justify a top-level field based on needing it to be low cardinality, anymore than we can justify a top-level field because we need it to be a string.

The concern about performance was discussed several times. It's not clear that this is a valid concern. Does it really cost that much more to check if a key exists in map vs checking if a top level field is not empty? The presence or absence of an attribute is depended on in many other cases throughout the project. In order to justify a top level field for the purpose of performance optimization, I think a very strong case needs to be made. Maybe there is a case to be made, but no one has made it. Even if it can be demonstrated that a top level field is necessary for performance reasons, we still need a clear definition for that field.

Finally, and by far most importantly in the long term, this change really undermines good semantic logging patterns. We just barely, after great effort, trained our developers to use fixed strings (instead of interpolated strings) as log messages and augment those fixed strings with context-specific attributes. This pattern makes log searchability vastly better. Demoting name to an attribute completely undermines that pattern. I cannot over-emphasize how important this is. Pretty much every logging API in the world allows the developer to add a single message, and if this message is not cardinality constrained, then developers will just use interpolated strings like they always have. Nobody is going to enter a high-cardinality value and then a low cardinality one for attributes. So by demoting name to attributes, all the benefits of low-cardinality have been discarded. Since Open Telemetry is on track to become a de-facto logging standard, it is my view that by making this change, Open Telemetry is losing the opportunity to meaningfully, and positively, change how developers in next many years do their logging.

OpenTelemetry does not plan to define new logging SDKs that developers will directly use to write logs. That is, we're not replacing Java's SLF4J or Go's zap with a new paradigm. Our intention is to take what is passed to those SDKs, and map it into a common data model (e.g. in Java, by providing a log appender).

It would be great if all logging SDKs unambiguously encouraged that a low cardinality Name field should be specified for each log. However, I don't think this is remotely the case. As you said, it took great effort to establish this as a best practice in your organization. I'm sure the same would be true in many cases. In a general sense, I don't think we can make any cardinality assumptions about the message field.

The notion that virtually all logging SDKs require a "message" is well understood and acknowledged though. This value is unambiguously mapped to the log's Body field. If you want to make the case that the message should be mapped to a field called Name, then I think you are basically proposing that we attempt to enforce new semantics on a wide variety of existing SDKs, and until those semantics are widely adopted, deal with significant ambiguity about whether or not the message was intended to be low cardinality or not.

@epsteina16
Copy link

For my use case, we have created a new API for logging with OpenTelemetry. We not only wanted to natively send logs under the hood to an OpenTelemetry collector but we also wanted to enforce good logging practices for high volume use cases. At the time of creating this, the OpenTelemetry logging specification specifically defined the Name field as a low cardinality field.

To be more specific, we guide users of the API to create a set of Names that encompasses both errors they encounter in their code as well as logs that map to specific events or states in their application. This makes it not only faster to search for logs in a high volume system, but also easier to write logs once you have a well defined table of Names.

I still don't understand why the field was removed. If it doesn't map well with existing SDKs like zap or SLF4J that's fine; new APIs and SDKs can still make use of it. Optional fields like Name should not prevent stability for the log data model.

@jjharr
Copy link
Author

jjharr commented Mar 8, 2022

@tigrannajaryan Thank you very much for being willing to listen on this topic.

Responding to your requests:

  1. I work for an established enterprise software company that doesn't normally participate in specifications and committees. I'm working appropriate channels to try to change that.

  2. The two general use cases presented in this thread are related to finite cardinality:
    a) A running histogram of errors in a system. This cannot realistically be done without a finite cardinality field.
    b) Aggregation or analysis of attributes based on the name field. That is to say, a SELECT on name field, followed by some aggregation or analysis of attributes associated with that name. Let's say I want to analyze attribute A from log source Y, but log source X also defines attribute A with a different meaning. How am I going to write a query to disambiguate attribute A? None of the answers to this question are very satisfactory. I could greatly complicate the query to also look at the resource field. Or I could write a regex to pseudo-match the body field. But, not great.

  3. In my view the definition of this field is something like:

"A value containing the body of the log record (see the description of any type above). First-party Applications SHOULD use a string message. However, a structured body may be necessary to preserve the semantics of some existing log formats. MUST NOT vary for each occurrence of the event coming from the same source. For example, if the field is a string, it must a string that is fixed everywhere the the application. It must not be an interpolated string. This field is optional."

@jjharr
Copy link
Author

jjharr commented Mar 8, 2022

@djaglowski :

no clear consensus on how one should populate or interpret this field

Understood. I think the raison d'être for this field is finite cardinality. That's it. Just like there is a conditional mapping for severity text or number based on the traits of the source severity mapping, I think there is a conditional mapping for body or name based on the traits of the source cardinality. Like severity, populating these fields doesn't need to be mutually exclusive, but in practice often would be.

Cardinality constraints are independent of whether a field should be a top-level field or an attribute.

While this might be true in theory, I do not think it's true in practice. The critical aspect here is not whether it's top-level per se; it's that there is no specification for it. Attributes are completely freestyle. Without a designated field for "low-cardinality", what field are disparate log sources to set if they want to identify low cardinality? I could set one for my company, but the next company or project likely isn't going to use the same one.

In a general sense, I don't think we can make any cardinality assumptions about the message field.

... This value is unambiguously mapped to the log's Body field.

Yes. The specification also acknowledges that there may be different kinds of severity mappings, and chose to provide two fields to accommodate those. It didn't really need to. The specification could have define a reserved range of severity numbers and then told implementers to make up their own numbers if they don't map. But having the extra field preserves meaning carried over from the source.

In a similar way, we would like developers to be able to choose whether to map the message to the body or the name field depending on their knowledge about the source, in order to preserve cardinality meaning. In my view, that meaning is very valuable for processing logs; that is, I would process a log record one way if it was finite, and another way if it was not.

This discussion really boils down to whether finite cardinality should be an objective of the format. My view is that in a world, now and in the future, where we're dealing with billions (or more) of log messages per day, finite cardinality is a worthy top-line objective with minimal cost. If others feel strongly that cardinality should not be an objective, then it probably doesn't make sense to include the name field.

@tigrannajaryan
Copy link
Member

While this might be true in theory, I do not think it's true in practice. The critical aspect here is not whether it's top-level per se; it's that there is no specification for it.

Whether the filed is top-level or an attribute is not decided by its cardinality. Please see this section for more on this.

Attributes are completely freestyle.

Semantic conventions can and do define the expectations for the attribute values, they are no completely freestyle. The name field can be an attribute with the same definition as it would be if it was a top-level field and have the exact same cardinality restrictions (I am assuming the Name field type is string and not an enum).

Cardinality alone is not a reason to make this a top-level field.

@jjharr
Copy link
Author

jjharr commented Mar 9, 2022

I'm getting the feeling this is probably futile. But I will persist while the ticket is still open.

Those semantic attributes you refer to are per-log source. Unless I'm missing something, there are no cross-source attributes defined by the specification. If I define a finite-cardinality attribute, that applies to my own app logs only. What I want for a system, to meet the use cases described above, is to have a specification-designated field somewhere, that applies across log sources. It is understood that, like severity number, not all log sources may populate it, but it is defined for sources that do wish to populate it.

To reiterate, the primary issue in my view is not whether or not it is top level. It is whether a finite cardinality message is part of the specification. Right now it is not. I feel it materially reduces the utility of the specification to omit cardinality. But, if the group feels that cardinality should not be part of the specification, and by implication that the use cases above are not sufficiently important, then that resolves the question of whether or not there should be a name field (anywhere).

@tigrannajaryan
Copy link
Member

Those semantic attributes you refer to are per-log source. Unless I'm missing something, there are no cross-source attributes defined by the specification. If I define a finite-cardinality attribute, that applies to my own app logs only.

Why do you think semantic conventions are per-log source? That is not the case. We can define a semantic convention that is very generic and applies to all log sources. See for example enduser.id. It does not say what sources it applies to. Any source can include this attribute.

What I want for a system, to meet the use cases described above, is to have a specification-designated field somewhere, that applies across log sources. It is understood that, like severity number, not all log sources may populate it, but it is defined for sources that do wish to populate it.

Semantic conventions are part of the Otel specification. Any log source that considers the attribute applicable can include it in the log record. This is no different than a top-level field which is optional (and a hypothetical Name field will be optional, we certainly cannot make it mandatory).

@tigrannajaryan
Copy link
Member

Here are a coupld arguments in favour of adding a Name top-level field:

  • Spans have a Name defined as "concisely identifies the work represented by the Span" and "A description of the span's operation". We could define LogRecord Name similarly. Granted, the Span Name definitions are not very precise, but I don't see a reason that we should require the definition of LogRecord Name to clear a much higher bar for precision.
  • Span Events have a Name which does not have a precise definition in the spec, however we can consider it analogous of the proposed LogRecord Name.

That these 2 fields already exist will make the presence of LogRecord Name more legitimate. Perhaps defining what LogRecord Name means can helps us refine the definition of Span Event Name (and may be also refine Span Name definition).

@jjharr
Copy link
Author

jjharr commented Mar 10, 2022

Why do you think semantic conventions are per-log source? That is not the case.

You are right. I missed that. Thank you for pointing that out.

a hypothetical Name field will be optional, we certainly cannot make it mandatory

My hope was that we could treat the Name field similarly to how severity is treated: use it if it maps well to your app, which will apply mostly to newer and future apps, and use Body otherwise. As they are defined somewhat differently, setting both is fine.

Perhaps defining what LogRecord Name means can helps us refine the definition of Span Event Name

It does seem like definition is the crux of the issue. I like your comparison to the span name. We use Name as a generalized, concise description of an event. The use cases are very similar to those for the span variety: it's essentially a key that can be used for aggregation. The span name is not explicitly defined to have finite cardinality, but it's description makes it implicitly so. That's fine.

Deriving from the span name description, what do you think about something like this?

A concise, categorized description of the event name. A best practice is to use the same name everywhere the category applies. For example, if the name is "invalid request", the same name should be used everywhere a request can be invalid. This makes it easier to correlate similar events across implementations. This field is semantically required to be set to non-empty string. This field is optional.

@jack-berg
Copy link
Member

In the previous iteration of this discussion we also discussed that when you have a low-cardinality classification like name, its very useful for it to have a namespace. Without a namespace, its harder for the field to be useful because different systems may have different interpretations of the field. As @jjharr points out, how are you going to disambiguate between two log systems that each use the name field with different meaning?

We could add another top level namespace field, but attributes lend themselves better to solving this problem since backends and users querying the data can filter on the presence of the attribute key of interest, and have confidence that all the values have the same semantic meaning.

Separately, I somewhat disagree with the comparison of log name to span name and span event name:

  • Span name is required and is integral to the span data model. Users must think carefully of how to name their spans when during instrumentation. A low cardinality log classifier is useful but not integral to the log data model (as evidenced by its absence in many existing log frameworks).
  • I think the span event name field missed the mark in that it doesn't have an associated namespace. I'd prefer if we didn't persist that design decision to logs. I could imagine span events someday being replaced by logs, where the span event name manifests as a log attribute of the form otel.span.event={name}.

@austinlparker
Copy link
Member

On review, we believe that the forthcoming Events spec and API will address the requirements raised here. If this does not address the needs here, then feel free to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory
Projects
None yet
Development

No branches or pull requests

8 participants