Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Context parameter to Enabled for metrics instruments #4256

Closed
pellared opened this issue Oct 11, 2024 · 23 comments
Closed

Add Context parameter to Enabled for metrics instruments #4256

pellared opened this issue Oct 11, 2024 · 23 comments
Labels
area:api Cross language API specification issue spec:metrics Related to the specification/metrics directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor

Comments

@pellared
Copy link
Member

pellared commented Oct 11, 2024

We should add Context as a parameter similarly to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/bridge-api.md#enabled.

Context is needed for allowing "context related" behavior. For instance, it enables to user to disable the instrument for some key/value in the context has been set. It can also used to see if the call is within a span. Related PR: #4203

Created here.

@jack-berg
Copy link
Member

Why?

@pellared
Copy link
Member Author

pellared commented Oct 12, 2024

Why?

Description updated. Thanks

@pellared pellared added area:api Cross language API specification issue spec:metrics Related to the specification/metrics directory labels Oct 12, 2024
@jack-berg
Copy link
Member

There's no corresponding SDK behavior that would allow the enabled API to change its response based on context.

@pellared
Copy link
Member Author

There's no corresponding SDK behavior that would allow the enabled API to change its response based on context.

A custom API implementation (e.g. SDK wrapper) could change its response based on context.
The SDK can add such behavior in future.

I find that adding is important for Go that require passing Context explicitly. Adding it later to the API would be a burden.
For reference: https://go.dev/wiki/CodeReviewComments#contexts.

CC @open-telemetry/go-maintainers, @open-telemetry/collector-maintainers

@svrnm
Copy link
Member

svrnm commented Oct 14, 2024

@open-telemetry/technical-committee since there is precedence for this already, we think that we can move forward with this, any objections? (cc @pellared @jpkrohling @danielgblanco)

@svrnm svrnm added the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Oct 14, 2024
@mx-psi
Copy link
Member

mx-psi commented Oct 14, 2024

Agree with @pellared that adding this is important for Go

@jmacd
Copy link
Contributor

jmacd commented Oct 16, 2024

IMO there is no need to modify the specification here.

To me, this is a Go-specific question. Golang uses explicit context, and all API calls should have context independent of whether OTel specifications explicitly mention context. There is no need to modify the OTel specification to add a Context parameter in Golang. Most OTel APIs rely on implicit context, like being able to get a thread-local context object. Golang does not have thread-local state, so we always expect a Context argument.

It shouldn't require a lot of imagination to see how context is useful for customized SDK purposes. A custom metric SDK could record metric events to the active span when the context is sampled, for example.

@pellared
Copy link
Member Author

OTel APIs rely on implicit context

This is why in almost all places Context is added as required for languages requiring explicit context. I would prefer to have the specification consistent.

@jmacd
Copy link
Contributor

jmacd commented Oct 16, 2024

@pellared I can't find what you're referring to. In the metrics api.md the word "Context" appears very little, certainly not as a required parameter anywhere. For metrics, we see:

  [Measurements](#measurement) recorded by synchronous instruments can be
  associated with the [Context](../context/README.md).

this implies that Enabled() should accept a context.

In the trace api.md, Context is used but it refers to the OTel context concept, not the Golang Context object.

Moreover, if you look at the one operation that absolutely requires a context, tracer.Start(), we see it is permitted to be implicit:

  The API MAY also have an option for implicitly using
  the current Context as parent as a default behavior.

This seems to set a precedent that context can be implicit. The definition of Context itself also repeats this:

Depending on the language, its usage may be either explicit or implicit.

I still believe it is not necessary to modify the specification in this case.

@pellared
Copy link
Member Author

pellared commented Oct 16, 2024

@pellared I can't find what you're referring to.

@jmacd, I added all these in the description of #4262

Context is used but it refers to the OTel context concept

Is there any reason that adding the OTel concept of context would be bad in this scenario? Especially given that in most languages it is always implicitly available.

I still believe it is not necessary to modify the specification in this case.

I prefer the specification to be less ambiguous. I would rather change the specification and call out that adding a Context parameter MAY be added to any operation if the language does not support implicit Context and requires passing Context explicitly.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 17, 2024

+1 for making our specification explicit in its meaning.

@dmathieu
Copy link
Member

+1 as well, for explicitness, but also consistency. When tracing mentions the context, we should also mention it for similar appropriate cases with other signals.

@jack-berg
Copy link
Member

I would rather change the specification and call out that adding a Context parameter MAY be added to any operation if the language does not support implicit Context and requires passing Context explicitly.

Mentioned here, but will repeat again here: Including Context parameter to any operation doesn't seem right. Should languages with explicit context pass it when obtaining a tracer, meter, logger? Should all the span operations to add / update fields include a context argument? Including Context everywhere, even if the language requires explicit context propagation, seems like a bad API design.

I don't think there are actually many operations that benefit from Context. At least, not so many that its cumbersome to explicitly mention it as a parameter.

@jsuereth
Copy link
Contributor

I do think Context is a critical foundation of OTEL. Any method which records telemetry should have it (at least optionally) included. This is the only way, e.g. to allow baggage to work, and future things like context-attributes.

However, this must also be balanced by the expense/cost of context. I think it's appropriate that we be super careful in terms of what we promote and push into context. For example, the reason "green threads" and "co-routines" have such a huge performance win over "raw threads" or "native threads" is due to the context passing - they are able to ignore MOST of the stack copying that would occur in raw threads. As OTEL adds back in things which must be passed, we need to understand the overhead of this and its usage.

To me this means we should assume the following:

  • Things placed in context should be highly vetted, minimal and necessary horizontally for application o11y.
  • Using context in should be relatively "cheap" as the cost has already been paid to propagate it.
  • We should avoid death by a thousand cuts with context -> Due to its highly concurrent usage, if we can interact with any locks/concurrency primitives ONCE in instrumentation critical paths, that's better than many times.

To me this means a pattern of:

var myContext = doTheExpensiveConcurrentAccessThingToGetContext();
if (logger.isEnabled(myContext)) {
  logger.log(myContext, ...expensive to compute things...)
}

Is fine under a few assumptions:

  • Interacting with context does not need any future concurrency-safety-primitives.
  • global "is enabled" check is less costly than performing the expensive operation (forcing cache clear to volatile read a boolean has non-zero cost in throughput).
  • there is no way to pass the expensive computation to the logger itself, e.g. a thunk / function / callback.

Basically, I don't see Context adding significant burden here, given its design and use in OTEL. However, I also am concerned if we partition out enablement checks in very fine grained fashion that we may locally optimise a benchmark at the cost of overhead increasing across the application.

TL;DR; it would be nice to see some prototypes and benchmarks here to address concerns. I'm theoretically supportive, but I really want to understand what is being checked to understand if we're introducing a foot-gun here.

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Oct 29, 2024
@lmolkova lmolkova added the triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor label Oct 30, 2024
@lmolkova lmolkova removed the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Oct 30, 2024
@trask trask removed the triage:followup Needs follow up during triage label Nov 5, 2024
@pellared
Copy link
Member Author

pellared commented Nov 14, 2024

@jsuereth

Any method which records telemetry should have it (at least optionally) included.

Notice that Enabled is not recording telemetry. The question is whether we want to support cross-cutting concerns or any context related functionality for Enabled API.

In OTel Go, we might need the Context in the Enabled API even if the SDK doesn't use it right away.

For example, in future, the user may want the Logs SDK to filter out span events (emitted via Logs SDK) when they are not inside a recording span. This could be implemented in the SDK by adding disabled_not_recorded_spans to LoggerConfig. The SDK can check the context.Context to see if the Enabled call is inside a recording span. Then the caller would write:

if logger.Enabled(ctx, params) {
	logger.Emit(ctx, createHeavyLogRecord(payload))
}

Alternatively, the caller could check if the span is being recorded by himself:

if trace.SpanFromContext(ctx).IsRecording() && logger.Enabled(params) {
	logger.Emit(ctx, createHeavyLogRecord(payload))
}

This alternative might be better because the caller usually knows better whether the log record is related to tracing.

However, adding the Context later for any other reason would be a big burden for OTel Go users. We might have to add a Logger.EnabledContext method and deprecate Logger.Enabled, which would complicate backward compatibility. Starting with Context as a required parameter avoids this inconvenience.

But maybe we indeed do not want cross-cutting concerns for Enabled API?
On the other hand, there may be other API implementations than the SDK that want to support cross-cutting concerns.

I find the principle "Do not break users" very important. Adding Context parameter decreases this chance a lot (for OTel Go). In Go, it's common to pass a context.Context. This is why I still lean toward including it from the start: it minimizes disruptive API changes in future releases and supports custom API implementations and future SDK enhancements.

Most of the reasons above also apply to Enabled for metrics instruments.

Can you elaborate on

Interacting with context does not need any future concurrency-safety-primitives.

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/README.md#overview:

A Context MUST be immutable, and its write operations MUST result in the creation of a new Context containing the original values and the specified values updated.

We don't need synchronization when interacting with context because it is immutable.

global "is enabled" check is less costly than performing the expensive operation

I think this relates to the purpose of the Enabled API, not the Context parameter. Most logging libraries offer this feature to improve performance for expensive log records. I can add benchmarks to the OTEP.

there is no way to pass the expensive computation to the logger itself, e.g. a thunk / function / callback.

This might be possible for users of the Logs API when emitting events. But Enabled is crucial for bridging Enabled with other logging libraries such as https://pkg.go.dev/log/slog, https://pkg.go.dev/go.uber.org/zap, https://pkg.go.dev/github.com/go-logr/logr.

@jack-berg
Copy link
Member

For example, in future, the user may want the Logs SDK to filter out span events (emitted via Logs SDK) when they are not inside a recording span. This could be implemented in the SDK by adding disabled_not_recorded_spans to LoggerConfig. The SDK can check the context.Context to see if the Enabled call is inside a recording span.

That's an interesting use case. And you've still retained the ability to have the SDK config be static (i.e. no user provided function that consumes context). I like it. 👍

@jsuereth
Copy link
Contributor

We don't need synchronization when interacting with context because it is immutable.

Quick response to this: You need to get access to the current context. Context itself is immutable, but the notion of "which context is current" is not. You need to go through concurrency primitives to get access to it or (as in Go) you explicitly pass a Context downstream to methods.

@codeboten
Copy link
Contributor

codeboten commented Nov 19, 2024

From the specification SIG on 19-Nov, it sounds like there isn't a strong need to have the context parameter at this time. Will close this issue in 2 weeks if there's no one that brings a strong argument by then.

@pellared
Copy link
Member Author

pellared commented Nov 19, 2024

As the author of this issue I am OK on closing (for metrics) this issue as long as the @open-telemetry/technical-committee DOES NOT block having Context in Logger.Enabled. Let's give others 2 weeks to veto with my statement.

@codeboten
Copy link
Contributor

Thanks @pellared

@jmacd
Copy link
Contributor

jmacd commented Nov 19, 2024

I support closing this issue. I'll make an argument against using the context parameter to the metrics-enabled API.

In the Lightstep metrics SDK, we have added a MeasurementProcessor API which has been discussed in the OpenTelemetry specification as far back as the OpenCensus origin. I would like to see OTel add this feature!

MeasurementProcessor absolutely uses the context to derive metric attributes. However, to conditionally enable or disable a metric instrument based on context would introduce an undetectable bias in the data. For example, if you only enable a metric instrument when the context is traced, then the metric instrument will under-count depending on sampling rate and the consumer of the data will have no way of knowing this. Metric instruments should strive to measure everything--users should let the context be used to adjust which attributes are used, not to conditionalize.

@mx-psi
Copy link
Member

mx-psi commented Nov 19, 2024

While I typically want to future proof APIs, I support closing this issue since adding it later would be relatively easy and with a relatively low cost in terms of maintenance burden.

I shared this issue on the #otel-maintainers CNCF Slack channel: https://cloud-native.slack.com/archives/C01NJ7V1KRC/p1732034582753799 and I also added an item to the discussion topics in the Maintainers call to make sure other people see this. I am not able to join the maintainers call, so I would appreciate if somebody else raises this on my behalf :)

@pellared
Copy link
Member Author

pellared commented Dec 3, 2024

SIG meeting:
We agreed to close it. The languages need to still have a possibility to add parameters to Enabled without breaking the users. Everyone on the call was fine with the fact that OTel Go wants to still add context.Context to the API as it is also a language-specific thing (Go devs seemed even happy with it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:metrics Related to the specification/metrics directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor
Projects
Status: Spec - Accepted