Add an opt-in environment variable for experimental instrumentation #785

MrAlias · 2024-04-14T14:37:58Z

Instrumentation for packages that do not have a stable output format yet (i.e. semantic conventions are not stable) should not be on by default. It leads users into a false sense of stability.

Instead, this instrumentation should be enabled only if a user sets an environment variable to explicitly opt-in. The environment variable should be prefixed with something like OTEL_GO_EXPERIMENTAL*, OTEL_GO_EXP*, or OTEL_GO_X*. This will better communicate to users the experimental and unstable nature of the instrumentation they will use.

This needs to be accompanied by documentation so a user, when their interest is piqued by the environment variable, can understand the implications of using these experimental instrumentation.

Alternatives considered

Provide a migration path for unstable instrumentation as it changes
Only document the lack of stability for instrumentation

The text was updated successfully, but these errors were encountered:

MrAlias · 2024-04-14T14:38:33Z

This will include instrumentation like kafka and database.

damemi · 2024-04-15T17:20:56Z

Would add that a clear definition of what falls under the experimental flag to help us and users. I think that includes:

Semconv stability
Stability of the library itself
Stability of the library's api or underlying types

Basically if any of these aren't stable, the library would fall under our experimental definition. Or we could discuss which of these need to be met if not all 3

MrAlias · 2024-04-15T17:45:01Z

I agree that the semantic conventions stability is a hard requirement for something to not fall under the experimental flag.

As for the API and package stability that may be a bit harder. We already depend on non-exported types to instrument packages. Including instrumentation for net/http. There are not guarantees, even for stable packages, that these will not change. If we require stable APIs for our stability we will need to re-evaluate how we instrument things or have very little stability.

That said, I can see an argument for requiring stable packages (instead of APIs). This would help our developer burden with minimized churn as the package evolves.

damemi · 2024-04-15T17:49:58Z

@MrAlias totally agree, I was just proposing those 3 requirements. I think ultimately if it's not user facing, then it doesn't have to be a requirement for our stability level. If the underlying library changes and breaks us on main, we'll still have compatibility versioning for our published releases.

edeNFed · 2024-04-16T11:00:55Z

Is this feature implemented in other auto instrumentations as well? If I remember correctly, Java/NodeJS auto instrumentations generate spans for Kafka/databases by default even though the semantic conventions are not stable yet.

pellared · 2024-04-16T11:39:46Z

If I remember correctly, Java/NodeJS auto instrumentations generate spans for Kafka/databases by default even though the semantic conventions are not stable yet.

Same for .NET. The feedback we got is that all of the automatic instrumentation users prefer experimental features enabled by default. We only mark that something is experimental via docs. See: https://opentelemetry.io/docs/languages/net/automatic/config/.
We have an issue to allow changing the default behavior, but even after one year no user asked for it open-telemetry/opentelemetry-dotnet-instrumentation#2439.

damemi · 2024-04-16T15:22:35Z

That's a good point @edeNFed @pellared, I didn't think to consider how other auto-instrumentation agents handle this. I think not deviating from the precedent set by other languages (along with the demonstrated user preference) makes a compelling case to not do this by default.

@MrAlias what do you think? I don't think we talked about how other languages handle this in the sig call last week

MrAlias · 2024-04-16T15:44:26Z

Is there data showing how users feel when things break? I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I would like examples of users not minding that their observability systems break due those kinds of changes. I have many examples of the opposite where users are not happy with these kinds of changes.

There are also observability providers who have voiced not wanting to impact customers with breaking changes once adoption becomes large enough. Whether it is stable or not. Which means things become de-facto stable even though they are on unstable semantic conventions.

MrAlias · 2024-04-16T15:46:20Z

I'm not opposed to following other languages. But I'm skeptical that they have seen the fallout from the negative impact making this choice has caused yet.

pellared · 2024-04-16T15:48:18Z

I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I was expecting the same in .NET thus I proposed open-telemetry/opentelemetry-dotnet-instrumentation#2439. Surprisingly so far no user really asked for it.

MrAlias · 2024-04-16T15:50:45Z

I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I was expecting the same in .NET thus I proposed open-telemetry/opentelemetry-dotnet-instrumentation#2439. Surprisingly so far no user really asked for it.

@pellared have you introduced breaking changes to the telemetry output in any instrumentation?

pellared · 2024-04-16T15:52:13Z

I see that Java released 2.0 when they changed HTTP sem conv: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v2.0.0

MrAlias · 2024-04-16T15:55:21Z

I see that Java released 2.0 when they changed HTTP sem conv: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v2.0.0

Right, they went through the migration for that though. That may not be relevant data given that was not a hard breaking change like the ones we will implement.

As mentioned in the description, migration is also something we discussed. But we also understood that will be a lot of extra development work to migrate all changes while the semantic conventions are not stable.

pellared · 2024-04-16T15:57:44Z

@pellared have you introduced breaking changes to the telemetry output in any instrumentation?

As far as I remember we the HTTP instrumentation had breaking changes in https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/releases/tag/v1.3.0

The upcoming release is going to have another one because of https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.Http/CHANGELOG.md#181

EDIT: I do not say that we should do the same for Go Automatic Instrumentation. Just feeding us with input.

MrAlias · 2024-04-16T16:46:51Z

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

edeNFed · 2024-04-16T16:49:09Z

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

Sounds good to me

pellared · 2024-04-16T16:54:22Z

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

Do we want a single option to enable all potential experimental features such us experimental detectors, experimental exporters, OTel Profiling and not just instrumentations?

MrAlias · 2024-04-16T18:50:32Z

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?
That way any distribution of this project could make all the experimental instrumentation enabled.

Do we want a single option to enable all potential experimental features such us experimental detectors, experimental exporters, OTel Profiling and not just instrumentations?

I'm open to other ideas. It was more of a general proposal.

What design do you think would work best?

pellared · 2024-04-16T18:57:10Z

My personal preference is still to have experimental features as opt-in, but I think users should be able to enable all of them in a easy way.

Maybe WithExperimental option and OTEL_GO_AUTO_EXPERIMENTAL env var (defaults to false)?

pellared · 2024-04-18T10:20:01Z

I have talked with our Product Manager and I am putting the summary below.

We SHOULD have experimental features/instrumentation disabled by default.

The documentation MUST say which instrumentations produce stable telemetry and which are experimental.

We MUST have env var toggles for each feature/instrumentation so that users can disable them. E.g. https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/docs/config.md#instrumentations

We SHOULD have an env var toggle (we can add it even post-GA or someone explicitly asks for it) to control whether by default experimental features should be enabled or not. E.g. OTEL_GO_AUTO_EXPERIMENTAL which defaults to false.

Modified after #785 (comment)

MrAlias · 2024-04-18T14:46:59Z

I still think the stability guarantees we are going to have to provide on unstable instrumentation out-weigh the benefit to the uneducated user wanting to see lights light up.

We can always change the opt-in prior to the GA if we find users desire different behavior. Going the other way will cause frustration.

I think we should move ahead with the opt-in approach and have it so distros are enable to make a different decision.

akubik-splunk · 2024-04-23T14:31:30Z

Observations from introducing experimental features to commercial portfolio

If it is on by default customers will expect it should be tested, relatively safe and supported - if it isn't they escalate and not always want to listen to reason
If it is on by default and works for sometime customers become dependent on it and when it changes later priori GA they escalate and claim impact on their experience

In my opinion it should be opt-in with ability to enable all instrumentations or experimental items. It is clear then that this was a decision of the user (conscious or not).
Even if that approach does not give them all the fireworks it eventually gives better experience in adoption and business outcomes

MrAlias · 2024-10-22T17:16:26Z

The project goals are to head towards custom probes: #1105

Adding this would be a step in an alternate direction. Closing as we do not plan to go this direction anymore.

MrAlias added the enhancement New feature or request label Apr 14, 2024

MrAlias added this to Go Auto Instrumentation: Beta Apr 14, 2024

github-project-automation bot moved this to Todo in Go Auto Instrumentation: Beta Apr 14, 2024

MrAlias mentioned this issue Apr 23, 2024

Add feature gate for unstable/experimental libraries #776

Closed

MrAlias closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024

github-project-automation bot moved this from Todo to Done in Go Auto Instrumentation: Beta Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an opt-in environment variable for experimental instrumentation #785

Add an opt-in environment variable for experimental instrumentation #785

MrAlias commented Apr 14, 2024

MrAlias commented Apr 14, 2024

damemi commented Apr 15, 2024

MrAlias commented Apr 15, 2024

damemi commented Apr 15, 2024

edeNFed commented Apr 16, 2024

pellared commented Apr 16, 2024 •

edited

Loading

damemi commented Apr 16, 2024

MrAlias commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024 •

edited

Loading

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024 •

edited

Loading

MrAlias commented Apr 16, 2024

edeNFed commented Apr 16, 2024

pellared commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024

pellared commented Apr 18, 2024 •

edited

Loading

MrAlias commented Apr 18, 2024 •

edited

Loading

akubik-splunk commented Apr 23, 2024

MrAlias commented Oct 22, 2024

Add an opt-in environment variable for experimental instrumentation #785

Add an opt-in environment variable for experimental instrumentation #785

Comments

MrAlias commented Apr 14, 2024

Alternatives considered

MrAlias commented Apr 14, 2024

damemi commented Apr 15, 2024

MrAlias commented Apr 15, 2024

damemi commented Apr 15, 2024

edeNFed commented Apr 16, 2024

pellared commented Apr 16, 2024 • edited Loading

damemi commented Apr 16, 2024

MrAlias commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024 • edited Loading

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024 • edited Loading

MrAlias commented Apr 16, 2024

edeNFed commented Apr 16, 2024

pellared commented Apr 16, 2024

MrAlias commented Apr 16, 2024

pellared commented Apr 16, 2024

pellared commented Apr 18, 2024 • edited Loading

MrAlias commented Apr 18, 2024 • edited Loading

akubik-splunk commented Apr 23, 2024

MrAlias commented Oct 22, 2024

pellared commented Apr 16, 2024 •

edited

Loading

pellared commented Apr 16, 2024 •

edited

Loading

pellared commented Apr 16, 2024 •

edited

Loading

pellared commented Apr 18, 2024 •

edited

Loading

MrAlias commented Apr 18, 2024 •

edited

Loading