Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an opt-in environment variable for experimental instrumentation #785

Closed
MrAlias opened this issue Apr 14, 2024 · 23 comments
Closed

Add an opt-in environment variable for experimental instrumentation #785

MrAlias opened this issue Apr 14, 2024 · 23 comments
Labels
enhancement New feature or request

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Apr 14, 2024

Instrumentation for packages that do not have a stable output format yet (i.e. semantic conventions are not stable) should not be on by default. It leads users into a false sense of stability.

Instead, this instrumentation should be enabled only if a user sets an environment variable to explicitly opt-in. The environment variable should be prefixed with something like OTEL_GO_EXPERIMENTAL*, OTEL_GO_EXP*, or OTEL_GO_X*. This will better communicate to users the experimental and unstable nature of the instrumentation they will use.

This needs to be accompanied by documentation so a user, when their interest is piqued by the environment variable, can understand the implications of using these experimental instrumentation.

Alternatives considered

  • Provide a migration path for unstable instrumentation as it changes
  • Only document the lack of stability for instrumentation
@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 14, 2024

This will include instrumentation like kafka and database.

@damemi
Copy link
Contributor

damemi commented Apr 15, 2024

Would add that a clear definition of what falls under the experimental flag to help us and users. I think that includes:

  • Semconv stability
  • Stability of the library itself
  • Stability of the library's api or underlying types

Basically if any of these aren't stable, the library would fall under our experimental definition. Or we could discuss which of these need to be met if not all 3

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 15, 2024

I agree that the semantic conventions stability is a hard requirement for something to not fall under the experimental flag.

As for the API and package stability that may be a bit harder. We already depend on non-exported types to instrument packages. Including instrumentation for net/http. There are not guarantees, even for stable packages, that these will not change. If we require stable APIs for our stability we will need to re-evaluate how we instrument things or have very little stability.

That said, I can see an argument for requiring stable packages (instead of APIs). This would help our developer burden with minimized churn as the package evolves.

@damemi
Copy link
Contributor

damemi commented Apr 15, 2024

@MrAlias totally agree, I was just proposing those 3 requirements. I think ultimately if it's not user facing, then it doesn't have to be a requirement for our stability level. If the underlying library changes and breaks us on main, we'll still have compatibility versioning for our published releases.

@edeNFed
Copy link
Contributor

edeNFed commented Apr 16, 2024

Is this feature implemented in other auto instrumentations as well? If I remember correctly, Java/NodeJS auto instrumentations generate spans for Kafka/databases by default even though the semantic conventions are not stable yet.

@pellared
Copy link
Member

pellared commented Apr 16, 2024

If I remember correctly, Java/NodeJS auto instrumentations generate spans for Kafka/databases by default even though the semantic conventions are not stable yet.

Same for .NET. The feedback we got is that all of the automatic instrumentation users prefer experimental features enabled by default. We only mark that something is experimental via docs. See: https://opentelemetry.io/docs/languages/net/automatic/config/.
We have an issue to allow changing the default behavior, but even after one year no user asked for it open-telemetry/opentelemetry-dotnet-instrumentation#2439.

@damemi
Copy link
Contributor

damemi commented Apr 16, 2024

That's a good point @edeNFed @pellared, I didn't think to consider how other auto-instrumentation agents handle this. I think not deviating from the precedent set by other languages (along with the demonstrated user preference) makes a compelling case to not do this by default.

@MrAlias what do you think? I don't think we talked about how other languages handle this in the sig call last week

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

Is there data showing how users feel when things break? I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I would like examples of users not minding that their observability systems break due those kinds of changes. I have many examples of the opposite where users are not happy with these kinds of changes.

There are also observability providers who have voiced not wanting to impact customers with breaking changes once adoption becomes large enough. Whether it is stable or not. Which means things become de-facto stable even though they are on unstable semantic conventions.

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

I'm not opposed to following other languages. But I'm skeptical that they have seen the fallout from the negative impact making this choice has caused yet.

@pellared
Copy link
Member

pellared commented Apr 16, 2024

I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I was expecting the same in .NET thus I proposed open-telemetry/opentelemetry-dotnet-instrumentation#2439. Surprisingly so far no user really asked for it.

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

I imagine there are many users that want it on by default, but is there any data showing how do user's feel when those instrumentation libraries change their output and break their dashboards/alerts?

I was expecting the same in .NET thus I proposed open-telemetry/opentelemetry-dotnet-instrumentation#2439. Surprisingly so far no user really asked for it.

@pellared have you introduced breaking changes to the telemetry output in any instrumentation?

@pellared
Copy link
Member

I see that Java released 2.0 when they changed HTTP sem conv: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v2.0.0

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

I see that Java released 2.0 when they changed HTTP sem conv: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v2.0.0

Right, they went through the migration for that though. That may not be relevant data given that was not a hard breaking change like the ones we will implement.

As mentioned in the description, migration is also something we discussed. But we also understood that will be a lot of extra development work to migrate all changes while the semantic conventions are not stable.

@pellared
Copy link
Member

pellared commented Apr 16, 2024

@pellared have you introduced breaking changes to the telemetry output in any instrumentation?

As far as I remember we the HTTP instrumentation had breaking changes in https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/releases/tag/v1.3.0

The upcoming release is going to have another one because of https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.Http/CHANGELOG.md#181

EDIT: I do not say that we should do the same for Go Automatic Instrumentation. Just feeding us with input.

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

@edeNFed
Copy link
Contributor

edeNFed commented Apr 16, 2024

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

Sounds good to me

@pellared
Copy link
Member

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?

That way any distribution of this project could make all the experimental instrumentation enabled.

Do we want a single option to enable all potential experimental features such us experimental detectors, experimental exporters, OTel Profiling and not just instrumentations?

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 16, 2024

@edeNFed would it be helpful to expose this as an option as well? Something like WithAllInstrumentation?
That way any distribution of this project could make all the experimental instrumentation enabled.

Do we want a single option to enable all potential experimental features such us experimental detectors, experimental exporters, OTel Profiling and not just instrumentations?

I'm open to other ideas. It was more of a general proposal.

What design do you think would work best?

@pellared
Copy link
Member

My personal preference is still to have experimental features as opt-in, but I think users should be able to enable all of them in a easy way.

Maybe WithExperimental option and OTEL_GO_AUTO_EXPERIMENTAL env var (defaults to false)?

@pellared
Copy link
Member

pellared commented Apr 18, 2024

I have talked with our Product Manager and I am putting the summary below.

We SHOULD have experimental features/instrumentation disabled by default.

The documentation MUST say which instrumentations produce stable telemetry and which are experimental.

We MUST have env var toggles for each feature/instrumentation so that users can disable them. E.g. https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/docs/config.md#instrumentations

We SHOULD have an env var toggle (we can add it even post-GA or someone explicitly asks for it) to control whether by default experimental features should be enabled or not. E.g. OTEL_GO_AUTO_EXPERIMENTAL which defaults to false.

Modified after #785 (comment)

@MrAlias
Copy link
Contributor Author

MrAlias commented Apr 18, 2024

I still think the stability guarantees we are going to have to provide on unstable instrumentation out-weigh the benefit to the uneducated user wanting to see lights light up.

We can always change the opt-in prior to the GA if we find users desire different behavior. Going the other way will cause frustration.

I think we should move ahead with the opt-in approach and have it so distros are enable to make a different decision.

@akubik-splunk
Copy link

Observations from introducing experimental features to commercial portfolio

  1. If it is on by default customers will expect it should be tested, relatively safe and supported - if it isn't they escalate and not always want to listen to reason
  2. If it is on by default and works for sometime customers become dependent on it and when it changes later priori GA they escalate and claim impact on their experience

In my opinion it should be opt-in with ability to enable all instrumentations or experimental items. It is clear then that this was a decision of the user (conscious or not).
Even if that approach does not give them all the fireworks it eventually gives better experience in adoption and business outcomes

@MrAlias
Copy link
Contributor Author

MrAlias commented Oct 22, 2024

The project goals are to head towards custom probes: #1105

Adding this would be a step in an alternate direction. Closing as we do not plan to go this direction anymore.

@MrAlias MrAlias closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

5 participants