Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that file config has solution for platforms which contribute to config #3966

Closed
jack-berg opened this issue Mar 28, 2024 · 6 comments · Fixed by open-telemetry/opentelemetry-configuration#106
Assignees
Labels
area:configuration Related to configuring the SDK triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned

Comments

@jack-berg
Copy link
Member

As discussed here, we want to ensure that platforms (Azure functions, otel operator, AWS lambda) which want to contribute to configuration have a mechanism to do so.

There comments alluding to limitations on these systems (i.e. some may not be able to read user configuration file, some may not be able to write to the file system, etc) scatter across #3948 and #3752. We should use this ticket to centralize requirements gathering. In particular:

  • What are the limitations of the systems that wish to contribute to config?
  • What types of config do these systems need to contribute?

Several solutions have been proposed informally, each with tradeoffs and limitations. Hopefully a clear winner emerges once we have a clear idea of the requirements.

The file config working group should prioritize this work, as accommodating these scenarios will be key to adoption and proving the validity of file config.

@jsuereth
Copy link
Contributor

jsuereth commented Apr 4, 2024

Sadly I only had a few minutes to update this bug with some use cases, but here's the first two:

Provide OTLP endpoints based on runtime environment.

Ideally a platform could "cooperate" with config authors by providing known values for available OTLP endpoints to SDKs.

E.g. imagine the developer only has to write one default configuration for use in these environments:

  • When running in production: the OTEL Operator providing knowledge to SDKs of exposed endpoints instead of requiring manual configuration of those endpoints.
  • When running in integration tests, the integration test framework specifies OTLP endpoints to collect data for enforcing semantic conventions.

While we don't expect every configuration to engage with this and we think developer needs will expand/grow beyond simple OTLP injection, I believe that use case is already covered via the configuration file and future OpAMP work. However, the simple "it works OOTB" features, e.g. for integration test frameworks or OTEL operator use case would greatly augment / simplify things for developers.

Provide Resource attributes from runtime environment

Today, it's somewhat expensive to get consistent resource attributes in SDKs. To avoid conflciting with other platforms or k8s operators, you need to provide a ResourceDetector per-language across all languages in OpenTelemetry and hope users configure it. A more consistent approach would be for an environment variable (or set of variables) that compose well and allow platforms to inject known attribtues for OpenTelemetry users.

  • Imagine default k8s env variables that all otel users could enjoy rather than being forced to use downward API or have otel-oeprator rewrite their podspecs?
  • Imagine all cloud providers placing OTEL ids and lookups in all managed platforms

@jack-berg
Copy link
Member Author

When running in production: the OTEL Operator providing knowledge to SDKs of exposed endpoints instead of requiring manual configuration of those endpoints.

I think the operator use case is already accommodated well with only env var substitution. The operator installs a variety of software, but the main things are the collector and language auto instrumentation. For the collector, the guidance straight for the docs is to include a chunk of configuration YAML in the spec definition, with the expectation that the user modifies the YAML to suit their needs.

For auto instrumentation, the operator docs state the expectation that the OTLP endpoint will automatically be set to http://otel-collector:4317, corresponding to the collector instance you presumably also setup with the operator. With file config available, the operator should take auto instrumentation installation in the direction of the collector, and update the installation snippet to include a default chunk of file config YAML to configure the SDK which the user can customize to suit. The default chunk of yaml can include default OTLP endpoints which point to the http://otel-collector:4317, which sets the user expectation that if they change it they're doing so knowingly. Or, the default chunk of yaml can include an env var substitution reference to ${OTEL_EXPORTER_OTLP_ENDPOINT} which the operator can set automatically, and which the user can reliably reference anywhere they in their config file they wish to configure OTLP.

This is a good user experience. It results in more symmetry between auto instrumentation and collector workflows with the operator, and allows the user to have much more control over SDK configuration than is available today. The workflow today requires users to modify a k8s YAML spec to modify env vars for config changes. With file config, config changes are also accomplished by modifying a k8s YAML spec, which includes an embedded YAML config snippet. There is no degradation of experience for the user.


When running in integration tests, the integration test framework specifies OTLP endpoints to collect data for enforcing semantic conventions.

I don't see what about this requires the integration test framework to specify OTLP endpoints via env vars versus a config file. If anything, env vars are more difficult to set in testing frameworks since there are often limitations to a test runtime modifying env vars and restoring state after test completion. Its quite common for integration tests which spin up infrastructure to include references to config files for their dependencies. For example, in otel java we have an integration test which round trip verifies otel java SDK -> collector -> java process, and we include a collector configuration reference here.


you need to provide a ResourceDetector per-language across all languages in OpenTelemetry and hope users configure it. A more consistent approach would be for an environment variable (or set of variables) that compose well and allow platforms to inject known attribtues for OpenTelemetry users.

I'd like to think that this is the crux of the issue. Similar to how github actions has default environment variables which are well documented and which a user can choose to include in their script, I expect that platforms can document that common config options like the preferred OTLP endpoint are exposed as well known environment variables for the user to reference with env var substitution. This is a good solution because it allows users to easily use platform config without requiring it. Granting platforms the power to force opentelemetry configuration which cannot be opted out of would not be desirable.

The problem is that the key/value nature of resource attributes doesn't map well to env var substitution, and so we need to find a solution for this.

We talked about this topic in the 4/15/24 config SIG and I proposed that we limit the scope of how platforms can influence configuration to only what is strictly required given constraints in those systems. If a platform can accomplish its objectives for influencing config by reading / writing config file content, it should do so. Its desirable for platforms to make standard information like OTLP endpoint available through environment variables and document that users can / should reference those in config files.

I propose we do not introduce a generic mechanism for platforms to influence config. Instead, we evaluate requirements on a case by case basis, and introduce case-specific solutions as needed. The litmus test for whether a case requires special treatment might be:

  • If the platform has restrictions that prevent reading / writing / modifying a user's config file
  • AND the configuration cannot be accomplished via env var substitution from env vars made available by the platform for optional use
  • AND there is evidence that the problem is not unique to a single platform
  • THEN we consider introducing a special accommodation.

So far, the only special case that I'm aware of is the platform contribution of resource attributes. To solve, we might:

  • State that file config merges attributes from OTEL_RESOURCE_ATTRIBUTES
  • Introduce a new env var specifically for this purpose, i.e. OTEL_CONFIG_FILE_PLATFORM_RESOURCE_ATTRIBUTES. This env var would be ignored outside of file config. Platforms would use it to provide resource attributes without needing to modify config file contents.
  • Introduce a property in the file config schema with semantics which match OTEL_RESOURCE_ATTRIBUTES, so that env var substitution can be used:
resource:
  attributes:
     service.name: foo
     other-key: bar
  # A comma separated list of key value pairs which is merged with .resource.attributes. E.g. key1=value,key2=value.
  attribute_key_value_list: ${OTEL_RESOURCE_ATTRIBUTES}

WDYT? Are there other special cases to consider which pass the litmus test I propose?

@jack-berg
Copy link
Member Author

cc @zeitlinger

@zeitlinger
Copy link
Member

WDYT?

Looks great 💯

Option 3

attribute_key_value_list: ${OTEL_RESOURCE_ATTRIBUTES}

This explicit option aligns best with the current design of config files in general, so I'm in favor of this option.

IIUC, a more realistic use case would be attribute_key_value_list: ${AWS_OTEL_RESOURCE_ATTRIBUTES} - which would conform to the requirement you put forward (and which I agree to):

Granting platforms the power to force opentelemetry configuration which cannot be opted out of would not be desirable.

What would happen if AWS_OTEL_RESOURCE_ATTRIBUTES is not present?
Will it simply fall back to the empty list? Or would users have to mark the env var as optional, e.g. using ${AWS_OTEL_RESOURCE_ATTRIBUTES:-}

Other special cases

Are there other special cases to consider which pass the litmus test I propose?

OTEL_EXPORTER_OTLP_HEADERS is important if the platform is responsible for managing a fleet of collectors - which I've seen in the real world.

@jsuereth
Copy link
Contributor

A bit late:

When running in production: the OTEL Operator providing knowledge to SDKs of exposed endpoints instead of requiring manual configuration of those endpoints.

I think the operator use case is already accommodated well with only env var substitution. The operator installs a variety of software, but the main things are the collector and language auto instrumentation. For the collector, the guidance straight for the docs is to include a chunk of configuration YAML in the spec definition, with the expectation that the user modifies the YAML to suit their needs.

For auto instrumentation, the operator docs state the expectation that the OTLP endpoint will automatically be set to http://otel-collector:4317, corresponding to the collector instance you presumably also setup with the operator. With file config available, the operator should take auto instrumentation installation in the direction of the collector, and update the installation snippet to include a default chunk of file config YAML to configure the SDK which the user can customize to suit. The default chunk of yaml can include default OTLP endpoints which point to the http://otel-collector:4317, which sets the user expectation that if they change it they're doing so knowingly. Or, the default chunk of yaml can include an env var substitution reference to ${OTEL_EXPORTER_OTLP_ENDPOINT} which the operator can set automatically, and which the user can reliably reference anywhere they in their config file they wish to configure OTLP.

This is a good user experience. It results in more symmetry between auto instrumentation and collector workflows with the operator, and allows the user to have much more control over SDK configuration than is available today. The workflow today requires users to modify a k8s YAML spec to modify env vars for config changes. With file config, config changes are also accomplished by modifying a k8s YAML spec, which includes an embedded YAML config snippet. There is no degradation of experience for the user.

The issue here is that it neglects a lot of common security / namespace / scaling issues in k8s. What if the operator needs to run more than one collector service? E.g. running both a deployment + daemonset to deal with scalability issues of daemonset. In this instance, you'd like to configure the endpoint to be different. While I appreciate very much having a default endpoint and that enables a lot of use cases, I still think users having a "no-code" or "no-deep-code-understanding" mechanism of altering the endpoint would be ideal.

If it turns out Ops / Platform teams are comfortable modifying complex otel configuration, then I'd be happy to be wrong, but I still think this is a valid use case that should have a solution.

@jack-berg
Copy link
Member Author

FYI, I have a PR which would resolve this issue here: open-telemetry/opentelemetry-configuration#106

This reflect my comment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:configuration Related to configuring the SDK triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned
Projects
Status: Spec - Closed
5 participants