Refactor for env vars #69

jack-berg · 2024-01-30T15:54:49Z

Refactor the schema to be more friendly to an environment variable override scheme derived from the model names.

An example of such a scheme is described in this document.

marcalff

If I understand correctly, the motivation for replacing the array under the tracer provider span processors is that accessing arrays by index causes difficulties.

This is replaced by processor_pipeline, which is ... an array.

Why would accessing elements in the processor_pipeline array be any easier compared to accessing elements in the processor array ?

What is gained by this model change ?

For example, if during a merge a user wants to add one more processor to send traces to the console for debug, without knowing what processors are already defined, why would the syntax to add a simple/console entry under the pipeline be better compared to the syntax to add a new simple/exporter/console processor under processors ?

jack-berg · 2024-02-05T20:30:15Z

Arrays of strings are more manageable than arrays of objects.

If you have array of strings, you can say that you can override it via an environment variable which is comma separated. In the example of adding a new console entry, you might have:

tracer_provider:
  processors:
    batch:
      exporter:
        otlp:
  processor_pipeline: [batch]

And set the following environment variables:

OTEL_CONF_TRACER_PROVIDER__PROCESSOR_PIPELINE=batch,simple
OTEL_CONF_TRACER_PROVIDER__PROCESSORS__SIMPLE__EXPORTER__CONSOLE={}

Which would result in:

tracer_provider:
  processors:
    batch:
      exporter:
        otlp:
    simple:
      simple:
        console:
  processor_pipeline: [batch,simple]

This assumes the rules are such that if you have an environment variable that references a structure that does not exist, you create that structure. I.e. OTEL_CONF_TRACER_PROVIDER__PROCESSORS__SIMPLE__EXPORTER__CONSOLE references .tracer_provider.processors.simple.exporter.console, so we create that structure.

Supporting arrays of objects requires that you support a more complicated env var syntax discussed, which has the types of sharp edges discussed in this document.

trask · 2024-02-06T19:41:39Z

README.md

+         exporter:
+           otlp:
+             endpoint: http://remote-host:4317      
+     processor_pipeline: [batch/local, batch/remote]


why is processor_pipeline needed?

Versus just using relying on the map of .tracer_provider.processors?

Order matters with processors and there's no guarantee that libraries used to parse YAML will provide access to map entries in the order they appear in the YAML. In java, I believe snakeYAML uses LinkedHashMap internally to represent objects so that order is retained, but there's no guarantee this is true in other implementation.

Additionally, the process pipeline gives an easy way to disable a processor via env variable without having to support some special env var syntax to "erase" an object. I.e. to remove batch/remote, set OTEL_CONF_TRACER_PROVIDER__PROCESSOR_PIPELINE=batch/local.

trask · 2024-02-06T19:45:12Z

examples/kitchen-sink.yaml

 disabled: false

 # Configure general attribute limits. See also tracer_provider.limits, logger_provider.limits.
 attribute_limits:
  # Configure max attribute value size.
  #
-  # Environment variable: OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT
+  # Environment variable: OTEL_CONF_ATTRIBUTE_LIMITS__ATTRIBUTE_VALUE_LENGTH_LIMIT


do you think the _CONF_ portion is necessary?

It seemed like a good way to clearly distinguish between this new environment variable schema and the old. Not strictly required, but the common prefix catches the eye and there is no mistaking which scheme it refers to.

We of course should weigh this against characters it adds to every single environment variable.

I'm personally in favor of dropping it if the plan is for these to be the (one set of) recommended env vars going forward

trask · 2024-02-06T20:04:16Z

examples/kitchen-sink.yaml

+          protocol: http/protobuf
+          # Configure endpoint.
+          #
+          # Environment variable: OTEL_CONF_LOGGER_PROVIDER__PROCESSORS__BATCH__EXPORTER__OTLP__ENDPOINT


I don't love the double underscores. Spring has a nice mapping, which would result instead in:

OTEL_CONF_LOGGERPROVIDER_PROCESSORS_BATCH_EXPORTER_OTLP_ENDPOINT

I understand the ambiguity that brings tho...

I understand the ambiguity that brings tho...

To make this concrete..

Suppose we want to change the max_queue_size of the batch span processor given the following config

tracer_provider: processors: batch: exporter: otlp: endpoint: http://endpoint1 processor_pipeline: [batch]

If we use spring's mappings, we'd set something like:

OTEL_CONF_TRACERPROVIDER_PROCESSORS_BATCH_MAXQUEUESIZE=10

In order to apply this override, we have to know the structure of the schema ahead of time. If we don't, and just blindly try to update the raw YAML, we'd get:

tracer_provider: processors: batch: exporter: otlp: endpoint: http://endpoint1 processor_pipeline: [batch] tracerprovider: processors: batch: maxqueuesize: 10

In this case, we know that the batch processor has a property called max_queue_size. So when evaluating, we can take the position of this property .tracer_provider.processors.batch.max_queue_size, apply a generic transformation yielding OTEL_CONF_TRACERPROVIDER_PROCESSORS_BATCH_MAXQUEUESIZE and check if any environment variable is set which matches.

But we don't know the schema ahead of time for custom components. Consider a custom span exporter named foo with a property called my_property:

tracer_provider: processors: batch: exporter: foo: my_property: http://endpoint1

Because we don't know the schema of the foo exporter ahead of time, we have to way to tell that OTEL_TRACERPROVIDER_PROCESSORS_BATCH_EXPORTER_FOO_MYPROPERTY should target my_property or myproperty.

Prior art that uses the __ include python's dynaconf and .NET Core.

Prior art that uses the __ include python's dynaconf and .NET Core.

👍

trask · 2024-02-06T20:07:59Z

examples/kitchen-sink.yaml

+    batch:
+      # Configure delay interval (in milliseconds) between two consecutive exports.
+      #
+      # Environment variable: OTEL_CONF_TRACER_PROVIDER__PROCESSORS__BATCH__SCHEDULE_DELAY


wdyt of shortening the top-level config keys:

tracer_provider -> traces

meter_provider -> metrics

log_provider -> logs

or if we think that's too ambiguous (e.g. tracing instrumentation could be considered part of traces), maybe having a top-level sdk config node, then the resulting env vars would be something like:

sdk_traces_

sdk_metrics_

sdk_logs_

The competing concerns with shortening top level keys are that on one hand we want shorter env vars, but on the other hand we want the config to mirror the SDK concepts of the specification. I.e. renaming tracer_provider to traces blurs the fact that you're configuring a tracer provider.

or if we think that's too ambiguous (e.g. tracing instrumentation could be considered part of traces), maybe having a top-level sdk config node, then the resulting env vars would be something like:

There's other to level attributes as well like .attribute_limits, .disabled, and .resource. The sdk disambiguates, but becomes repetitive when applied to all the properties. Begs the question of why not just nest all the top level properties under .sdk (i.e. .sdk.tracer_provider, .sdk.meter_provider, or .sdk.traces, .sdk.metrics)

The instrumentation question is interesting because its not clear to what extent instrumentation config will be normalized across languages, and to what extent we should even try this (if the long term vision is instrumentation moving to native, we can't force libraries to follow standard configuration).

One answer is to leave all the SDK config top level, and say that all instrumentation config is nested under .instrumentation, or something to that affect.

marcalff · 2024-02-06T20:54:18Z

examples/anchors.yaml

@@ -8,35 +8,36 @@ exporters:
    certificate: /app/cert.pem
    client_key: /app/cert.pem
    client_certificate: /app/cert.pem
-    headers:
-      api-key: !!str 1234
+    headers: "api-key=1234"


I have some concerns with this.

Will it be possible later to not have the value appear in the config.yaml / environment variables content, and have a yaml node that represents a secret instead ?

Sorry I'm not grasping the question. Can you elaborate?

When some fields are secrets, like a password, or here an auth token:

headers: foo: 12 bar: 24 api-key: actual_auth_token <-- visible in plain view

The model can be changed to separate key names and key values:

headers: - key: name: foo value: 12 - key: name: bar value: 24 - key: name: api-key value: actual_auth_token <-- still visible in plain view

and then, a key value can be represented by an alternative node, which is a named proxy:

headers: - key: name: foo value: 12 - key: name: bar value: 24 - key: name: api-key secret: secret_name: "vendor_auth_token" <-- nothing disclosed here

The secret name, "vendor_auth_token", is just a name and not the actual value.
The actual value (actual_auth_token) is retrieved from a secure location, referenced by name (vendor_auth_token).

The concern is that serializing key/value pairs in a comma delimited string prevents this.

Ah I see. And that syntax is what is used in kubernetes, so there's definitely (popular) prior art for this type of syntax.

A counter argument is that OTEL_EXPORTER_OTLP_HEADERS has been around for a while and I haven't heard many (any?) issues with the fact that its a comma separated list of key value pairs. This design is essentially saying that because a comma separated list was "good enough" for OTEL_EXPORTER_OTLP_HEADERS, that its probably good enough for file configuration too. The trade off of course is expressiveness vs. ease of overriding w/ environment variables. The syntax you give above is highly expressive but not easily overridden with environment variables. Treating headers as a flattened comma separated list is not as expressive but is rally easy to override with environment variables for most cases.

In a world pre config.yaml, the application is configuring the SDK by writing explicit code.

When doing so, some configuration can be done using environment variables as a help, or ignoring environment variables and doing everything programmatically, or both.

Disclosure: For a major app at work, not a single environment variable is used, everything is done programmatically, finding the configuration data "elsewhere".

Now, in a setup using config.yaml, the main advantage of unifying all the configuration in a yaml document is that the entire configuration can change, without having to change a single line of code in the application.

Having auth tokens in a key/value delimited string in an environment variable (for people willing to use this), or in code (for people not willing to use this) was workable before, because the app code was ad hoc anyway.

In a world where all is unified with a yaml config file, parts which stands out for specific reasons (as in, I would like all the config in yaml ... except the sensitive part) are more likely to cause friction and stand out, compared to a pre config.yaml usage. What was good enough before may not be good enough now.

If there is a choice to make between config.yaml and security, it is already made: config.yaml will not be used, period.

So, all the work with configuration in general should have provisions to avoid these pitfalls.

As a side note, using key/value pairs in comma delimited strings (this PR) also broke the use case involving variables substitution:

headers: foo: 12 bar: 34 api-key: ${MY_VENDOR_TOKEN}

This setup is no longer working.

tylerbenson · 2024-02-26T19:14:00Z

schema/common.json

@@ -34,7 +25,7 @@
                    "type": "string"
                },
                "headers": {
-                    "$ref": "#/$defs/Headers"
+                    "type": "string"


Is it possible for the schema to support a union type? (IE, this value can be either a string or a list)
If so, the user could define their yaml as

headers: "header1, header2"

(Which would work well for env variable substitution)
or

headers: - header1 - header2

jack-berg added 3 commits January 29, 2024 17:34

Refactor to remove object arrays

9b5eb6b

Update environment variable names to reflect generated schema

3767326

Define environment variable friendly modeling rules

eebd269

This was referenced Jan 30, 2024

Define OTEL_CONFIG_FILE with placeholder for new environment variable override scheme open-telemetry/opentelemetry-specification#3850

Closed

Should file configuration merge environment variable configuration? open-telemetry/opentelemetry-specification#3752

Closed

marcalff reviewed Feb 5, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

trask reviewed Feb 6, 2024

View reviewed changes

marcalff reviewed Feb 6, 2024

View reviewed changes

marcalff mentioned this pull request Feb 9, 2024

Improve key/value pairs in headers #72

Closed

tylerbenson reviewed Feb 26, 2024

View reviewed changes

jack-berg closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor for env vars #69

Refactor for env vars #69

jack-berg commented Jan 30, 2024

marcalff left a comment

jack-berg commented Feb 5, 2024

This comment was marked as outdated.

trask Feb 6, 2024

jack-berg Feb 6, 2024

trask Feb 6, 2024

jack-berg Feb 6, 2024

trask Feb 6, 2024

trask Feb 6, 2024

jack-berg Feb 6, 2024

trask Feb 6, 2024

trask Feb 6, 2024

jack-berg Feb 6, 2024

marcalff Feb 6, 2024

jack-berg Feb 6, 2024

marcalff Feb 6, 2024 •

edited

Loading

jack-berg Feb 7, 2024

marcalff Feb 8, 2024

tylerbenson Feb 26, 2024

Refactor for env vars #69

Refactor for env vars #69

Conversation

jack-berg commented Jan 30, 2024

marcalff left a comment

Choose a reason for hiding this comment

jack-berg commented Feb 5, 2024

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcalff Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcalff Feb 6, 2024 •

edited

Loading