Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure multiple providers #5

Open
pellared opened this issue Apr 26, 2023 · 17 comments
Open

Configure multiple providers #5

pellared opened this issue Apr 26, 2023 · 17 comments

Comments

@pellared
Copy link
Member

pellared commented Apr 26, 2023

It SHOULD be possible to configure multiple trace/meter/logger providers.

Reference https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#tracerprovider

Notwithstanding any global TracerProvider, some applications may want to or
have to use multiple TracerProvider instances,
e.g. to have different configuration (like SpanProcessors) for each
(and consequently for the Tracers obtained from them),
or because its easier with dependency injection frameworks.
Thus, implementations of TracerProvider SHOULD allow creating an arbitrary
number of TracerProvider instances.

EDIT: Therefore, the schema MUST accept multiple providers.

@jack-berg
Copy link
Member

Checkout out opentelemetry-specification#3437 which includes language around this. Although it uses "MUST" which should probably be reduced to "SHOULD" for consistency.

@pellared
Copy link
Member Author

pellared commented Apr 27, 2023

Just to make it clear. I think that the schema MUST accept multiple providers.

@tsloughter
Copy link
Member

@pellared why when you can create multiple providers by running multiple configurers?

@pellared
Copy link
Member Author

pellared commented Apr 27, 2023

@pellared why when you can create multiple providers by running multiple configurers?

I do not know the exact use cases but I already heard that some people are having e.g. 2 trace providers. I have quickly found one thread here.

I guess the reason is

e.g. to have different configuration (like SpanProcessors) for each provider

I think that it would be more flexible if we something more or less like

# Configure tracer providers.
tracer_providers: 
  - {}

instead of

# Configure tracer provider.
tracer_provider: {}

Also I think we would need to have something that would mark that given provider should be set as a "global provider".

@jack-berg
Copy link
Member

jack-berg commented Apr 27, 2023

DIsagree strongly. Multiple tracer providers in one file requires that they be named / identified and that the caller has some way to obtain the instance they want. This workflow is minimally different than simply having separate config files - one per provider - and sacrifices the user experience of everyone for an esoteric use case.

@pellared
Copy link
Member Author

pellared commented Apr 27, 2023

This workflow is minimally different than simply having separate config files

Alternatively OTEL_CONFIG_FILE env var might need to support multiple file paths.

How will we define that a provider has to be set as a "global provider"?

an esoteric use case.

I would not judge that is it esoteric if the specification calls it out and I have seen people asking for such things 🤷

@pellared
Copy link
Member Author

pellared commented Apr 27, 2023

Multiple tracer providers in one file requires that they be named / identified and that the caller has some way to obtain the instance they want.

  1. The names/IDs may be optional. They will be needed for some cross-cutting concerns between other elements of the config file (e.g. corelating a trace provider with a tracing instrumentation library).
  2. The "caller" needs to know which provider should be set a global provider.

@jack-berg
Copy link
Member

The spec says it should be possible to create multiple providers, but doesn't give any mechanism for identifying these providers or automatically configuring them. That's new to this proposal.

If multiple providers are possible and can be automatically configured via OTEL_CONFIG_FILE, then all instrumentation would have to be aware of this and decide which provider they want:

Map<String, TracerProvider> tracerProviders = Configuration.configure(System.getEnv("OTEL_CONFIG_FILE")) .. // Init multiple tracer providers from OTEL_CONFIG_FILE
  .getTracerProviders();
HttpServerInstrumentation.create(tracerProviders.get("tracer-provider1")); // Initialize http instrumentation with "tracer-provider-1"
DbInstrumentation.create(tracerproviders.get("tracer-provider2"))); // Initializer db instrumentation with "tracer-provider-2"

With one provider per file, its still possible to have multiple providers. The caller is just responsible for referencing each configuration file and passing the resulting providers to the appropriate place in the application:

TracerProvider provider1 = Configuration.configure("/config1.yaml").getTracerProvider();
TracerProvider provider2 = Configuration.configure("/config2.yaml").getTracerProvider();

HttpServerInstrumentation.create(provider1);
DbInstrumentation.create(provider2);

The names/IDs may not be needed unless we have some cross-cutting concerns between other elements of the config file (e.g. corelating a trace provider with a tracing instrumentation library).

If one tracer provider is the global, and that's indicated in the config file, and instrumentations don't select which provider they want because presumably they choose the global, then how are the non-global providers used?

@pellared
Copy link
Member Author

The spec says it should be possible to create multiple providers, but doesn't give any mechanism for identifying these providers or automatically configuring them. That's new to this proposal.

Correct. I think this is something that would part of "Configurer/Configuration API/structure". It does not have to be part of Traces/Metrics/Logs API. This is only something that needs to be parsed/processed during "telemetry pipeline" setup.

@pellared
Copy link
Member Author

Personally, I do not want to propose not decide "how" to do it. My proposals were just "drafts" to "visualize" the issue.

First of all, we should decide if this is something that is planned be addressed.

In my opinion, the Configuration Model MUST allow instantiating multiple providers of the same type to allow complex configurations. One of the reasons we want to use the Configuration Model is to allow setting up complex things which would be almost impossible using env vars.

@jack-berg
Copy link
Member

In my opinion, the Configuration Model MUST allow instantiating multiple providers of the same type to allow complex configurations.

That requirement is satisfied by the ability to have multiple configuration files / models. It's a great simplifying assumption to say that the configuration model defines one tracer provider / meter provider / logger provider configuration.

A user that insists on putting multiple configurations in a single file can always use the YAML syntax to define multiple documents in a single file:

---
resource: ...
tracer_provider: ...
---
resource: ...
tracer_provider: ...

And parse like:

List<TracerProvider> tracerProviders = ParseDocuments(new File("/multi-config.yaml"))
    .stream()
    .map(document -> Configuration.configure(document).getTracerProvider())
    .collect(toList());

TracerProvider provider1 = tracerProviders.get(0);
TracerProvider provider2 = tracerProviders.get(1);

Having multiple providers is an exceptional case. The link you posted reiterates that. We don't need to burden application owners with this detail, the vast majority of which will only be confused by why they need to define an array of providers. And we don't need to burden SDK / instrumentation authors with trying to figure out what to when multiple providers are present.

@MikeGoldsmith
Copy link
Member

I agree with @jack-berg - I think multiple providers per Configuration Model unnecessarily complicates the schema and doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

The example above can return multiple Configuration Models from the same file using multiple YAML documents will work and gives the same index based access returning multiple providers from the same model would.

@pellared
Copy link
Member Author

pellared commented Apr 28, 2023

The example above can return multiple Configuration Models from the same file using multiple YAML documents will work and gives the same index based access returning multiple providers from the same model would.

People using automatic instrumentation would not able to (easily?) profit from such approach.

doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

I do not get it. The schema would simply need to offer linking providers with other components (e.g. instrumentation libraries).

@tsloughter
Copy link
Member

Linking with instrumentation libraries?

I should say that having named providers would make life easier in Erlang because I guess we do something similar to automatic instrumentation (even though it doesn't actually instrument anything automatically, it just sets up the providers at boot time so tracers are available before dependencies boot). So the user is usually not going to run anything like start_tracer_provider but let it be done on boot based on configuration.

I like the multi-file approach though and was just resigning to let it be that the provided config file at boot will start the global providers and any additional providers would be created by the user by manually calling the configurer and providing names for them (providers are named processes in Erlang) at that time.

@MikeGoldsmith
Copy link
Member

doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

I do not get it. The schema would simply need to offer linking providers with other components (e.g. instrumentation libraries).

@pellared maybe I misunderstood. I have a few of follow-up questions:

  • Do you mean allow multiple providers be defined with a distinct name / ID, then when configuring users of providers (eg instrumentation libraries) you can provide the name / ID?
  • What would happen if you didn't set a provider name / ID, or it was invalid and still had multiple providers?
  • Does that mean we would need to define the default provider per signal type, so consumers of the config that don't know / care about multiple providers, can ask for a default?

@pellared
Copy link
Member Author

pellared commented May 8, 2023

Do you mean allow multiple providers be defined with a distinct name / ID, then when configuring users of providers (eg instrumentation libraries) you can provide the name / ID?

Correct.

What would happen if you didn't set a provider name / ID, or it was invalid and still had multiple providers?

I think the name/ID is optional. If none is provided then we say it is the "global" provider.

If the user provides multiple providers of the same types with the same name/ID then we should return a validation error.

Does that mean we would need to define the default provider per signal type, so consumers of the config that don't know / care about multiple providers, can ask for a default?

If name/ID is not defined then we assume that it is the default. If some component (e.g. instrumentation library) does not reference an provider explicitly then it should use the default one.

Additional "general" notes

I do not say we have to multiple providers support it up-front. My main concern is that I would prefer to have a design/structure/model which would allow such addition in future. Initially we can say that we support only one instance per provider type. Also, I totally agree that probably 95% (or more) of uses would not need this feature and by default users should not need to provide the provider ID/name.

@sandipb
Copy link

sandipb commented May 31, 2024

I don't know if this helps, but one use case that I think we have is having two kind of trace destinations - the default destination is GCP, and then for certain LLM parts of our application we would only like to send the traces to an LLM trace observability platform like Arize Phoenix. We dont want to send every trace to phoenix, and we have no use for LLM traces in GCP.
Currently, it seems like I can only create a single provider which has span processors for both GCP and phoenix. I would like to create a tracer specific to phoenix when I want and use a default one otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Not blocking stability
Development

No branches or pull requests

5 participants