-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC][DO NOT MERGE] Pipelines without a service #5149
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5149 +/- ##
==========================================
- Coverage 90.34% 90.00% -0.34%
==========================================
Files 182 183 +1
Lines 11031 11150 +119
==========================================
+ Hits 9966 10036 +70
- Misses 840 882 +42
- Partials 225 232 +7
Continue to review full report at Codecov.
|
receiverFactory := otlpreceiver.NewFactory() | ||
receiverCfg := receiverFactory.CreateDefaultConfig().(*otlpreceiver.Config) | ||
receiverCfg.HTTP = nil // I need to explicitly nil HTTP, since this is done in Unmarshal usually | ||
receiverCfg.GRPC = &configgrpc.GRPCServerSettings{ | ||
NetAddr: confignet.NetAddr{ | ||
Transport: "tcp", | ||
// I can't really express 'use the default gRPC settings here' as one can do by setting 'grpc:' on the YAML | ||
Endpoint: "0.0.0.0:4317", | ||
}, | ||
// I only know this by reading the code | ||
ReadBufferSize: 512 * 1024, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, to me, is the hardest problem to solve: declaring configuration for components is very brittle and has a big number of gotchas. The ones that I have thought about are:
Problem 1: Can't express default configuration (nil but present sections)
On the OTLP receiver and other components that have a custom unmarshaling function, we rely on having empty sections for configuration:
# Example: enable only HTTP with default endpoint
otlp:
protocols:
http:
We can't express that on the Go API easily (setting to nil disables them), and we need to explicitly declare the endpoint. In fact, for the OTLP receiver, we need to explicitly disable an endpoint by setting to nil instead of the other way around.
Problem 2: Users can't rely on zero values
Configuration structs on the different components don't usually have the zero value of the fields as the default. For example, one may be tempted to set the default configuration for gRPC connection as:
// WRONG: ReadBufferSize is 0, which disables read buffering (not the default)
receiverCfg.GRPC = &configgrpc.GRPCServerSettings{
NetAddr: confignet.NetAddr{
Transport: "tcp",
Endpoint: "0.0.0.0:4317",
},
}
while disabling the read buffer is not necessarily catastrophic, in other situations it can lead to more serious unexpected side effects. This can be partially mitigated by components if they assign meaning to zero values, but this is not always possible (e.g. what about booleans?)
Problem 3: Can't express overrides
Imagine a component with a config like the following:
timeout: <time.Duration, 0 disables timeout>
foo:
timeout: <time.Duration, 0 disables timeout, overrides timeout for foos>
bar:
timeout: <time.Duration, 0 disables timeout, overrides timeout for bars>
A component can express this right now via a custom unmarshaling function, by checking if foo::timeout
or bar::timeout
are set and overriding if so. In the Go API, we can't really do this:
cfg := component.Config{
Timeout: 10*time.Second,
Foo: Foo{ Timeout: 30 *time.Second },
// Pipeline doesn't know that Bar was unset, so it doesn't know that it should ignore Bar 0 value and use top-level timeout instead
}
I think all of these are solvable by adding a configuration builder for each component like otlpreceiver.NewConfig(options ...otlpreceiver.ConfigOption)
and nudging users toward using those, but it needs individual component support. Furthermore, since the configs of components are meant to have all fields public and behave like PODs, users can be tempted to use the configuration struct directly regardless. This is not very worrisome (nobody does zap.Logger{}
, even if you can), but it's yet another point of friction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mx-psi I understand all these possible problems, but I am still not sure I understand what are you trying to solve. In your case if I understood correctly you have a "hardcoded" pipeline, and you don't need to let users define the config. Do you care that you have to do some "hardcoded" config? What problem are you trying to solve with this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure I understand what are you trying to solve. In your case if I understood correctly you have a "hardcoded" pipeline, and you don't need to let users define the config
Let me try to describe my use case in more detail. We use the Collector on the Datadog Agent, an observability daemon like the Collector. We have fixed pipelines for different telemetry signals with
- an OTLP receiver,
- some processing (e.g. batching) and
- a custom exporter that converts to the Datadog format and forwards the telemetry data internally to existing components of the Datadog Agent.
Parts of the configuration for the pipelines are hardcoded (e.g. what components the pipeline has), while other parts are customizable by end-users via YAML (but not necessarily in the format of the Collector).
Do you care that you have to do some "hardcoded" config?
I am okay with hardcoding things. We currently use a custom ConfigProvider that assembles a config.Map
from the YAML configuration and the fixed settings, and we run a service.Collector
instead of an individual pipeline.
However, right now I either have to hardcode things in a config.Map or deal with the problems described in my message above, neither option being ideal.
What problem are you trying to solve with this proposal?
The overarching problem I am trying to solve is to run a pipeline within another existing application and integrating it with this application (e.g. by integrating the logging, metrics, CLI or configuration). In particular, this PoC stemmed from my need for overriding telemetry providers so that we can eventually access the different components' telemetry.
I hoped to solve this with #5107, but I understand that service.Collector
may not fit my use case and I want to take your concerns seriously, thus I am exploring the second option of getting rid of it entirely and just having a lightweight pipeline builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my comment about What problem are you trying to solve with this proposal?
was not clear, I was referring to the config part not the initial problem you are trying to solve.
So for configuration let me see if I understood the problem:
Datadog Agent wants to provide some configuration for some of the components in this hardcoded pipeline that they need, in a yaml format that is not the collectors yaml
Some solutions that I see:
- You translate the Datadog's YAML into collector's YAML (something that you do right now). In this you need indeed to take care of the default config etc, because the "Unmarshal" function on the component config may make some assumptions about the default configuration.
- You translate the Datadog's YAML into collector's component Config (the unmarshalled config). Here you don't care about the "default config" etc. You define your own rules on how to configure components.
- You want the same component yaml config for Components that you use in the hardcoded pipeline, but in a different high-level yaml structure (no top-level fields like
receiver
,service
, etc.). In this case you should parse that as amap[string]interface{}
->config.Map
-> Maybe do any conversion to the Map -> Unmarshal into the componentConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my comment about What problem are you trying to solve with this proposal? was not clear, I was referring to the config part not the initial problem you are trying to solve.
Looks like I misunderstood you but I think I managed to answer the question anyway :)
So for configuration let me see if I understood the problem:
Datadog Agent wants to provide some configuration for some of the components in this hardcoded pipeline that they need, in a yaml format that is not the collectors yaml
I think this is a good summary, thanks for taking the time to work with me on this. Right now indeed we are using solution (1), which works even if a bit cumbersome to use. Solution (2) has the pitfalls I described above and I don't feel like it's safe to use with the current API design.
In any case, I still don't feel like this solves the original 'overriding telemetry providers' problem, in that I wouldn't want to use code building pipelines like the one on this PR if it did not live on the Collector repo: things today are mostly designed to work only with the 'Collector distro' use case in mind, and I would worry that I can't reliably keep using the latest version of upstream. Do you think it makes sense to have pipeline-building code on go.opentelemetry.io/collector
? Is this better supported by eventually exposing service.Service
or do we need this different thing?
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
After thinking about this, I don't think this is a solution that offers many advantages over just eventually exposing
|
Description:
This is a PoC for thinking about how pipelines without using
service.Collector
(orservice.service
, if it ever gets exposed) would look like. It's meant to spark discussion more than as actual code to be mergedLink to tracking Issue: Relates to discussion on #4970
Additional notes:
This only supports metrics (since it is what I had an easy testcase to run against). It includes an example or how to use the API in
ExampleNewBuilder
.There are several limitations that can be improved with more code:
Compose(exporters ...component.Exporter) component.Exporter
kind of function).consumer.Metrics|Traces|Logs
instead).There are several limitations that I find harder to think about how to overcome, mostly related to configuration. These are detailed below on a comment