Configuration reloading logic and customizability #5966

swiatekm · 2022-08-25T16:59:31Z

Is your feature request related to a problem? Please describe.
Right now, the collector reloads configuration whenever it gets a notification from any of its configuration sources. In practice, this never happens, as watching is not implemented in any of the existing config providers, but it will the moment that is added.

I don't think this is behaviour users would universally want. In many contexts, you only want to reload the configuration on-demand. Kubernetes is one example of this, where mounted ConfigMaps can be remounted if they change, but it's not desirable behaviour for an application to reload the config without the Pod having been recreated. Running as a systemd unit is another example.

I'd like to figure out what is the best way of letting the user control how the collector reloads configuration. I'd like to explore different approaches - for example, many applications do this on receipt of a SIGHUP signal.

Describe the solution you'd like
I don't know what the right set of solutions is, but I think it should allow:

the user to decide if they want automatic configuration reload based on provider notification, maybe per provider
the user to be able to reload configuration on demand without restarting the collector process

In particular, I'm not sure what the default should be.

Describe alternatives you've considered
Some possible solutions:

Let automatic reloading be controlled by command line options
for example:
```
otelcol --config config.yaml --enable-hot-reload
```
Let automatic reloading be controlled by a section in the configuration
```
service:
  config:
    hotReload: true
```
unconditionally reload configuration on receipt of a SIGHUP signal

I'm open to other ideas.

Additional context
This originally came up because I wanted to add notifications to the fileprovider in #5945 . @TylerHelmuth suggested it should start off behind a feature flag, and I reconsidered whether it should even ultimately be enabled for all users. I looked at how this is done in some other agents, and this issue is the result.

bogdandrutu · 2022-08-29T16:44:06Z

Kubernetes is one example of this, where mounted ConfigMaps can be remounted if they change, but it's not desirable behaviour for an application to reload the config without the Pod having been recreated. Running as a systemd unit is another example.

I am confused about this, I don't think the remounted happens for the existing pods. Imagine an app crash, then restart will pick the new version? Can you investigate a bit more?

(maybe?) for the collector to keep using older working configuration if the new one is invalid
keep the last known working configuration and fall back to it if the new one is invalid

I think it is almost impossible in case of a restart for example, also it becomes impossible to know what version of the config etc. For use cases like this we should create a provider to a proper "configuration" system that deals with all these complications.

unconditionally reload configuration on receipt of a SIGHUP signal

Where is this coming from? SIGHUP seem to mean other things, but I don't have that much experience with this signal.

the user to be able to reload configuration on demand without restarting the collector process

What does it mean? Is this what we have right now?

Can you please re-work/clarify the goals section. Then we can clarify the options, because right now the "possible solutions" section is not clear that I can pick one of that and will solve this problem, it feels more or less a random list of possible things to do and combine that may get us to the "solution".

swiatekm · 2022-08-30T13:54:34Z

I am confused about this, I don't think the remounted happens for the existing pods. Imagine an app crash, then restart will pick the new version? Can you investigate a bit more?

I was also surprised, but it does absolutely happen: https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automatically. To make the thing even more confusing, whether this happens depends on the mount type as well. All in all, it's kind of a mess.

Either way, this was just an example of a situation where the user may explicitly not want to reload their config file when it changes.

Where is this coming from? SIGHUP seem to mean other things, but I don't have that much experience with this signal.

It's conventionally used for configuration reloading by daemons, which aren't attached to a terminal and therefore have no use for it otherwise. Some examples from similar projects:

https://docs.fluentd.org/deployment/signals#sighup
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration
https://docs.influxdata.com/influxdb/cloud/telegraf-configs/update/
also the case for nginx, apache, sshd, and so on

This is actually something I think we could easily and safely add. We already catch signals in the main collector loop and configuration reloading is already implemented.

swiatekm · 2022-08-30T14:03:07Z

What does it mean? Is this what we have right now?

What we have right now is automatic reload based on notifications from the config provider. I want to let users have more control over this. Right now, there's not even a way to opt-out of it.

Can you please re-work/clarify the goals section. Then we can clarify the options, because right now the "possible solutions" section is not clear that I can pick one of that and will solve this problem, it feels more or less a random list of possible things to do and combine that may get us to the "solution".

Sorry about that, this is not a very streamlined issue, and I'm not sure what the right set of solutions is. I want to give users more control over configuration reloading, and I'd like to see what options are on the table for that.

portertech · 2022-08-30T18:52:09Z

@swiatekm-sumo you thinking something like this? sensu@995cba5

portertech · 2022-08-30T19:06:18Z

Works https://gist.github.com/portertech/63a33aca67bc0468120abc924eda49e3

bogdandrutu · 2022-08-30T20:21:02Z

@portertech I would prefer to not jump and add more "legacy" things to the collector. We should define the "solution" first then add more functionality.

bogdandrutu · 2022-09-09T02:29:05Z

@swiatekm-sumo I thought more also about this, I think we should do the following:

Add a setting to our "service.CollectorSettings" for the Collector instance to watch or not for config provider updates. Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.
Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

codeboten · 2022-09-09T20:40:35Z

1. Add a setting to our "service.CollectorSettings" for the Collector instance to watch or not for config provider updates. Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.

2. Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

+1 to @bogdandrutu's suggestions. And from the examples stated above, I do believe using SIGHUP is appropriate here. See the following excerpt from the wikipedia entry https://en.wikipedia.org/wiki/Signal_(IPC)#SIGHUP:

Many daemons (who have no controlling terminal) interpret receipt of this signal as a request to reload their configuration files and flush/reopen their logfiles instead of exiting.[11] nohup is a command to make a command ignore the signal.

swiatekm · 2022-09-19T14:20:39Z

Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.

Technically it can, as we read the config normally before starting the main collector loop, unless I'm misunderstanding something. Command-line switch is probably the better choice in general though.

I can try to add this if we have agreement on the mechanism.

Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

Do we need this to be configurable? I can't think of any circumstance where you'd need to disable it. There's no reason to send this signal to the collector other than this functionality.

bogdandrutu · 2022-09-19T22:13:12Z

Do we need this to be configurable? I can't think of any circumstance where you'd need to disable it. There's no reason to send this signal to the collector other than this functionality.

I think it is fine, since if it causes troubles we can have a flag/config to disable it.

swiatekm · 2022-09-20T08:03:00Z

Cool, in that case we have an implementation ready in #6000. CC @portertech

bogdandrutu · 2022-09-21T00:15:13Z

@swiatekm-sumo I think that is just the initial part. There are more to clarify, e.g. what is the behavior/interaction with watchers?

Do we reload if no watcher triggered before? Assuming all the configs have proper watching implement and we know nothing change do we do load and check with previous config?
Do we reload even if new loaded config == old loaded config?
If watching enable do we ignore SIGHUP?
If watching enabled and reload in progress do we ignore the SIGHUP?
If reload in progress because of SIGHUP do we ignore a new request?

swiatekm · 2022-09-21T13:55:16Z

@swiatekm-sumo I think that is just the initial part. There are more to clarify, e.g. what is the behavior/interaction with watchers?

Do we reload if no watcher triggered before? Assuming all the configs have proper watching implement and we know nothing change do we do load and check with previous config?

Do we reload even if new loaded config == old loaded config?

If watching enable do we ignore SIGHUP?

If watching enabled and reload in progress do we ignore the SIGHUP?

If reload in progress because of SIGHUP do we ignore a new request?

I'd just go with the naive solution for now - we reload on every reload event (be it from a watcher or a signal). If we get an event during a reload, we finish the reload and then reload again. And we don't care if the config actually changed or not.

We could try to be clever about it and implement some kind of throttling, but that seems like premature optimization to me. Ultimately all of these events should be controlled by the user, and it should be on them not to overload the collector with signals or configuration changes. At the very least I'd wait until someone complains before thinking about mitigations for hypothetical problems.

The other benefit is that the naive implementation (as seen in #6000) is both simple and easy to understand.

swiatekm · 2023-03-15T17:02:56Z

Summary of recent sig discussion re: config reloading:

We don't really have a good idea of how we want this to work
Because of the above, it should be disabled by default
We should probably let users who want to enable it, do so, but not clear if it should be a single flag like in Add a flag for enabling config watch #6300 or a more fine-grained mechanism
It would be good to get a sense for how much demand there is from users for hot reloading
We should make sure that the current reloading mechanism actually works correctly, @TylerHelmuth claimed it doesn't reload metrics, for one
We need a comprehensive proposal for how this should work before enabling it for all users again
There should be some mechanism (possibly separate) for components to do this, mostly for remote management. You can currently do it with SIGHUP, but that's kind of hacky.

@tigrannajaryan I was told you had some thoughts on this as well. @dmitryax you mentioned you wanted to make hot-reloading configurable per config source - how would that work in practice?

TylerHelmuth · 2023-03-15T17:15:22Z

I was referring to this issue, but it is fixed now.

MaciejGrynda · 2023-11-03T14:33:49Z

Hello,
Last update for this was over a year ago, did something change? SIGHUP is still supported, but it would be great if we had OS independent functionality.
Something like exposing REST API to force reload or definig a file to watch for modification timestamp change

swiatekm · 2023-11-03T14:39:25Z

@MaciejGrynda I think the consensus was that automatically reloading whenever config files changed wasn't something we wanted to do, but mechanisms where the user explicitly requests the reload (like SIGHUP) were fine. I suggest you open a new issue if you want one of those. I know some other CNCF projects take the REST API approach, so that would be an argument in favor.

vikrambe · 2023-12-17T17:04:41Z

Hello, Do we have a plan to support dynamic reload capability soon ? I see a lot of interest and dangling threads on this topic. Here are few question i have

Do we want to make confmap/providers pluggable ?
I do see a watch event in the providers, but none of the provider notify
Also components need to listen to the event and reload configs without collector restarting.

@bogdandrutu @tigrannajaryan Do suggest ?

portertech mentioned this issue Aug 30, 2022

Support SIGHUP configuration reloading #6000

Merged

bogdandrutu mentioned this issue Oct 4, 2022

Configuration reloading needs rethinking from error handling perspective #6226

Open

swiatekm mentioned this issue Oct 13, 2022

Add a flag for enabling config watch #6300

Closed

yehaotian mentioned this issue Aug 25, 2023

Opentelemetry collector memory leak/non-optimized GC open-telemetry/opentelemetry-collector-contrib#26087

Closed

cforce mentioned this issue Oct 10, 2023

Capability to update otel configuration dynamically at runtime #4205

Closed

MaciejGrynda mentioned this issue Nov 3, 2023

Provide an os independent way of forcing configuration reload without process restart #8796

Open

jaronoff97 mentioned this issue Jan 12, 2024

Target Allocator - ServiceMonitor scheme open-telemetry/opentelemetry-operator#1669

Open

chenlujjj mentioned this issue May 30, 2024

Support trigger config reloading via HTTP request besides SIGHUP #10264

Open

codeboten mentioned this issue Sep 25, 2024

Support hot deployment #11269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration reloading logic and customizability #5966

Configuration reloading logic and customizability #5966

swiatekm commented Aug 25, 2022 •

edited

Loading

bogdandrutu commented Aug 29, 2022

swiatekm commented Aug 30, 2022

swiatekm commented Aug 30, 2022

portertech commented Aug 30, 2022

portertech commented Aug 30, 2022

bogdandrutu commented Aug 30, 2022

bogdandrutu commented Sep 9, 2022

codeboten commented Sep 9, 2022

swiatekm commented Sep 19, 2022

bogdandrutu commented Sep 19, 2022

swiatekm commented Sep 20, 2022

bogdandrutu commented Sep 21, 2022

swiatekm commented Sep 21, 2022

swiatekm commented Mar 15, 2023 •

edited

Loading

TylerHelmuth commented Mar 15, 2023

MaciejGrynda commented Nov 3, 2023

swiatekm commented Nov 3, 2023

vikrambe commented Dec 17, 2023

Configuration reloading logic and customizability #5966

Configuration reloading logic and customizability #5966

Comments

swiatekm commented Aug 25, 2022 • edited Loading

bogdandrutu commented Aug 29, 2022

swiatekm commented Aug 30, 2022

swiatekm commented Aug 30, 2022

portertech commented Aug 30, 2022

portertech commented Aug 30, 2022

bogdandrutu commented Aug 30, 2022

bogdandrutu commented Sep 9, 2022

codeboten commented Sep 9, 2022

swiatekm commented Sep 19, 2022

bogdandrutu commented Sep 19, 2022

swiatekm commented Sep 20, 2022

bogdandrutu commented Sep 21, 2022

swiatekm commented Sep 21, 2022

swiatekm commented Mar 15, 2023 • edited Loading

TylerHelmuth commented Mar 15, 2023

MaciejGrynda commented Nov 3, 2023

swiatekm commented Nov 3, 2023

vikrambe commented Dec 17, 2023

swiatekm commented Aug 25, 2022 •

edited

Loading

swiatekm commented Mar 15, 2023 •

edited

Loading