-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collector: reload sampling strategies on file content change #1058
Comments
Ideally, we can use https://github.com/fsnotify/fsnotify and have a separate go routine that watches the file and then updates the strategies here on any updates. EDIT: Long term, we need a solution for dynamic configs for everything but for now, this will suffice |
Great! Was just waiting to see if there was interest, I'll work on this this weekend. So, I come from prometheus, and we make sure that a SIGHUP or a POST request to Now the |
I like the idea of having a For now, I'd prefer just using |
This block here does it: https://github.com/prometheus/prometheus/blob/457e4bb58e0fd3c10e16ecaabfeb04fb2c41bac5/cmd/prometheus/main.go#L478-L488 While SIGHUP is registered here: https://github.com/prometheus/prometheus/blob/457e4bb58e0fd3c10e16ecaabfeb04fb2c41bac5/cmd/prometheus/main.go#L469-L470 The webhandler is quite straight-forward. I'll take a stab soon. |
For that, we might want to "reload" the process, like NGINX does, perhaps using https://github.com/cloudflare/tableflip |
Hi, I am a new comer, I can take a look at this if others are not working on this yet. Have we decided on which approach we want? @jpkrohling the library you recommended seems relatively new, should we worry about stability issues? Is this a pressing issue? If not, a reload endpoint for all configs sounds like a more generic solution? |
BTW using SIGHUP for this is standard practice in |
I think using SIGHUP in our services is actually a more robust/portable approach than fsnotify and/or dynamic config. |
Since jaeger is using viper as a configuration solution, just brainstorming - one solution could be just using a goroutine to use viper to read config periodically (e.g every 5 seconds)? Are we only focusing on collector now? Or we should provide dynamic configuration for all modules? |
Viper does provide a hook to deliver notifications when the config file has changed: https://github.com/spf13/viper#watching-and-re-reading-config-files |
Now here are some possible solutions:
Am I missing anything? Which one do we prefer? Personally, I would try it out with what viper has offered first. |
The main difficulty is not with reloading of the config (I don't have a strong opinion on any of the above), but with how to structure the code to reflect those changes, since many classes are only initialized with fixed values. Two possible options are:
Option 1 is a lot simpler to implement, imo. The "dynamic config" can be implemented with a completely static data initially, and then extended to support file reloading or even external configuration (e.g. from etcd). |
Hi @yurishkuro , I am not quite sure I understanding your point correctly - so let me try to rephrase what you put: Use |
Those might be the easy parts, as they are almost stateless. What if the change is about the property |
@shunge so assume you do that and create a new instance of the Logger. What then? All other components inside the collector like http servers, storage, etc. have been already initialized with the old logger. You need to have a mechanism to tear them down and re-initialize, which is not something we designed for originally. I think dynamic configuration has two types of data:
I am completely onboard with supporting the 1st type of dynamic params, and we have a ticket for that (#355). Reloading of the sampling strategies can be implemented separately with a less generic mechanism, but it would be nice to take #355 into account when designing what that mechanism is. |
Sorry @yurishkuro , I think I oversimplified the situation - you are right, there are too many layers that need to be teared down and reinitialized. So the tear down solution might not be the most optimal. I think the question eventually boiled down - which component can be initialized then change config on the fly (e.g. is logger one of them), and for those can't change on the fly (like @yurishkuro @jpkrohling mentions port numbers etc.). So I assume if I want to work on this, I should look into which configs for collector can be hot-swapped? Thanks. |
As I mentioned, I would suggest just starting with reloadable sampling strategies, behind an interface that can be implemented with file watching or SIGHUP now and via some other mechanism later. |
@black-adder @yurishkuro; I was looking into the GRPC implementation on the collector, the sampling manager is registered as a service with the GRPC server -
jaeger/proto-gen/api_v2/sampling.pb.go Lines 496 to 498 in 7ae02d4
And de-registering/re-registering is not implemented by GRPC because it would impact service availability from the client, the same as a server restart. |
@annanay25 I don't think anyone was suggesting to touch the gRPC handlers. It should return data provided by an internal interface, and the implementation of that interface should have an atomic swap once it gets the new config. |
This seems like a extremely useful feature to inject on-the-fly sampling of specific span operators. How else do y'all achieve it? Thanks |
NB: hot reload of UI config has been implemented in #1688. We need to generalize that to also support reloading of sampling strategies. |
This has been implemented in #2188 with timer-based reloading of the file. I think we can close this issue. |
Requirement - what kind of business use case are you trying to solve?
It should be possible to change the sampling strategies without restarting the collector.
Problem - what in Jaeger blocks you from solving the requirement?
I modified the sampling strategies file, only to find that it had no impact.
Proposal - what do you suggest to solve the problem or improve the existing situation?
The collector should ideally watch the file for changes but it's also okay to just periodically re-read the file and figure out if anything changed.
The meta issue for this is here: #355 but I think for now we should reload the sampling-strategies on change.
The text was updated successfully, but these errors were encountered: