Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector: reload sampling strategies on file content change #1058

Closed
gouthamve opened this issue Sep 14, 2018 · 22 comments
Closed

collector: reload sampling strategies on file content change #1058

gouthamve opened this issue Sep 14, 2018 · 22 comments
Labels
enhancement good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@gouthamve
Copy link
Contributor

Requirement - what kind of business use case are you trying to solve?

It should be possible to change the sampling strategies without restarting the collector.

Problem - what in Jaeger blocks you from solving the requirement?

I modified the sampling strategies file, only to find that it had no impact.

Proposal - what do you suggest to solve the problem or improve the existing situation?

The collector should ideally watch the file for changes but it's also okay to just periodically re-read the file and figure out if anything changed.

The meta issue for this is here: #355 but I think for now we should reload the sampling-strategies on change.

@black-adder
Copy link
Contributor

black-adder commented Sep 14, 2018

Ideally, we can use https://github.com/fsnotify/fsnotify and have a separate go routine that watches the file and then updates the strategies here on any updates.

EDIT: Long term, we need a solution for dynamic configs for everything but for now, this will suffice

@black-adder black-adder added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners labels Sep 14, 2018
@gouthamve
Copy link
Contributor Author

Great! Was just waiting to see if there was interest, I'll work on this this weekend.

So, I come from prometheus, and we make sure that a SIGHUP or a POST request to /-/reload is required before we update the config. This is because we don't want to reload a partially written file which could happen if we are using fsnotify.

Now the /-/reload endpoint is really just there to get around k8s limitation (I think), where SIGNAL handling is bad. Your thoughts on this?

@black-adder
Copy link
Contributor

I like the idea of having a reload endpoint but it might broaden the scope of this issue and might require an extensive surgery. Can you point me to the code where prometheus receives the reload signal and then propagates the signal to all relevant processes?

For now, I'd prefer just using fsnotify (if the need is very pressing) but I'm willing to look into adding an endpoint and updating all configs.

@gouthamve
Copy link
Contributor Author

@jpkrohling
Copy link
Contributor

EDIT: Long term, we need a solution for dynamic configs for everything but for now, this will suffice

For that, we might want to "reload" the process, like NGINX does, perhaps using https://github.com/cloudflare/tableflip

@shunge
Copy link

shunge commented Oct 25, 2018

Hi, I am a new comer, I can take a look at this if others are not working on this yet. Have we decided on which approach we want? @jpkrohling the library you recommended seems relatively new, should we worry about stability issues?

Is this a pressing issue? If not, a reload endpoint for all configs sounds like a more generic solution?

@isaachier
Copy link
Contributor

isaachier commented Oct 25, 2018

BTW using SIGHUP for this is standard practice in Linux Unix. See here: https://stackoverflow.com/a/28327659/1930331.

@isaachier
Copy link
Contributor

Ideally, we can use https://github.com/fsnotify/fsnotify and have a separate go routine that watches the file and then updates the strategies here on any updates.

EDIT: Long term, we need a solution for dynamic configs for everything but for now, this will suffice

I think using SIGHUP in our services is actually a more robust/portable approach than fsnotify and/or dynamic config.

@shunge
Copy link

shunge commented Oct 26, 2018

Since jaeger is using viper as a configuration solution, just brainstorming - one solution could be just using a goroutine to use viper to read config periodically (e.g every 5 seconds)? Are we only focusing on collector now? Or we should provide dynamic configuration for all modules?

@jpkrohling
Copy link
Contributor

Viper does provide a hook to deliver notifications when the config file has changed:

https://github.com/spf13/viper#watching-and-re-reading-config-files

@shunge
Copy link

shunge commented Oct 29, 2018

Now here are some possible solutions:

  1. SIGNUP or a POST method similar to the solution in prometheus.
  2. fsnotify https://github.com/fsnotify/fsnotify
  3. tableflip https://github.com/cloudflare/tableflip
  4. Viper's native re-reading config support (with help of fsnotify).

Am I missing anything? Which one do we prefer? Personally, I would try it out with what viper has offered first.

@yurishkuro
Copy link
Member

yurishkuro commented Oct 29, 2018

The main difficulty is not with reloading of the config (I don't have a strong opinion on any of the above), but with how to structure the code to reflect those changes, since many classes are only initialized with fixed values. Two possible options are:

  • have the classes accept a "dynamic config" object at construction and read constant values from that interface rather than from internal fields
  • have some classes (as needed) support "destroy and reload"

Option 1 is a lot simpler to implement, imo. The "dynamic config" can be implemented with a completely static data initially, and then extended to support file reloading or even external configuration (e.g. from etcd).

@shunge
Copy link

shunge commented Oct 31, 2018

Hi @yurishkuro , I am not quite sure I understanding your point correctly - so let me try to rephrase what you put:

Use collector as an example - it will read HealthCheck and Logger configs from a config file through viper, and go ahead to call func NewHealthCheck and func NewLogger to init and return a new HealthCheck and Logger. What you are suggesting is to let NewHealthCheck and NewLogger accept a "dynamic config"? thanks

@jpkrohling
Copy link
Contributor

Those might be the easy parts, as they are almost stateless. What if the change is about the property collector.port? There should be an appropriate process to stop serving new requests while letting all in-flight requests to finish. At the same time, the new port should be ready.

@yurishkuro
Copy link
Member

@shunge so assume you do that and create a new instance of the Logger. What then? All other components inside the collector like http servers, storage, etc. have been already initialized with the old logger. You need to have a mechanism to tear them down and re-initialize, which is not something we designed for originally.

I think dynamic configuration has two types of data:

  1. Some configuration variables can be easily changed "on the fly", as I described in option 1, without requiring tear-down of any internal components. Examples of these parameters include things like timeouts, number of db retries, some while/blacklist of service names or tags (we one that internally), etc. The common trait of these are that they are stored in memory as pure data and therefore can be relatively easily replaced with another set of data. Things like changing a log level threshold should also be in this category (tbd if Zap actually allows that).
  2. Other configuration variables require tear-down and re-initialization, for example a port number, a database address, number of worker goroutines, even a queue length. While it is not impossible to implement, this requires quite a lot of refactoring and adding levels of abstractions so that the clients of a component that is being reloaded would not be aware.

I am completely onboard with supporting the 1st type of dynamic params, and we have a ticket for that (#355). Reloading of the sampling strategies can be implemented separately with a less generic mechanism, but it would be nice to take #355 into account when designing what that mechanism is.

@shunge
Copy link

shunge commented Nov 2, 2018

Sorry @yurishkuro , I think I oversimplified the situation - you are right, there are too many layers that need to be teared down and reinitialized. So the tear down solution might not be the most optimal. I think the question eventually boiled down - which component can be initialized then change config on the fly (e.g. is logger one of them), and for those can't change on the fly (like @yurishkuro @jpkrohling mentions port numbers etc.). So I assume if I want to work on this, I should look into which configs for collector can be hot-swapped? Thanks.

@yurishkuro
Copy link
Member

As I mentioned, I would suggest just starting with reloadable sampling strategies, behind an interface that can be implemented with file watching or SIGHUP now and via some other mechanism later.

@annanay25
Copy link
Member

@black-adder @yurishkuro; I was looking into the GRPC implementation on the collector, the sampling manager is registered as a service with the GRPC server -

api_v2.RegisterSamplingManagerServer(server, sampling.NewGRPCHandler(samplingStrategy))

func RegisterSamplingManagerServer(s *grpc.Server, srv SamplingManagerServer) {
s.RegisterService(&_SamplingManager_serviceDesc, srv)
}

And de-registering/re-registering is not implemented by GRPC because it would impact service availability from the client, the same as a server restart.

@yurishkuro
Copy link
Member

@annanay25 I don't think anyone was suggesting to touch the gRPC handlers. It should return data provided by an internal interface, and the implementation of that interface should have an atomic swap once it gets the new config.

@surendarchandra
Copy link

This seems like a extremely useful feature to inject on-the-fly sampling of specific span operators. How else do y'all achieve it? Thanks

@yurishkuro
Copy link
Member

NB: hot reload of UI config has been implemented in #1688. We need to generalize that to also support reloading of sampling strategies.

@yurishkuro
Copy link
Member

This has been implemented in #2188 with timer-based reloading of the file. I think we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants