-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration for service discovery & kv config stores (dynamic config) #272
Comments
👍 |
Hello @pauldix, |
@pauldix @rvrignaud see PR about etcd here : #651 |
Hi @titilambert, your PR is really useful to update telegraf configuration dynamically, such as changing input and outputs configurations from time to time, but for service discovery in a system such as AWS, mesos or kubernetes where things scale dynamically, something like the service discovery features implemented in prometheus would be really great. @rvrignaud explanation is here, and the prometheus documentation shows the different possibilities supported. Having this feature would definitively make me move to influxdb, but keep using the prometheus instrumentation library. |
@chris-zen that's very interesting ! What do you think about? |
Yes, agree that it is specially important for polling. But telegraf is already supporting polling inputs such as the one for prometheus. Right now the prometheus input only allows static config, but it would be very useful to support service discovery too. My understanding is that telegraf is quite versatile and allows both pull and push models, but the pull model without service discovery is worthless in such dynamic environments. |
Just dropping this here for reference on what I think is a good service discovery model (from prometheus): https://prometheus.io/blog/2015/06/01/advanced-service-discovery/. Same as mentioned above but I think this blog post is a little more approachable than their documentation. I think that the "file-based" custom service discovery will be easy to implement. Doing DNS-SERV, Consul, etc. will take a bit more work, but certainly doable. I'm imagining some sort of plugin system for these, where notifications on config changes and additions could be sent down a channel, and whenever Telegraf detects one of these it would apply and reload the configuration. |
My preference would be to start with a simple file & directory service discovery. This would be an inotify goroutine that would basically send a service reload (SIGHUP) to the process when it detects a change in any config file, or any config file added or removed to a config directory. This could be extended using https://github.com/docker/libkv or something similar that would launch a goroutine that would overwrite the on-disk config file(s) when it detects a change (basically a very simple version of confd) This would solve some of the issues that I have (and that @johnrengelman and @balboah raised) with integrating with a kv-store. In essence, we wouldn't be dependent on a kv-store, and we wouldn't have any confusion over the currently-loaded config, because the config would always also be on-disk. |
curious what others think of this design, I'm biased but this is my view: pros:
cons:
|
I like it. That was one thing that used to be tricky with Redis. You could make commands to alter the running config, but then if you restarted your server without updating the on disk config then you're hosed. File write isn't a big deal. Not like they're going to be updating the config multiple times a second, minute, or even hour. |
@pauldix You might be updating your config multiple times per hour and up if you are in a highly dynamic environment, like an AWS Autoscaling Group or a Docker Swarm/Kubernetes/fleetd/LXD container thingie. |
Hi guys, any updates with this monitoring methodology? @sparrc can you please share the current state design? Thanks |
Hi to everybody , I'm new to this discussion and I would like to add my Point of view. Everybody knows how important is now add ability to our agents to get configuration and discover configuration change from a centralized configuration system on our systems. As I have been read in this thread ( and others #651) , there diferent ways to got remote configuration. https://github.com/docker/libkv ( for etcd or other KV store backends) Any way the most important thing ( IMHO ) is add the ability to manage easily changes on all our distributed agents. I think when there is not any available solution the easiest way should be the best. So I did yesterday a really simple proposal on #1496, that could be easily coded in a few lines of code. ( the same behaviour if you can switch to the https://github.com/spf13/viper library). Once added this simple feature , we'll can continue discussion on other more sophisticated way to get configurations and integration with know centralized systems. ( like etcd, and others). I vote for add first a simple centralized way and after an integrated solution. Both will cover the same functionality on different scenarios. what do you think about? |
@toni-moreno the most simple way to manage it is via files. Although the http getting might be simple for your scenario, I can imagine ways in which it can get complicated (just see the httpjson plugin for examples). Like I said, this feature needs to first be coded as a file watcher and then we can develop plugins around changing the on-disk file(s). |
There is one commonly used abstraction pattern available, the only thing what would be needed is hot config reloading: https://github.com/kelseyhightower/confd/ is a single binary which watches any (many) kind(s) of backend(s) and templates the configuration file upon detected changes. I'm about to implement something for rancher catalogue items. influxdata/influxdata-docker#9 is related. The pattern is rather simple to manage with sidekicks and shared volumes. One step further:
@sparrc I think this is almost a no brainer, as only the signalling to the telegraf process would need some extra thought, the rest is taken care of. |
the signaling would simply be the file changing on disk, there is no need for confd to directly signal to Telegraf as far as I understand it. |
Absolutely right. |
@sparrc Hi sparrc, any new updates on this? |
Hi guys, very interesting discussion, I'm totally agree with having telegraf 'separate' of etcd/viper/etc, however it needs somehow track any file changes performed for those apps, and being able to apply those changes 'on-the-fly'. Does anyone knows if this is going to be the way to go, and how is going to be implemented? |
@3fr61n, yes, the initial implementation will be a file/directory watcher that will be able to dynamically reload the configuration any time that the file(s) change. I'm not sure the "how" yet, maybe this: https://github.com/fsnotify/fsnotify |
@abraithwaite Can you take a look at the |
Unfortunately not. The value that prometheus provides with Kubernetes is that you configure metrics collection via the service (with kubernetes annotations) and not through the metrics collection agent. This enables users to configure everything they need without having to setup something outside the scope of their own services. I can provide examples if needed, just lemme know. |
@abraithwaite can you link me to the Kubernetes documentation for the method you are using? |
Haven't seen any official documentation, actually. Just pieced it together from code, examples and blog posts: https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml |
FWIW, I don't use prometheus with Kubernetes but the concept is extremely valuable and I'd still love to see it here. I looked at the telegraf code though and I'm certain you'd need to add service discovery as a first class configuration method. |
Just to clarify, the |
Right, I understand that. It still requires an explicit dependency between the service and telegraf, instead of an implicit one. When using annotations, there is no PR a user has to make to update the telegraf config in order to start getting metrics from their service collected. |
I can agree that "Prometheus kubernetes discovery using annotations is pure gold. I would love to have this in telegraf." We use this to have prometheus dynamically find new targets. Would love to move back to telegraf for collection of metrics and uptime if this was supported. |
Hi, |
@narayanprabhu I use Puppet to ease that kind of pain. It knows all the services that are “ensured” on each server, and that makes it easier to deploy a matching Telegraf config. Sent with GitHawk |
@voiprodrigo Yes puppet is a good option, unfortunately my organization does not have that solution. They mainly rely on SCCM for windows deployment and Ansible for the linux. This thread says that there is a UI option being built for chronograf to manage agent configs, is that option still being built. Wondering if that is coming up anytime soon? And there is something about etcd where we can have one config consumed by other telegraf agents - is this some option that would help out my use case. Is this something that works for windows as well? |
@danielnelson any update? |
Work is on hold right now (for the first item here), but I'm tempted to break this issue up into several issues:
|
in influxdb 2.0 alpha version, it has telegraf config generation ui. and telegraf was guided to take config from influxdb. but influxdb seems to have no |
Hello, so reloading (file) config without restart has not been implemented yet? It's a pitty. @blaggacao haven't you mentioned it "almost a no brainer"? I'd like to use telegraf with a sidecar creating the configs... EDIT: seems like there is a |
If there's some standard service discovery to connect to like Consul, it would be cool to have Telegraf connect to that and automatically start collecting data for services that Telegraf supports.
So when a new MySQL server comes on, Telegraf will automatically start collecting data from it.
Just an idea. Users could also get this by just having Telegraf part of their deploys when they create new servers.
The text was updated successfully, but these errors were encountered: