-
Notifications
You must be signed in to change notification settings - Fork 294
RFC: Tribe Clusters and Worker Pattern #1584
Comments
@candysmurf this RFC would satisfy the need of #773 (HA) and #1558 (No Duplicate Polling). |
I feel I have something to add to this. I think the idea of distributing a task between a tribe is a great one in principle. I would really want to see ideas on how this would be implemented since I have some particular use cases in mind. The primary use case I had in mind was the service discovery and collection of metrics from container bound applications. For example if you had a pod in kubernetes running a group of containers that all host a In the above use case sharing a task is useful to accomplishing this. However, this feature seems incomplete without some associated form of service discovery. Snap needs a way to schedule and un-schedule shared tasks based on contextual data parsed using some form of service discovery similar to how Prometheus would collect from a Kubernetes cluster. This feature doesn't necessarily have to be integrated directly into snap. This could be done using an external scheduling daemon that exists outside of snap and interacts with it using the Snap Rest API. Or it can be a directly instrumented as a new type of plugin designed for shared tasks that can pass configuration forward to a set of collectors. Let me know what you guys think of this idea, it's something I feel would be really useful in a container based deployment. |
This feature doesn't necessarily have to be integrated directly into snap.
This could be done using an external scheduling daemon that exists outside
of snap and interacts with it using the Snap Rest API.
I tend to agree. I would like to see tribe be something that integrates
with snap. This would foster more options for the management layer the
obvious being one that backs into something like etcd instead of using
gossip like we do today.
…On Tue, Apr 4, 2017 at 2:48 PM Emily Gu ***@***.***> wrote:
I have to agree that @jtlisi <https://github.com/jtlisi> has a good point
that something may be achieved outside Snap. @dishmael
<https://github.com/dishmael>, would you please add more your thoughts
into how this will work with containers replicas?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1584 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA0q-OUsYoj5YpmhlvbuIu6mqYCu4Pviks5rsrq5gaJpZM4MyFc5>
.
|
So it seems that before we will be able to implement this RFC we need to separate tribe from main Snap repo, is that right @jcooklin ? @jtlisi If you'd like to monitor applications in Kubernetes you could also think about creating Snap Third Party Resource which will associate application (and its metric endpoint) with Snap (task manifest). Then you would need a watcher on Kubernetes API which will tell you when new application pod is started and check whether corresponding Snap TPR is running. Having this information all you need would be some automation to load plugins and tasks. We will be implementing such solution in the future. Work will be done under: https://github.com/intelsdi-x/snap-integration-kubernetes |
@andrzej-k, do you have statistics of how many of our customers are using tribe? |
So it seems that before we will be able to implement this RFC we need to
separate tribe from main Snap repo, is that right @jcooklin
<https://github.com/jcooklin> ?
Correct.
…On Wed, Apr 5, 2017 at 4:18 AM Andrzej Kuriata ***@***.***> wrote:
So it seems that before we will be able to implement this RFC we need to
separate tribe from main Snap repo, is that right @jcooklin
<https://github.com/jcooklin> ?
@jtlisi <https://github.com/jtlisi> If you'd like to monitor applications
in Kubernetes you could also think about creating Snap Third Party Resource
which will associate application (and its metric endpoint) with Snap (task
manifest). Then you would need a watcher on Kubernetes API which will tell
you when new application pod is started and check whether corresponding
Snap TPR is running. Having this information all you need would be some
automation to load plugins and tasks. We will be implementing such solution
in the future. Work will be done under:
https://github.com/intelsdi-x/snap-integration-kubernetes
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1584 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA0q-MTZCl1BgR6IAgqQH1hbz_RR3md1ks5rs3hsgaJpZM4MyFc5>
.
|
Summary
Often, we have a need to gather metrics from a remote system from a centralized collector or, ideally, cluster (tribe) of snap collectors. The overarching goal is to define a single task to collect one or more metrics from a remote node and submit that task to the tribe for collection by assigning the task to a worker.
Proposal
At a configurable rate, the collectors would vote on which collector would be a master and which collectors would be used to gather the metric(s) defined in a task shared amongst members in a tribe. This can be achieved using, for example, Raft - https://raft.github.io. Busy collectors would be naturally slower to respond and so faster, under/less utilized collectors would be selected for gathering those metrics. Tribe HA (this RFC) is configurable as a grouping option allowing users to define which cluster members will operate in an HA model since not all snap telemetry tribes need to be HA.
Motivation/Use Cases
The link above (RAFT) has a decent description of how cluster consensus might work in the Tribe architecture. The following motivation and use cases are targeted.
Benefits
Utilizing a cluster that has a Master/Workers architecture ensures high availability without duplicate polling. A task can be defined once, submitted to the tribe, executed only once, and guaranteed to collect from one of the workers.
Drawbacks
This may add overhead to the Tribes, certainly increasing the amount of cross chatter between snap telemetry instances.
Definitions
The following definitions are used in this RFC:
Issues Addressed
The following issues would be satisfied by implementing this RFC:
The text was updated successfully, but these errors were encountered: