-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement leader election for the target allocator #1061
Comments
Interesting suggestion. Improving the resiliency of the target allocation layer would definitely be a plus. Would the followers use the existing API to obtain state from the leader, or would you expect an active-passive setup with failover? |
Yes, some kind of HA is definitely needed for TA. Regarding leader election and state sharing I throw memberlist in to the ring. Grafana uses it in all its distributed products ( Mimir, Loki, Tempo, ... ), so it seems stable. That way we can simply point the collector to TA using a service, as we do now and don't need to think about manual failovers. |
@Aneurysm9 I was expecting an active-passive setup with failover so as to not complicate any of the existing logic. @secustor using memberlist, would we instead do state sharing, or would we just use that to determine who the active is? |
@secustor memberlist might need to be adapted to get to a propagation delay adapted for an HA expectation. |
Right now, the target allocator's allocation strategy (least connection) means that you can only run a single TA pod at a time. This proves difficult if a consumer wants a high availability option for their target allocation. In order to make this possible, we could use the built in go leader election package for the target allocator.
The rough process that a collector does looks like this right now:
If the target allocator is down at step 4, the job will fail (and most likely the entire the scrape config.) Adding in support for HA would improve the reliability of the statefulset collector.
If this is something the community would like, I would be happy to implement and test it.
The text was updated successfully, but these errors were encountered: