Implement leader election for the target allocator #1061

jaronoff97 · 2022-08-25T21:26:39Z

Right now, the target allocator's allocation strategy (least connection) means that you can only run a single TA pod at a time. This proves difficult if a consumer wants a high availability option for their target allocation. In order to make this possible, we could use the built in go leader election package for the target allocator.

The rough process that a collector does looks like this right now:

collector starts up
prom receiver loads configuration
for each job
http_sd_config queries target allocator
list of targets and metadata is returned from TA
prom receiver runs relabel_configs on targets
prom receiver scrapes targets remaining
prom receiver applies metric_relabel_configs
collector converts prom to otel
moves config to processor stage ...

If the target allocator is down at step 4, the job will fail (and most likely the entire the scrape config.) Adding in support for HA would improve the reliability of the statefulset collector.

If this is something the community would like, I would be happy to implement and test it.

Aneurysm9 · 2022-08-26T04:00:47Z

Interesting suggestion. Improving the resiliency of the target allocation layer would definitely be a plus. Would the followers use the existing API to obtain state from the leader, or would you expect an active-passive setup with failover?

secustor · 2022-08-26T07:03:32Z

Yes, some kind of HA is definitely needed for TA.

Regarding leader election and state sharing I throw memberlist in to the ring. Grafana uses it in all its distributed products ( Mimir, Loki, Tempo, ... ), so it seems stable.

That way we can simply point the collector to TA using a service, as we do now and don't need to think about manual failovers.

jaronoff97 · 2022-08-26T17:07:42Z

@Aneurysm9 I was expecting an active-passive setup with failover so as to not complicate any of the existing logic. @secustor using memberlist, would we instead do state sharing, or would we just use that to determine who the active is?

jeromeinsf · 2022-08-26T18:24:55Z

@secustor memberlist might need to be adapted to get to a propagation delay adapted for an HA expectation.

pavolloffay added the area:target-allocator Issues for target-allocator label Aug 30, 2022

jaronoff97 mentioned this issue Sep 12, 2022

Added consistent hashing strategy #1087

Merged

pavolloffay closed this as completed in #1087 Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement leader election for the target allocator #1061

Implement leader election for the target allocator #1061

jaronoff97 commented Aug 25, 2022 •

edited

Loading

Aneurysm9 commented Aug 26, 2022

secustor commented Aug 26, 2022 •

edited

Loading

jaronoff97 commented Aug 26, 2022

jeromeinsf commented Aug 26, 2022

Implement leader election for the target allocator #1061

Implement leader election for the target allocator #1061

Comments

jaronoff97 commented Aug 25, 2022 • edited Loading

Aneurysm9 commented Aug 26, 2022

secustor commented Aug 26, 2022 • edited Loading

jaronoff97 commented Aug 26, 2022

jeromeinsf commented Aug 26, 2022

jaronoff97 commented Aug 25, 2022 •

edited

Loading

secustor commented Aug 26, 2022 •

edited

Loading