receive: multiple failure cases in non-trivial receiver topologies #4359

bill3tt · 2021-06-18T13:09:45Z

Thanos, Prometheus and Golang version used:

Thanos: HEAD
Prometheus: v2.26.0
Golang: go version go1.16.4 linux/amd64

Object Storage Provider: N/A

What happened:

While working on #4326 - we discovered 4 failure cases when deploying non-trivial post-split proposal receive topologies. This results in the Thanos deployment not behaving as expected.

Issues

1: Request Replica > Ingestor Replication Factor

Following the post-split proposal, we are able to configure recveive instances to operate in router, ingester or routing and ingesting modes.

In the above example, if router is configured with a replication factor of 3, we would expect the data to be replicated to all ingestor instances.

Instead, we observe that router fails to replicate the data with quorum not reached errors like the following:

Routing logs (formatted for clarity)


09:44:41 receive-d1: level=error name=receive-d1 ts=2021-06-18T08:44:41.943443895Z caller=handler.go:352 component=receive component=receive-handler 
      err="3 errors: 
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i1:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i1:9091: conflict; 
           
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i2:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i2:9091: conflict; 
           
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i3:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i3:9091: conflict" msg="internal server error"

This happens because when router constructs downstream requests, the replica number n is supplied in the request.

When single ingestor instances are spun-up, their default receive.replication-factor is set to 1. When they receive the request from router, they return errBadReplica responses.

The result is that the first downstream request succeeds, but all subsequent ones fail.

2: SingleNodeHashring GetN where n > 0

If a single ingestor instance receives a request, and the replica number n is > 0, the request will fail with insufficientNodesError.

The ingestor correctly believes there is only one node in its hashring (itself) and rejects all requests that indicate they are for larger hashrings.

3: SimpleHashring GetN where n > len(hashring)

Similarly to the above, if a router or router & ingestor are configured with a hashring with > 1 instance, and it receives a request with n is greater than the number of instances, it will return an insufficientNodesError.

4: Replicated requests can't be replicated again.

Let's consider the following architecture, which is valid under the split proposal.

If router1 and router2 both have receive.replication-factor set to 2 we would expect the data to be replicated three times across all ingestor instances.

Instead, what we find is that the data is only replicated twice.

This happens because when router2 receives the request from router1 the storepb.WriteRequest replica field is > 0, which indicates that this request has already been replicated (which is true).

Since this request has already been replicated, router2 will forward this request to a maximum of one downstream hashring instance (src).

This mechanism prevents infinite loops in fully-connected Router & Ingestor topologies, but prevents us from forming functional trees of depth n.

Integration Tests

Three additional e2e integration tests that exercise the above failure cases have been added: #4362

The text was updated successfully, but these errors were encountered:

bill3tt · 2021-06-23T17:02:41Z

We think we have a fix for the above behaviour - but the ergonomics and explainability to end-users will be poor. The plan is to propose this PR and discuss at the next Thanos contributor hours.

This was referenced Jun 18, 2021

katacoda: Add a tutorial to demonstrate Receiver modes (Ingester, Router) #4326

Open

Add integration tests that excercise Thanos Receive edge cases #4362

Merged

bill3tt changed the title ~~receive: 'quorum not reached' when --receive.replication-factor > 1~~ receive: multiple failure cases in non-trivial receiver topologies Jun 23, 2021

This was referenced Jun 23, 2021

Only propagate write request replica for RouterIngestor receiver mode #4368

Merged

Add convenience functions for Thanos Receive to demonstrate separate Ingester and Router functionality thanos-io/kube-thanos#229

Closed

onprem added the priority: P0 label Jul 6, 2021

GiedriusS closed this as completed in #4368 Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: multiple failure cases in non-trivial receiver topologies #4359

receive: multiple failure cases in non-trivial receiver topologies #4359

bill3tt commented Jun 18, 2021 •

edited

Loading

bill3tt commented Jun 23, 2021

receive: multiple failure cases in non-trivial receiver topologies #4359

receive: multiple failure cases in non-trivial receiver topologies #4359

Comments

bill3tt commented Jun 18, 2021 • edited Loading

Issues

1: Request Replica > Ingestor Replication Factor

2: SingleNodeHashring GetN where n > 0

3: SimpleHashring GetN where n > len(hashring)

4: Replicated requests can't be replicated again.

Integration Tests

bill3tt commented Jun 23, 2021

bill3tt commented Jun 18, 2021 •

edited

Loading