You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on #4326 - we discovered 4 failure cases when deploying non-trivial post-split proposalreceive topologies. This results in the Thanos deployment not behaving as expected.
Issues
1: Request Replica > Ingestor Replication Factor
Following the post-split proposal, we are able to configure recveive instances to operate in router, ingester or routing and ingesting modes.
In the above example, if router is configured with a replication factor of 3, we would expect the data to be replicated to all ingestor instances.
Instead, we observe that router fails to replicate the data with quorum not reached errors like the following:
Routing logs (formatted for clarity)
09:44:41 receive-d1: level=error name=receive-d1 ts=2021-06-18T08:44:41.943443895Z caller=handler.go:352 component=receive component=receive-handler
err="3 errors:
replicate write request for endpoint receive_distributor_ingestor_mode-receive-i1:9091: quorum not reached:
2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = InvalidArgument
desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = AlreadyExists
desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i1:9091: conflict;
replicate write request for endpoint receive_distributor_ingestor_mode-receive-i2:9091: quorum not reached:
2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = InvalidArgument
desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = AlreadyExists
desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i2:9091: conflict;
replicate write request for endpoint receive_distributor_ingestor_mode-receive-i3:9091: quorum not reached:
2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = InvalidArgument
desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = AlreadyExists
desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i3:9091: conflict" msg="internal server error"
This happens because when router constructs downstream requests, the replica number n is supplied in the request.
When single ingestor instances are spun-up, their defaultreceive.replication-factor is set to 1. When they receive the request from router, they return errBadReplica responses.
The result is that the first downstream request succeeds, but all subsequent ones fail.
2: SingleNodeHashring GetN where n > 0
If a single ingestor instance receives a request, and the replica number n is > 0, the request will fail with insufficientNodesError.
The ingestor correctly believes there is only one node in its hashring (itself) and rejects all requests that indicate they are for larger hashrings.
3: SimpleHashring GetN where n > len(hashring)
Similarly to the above, if a router or router & ingestor are configured with a hashring with > 1 instance, and it receives a request with n is greater than the number of instances, it will return an insufficientNodesError.
4: Replicated requests can't be replicated again.
Let's consider the following architecture, which is valid under the split proposal.
If router1 and router2 both have receive.replication-factor set to 2 we would expect the data to be replicated three times across all ingestor instances.
Instead, what we find is that the data is only replicated twice.
This happens because when router2 receives the request from router1 the storepb.WriteRequest replica field is > 0, which indicates that this request has already been replicated (which is true).
Since this request has already been replicated, router2 will forward this request to a maximum of one downstream hashring instance (src).
This mechanism prevents infinite loops in fully-connected Router & Ingestor topologies, but prevents us from forming functional trees of depth n.
Integration Tests
Three additional e2e integration tests that exercise the above failure cases have been added: #4362
The text was updated successfully, but these errors were encountered:
bill3tt
changed the title
receive: 'quorum not reached' when --receive.replication-factor > 1
receive: multiple failure cases in non-trivial receiver topologies
Jun 23, 2021
We think we have a fix for the above behaviour - but the ergonomics and explainability to end-users will be poor. The plan is to propose this PR and discuss at the next Thanos contributor hours.
Thanos, Prometheus and Golang version used:
HEAD
v2.26.0
go version go1.16.4 linux/amd64
Object Storage Provider:
N/A
What happened:
While working on #4326 - we discovered 4 failure cases when deploying non-trivial post-split proposal
receive
topologies. This results in the Thanos deployment not behaving as expected.Issues
1: Request Replica > Ingestor Replication Factor
Following the post-split proposal, we are able to configure
recveive
instances to operate inrouter
,ingester
orrouting and ingesting
modes.In the above example, if
router
is configured with a replication factor of3
, we would expect the data to be replicated to allingestor
instances.Instead, we observe that
router
fails to replicate the data withquorum not reached
errors like the following:Routing logs (formatted for clarity)
This happens because when
router
constructs downstream requests, the replica numbern
is supplied in the request.When single
ingestor
instances are spun-up, their defaultreceive.replication-factor
is set to1
. When they receive the request fromrouter
, they return errBadReplica responses.The result is that the first downstream request succeeds, but all subsequent ones fail.
2: SingleNodeHashring GetN where n > 0
If a single
ingestor
instance receives a request, and the replica numbern
is > 0, the request will fail with insufficientNodesError.The
ingestor
correctly believes there is only one node in its hashring (itself) and rejects all requests that indicate they are for larger hashrings.3: SimpleHashring GetN where n > len(hashring)
Similarly to the above, if a
router
orrouter & ingestor
are configured with a hashring with > 1 instance, and it receives a request withn
is greater than the number of instances, it will return an insufficientNodesError.4: Replicated requests can't be replicated again.
Let's consider the following architecture, which is valid under the split proposal.
If
router1
androuter2
both havereceive.replication-factor
set to2
we would expect the data to be replicated three times across allingestor
instances.Instead, what we find is that the data is only replicated twice.
This happens because when
router2
receives the request fromrouter1
thestorepb.WriteRequest
replica field is > 0, which indicates that this request has already been replicated (which is true).Since this request has already been replicated,
router2
will forward this request to a maximum of one downstream hashring instance (src).This mechanism prevents infinite loops in fully-connected
Router & Ingestor
topologies, but prevents us from forming functional trees of depth n.Integration Tests
Three additional
e2e
integration tests that exercise the above failure cases have been added: #4362The text was updated successfully, but these errors were encountered: