429s due to replica operations #8140

bcastilho90 · 2024-10-21T19:25:27Z

I have a cluster with 3 master nodes and several data nodes. From time to time, we experience a brief period where certain nodes will return 429s due to replica operations:

es_rejected_execution_exception Reason: "rejected execution of primary operation [coordinating_and_primary_bytes=0, replica_bytes=2210080256, all_bytes=2210080256, primary_operation_bytes=30430, max_coordinating_and_primary_bytes=2147483648

I've scaled the cluster before, but we continue to see this intermittently since it happens on random nodes. Is there a way to have the readiness probe fail on cases like this so that requests stop being sent to a node that is overloaded?

I'm not sure how exactly to find out what is causing the back log of replica ops.

botelastic bot added the triage label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

429s due to replica operations #8140

429s due to replica operations #8140

bcastilho90 commented Oct 21, 2024

429s due to replica operations #8140

429s due to replica operations #8140

Comments

bcastilho90 commented Oct 21, 2024