[DISCUSS - Segment Replication] SegRep consistency limitations #8700
Labels
discuss
Issues intended to help drive brainstorming and decision making
enhancement
Enhancement or improvement to existing feature or request
feedback needed
Issue or PR needs feedback
Storage
Issues and PRs relating to data and metadata storage
v2.10.0
Background:
Segment Replication (SegRep) currently has a known limitation in that it does not support strong read after write mechanisms available to Document Replication (DocRep) indices. These mechanisms are: a read request with get/multi-get by ID and writes with RefreshPolicy.WAIT_UNTIL followed by a search. Currently, the only way to achieve a strong read with Segment Replication is to use Preference.Prefer_Primary to route requests to primary shards.
The Problem:
The issue with these mechanisms with SegRep is they hold resources in memory. GET requires a “version map” to be maintained in the engine that maps doc ids to their translog location while writes with WAIT_UNTIL hold open listeners. With DocRep these resources can be cleared locally with a refresh when a limit is reached. With SegRep only primaries can issue a local refresh to clear these resources because replicas only refresh after receiving copied segments. This means we will continue to fill without bound until segment copy completes.
Issues where we explored supporting these existing mechanisms for context:
WAIT_UNTIL - #6045
GET/MGET - #8536
A streaming Index API can provide a resolution to this limitation by acknowledging a write request once a certain consistency level has been reached. However, until this exists and for get requests I’d like to list some ideas and start the discussion on how to deal with this. Pls do comment if there is any option I’ve missed. I think Option 1 provides the best shorter term solution until we have the streaming API for search.
1. Primary shard based reads
Internally route all get/mget requests to primary shards only if SegRep is enabled and the realtime param is true (default). More on this option provided in #8536 and clearly document that this could hurt performance in read heavy cases or if primary is overloaded. Require users to update to prefer _primary for any search that is currently following a WAIT_UNTIL write.
Pros:
Cons:
2. Do nothing
All requests requiring strong reads require client update to prefer primary shards with segment replication.
Pros:
Cons:
3. Constrained GET and WAIT_UNTIL requests.
In this approach we would update GET and WAIT_UNTIL by enforcing hard caps to safeguard against memory issues.
Get - For get/mget this means we would need to throttle writes until replicas are caught up so that the memory footprint mapping doc to translog location does not grow unbounded. This is similar to our SegRep backpressure mechanism today that enforces pressure when replica falls too far behind based on replication lag & checkpoint count. However, it would also include a primary computed memory threshold.
WAIT_UNTIL - We would put a hard cap on the amount of open wait_until writes per shard. In this case we would support the refresh policy but rather than solely triggering a local refresh to clear requests we would track the amount of open wait_until requests open to replicas and reject writes if this exceeds a limit.
Pros:
Cons:
The text was updated successfully, but these errors were encountered: