-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Cross Cluster Replication based on segment replication #3020
Comments
Continuing feedback from OpenSearch-2482
Interesting thought @nknize . Docrep seems to keep the implementation simple while segrep will likely require maintaining two flavors of segment (one meant for leader's own use while other meant for follower). There could be more flavors of segments to be maintained on the leader if there are more followers and corresponding book-keeping. Essentially it seems like building filtering support with segrep could add tradeoff on storage and complexity.
Leveraging cloud storage to durably replicate at scale is definitely a great option to evaluate. Infact I don't see this as all OR nothing. The underlying implementation can leverage cloud storage instead of directly replicating segments from nodes. The reason we would still need a CCR layer is mainly to have a hot standby cluster ready to take over with guaranteed assurance when disaster strikes and simplify CCR management via APIs. Underneath cloud storage can be an intermediary to avoid direct dependency. Further CCR also takes care of replicating metadata, aliases and provides APIs that expose stats, facilitate index level replication etc It would be interesting to investigate continuously replicated "cold-replica" where in the cloud storage has everything ready in the geography of follower upto the point of time disaster stuck. This could help solve cost concerns at the cost of increased time to recovery. |
|
Is your feature request related to a problem?
Currently CCR implements logical document replication where in operations from translog are replayed onto follower cluster. However with segment replication coming into OpenSearch core, basing CCR on it has potential to bring in huge benefits such as speeding up replication, reduced memory/CPU etc.
What solution would you like?
Implement CCR based on segment replication.
What alternatives have you considered?
Haven't considered really - but there is an RFC for pluggable translog. It would be good to investigate if there are any strong reasons for having logical replication (instead of physical segment replication) and if so consider if basing CCR on pluggable translog has any merits to the idea.
Do you have any additional context?
Need to ensure the RFC considers following
There is also ton of useful discussion on this topic at 2482
cc: @nknize @mikemccand
The text was updated successfully, but these errors were encountered: