Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Cross Cluster Replication as a core component #7222

Open
ankitkala opened this issue Apr 18, 2023 · 3 comments
Open

[POC] Cross Cluster Replication as a core component #7222

ankitkala opened this issue Apr 18, 2023 · 3 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep

Comments

@ankitkala
Copy link
Member

ankitkala commented Apr 18, 2023

This issue documents the POC for Cross Cluster Replication in core.

Key Highlights:

  • This implementation works only for SegRep & remote store backed indices.
  • Contrary to existing CCR design, leader is connected with the follower via seed ndoes. So the control flow is essentially push where follower operations are event driven (instead of polling). Data flow is pull as follower will pull the data(and metadata) from remote store.
  • This implementation relies on leader and follower indices pointing to same remote store repository. We need to further explore whether we should move to separate remote stores with data replicating between the remote buckets instead.

What's covered:
As a user:

  • I'm able to start replication between 2 clusters on SegRep & remote store backed indices.
  • I'm able to continuously replicate all the documents from leader to follower index.

What's not covered:

  • Ability to add multiple follower.
  • Persisting replication states(index and follower level) and metadata.
  • Failover between primary & replica.
  • Failure handing (e.g. node/cluster outages).
  • Integration with security.

Data flow from leader to follower:

Indexing flow

Start Replication flow:

  • User invokes the API to start full cluster replication with follower aliases.
  • Rest handler will start a persistent task SupervisorReplicationTask to monitor the overall replication process.
  • Upon starting, SupervisorReplicationTask will bootstrap and
    • Create one FollowerReplicationTask per follower. This persistent task will monitor the states of follower cluster and react to events (like Follower cluster not responding, replication state: started/stopped/paused/resumed, etc)(not part of POC)
    • Create one IndexReplicationTask per leader index.
      • Upon starting, IndexReplicationTask will create the follower index with same settings & metadata as leader index.
      • This persistent task will track replication events for the index to be replicated(not part of POC). Events like index open/close/delete, mapping and setting updates, handling primary relocation/failover etc.

New index Creation:

  • Follower recieves a request for creating the index. For CCR Follower indices, we
    • override and load the NRTReplicationEngine for follower's primary as well as replica.
    • override the remote store path to the leader index's remote store.
    • disable tranlog manager(NOOP) (not part of POC).
    • with follower's primary and replicas pointing to leader's remote, index recovery happens via remote store.

Data sync flow:

  • RemoteStoreRefreshListener.afterRefresh uploads the segments & segmentsInfo snapshot to the remote store.
  • After upload is complete, it invokes RemoteStoreSegmentUploadNotificationPublisher.notifySegmentUpload
  • RemoteStoreSegmentUploadNotificationPublisher.notifySegmentUpload will:
    • SegmentReplicationCheckpointPublisher.publish for notifying replica.
    • NotifyCCRFollowersAction.publish -> leaderService.syncFollowerSegments invokes SyncFromLeaderAction on follower nodes containing primary and replica shards.
  • SyncFromLeaderAction will sync the segments from remote store and update the IndexShard's reader.

Note: Notifying replica and followers can be be parallelized eventually.

@ankitkala ankitkala added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 18, 2023
@ankitkala
Copy link
Member Author

ankitkala commented Apr 18, 2023

@ankitkala
Copy link
Member Author

@anasalkouz
Copy link
Member

Related Issue (RFC): #2872

@Bukhtawar Bukhtawar added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Jul 27, 2023
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep
Projects
Status: New
Development

No branches or pull requests

4 participants