-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Cross Cluster Replication as a Core Component #2872
Comments
Created dedicated issue to discuss/track segment replication - cross-cluster-replication/issues/373. |
Hi @krishna-ggk -- just continuing this discussion a bit from #2482:
Oh you can choose where you do the filtering, either in the source cluster or the target cluster. So if security is important, always use Or maybe just always filter at the source and don't risk security issues :) In Lucene we are also working on making
OK I see. I think this is yet another reason to add a bulk streaming indexing API to OpenSearch -- the synchronous (ack'd on every bulk request) bulk write model is not great for several reasons: it pushes concurrency tweaking (for higher throughput) out to clients, forcing them to also deal with With a streaming indexing API instead / in addition, users could send an endless stream of docs and the cluster could "do the right thing" dynamically picking appropriate concurrency and block sizes internally based on available cluster resources. Finally, when the client closes the bulk stream, at that point they get a write ack. This would allow users who do not need such synchronous durability to amortize that cost better. |
I don't think we should fragment the conversation? Segrep is a core component and I think we need to look at baking CCR mechanisms w/ segrep in the core. I'm not convinced that looks like a completely separate CCR module instead of using segment based snapshot leader / restore follower. If it does that module should live in core. Maybe we could use this as a meta issue and relocate #3020 to core?
I think this makes sense absent remote storage but should still be a solution baked into segrep (more likely for on-prem use cases). If a user ops into remote storage, however, we should rely on the durability of the remote storage mechanism.
This is a HUGE problem today. The Streaming Index API can not only determine the appropriate level of concurrency based on resource load but also accept a user defined Durability Policy to guide the type of remote storage (warm, cold, etc), what documents to replicate across clusters (if any), whether or not to use a translog. It decouples the durability configuration as input to the segment replication setup. |
Done - I transferred cross-cluster-replication/issues/373 into OpenSearch/issues/3020. All I wanted to ensure was we discuss segrep and architectural details of moving to core separately. Anyways agree with avoiding further fragmentation of the conversation 👍 |
Thanks for the suggestions. Very likely we may have to end up filtering at source from security standpoint - however we can evaluate deeply while implementing it.
Nice! |
|
Is your feature request related to a problem? Please describe.
CCR is currently an external plugin written in Kotlin and relying on internal components of the engine that do not guarantee backwards compatibility. This has already caused issues carrying CCR specific API settings in the core and now a concern about CPU performance when retrieving operations history from the lucene index. This external plugin is requiring core implementation guarantees that was not designed for external plugins.
Describe the solution you'd like
Refactor CCR as a core module and offer as a core feature. Not only would this provide internal compatibility that is not currently guaranteed to external plugins but it would allow the feature to be re-implemented on top of the more proper segment replication feature by replicating segments directly to the following cluster instead of slowly replaying operations from the translog.
Note: for now we should explore refactoring CCR from kotlin to java as we have no idea what kind of interpreter trickery kotlin plays that might lead to crazy direct memory OOM like the ILM issue discovered.
Describe alternatives you've considered
None. The current implementation should be considered obsolete and refactored once segment replication becomes GA.
Additional context
Related issues:
The text was updated successfully, but these errors were encountered: