Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto follow patterns should not auto follow internal hidden indices / data streams #81750

Open
martijnvg opened this issue Dec 15, 2021 · 5 comments
Labels
:Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@martijnvg
Copy link
Member

Today when setting up an auto follow pattern in CCR that follows (almost) everything then also internal data streams / indices can be auto followed from the remote cluster. For example slm, ilm or watcher history. I don't think that these data streams / indices should be auto followed, because each cluster has their own history for each of those components. This would only make the history of ilm/slm/watcher in the follow cluster more complicated to understand, since it will have history for ilm/slm/watcher from the remote clusters and local cluster.

There is hard coded logic in auto follow patterns to never auto follow system data streams and indices. I think we should have something similar for internal hidden data streams / indices.

Maybe we just determine the list of hidden indices/data streams to exclude based on the internal IndexTemplateRegistry instances? I also wonder whether in general hidden indices/data streams should be replicated? Maybe we can add a parameter to auto follow patterns that controls whether hidden data streams/indices are replicated (and default to false)?

@martijnvg martijnvg added :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features team-discuss :Data Management/Data streams Data streams and their lifecycles labels Dec 15, 2021
@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Dec 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@gwbrown
Copy link
Contributor

gwbrown commented Mar 22, 2022

I don't think that these data streams / indices should be auto followed, because each cluster has their own history for each of those components.

I think this reasoning is sound for almost all cases, but I've had users ask if we would consider supporting CCR for system indices to better support the disaster recovery use case, and I suspect the same folks be using (or want to use) this functionality to replicate hidden indices as well, even though that data is "just" informational.

Maybe we just determine the list of hidden indices/data streams to exclude based on the internal IndexTemplateRegistry instances? I also wonder whether in general hidden indices/data streams should be replicated?

I'd advocate for "hidden indices don't get replicated" over "Elastic's hidden indices don't get replicated" just for simplicity's sake - anything that involves magical lists of things in code is difficult to troubleshoot.

Maybe we can add a parameter to auto follow patterns that controls whether hidden data streams/indices are replicated (and default to false)?

I'd prefer doing this over saying that hidden indices never get replicated, as I think this covers the DR case I mentioned above pretty well while simplifying the model for most users. This also gives us a migration path: Add the new parameter & deprecate the default, then eventually switch the default when we can.

@martijnvg
Copy link
Member Author

Thanks @gwbrown for sharing your thoughts here. I agree with them.

I'd advocate for "hidden indices don't get replicated" over "Elastic's hidden indices don't get replicated" just for simplicity's sake

👍

This also gives us a migration path: Add the new parameter & deprecate the default, then eventually switch the default when we can.

👍

@Leaf-Lin
Copy link
Contributor

We discussed this in the @elastic/es-distributed meeting and agree with the approach proposed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

5 participants