-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement repair functionality for aliases colliding with indices bug #91887
Implement repair functionality for aliases colliding with indices bug #91887
Conversation
Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
Pinging @elastic/es-distributed (Team:Distributed) |
Jenkins run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could write one that corrupts the index in 8.5 and then does a rolling upgrade
I think the level of testing here is adequate for ensuring it is fixed on upgrade.
var updatedBuilder = ImmutableOpenMap.builder(aliasesMap); | ||
final var brokenAlias = updatedBuilder.remove(index); | ||
final var fixedAlias = AliasMetadata.newAliasMetadata(brokenAlias, index + "-alias-corrupted-by-8-5"); | ||
aliasesMap = updatedBuilder.fPut(fixedAlias.getAlias(), fixedAlias).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk that adding an alias with the same name as the index for an index created in 8.5 will now succeed and just result in the "corrupted" alias name instead?
Can we perhaps find or add an integration test that verifies that when adding aliases we have validation in place that catches this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right ... this is quite annoying. I could obviously add the validation for this at another level as well (not sure where, it seems not entirely trivial to add this check elsewhere in a way that would only catch mutations) but it does add some extra complexity to the whole thing. Is it worth doing that for this corner case? The name collision checks are quite complicated as it is already and checking for this specifically would add yet more complexity at some other layer ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a buildAndRepair
method that is used from IndexMetadata.fromXContent
, passing down a boolean to a private build
method in order to only do the repair from that path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recon my comment being slightly convoluted, so here is a diff of what I am thinking of:
Index: server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java b/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
--- a/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java (revision c6565a39e26f6c91f8e6f71f8021e07f8805639d)
+++ b/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java (date 1669302455554)
@@ -2016,6 +2016,10 @@
}
public IndexMetadata build() {
+ return build(false);
+ }
+
+ private IndexMetadata build(boolean repair) {
/*
* We expect that the metadata has been properly built to set the number of shards and the number of replicas, and do not rely
* on the default values here. Those must have been set upstream.
@@ -2144,7 +2148,7 @@
var aliasesMap = aliases.build();
for (AliasMetadata alias : aliasesMap.values()) {
if (alias.alias().equals(index)) {
- if (indexCreatedVersion.equals(Version.V_8_5_0)) {
+ if (repair && indexCreatedVersion.equals(Version.V_8_5_0)) {
var updatedBuilder = ImmutableOpenMap.builder(aliasesMap);
final var brokenAlias = updatedBuilder.remove(index);
final var fixedAlias = AliasMetadata.newAliasMetadata(brokenAlias, index + "-alias-corrupted-by-8-5");
@@ -2208,6 +2212,11 @@
);
}
+ // visible for testing.
+ IndexMetadata buildAndRepair() {
+ return build(true);
+ }
+
@SuppressWarnings("unchecked")
public static void toXContent(IndexMetadata indexMetadata, XContentBuilder builder, ToXContent.Params params) throws IOException {
Metadata.XContentContext context = Metadata.XContentContext.valueOf(
@@ -2476,7 +2485,7 @@
assert settingsVersion : "settings version should be present for indices created on or after 6.5.0";
assert indexCreatedVersion(builder.settings).before(Version.V_7_2_0) || aliasesVersion
: "aliases version should be present for indices created on or after 7.2.0";
- return builder.build();
+ return builder.buildAndRepair();
}
/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, but the would mean that the fix will only apply with a full cluster restart and not with a rolling restart? Wouldn't that be painful for Cloud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also build and repair when receiving the index metadata over the wire maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could I guess :) lets do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to from--x-content, reading from the wire and (not sure this is necessary but I found it weird for it to behave different form normal transport read) diff application now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -504,11 +505,21 @@ public void testIndexAndAliasWithSameName() { | |||
.numberOfShards(1) | |||
.numberOfReplicas(0) | |||
.putAlias(AliasMetadata.builder("index").build()) | |||
.build() | |||
.build(randomBoolean()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we expand this test (or add one) to also verify that for indices created in v8.5, we reject an alias of the same name as the index?
Thanks Henning! |
…elastic#91887) Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
💔 Backport failed
You can use sqren/backport to manually backport by running |
…elastic#91887) Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
…es bug (#91993) * Fix corrupted Metadata from index and alias having the same name (#91456) This fixes a bug introduced in #87863 which added a Metadata copy constructor with separate name collision checks that assumed index name and alias names were already validated in `IndexMetada`. => fixed corrupted states by actually adding the validation to `IndexMetadata` to make it impossible to instantiate a broken `IndexMetadata` in the first place. * Implement repair functionality for aliases colliding with indices bug (#91887) Follow up to #91456 implementing an automated fix for indices corrupted in 8.5.
* upstream/main: (209 commits) Remove unused methods and classes from HLRC (elastic#92030) Clean up on exception while chunking XContent (elastic#92024) Add profiling plugin (elastic#91640) Remove unused methods and classes from HLRC (elastic#92012) Remove IndexerState from HLRC (elastic#92023) Ensure cached time elapses in ClusterServiceIT (elastic#91986) Chunked encoding for RestGetIndicesAction (elastic#92016) Simplify shardsWithState (elastic#91991) [DOCS] Updates ML decider docs by mentioning CPU as scaling criterion (elastic#92018) Add chunking to ClusterState.Custom impls (elastic#91963) Speedup time_series agg by caching current tsid ordinal, parent bucket ordinal and buck ordinal (elastic#91784) Drop the ingest listener call count tracking (elastic#92003) [DOCS] fixes issue number 91889 - missing [discrete] header (elastic#91976) Fix PersistentTasksClusterServiceTests (elastic#92002) [docs] Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated (elastic#91811) Clarify writability in Netty4HttpPipeliningHandler (elastic#91982) Load stable plugins as synthetic modules (elastic#91869) Handle any exception thrown while generating source for an IngestDocument (elastic#91981) fixing Apache HttpHost url on java-rest doc (elastic#91945) Implement repair functionality for aliases colliding with indices bug (elastic#91887) ...
Follow up to #91456 implementing an automated fix for indices corrupted in 8.5.
Not sure this should get an automated test, could write one that corrupts the index in 8.5 and then does a rolling upgrade but maybe manually testing this is good enough in this case?