Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement repair functionality for aliases colliding with indices bug #91887

Conversation

original-brownbear
Copy link
Member

Follow up to #91456 implementing an automated fix for indices corrupted in 8.5.

Not sure this should get an automated test, could write one that corrupts the index in 8.5 and then does a rolling upgrade but maybe manually testing this is good enough in this case?

Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
@original-brownbear original-brownbear added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.6.0 v8.7.0 v8.5.3 labels Nov 24, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 24, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/part-1

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could write one that corrupts the index in 8.5 and then does a rolling upgrade

I think the level of testing here is adequate for ensuring it is fixed on upgrade.

var updatedBuilder = ImmutableOpenMap.builder(aliasesMap);
final var brokenAlias = updatedBuilder.remove(index);
final var fixedAlias = AliasMetadata.newAliasMetadata(brokenAlias, index + "-alias-corrupted-by-8-5");
aliasesMap = updatedBuilder.fPut(fixedAlias.getAlias(), fixedAlias).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk that adding an alias with the same name as the index for an index created in 8.5 will now succeed and just result in the "corrupted" alias name instead?

Can we perhaps find or add an integration test that verifies that when adding aliases we have validation in place that catches this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right ... this is quite annoying. I could obviously add the validation for this at another level as well (not sure where, it seems not entirely trivial to add this check elsewhere in a way that would only catch mutations) but it does add some extra complexity to the whole thing. Is it worth doing that for this corner case? The name collision checks are quite complicated as it is already and checking for this specifically would add yet more complexity at some other layer ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a buildAndRepair method that is used from IndexMetadata.fromXContent, passing down a boolean to a private build method in order to only do the repair from that path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recon my comment being slightly convoluted, so here is a diff of what I am thinking of:

Index: server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java b/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
--- a/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java	(revision c6565a39e26f6c91f8e6f71f8021e07f8805639d)
+++ b/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java	(date 1669302455554)
@@ -2016,6 +2016,10 @@
         }
 
         public IndexMetadata build() {
+            return build(false);
+        }
+
+        private IndexMetadata build(boolean repair) {
             /*
              * We expect that the metadata has been properly built to set the number of shards and the number of replicas, and do not rely
              * on the default values here. Those must have been set upstream.
@@ -2144,7 +2148,7 @@
             var aliasesMap = aliases.build();
             for (AliasMetadata alias : aliasesMap.values()) {
                 if (alias.alias().equals(index)) {
-                    if (indexCreatedVersion.equals(Version.V_8_5_0)) {
+                    if (repair && indexCreatedVersion.equals(Version.V_8_5_0)) {
                         var updatedBuilder = ImmutableOpenMap.builder(aliasesMap);
                         final var brokenAlias = updatedBuilder.remove(index);
                         final var fixedAlias = AliasMetadata.newAliasMetadata(brokenAlias, index + "-alias-corrupted-by-8-5");
@@ -2208,6 +2212,11 @@
             );
         }
 
+        // visible for testing.
+        IndexMetadata buildAndRepair() {
+            return build(true);
+        }
+
         @SuppressWarnings("unchecked")
         public static void toXContent(IndexMetadata indexMetadata, XContentBuilder builder, ToXContent.Params params) throws IOException {
             Metadata.XContentContext context = Metadata.XContentContext.valueOf(
@@ -2476,7 +2485,7 @@
             assert settingsVersion : "settings version should be present for indices created on or after 6.5.0";
             assert indexCreatedVersion(builder.settings).before(Version.V_7_2_0) || aliasesVersion
                 : "aliases version should be present for indices created on or after 7.2.0";
-            return builder.build();
+            return builder.buildAndRepair();
         }
 
         /**

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, but the would mean that the fix will only apply with a full cluster restart and not with a rolling restart? Wouldn't that be painful for Cloud?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also build and repair when receiving the index metadata over the wire maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could I guess :) lets do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to from--x-content, reading from the wire and (not sure this is necessary but I found it weird for it to behave different form normal transport read) diff application now :)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -504,11 +505,21 @@ public void testIndexAndAliasWithSameName() {
.numberOfShards(1)
.numberOfReplicas(0)
.putAlias(AliasMetadata.builder("index").build())
.build()
.build(randomBoolean())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand this test (or add one) to also verify that for indices created in v8.5, we reject an alias of the same name as the index?

@original-brownbear
Copy link
Member Author

Thanks Henning!

@original-brownbear original-brownbear merged commit 046592e into elastic:main Nov 29, 2022
@original-brownbear original-brownbear deleted the repair-index-alias-collision branch November 29, 2022 09:53
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 29, 2022
…elastic#91887)

Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.6
8.5 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 91887

elasticsearchmachine pushed a commit that referenced this pull request Nov 29, 2022
…#91887) (#91988)

Follow up to #91456 implementing an automated fix for indices corrupted in 8.5.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Nov 29, 2022
…elastic#91887)

Follow up to elastic#91456 implementing an automated fix for indices corrupted in 8.5.
original-brownbear added a commit that referenced this pull request Nov 29, 2022
…es bug (#91993)

* Fix corrupted Metadata from index and alias having the same name (#91456)

This fixes a bug introduced in #87863
which added a Metadata copy constructor with separate name collision checks that
assumed index name and alias names were already validated in `IndexMetada`.
=> fixed corrupted states by actually adding the validation to `IndexMetadata`
to make it impossible to instantiate a broken `IndexMetadata` in the first place.

* Implement repair functionality for aliases colliding with indices bug (#91887)

Follow up to #91456 implementing an automated fix for indices corrupted in 8.5.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Dec 2, 2022
* upstream/main: (209 commits)
  Remove unused methods and classes from HLRC (elastic#92030)
  Clean up on exception while chunking XContent (elastic#92024)
  Add profiling plugin (elastic#91640)
  Remove unused methods and classes from HLRC (elastic#92012)
  Remove IndexerState from HLRC (elastic#92023)
  Ensure cached time elapses in ClusterServiceIT (elastic#91986)
  Chunked encoding for RestGetIndicesAction (elastic#92016)
  Simplify shardsWithState (elastic#91991)
  [DOCS] Updates ML decider docs by mentioning CPU as scaling criterion (elastic#92018)
  Add chunking to ClusterState.Custom impls (elastic#91963)
  Speedup time_series agg by caching current tsid ordinal, parent bucket ordinal and buck ordinal (elastic#91784)
  Drop the ingest listener call count tracking (elastic#92003)
  [DOCS] fixes issue number 91889 - missing [discrete] header (elastic#91976)
  Fix PersistentTasksClusterServiceTests (elastic#92002)
  [docs] Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated (elastic#91811)
  Clarify writability in Netty4HttpPipeliningHandler (elastic#91982)
  Load stable plugins as synthetic modules (elastic#91869)
  Handle any exception thrown while generating source for an IngestDocument (elastic#91981)
  fixing Apache HttpHost url on java-rest doc (elastic#91945)
  Implement repair functionality for aliases colliding with indices bug (elastic#91887)
  ...
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 12, 2022
DaveCTurner added a commit that referenced this pull request Dec 12, 2022
DaveCTurner added a commit that referenced this pull request Dec 12, 2022
DaveCTurner added a commit that referenced this pull request Dec 12, 2022
@original-brownbear original-brownbear restored the repair-index-alias-collision branch April 18, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.5.3 v8.6.0 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants