Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Broken Numeric Shard Generations in RepositoryData #57813

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jun 8, 2020

Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This seems to me to be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the RepositoryData but I don't think
it matters much since we will read RepositoryData from cache in almost
all cases.

Closes #57798

Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes elastic#57798
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 8, 2020
@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 8, 2020

Hold off one second with reviewing, just thought of a problematic corner case that needs a (simple) fix still

Nevermind, problematic corner case is actually good I think.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Armin, I've left one stylistic comment, and one question

shardGenerations.put(indexId, i, gens.get(i));
final String parsedGen = gens.get(i);
shardGenerations.put(
indexId, i, fixBrokenShardGens ? ShardGenerations.fixShardGeneration(parsedGen) : parsedGen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid putting null values into the map (in particular as ShardGenerations.Builder.put does not seem to define @Nullable String generation, and the semantics of putting null into map can be problematic, as the Map interface says the behavior is underspecified whether a NPE is thrown). I would prefer a boolean method here with conditional logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ done

@@ -129,7 +129,7 @@ public static void assertConsistency(BlobStoreRepository repository, Executor ex
try (InputStream blob = blobContainer.readBlob("index-" + latestGen);
XContentParser parser = XContentType.JSON.xContent().createParser(NamedXContentRegistry.EMPTY,
LoggingDeprecationHandler.INSTANCE, blob)) {
repositoryData = RepositoryData.snapshotsFromXContent(parser, latestGen);
repositoryData = RepositoryData.snapshotsFromXContent(parser, latestGen, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why true here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right my bad no reason to be lenient here ... moved to false.

@original-brownbear
Copy link
Member Author

Thanks @ywelsch => I pushed 655c7f3

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also get this into 7.7.2 (if we ever going to release that one)?

@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 8, 2020

Thanks!

@ywelsch about the backport, I was going to ask the same about the other PR as well ... I don't see why not? => I'll backport this and the other one to 7.7.2?

@ywelsch
Copy link
Contributor

ywelsch commented Jun 8, 2020

👍

@original-brownbear original-brownbear merged commit db09e80 into elastic:master Jun 8, 2020
@original-brownbear original-brownbear deleted the broken-repo-data-auto-repair branch June 8, 2020 15:14
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes elastic#57798
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes elastic#57798
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes elastic#57798
original-brownbear added a commit that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes #57798
original-brownbear added a commit that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes #57798
original-brownbear added a commit that referenced this pull request Jun 8, 2020
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes #57798
@original-brownbear original-brownbear restored the broken-repo-data-auto-repair branch August 6, 2020 18:28
@kostasb
Copy link

kostasb commented Sep 3, 2020

Hey @original-brownbear it looks like we missed mentioning this PR in the release notes.
It was backported to 7.7 but we never released 7.7.2, and it's also merged into 7.8 onwards. But I can't find it in the release notes of any version - should it be added to 7.8.0 release notes?

@original-brownbear original-brownbear deleted the broken-repo-data-auto-repair branch September 3, 2020 08:33
@original-brownbear
Copy link
Member Author

@kostasb

should it be added to 7.8.0 release notes?

Yea I guess technically that would be correct. Not sure how relevant it is at this point, but technically it's the first version released with this fix.

@original-brownbear original-brownbear restored the broken-repo-data-auto-repair branch December 6, 2020 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.7.2 v7.8.0 v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot Repositories Containing a Mix of pre and post v7.6 Snapshots Can Become Corrupted
5 participants