Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate mappings in persisted cluster state #88479

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Jul 12, 2022

Replacement for #87126, related to #77466

Following up on #82608, which compacted the network serialization of mappings in the cluster state's metadata. In that PR, mappings are serialized as a map keyed by hash, and then when serializing index metadata the appropriate mapping can be referenced by its hash only, saving network traffic in the (expected common) case of indices with duplicate mappings.

This PR extends that same logic to the on-disk serialization of a cluster state via the PersistedClusterStateService. The efficiency gain in this case is greater, however, because in the case of an index settings change, for example, the mapping and associated hash are unchanged and so the on-disk mapping doesn't need to be re-persisted.

/cc @DaveCTurner

DaveCTurner and others added 22 commits May 25, 2022 16:07
to reflect specifically that there's no action required for an index, specifically
There's no tracking of 'mappings changed' because we're storing things
by hash, so mappings are removed and/or added but we can't see
'changes' as a reified thing.
The behavior of writeIncrementalStateAndCommit relies on the
previousState that we pass in actually being a correct representation
of the on-disk state -- this test did not respect that. Rather, it did
clusterState->newClusterState twice in a row, so now we inject a
null->clusterState between the two so that everything works right.

Also we're doing more writes now, so I bumped the 14 constant to 16.
@joegallo joegallo added WIP :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.4.0 labels Jul 12, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@joegallo joegallo requested a review from DaveCTurner July 13, 2022 21:19
@joegallo joegallo removed the WIP label Jul 13, 2022
@joegallo
Copy link
Contributor Author

joegallo commented Jul 13, 2022

I think this is ready for review, @original-brownbear, if you have the time and inclination to take a look. I won't merge it without an explicit ✅ from @DaveCTurner, though, and I know I won't get that until he's back early next week.

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one style nit in the tests

@@ -1629,6 +1894,18 @@ private NodeEnvironment newNodeEnvironment(Path[] dataPaths) throws IOException
);
}

private static MappingMetadata randomMappingMetadata(boolean allowNull) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slight preference for this being two methods randomMappingMetadata vs randomMappingMetadataOrNull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, I was on the fence about that myself. I'll add a commit.

@joegallo joegallo merged commit 9d882e9 into elastic:master Jul 18, 2022
@joegallo joegallo deleted the 2022-05-25-deduplicate-mappings-in-persisted-state branch July 18, 2022 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants