-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate mappings in persisted cluster state #88479
Deduplicate mappings in persisted cluster state #88479
Conversation
to reflect specifically that there's no action required for an index, specifically
There's no tracking of 'mappings changed' because we're storing things by hash, so mappings are removed and/or added but we can't see 'changes' as a reified thing.
The behavior of writeIncrementalStateAndCommit relies on the previousState that we pass in actually being a correct representation of the on-disk state -- this test did not respect that. Rather, it did clusterState->newClusterState twice in a row, so now we inject a null->clusterState between the two so that everything works right. Also we're doing more writes now, so I bumped the 14 constant to 16.
Pinging @elastic/es-distributed (Team:Distributed) |
I think this is ready for review, @original-brownbear, if you have the time and inclination to take a look. I won't merge it without an explicit ✅ from @DaveCTurner, though, and I know I won't get that until he's back early next week. |
Hi @joegallo, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one style nit in the tests
@@ -1629,6 +1894,18 @@ private NodeEnvironment newNodeEnvironment(Path[] dataPaths) throws IOException | |||
); | |||
} | |||
|
|||
private static MappingMetadata randomMappingMetadata(boolean allowNull) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: slight preference for this being two methods randomMappingMetadata
vs randomMappingMetadataOrNull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I was on the fence about that myself. I'll add a commit.
Replacement for #87126, related to #77466
Following up on #82608, which compacted the network serialization of mappings in the cluster state's metadata. In that PR, mappings are serialized as a map keyed by hash, and then when serializing index metadata the appropriate mapping can be referenced by its hash only, saving network traffic in the (expected common) case of indices with duplicate mappings.
This PR extends that same logic to the on-disk serialization of a cluster state via the PersistedClusterStateService. The efficiency gain in this case is greater, however, because in the case of an index settings change, for example, the mapping and associated hash are unchanged and so the on-disk mapping doesn't need to be re-persisted.
/cc @DaveCTurner