-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix merging component templates with a mix of dotted and nested object mapper definitions #106077
Fix merging component templates with a mix of dotted and nested object mapper definitions #106077
Conversation
…t mapper definitions
Pinging @elastic/es-search (Team:Search) |
Hi @felixbarny, I've created a changelog YAML for you. |
server/src/test/java/org/elasticsearch/index/mapper/MapperServiceTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do remember having the feeling that the actual merge reason is not fully followed through, but didn't find a proper test case. I hope there are no additional cases where hard-coded reason is used instead of the actual one.
Other than that, what a magnificent combination of code analysis, fixing and storytelling! I want to see the Gen AI application that does something like that 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Felix and for the awesome description (a thing of beauty!)
I left some rather minor nits and a proposal to remove one of the merge
methods which I sketched in a branch (to make sure it's not a humongous change - and it doesn't seem like it would be +27, -61
): cdb4711
The purpose is to (hopefully) simplify the matrix of available public methods and avoid a potentially subtle bug where a merge
method would be called with a reason
that's not the same as the one configured in the mapper merge context
We have already a few situations like this in tests (which I fixed in the branch I sketched out the proposal in):
NestedObjectMapper result = (NestedObjectMapper) firstMapper.merge(
secondMapper,
MapperService.MergeReason.INDEX_TEMPLATE,
MapperMergeContext.root(false, false, Long.MAX_VALUE)
);
What do you think? (if you agree feel free to cherry-pick the commit or reimplement, as you wish - it's on a branch based on your branch so it should be straightforward to pick)
server/src/main/java/org/elasticsearch/index/mapper/MapperBuilderContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MapperBuilderContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MapperMergeContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
This removes the (additional) `merge` method with signature ``` public Mapper merge(Mapper mergeWith, MapperService.MergeReason reason, MapperMergeContext mapperMergeContext); ``` in favour of the existing ``` public abstract Mapper merge(Mapper mergeWith, MapperMergeContext mapperMergeContext); ``` as the `MapperMergeContext` arealdy contains the merge reason. This will help the API clients avoid the pitfalls that already exist in tests where the provided merge `reason` is not the same as the `MapperMergeContext#reason`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some rather minor nits and a proposal to remove one of the merge methods which I sketched in a branch (to make sure it's not a humongous change - and it doesn't seem like it would be +27, -61): cdb4711
Thanks, I've cherry-picked your commit.
server/src/main/java/org/elasticsearch/index/mapper/MapperBuilderContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MapperMergeContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this Felix.
LGTM (once CI is happy) 🚀
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…t mapper definitions (elastic#106077) Co-authored-by: Andrei Dan <[email protected]> (cherry picked from commit ab52ef1)
…d object mapper definitions (#106077) (#107254) * Fix merging component templates with a mix of dotted and nested object mapper definitions (#106077) Co-authored-by: Andrei Dan <[email protected]> (cherry picked from commit ab52ef1) * Remove unnecessary subobjects flag in buildMergedMappers this sneaked in while resolving merge conflicts
Fixes #105482
The background
This regression was introduced in #97317. That PR added a new way how component templates get merged. When creating mappings for an index that's about to be created, we previously merged component templates one-by-one. The issue with this sequential merging approach is that some settings (like
subobjects: false
) in one component template may affect how other component templates need to be parsed. This makes it dependent on ordering. Consider that there are component templates c1 and c2 that are merged sequentially. In c1,subobjects
is set to false. This affects how c2 needs to be parsed. However, if c2 setssubobjects: false
, before the PR, it couldn't affect how c1 is parsed, because it has already been parsed before considering c2.The PR introduced a way to bulk-merge multiple component templates at once (see the changes in MetadataCreateIndexService). In the first step, the mapping sources are merged (merging of a
Map<String, Object>
). This ensures that thesubobjects
parameter, for example, is determined for the combination of the mappings, before the mapping source is actually parsed into aRootObjectMapper
.So far, all is well and good.
When there are two component templates that have a mix of dotted and object notation, their sources get merged like this:
Still, so far, so good.
When parsing the mappings, the
RootObjectMapper
will then have twoObjectMapper.Builder
instances for theparent
object. This is also not issue so far, because there's code inObjectMapper.Builder#buildMappers
that deals with that exact scenario and merges the two objects:elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Lines 160 to 167 in a97552b
This is where the trouble starts: when looking at the implementation of
ObjectMapper#merge
, it the merge reason is hard-coded toMAPPING_UPDATE
.elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Lines 458 to 460 in a97552b
But wait, we're still in a context where the merge reason should be
INDEX_TEMPLATE
- we're merging multiple component templates.Why does the merge reason matter?
Within the merge process, the merge reason determines how to deal with field mappers for the same field:
elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Lines 592 to 598 in a97552b
Because the wrong merge reason is selected, the two different definitions of the
child
field ({"child" : {"type" : "keyword"}}
and{"child" : {"type" : "integer"}}
) are merged instead of replaced. However, it's not allowed to change the type of a field when merging two different field mappers. This is why this saga ends with an exception being thrown here:elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java
Lines 404 to 408 in 179739e
The fix
The PR #97317 made crucial improvements on how component templates get merged. We don't want to roll these back. Instead, the focus of the bug fix is to preserve the right merge reason, depending on the context.
To do so, I've added the merge reason to
MapperBuilderContext
. This seemed like the most natural choice to hand down the merge reason and avoids having to add anotherMergeReason
parameter to even more methods. This merge reason is the one to use when merging mappers while building the mapper (when callingObjectMapper#merge
inObjectMapper.Builder#buildMappers
).When creating a
MapperBuilderContext
, aMergeReason
can be supplied. I kept thepublic static MapperBuilderContext root(boolean isSourceSynthetic, boolean isDataStream)
method which is using aMAPPING_UPDATE
merge reason by default. This is mainly so that I don't clutter this PR with a gazillion updated callers in test code. There are also a couple of callers in production code that I didn't feel the need to update by explicitly providing theMAPPING_UPDATE
merge reason. YMMV, but previously the entire code base was implicitly assuming aMAPPING_UPDATE
merge reason forObjectMappepr#merge
and this is now adding the ability to specify a different one. But in most places,MAPPING_UPDATE
is the appropriate merge reason.