Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merging component templates with a mix of dotted and nested object mapper definitions #106077

Merged
merged 10 commits into from
Apr 8, 2024

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Mar 7, 2024

Fixes #105482

The background

This regression was introduced in #97317. That PR added a new way how component templates get merged. When creating mappings for an index that's about to be created, we previously merged component templates one-by-one. The issue with this sequential merging approach is that some settings (like subobjects: false) in one component template may affect how other component templates need to be parsed. This makes it dependent on ordering. Consider that there are component templates c1 and c2 that are merged sequentially. In c1, subobjects is set to false. This affects how c2 needs to be parsed. However, if c2 sets subobjects: false, before the PR, it couldn't affect how c1 is parsed, because it has already been parsed before considering c2.

The PR introduced a way to bulk-merge multiple component templates at once (see the changes in MetadataCreateIndexService). In the first step, the mapping sources are merged (merging of a Map<String, Object>). This ensures that the subobjects parameter, for example, is determined for the combination of the mappings, before the mapping source is actually parsed into a RootObjectMapper.

So far, all is well and good.

When there are two component templates that have a mix of dotted and object notation, their sources get merged like this:

{
  "properties": {
    "parent.child": {
      "type": "keyword"
    },
    "parent" : {
      "properties" : {
        "child" : {
          "type" : "integer"
        }
      }
    }
  }
}

Still, so far, so good.

When parsing the mappings, the RootObjectMapper will then have two ObjectMapper.Builder instances for the parent object. This is also not issue so far, because there's code in ObjectMapper.Builder#buildMappers that deals with that exact scenario and merges the two objects:

if (existing != null) {
// The same mappings or document may hold the same field twice, either because duplicated JSON keys are allowed or
// the same field is provided using the object notation as well as the dot notation at the same time.
// This can also happen due to multiple index templates being merged into a single mappings definition using
// XContentHelper#mergeDefaults, again in case some index templates contained mappings for the same field using a
// mix of object notation and dot notation.
mapper = existing.merge(mapper, MapperMergeContext.from(mapperBuilderContext, Long.MAX_VALUE));
}

This is where the trouble starts: when looking at the implementation of ObjectMapper#merge, it the merge reason is hard-coded to MAPPING_UPDATE.

public ObjectMapper merge(Mapper mergeWith, MapperMergeContext mapperMergeContext) {
return merge(mergeWith, MergeReason.MAPPING_UPDATE, mapperMergeContext);
}

But wait, we're still in a context where the merge reason should be INDEX_TEMPLATE - we're merging multiple component templates.

Why does the merge reason matter?

Within the merge process, the merge reason determines how to deal with field mappers for the same field:

// If we're merging template mappings when creating an index, then a field definition always
// replaces an existing one.
if (reason == MergeReason.INDEX_TEMPLATE) {
putMergedMapper(mergedMappers, mergeWithMapper);
} else {
putMergedMapper(mergedMappers, mergeIntoMapper.merge(mergeWithMapper, objectMergeContext));
}

Because the wrong merge reason is selected, the two different definitions of the child field ({"child" : {"type" : "keyword"}} and {"child" : {"type" : "integer"}} ) are merged instead of replaced. However, it's not allowed to change the type of a field when merging two different field mappers. This is why this saga ends with an exception being thrown here:

if (Objects.equals(this.getClass(), mergeWith.getClass()) == false) {
throw new IllegalArgumentException(
"mapper [" + name() + "] cannot be changed from type [" + contentType() + "] to [" + mergeWith.contentType() + "]"
);
}

The fix

The PR #97317 made crucial improvements on how component templates get merged. We don't want to roll these back. Instead, the focus of the bug fix is to preserve the right merge reason, depending on the context.

To do so, I've added the merge reason to MapperBuilderContext. This seemed like the most natural choice to hand down the merge reason and avoids having to add another MergeReason parameter to even more methods. This merge reason is the one to use when merging mappers while building the mapper (when calling ObjectMapper#merge in ObjectMapper.Builder#buildMappers).

When creating a MapperBuilderContext, a MergeReason can be supplied. I kept the public static MapperBuilderContext root(boolean isSourceSynthetic, boolean isDataStream) method which is using a MAPPING_UPDATE merge reason by default. This is mainly so that I don't clutter this PR with a gazillion updated callers in test code. There are also a couple of callers in production code that I didn't feel the need to update by explicitly providing the MAPPING_UPDATE merge reason. YMMV, but previously the entire code base was implicitly assuming a MAPPING_UPDATE merge reason for ObjectMappepr#merge and this is now adding the ability to specify a different one. But in most places, MAPPING_UPDATE is the appropriate merge reason.

@felixbarny felixbarny added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types auto-backport-and-merge v8.13.0 v8.14.0 labels Mar 7, 2024
@felixbarny felixbarny requested review from javanna and piergm March 7, 2024 16:18
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 7, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember having the feeling that the actual merge reason is not fully followed through, but didn't find a proper test case. I hope there are no additional cases where hard-coded reason is used instead of the actual one.

Other than that, what a magnificent combination of code analysis, fixing and storytelling! I want to see the Gen AI application that does something like that 😉

@andreidan andreidan self-requested a review April 4, 2024 10:00
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Felix and for the awesome description (a thing of beauty!)

I left some rather minor nits and a proposal to remove one of the merge methods which I sketched in a branch (to make sure it's not a humongous change - and it doesn't seem like it would be +27, -61): cdb4711

The purpose is to (hopefully) simplify the matrix of available public methods and avoid a potentially subtle bug where a merge method would be called with a reason that's not the same as the one configured in the mapper merge context
We have already a few situations like this in tests (which I fixed in the branch I sketched out the proposal in):

        NestedObjectMapper result = (NestedObjectMapper) firstMapper.merge(
            secondMapper,
            MapperService.MergeReason.INDEX_TEMPLATE,
            MapperMergeContext.root(false, false, Long.MAX_VALUE)
        );

What do you think? (if you agree feel free to cherry-pick the commit or reimplement, as you wish - it's on a branch based on your branch so it should be straightforward to pick)

andreidan and others added 3 commits April 8, 2024 13:18
This removes the (additional) `merge`  method with signature
```
public Mapper merge(Mapper mergeWith, MapperService.MergeReason reason, MapperMergeContext mapperMergeContext);
```
in favour of the existing
```
public abstract Mapper merge(Mapper mergeWith, MapperMergeContext mapperMergeContext);
```
as the `MapperMergeContext` arealdy contains the merge reason.

This will help the API clients avoid the pitfalls that already exist in tests where the provided
merge `reason` is not the same as the `MapperMergeContext#reason`
Copy link
Member Author

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some rather minor nits and a proposal to remove one of the merge methods which I sketched in a branch (to make sure it's not a humongous change - and it doesn't seem like it would be +27, -61): cdb4711

Thanks, I've cherry-picked your commit.

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Felix.

LGTM (once CI is happy) 🚀

@felixbarny felixbarny merged commit ab52ef1 into elastic:main Apr 8, 2024
14 checks passed
@felixbarny felixbarny deleted the fix-template-merging-regression branch April 8, 2024 15:55
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.13 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 106077

@felixbarny
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.13

Questions ?

Please refer to the Backport tool documentation

felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Apr 9, 2024
…t mapper definitions (elastic#106077)

Co-authored-by: Andrei Dan <[email protected]>
(cherry picked from commit ab52ef1)
elasticsearchmachine pushed a commit that referenced this pull request Apr 9, 2024
…d object mapper definitions (#106077) (#107254)

* Fix merging component templates with a mix of dotted and nested object mapper definitions (#106077)

Co-authored-by: Andrei Dan <[email protected]>
(cherry picked from commit ab52ef1)

* Remove unnecessary subobjects flag in buildMergedMappers

this sneaked in while resolving merge conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending >bug external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.13.0 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index template regression when composed of component template with different mapping notations
5 participants