Fix execution of exists query within nested queries on field with doc_values disabled #78841

yrodiere · 2021-10-07T16:27:56Z

Fixes #76362, which is a regression that first occurred in 7.14.0.

In order to fix the problem, we need to add a _field_names field to nested documents. However, just like root documents, this field will only be populated with fields that have no doc values nor norms. Which means that this patch won't change anything, except for users who explicitly opted out of doc values for one of their fields.

I'm sorry if this does not pass the full test suite; I tried running it locally but it fails for some unknown (but unrelated) reason. I executed a few relevant tests manually, and I'll have to wait for CI to give a full report.

If this gets merged, could we please backport this PR to 7.x? That would allow the Hibernate Search test suite to pass on something more recent than ES 7.13.

mark-vieira · 2021-10-07T16:53:48Z

@elasticmachine retest this please

yrodiere · 2021-10-08T06:38:12Z

Thanks. Having a look at the test failures right now.

yrodiere · 2021-10-08T07:06:36Z

It appears MixedClusterClientYamlTestSuiteIT was failing because it checks what happens when running the tests I added against a clustering comprising ES 8 nodes and ES 7.16 nodes, and obviously the bug wasn't fixed in ES 7.16 yet: we need a backport for that.

For now I added guards to prevent testing against ES 7.x, like I've seen it done in that other PR.

I also fixed the tab that was making the "part-1" build fail.

elasticmachine · 2021-10-11T15:18:33Z

Pinging @elastic/es-search (Team:Search)

romseygeek

Thanks for opening @yrodiere!

I think we can do this more simply by changing the implementation of DocumentParserContext.addToFieldNames(); instead of collecting the field names and then adding them to the document in postParse(), we can instead add each field name individually as a new Field to the current document directly in addToFieldNames(). We don't store freqs or positions on the field names field so adding duplicate values won't matter, and then the existing logic that copies fields from nested docs to their parents will handle things naturally.

Would you be interested in trying this out?

yrodiere · 2021-10-14T11:48:05Z

Hey @romseygeek ,

Sure, I can do this.

However, I will ask confirmation about a few things before I start:

we can instead add each field name individually as a new Field to the current document directly in addToFieldNames()

You went out of your way to do the exact opposite, making sure the mapper itself adds the fields names: #71929 (in particular
0c509b4#diff-2996184d1f59c95c1530f925f7308296b84cb7f43fb47aac2845cde01510be5fR159-R166 ).

In fact, that's the main reason I tried to use FieldNamesFieldMapper; I thought that was a newly introduced practice that was preferred over the previous one.

Can you confirm you changed your mind and no longer want that?

the existing logic that copies fields from nested docs to their parents will handle things naturally

By "existing logic", do you mean the one I just added that keeps a reference to the parent in DocumentParserContext? Because that's the main driver for the complexity of this patch, so I'm guessing you won't like that. Before my patch, there wasn't any _field_names field in nested documents, so I doubt there was any code to copy it to the root document.

Or do you mean the include_in_parent parameter of nested field mapping? Because that's optional and disabled by default, so I believe relying on this would change the previous behavior that allowed running an exists query without wrapping it into a nested query.

Or do you mean another "existing logic" that copies _field_names from nested documents to the root document? Because I didn't see that anywhere...

To be clear I personally don't need _field_names to be copied to the parent document: I always wrap exists queries into nested queries when necessary, so I have no horse in this game. I just don't want to introduce backwards-incompatible changes that you will have to deal with later, so I thought I'd bring this to your attention.

romseygeek · 2021-10-14T13:02:25Z

Can you confirm you changed your mind and no longer want that?

I can! I think we can avoid getting the FieldNamesFieldMapper itself, and just add a static helper method to it that would create the relevant IndexableField.

Or do you mean the include_in_parent parameter of nested field mapping? Because that's optional and disabled by default, so I believe relying on this would change the previous behavior that allowed running an exists query without wrapping it into a nested query.

I do indeed mean this. I would say that the current behaviour is a bug - running a field exists query against a top-level document that does not in fact include that field should not return the top-level doc.

yrodiere · 2021-10-14T14:17:35Z

@romseygeek Done.

However, I couldn't use a static method: the _field_names feature can be disabled through a setting (_field_names.enabled?) for indexes created in Elasticsearch 7.x or before, so I need to retrieve the value of the setting to decide whether to add a field name or not, and the easiest way to do so is to just retrieve the FieldNamesFieldMapper.

romseygeek

This looks great, thank you! I think we need one more test scenario to check that includeInParent will work as expected, but other than that this is good to go.

romseygeek · 2021-10-15T11:32:08Z

server/src/test/java/org/elasticsearch/index/mapper/NestedObjectMapperTests.java

+        assertThat(doc.docs().get(4).getNumericValue("nested1.nested2.integer2"), nullValue());
+        assertThat(doc.docs().get(4).get("_field_names"), nullValue());
+    }
+


Can we add a test that these also work for nested documents that have includeInParent set to true?

romseygeek · 2021-10-15T11:32:53Z

@elasticmachine test this please

... instead of always adding them to the root document.

…disabled

…_root

yrodiere · 2021-10-15T11:57:03Z

Thanks. I amended a commit to fix the checkstyle issue (unused import) and added a commit to test include_in_parent/include_in_root (it works as expected).

romseygeek · 2021-10-15T12:54:20Z

@elasticmachine test this please

romseygeek · 2021-10-15T14:02:25Z

@elasticmachine update branch

romseygeek · 2021-10-16T15:36:08Z

@elasticmachine test this please

yrodiere · 2021-10-18T06:32:30Z

I tried to see if there was something I could do to fix the tests, but I'm not sure the failures are related to my changes. There's something about field caps, something about node discovery, ... I'm wondering if those aren't simply flaky tests?

romseygeek · 2021-10-18T07:58:16Z

@elasticmachine update branch

romseygeek · 2021-10-18T07:59:52Z

I tried to see if there was something I could do to fix the tests, but I'm not sure the failures I related to my changes. There's something about field caps, something about node discovery, ... I'm wondering if those aren't simply flaky tests?

We're coming up to a deadline so lots of stuff is going in which makes it difficult to keep up with backwards-compatibility checks. I'll keep updating the branch and retesting and will merge once we get a green CI. Thanks for your patience on this.

romseygeek · 2021-10-18T08:00:01Z

@elasticmachine ok to test

romseygeek · 2021-10-18T11:09:30Z

@elasticmachine update branch

romseygeek · 2021-10-19T08:27:05Z

@elasticmachine update branch

…_values disabled (elastic#78841) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents.

…_values disabled (#78841) (#79462) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>

* upstream/master: Validate tsdb's routing_path (elastic#79384) Adjust the BWC version for the return200ForClusterHealthTimeout field (elastic#79436) API for adding and removing indices from a data stream (elastic#79279) Exposing the ability to log deprecated settings at non-critical level (elastic#79107) Convert operator privilege license object to LicensedFeature (elastic#79407) Mute SnapshotBasedIndexRecoveryIT testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79456) Create cache files with CREATE_NEW & SPARSE options (elastic#79371) Revert "[ML] Use a new annotations index for future annotations (elastic#79151)" [ML] Use a new annotations index for future annotations (elastic#79151) [ML] Removing legacy code from ML/transform auditor (elastic#79434) Fix rate agg with custom `_doc_count` (elastic#79346) Optimize SLM Policy Queries (elastic#79341) Fix execution of exists query within nested queries on field with doc_values disabled (elastic#78841) Stricter UpdateSettingsRequest parsing on the REST layer (elastic#79227) Do not release snapshot file download permit during recovery retries (elastic#79409) Preserve request headers in a mixed version cluster (elastic#79412) Adjust versions after elastic#79044 backport to 7.x (elastic#79424) Mute BulkByScrollUsesAllScrollDocumentsAfterConflictsIntegTests (elastic#79429) Fail on SSPL licensed x-pack sources (elastic#79348) # Conflicts: # server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java

…_values disabled (elastic#78841) (elastic#79462) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>

…_values disabled (#78841) (#79462) (#80274) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>

stewi2 · 2021-11-05T00:26:15Z

except for users who explicitly opted out of doc values for one of their fields

To clarify, #80309 happened on a text field, which doesn't support doc_values in the first place. I think the issue was more that we had also disabled norms, meaning that exists() queries had to fall back to using field_names, which didn't exist (as far as I understand).

"first": {
    "type": "text",
    "norms": false
},

"type": "text" basically implies "doc_values": false. Not sure if this PR would fix that case as well?

yrodiere · 2021-11-05T07:51:58Z

To clarify, #80309 happened on a text field, which doesn't support doc_values in the first place.

That's right, this is slightly different from #76362, which this PR is supposed to fix.

Not sure if this PR would fix that case as well?

It should.

I didn't add any code specific to doc_values. Elasticsearch already had code that detects whether a field needs an entry in the _field_names field, and that code already took into account both missing doc_values and missing norms. It worked well enough for both.

All this PR does is make sure that, when Elasticsearch detects that a field needs an entry in the _field_names field, then it adds this entry to the right (nested) document.

So, yes, I think this PR should fix #80309 as well.

If you want to be sure, you can add more tests to rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/161_exists_query_within_nested_query.yml. Or... just test a patched version of Elasticsearch locally.

elasticsearchmachine added v8.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 7, 2021

yrodiere force-pushed the exists-in-nested-query branch from e12665b to 8298689 Compare October 8, 2021 07:06

mark-vieira added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Oct 11, 2021

elasticmachine added the Team:Search Meta label for search team label Oct 11, 2021

jimczi requested a review from romseygeek October 12, 2021 15:58

jimczi added the >bug label Oct 12, 2021

romseygeek suggested changes Oct 14, 2021

View reviewed changes

yrodiere force-pushed the exists-in-nested-query branch from 8298689 to 80694d2 Compare October 14, 2021 14:15

romseygeek suggested changes Oct 15, 2021

View reviewed changes

yrodiere added 3 commits October 15, 2021 13:56

Add field names to nested documents when relevant

17edd74

... instead of always adding them to the root document.

Test execution of exists query within nested queries with doc_values …

bfdb2c4

…disabled

Test field names added to documents with include_in_parent/include_in…

1b395e9

…_root

yrodiere force-pushed the exists-in-nested-query branch from 80694d2 to 1b395e9 Compare October 15, 2021 11:56

Merge branch 'master' into exists-in-nested-query

98cf1eb

Merge branch 'master' into exists-in-nested-query

87a0b5a

Merge branch 'master' into exists-in-nested-query

0090320

Merge branch 'master' into exists-in-nested-query

5895421

romseygeek added v7.16.0 auto-backport-and-merge labels Oct 19, 2021

romseygeek merged commit 3d35144 into elastic:master Oct 19, 2021

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

romseygeek mentioned this pull request Nov 3, 2021

Fix execution of exists query within nested queries on field with doc… #80274

Merged

romseygeek added the v7.15.2 label Nov 3, 2021

romseygeek mentioned this pull request Nov 4, 2021

‘exists’ query not working for nested docs with norms=false #80309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix execution of exists query within nested queries on field with doc_values disabled #78841

Fix execution of exists query within nested queries on field with doc_values disabled #78841

yrodiere commented Oct 7, 2021 •

edited

Loading

mark-vieira commented Oct 7, 2021

yrodiere commented Oct 8, 2021

yrodiere commented Oct 8, 2021

elasticmachine commented Oct 11, 2021

romseygeek left a comment

yrodiere commented Oct 14, 2021 •

edited

Loading

romseygeek commented Oct 14, 2021

yrodiere commented Oct 14, 2021

romseygeek left a comment

romseygeek Oct 15, 2021

romseygeek commented Oct 15, 2021

yrodiere commented Oct 15, 2021 •

edited

Loading

romseygeek commented Oct 15, 2021

romseygeek commented Oct 15, 2021

romseygeek commented Oct 16, 2021

yrodiere commented Oct 18, 2021 •

edited

Loading

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 19, 2021

stewi2 commented Nov 5, 2021 •

edited

Loading

yrodiere commented Nov 5, 2021

Fix execution of exists query within nested queries on field with doc_values disabled #78841

Fix execution of exists query within nested queries on field with doc_values disabled #78841

Conversation

yrodiere commented Oct 7, 2021 • edited Loading

mark-vieira commented Oct 7, 2021

yrodiere commented Oct 8, 2021

yrodiere commented Oct 8, 2021

elasticmachine commented Oct 11, 2021

romseygeek left a comment

Choose a reason for hiding this comment

yrodiere commented Oct 14, 2021 • edited Loading

romseygeek commented Oct 14, 2021

yrodiere commented Oct 14, 2021

romseygeek left a comment

Choose a reason for hiding this comment

romseygeek Oct 15, 2021

Choose a reason for hiding this comment

romseygeek commented Oct 15, 2021

yrodiere commented Oct 15, 2021 • edited Loading

romseygeek commented Oct 15, 2021

romseygeek commented Oct 15, 2021

romseygeek commented Oct 16, 2021

yrodiere commented Oct 18, 2021 • edited Loading

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 18, 2021

romseygeek commented Oct 19, 2021

stewi2 commented Nov 5, 2021 • edited Loading

yrodiere commented Nov 5, 2021

yrodiere commented Oct 7, 2021 •

edited

Loading

yrodiere commented Oct 14, 2021 •

edited

Loading

yrodiere commented Oct 15, 2021 •

edited

Loading

yrodiere commented Oct 18, 2021 •

edited

Loading

stewi2 commented Nov 5, 2021 •

edited

Loading