-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix execution of exists query within nested queries on field with doc_values disabled #78841
Conversation
@elasticmachine retest this please |
Thanks. Having a look at the test failures right now. |
e12665b
to
8298689
Compare
It appears For now I added guards to prevent testing against ES 7.x, like I've seen it done in that other PR. I also fixed the tab that was making the "part-1" build fail. |
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening @yrodiere!
I think we can do this more simply by changing the implementation of DocumentParserContext.addToFieldNames()
; instead of collecting the field names and then adding them to the document in postParse(), we can instead add each field name individually as a new Field to the current document directly in addToFieldNames()
. We don't store freqs or positions on the field names field so adding duplicate values won't matter, and then the existing logic that copies fields from nested docs to their parents will handle things naturally.
Would you be interested in trying this out?
Hey @romseygeek , Sure, I can do this. However, I will ask confirmation about a few things before I start:
You went out of your way to do the exact opposite, making sure the mapper itself adds the fields names: #71929 (in particular In fact, that's the main reason I tried to use Can you confirm you changed your mind and no longer want that?
By "existing logic", do you mean the one I just added that keeps a reference to the parent in Or do you mean the Or do you mean another "existing logic" that copies To be clear I personally don't need |
I can! I think we can avoid getting the FieldNamesFieldMapper itself, and just add a static helper method to it that would create the relevant IndexableField.
I do indeed mean this. I would say that the current behaviour is a bug - running a field exists query against a top-level document that does not in fact include that field should not return the top-level doc. |
8298689
to
80694d2
Compare
@romseygeek Done. However, I couldn't use a static method: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you! I think we need one more test scenario to check that includeInParent
will work as expected, but other than that this is good to go.
assertThat(doc.docs().get(4).getNumericValue("nested1.nested2.integer2"), nullValue()); | ||
assertThat(doc.docs().get(4).get("_field_names"), nullValue()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that these also work for nested documents that have includeInParent
set to true
?
@elasticmachine test this please |
... instead of always adding them to the root document.
80694d2
to
1b395e9
Compare
Thanks. I amended a commit to fix the checkstyle issue (unused import) and added a commit to test |
@elasticmachine test this please |
@elasticmachine update branch |
@elasticmachine test this please |
I tried to see if there was something I could do to fix the tests, but I'm not sure the failures are related to my changes. There's something about field caps, something about node discovery, ... I'm wondering if those aren't simply flaky tests? |
@elasticmachine update branch |
We're coming up to a deadline so lots of stuff is going in which makes it difficult to keep up with backwards-compatibility checks. I'll keep updating the branch and retesting and will merge once we get a green CI. Thanks for your patience on this. |
@elasticmachine ok to test |
@elasticmachine update branch |
@elasticmachine update branch |
…_values disabled (elastic#78841) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents.
…_values disabled (#78841) (#79462) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>
* upstream/master: Validate tsdb's routing_path (elastic#79384) Adjust the BWC version for the return200ForClusterHealthTimeout field (elastic#79436) API for adding and removing indices from a data stream (elastic#79279) Exposing the ability to log deprecated settings at non-critical level (elastic#79107) Convert operator privilege license object to LicensedFeature (elastic#79407) Mute SnapshotBasedIndexRecoveryIT testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79456) Create cache files with CREATE_NEW & SPARSE options (elastic#79371) Revert "[ML] Use a new annotations index for future annotations (elastic#79151)" [ML] Use a new annotations index for future annotations (elastic#79151) [ML] Removing legacy code from ML/transform auditor (elastic#79434) Fix rate agg with custom `_doc_count` (elastic#79346) Optimize SLM Policy Queries (elastic#79341) Fix execution of exists query within nested queries on field with doc_values disabled (elastic#78841) Stricter UpdateSettingsRequest parsing on the REST layer (elastic#79227) Do not release snapshot file download permit during recovery retries (elastic#79409) Preserve request headers in a mixed version cluster (elastic#79412) Adjust versions after elastic#79044 backport to 7.x (elastic#79424) Mute BulkByScrollUsesAllScrollDocumentsAfterConflictsIntegTests (elastic#79429) Fail on SSPL licensed x-pack sources (elastic#79348) # Conflicts: # server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java
…_values disabled (elastic#78841) (elastic#79462) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>
…_values disabled (#78841) (#79462) (#80274) The FieldNamesFieldMapper adds non-indexed fields to a special metadata field so that exists queries can be run efficiently. This is built in a postParse method that is run once per document. However, this means that nested documents are not handled correctly - non-indexed field names are added to the parent document's metadata field rather than to the nested document's field. This commit fixes things to add non-indexed field names directly to the nested documents. Co-authored-by: Yoann Rodière <[email protected]>
To clarify, #80309 happened on a text field, which doesn't support doc_values in the first place. I think the issue was more that we had also disabled norms, meaning that exists() queries had to fall back to using field_names, which didn't exist (as far as I understand). "first": {
"type": "text",
"norms": false
},
|
That's right, this is slightly different from #76362, which this PR is supposed to fix.
It should. I didn't add any code specific to doc_values. Elasticsearch already had code that detects whether a field needs an entry in the All this PR does is make sure that, when Elasticsearch detects that a field needs an entry in the So, yes, I think this PR should fix #80309 as well. If you want to be sure, you can add more tests to |
Fixes #76362, which is a regression that first occurred in 7.14.0.
In order to fix the problem, we need to add a
_field_names
field to nested documents. However, just like root documents, this field will only be populated with fields that have no doc values nor norms. Which means that this patch won't change anything, except for users who explicitly opted out of doc values for one of their fields.I'm sorry if this does not pass the full test suite; I tried running it locally but it fails for some unknown (but unrelated) reason. I executed a few relevant tests manually, and I'll have to wait for CI to give a full report.
If this gets merged, could we please backport this PR to 7.x? That would allow the Hibernate Search test suite to pass on something more recent than ES 7.13.