-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic _source: support match_only_text #89516
Synthetic _source: support match_only_text #89516
Conversation
This adds support for synthetic `_source` to the `match_only_text` field type. When synthetic `_source` is enabled `match_only_text` fields create a hidden stored field to contain their text. This should have similar or better search performance for this specific field type, though it will have slightly worse indexing performance because synthetic `_source` is still writing `_recovery_source`, which means we're writing the bits for this field twice.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @nik9000, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
} catch (IOException e) { | ||
throw new UncheckedIOException(e); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a stored field lookup thing in searchExecutionContext.lookup()
but it can't be convinced to load the hidden stored field. If we feel strongly about it I can try and integrate into it, but I'm not super sure how at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a lovely caching mechanism that I think could be quite nice. If multiple queries need to recheck the source it'll load once while this won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, somewhat annoyingly, I don't think we will be able to re-use a stored field loader from the SearchLookup here because the underlying API expects a LeafReaderContext, not a LeafSearchLookup. But you should be able to use a LeafStoredFieldLoader rather than a FieldVisitor here which will at least be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.
} catch (IOException e) { | ||
throw new UncheckedIOException(e); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.
@@ -326,6 +371,10 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio | |||
Field field = new Field(fieldType().name(), value, fieldType); | |||
context.doc().add(field); | |||
context.addToFieldNames(fieldType().name()); | |||
|
|||
if (context.isSyntheticSource()) { | |||
context.doc().add(new Field(fieldType().originalFieldName(), value, ORIGINAL_FIELD_TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use StoredField
rather than creating a new field type
@@ -279,6 +321,9 @@ public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext | |||
throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations"); | |||
} | |||
|
|||
private String originalFieldName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this storedFieldName
to make it a bit clearer what it's used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
return fieldMapping(b -> b.field("type", "match_only_text")); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd be happier with coverage here if we explicitly run both tests for the synthetic and non-synthetic case, this feels at the moment like we're only testing 50% of the functionality on each run and I think that will come back to bite us.
👍 |
Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.
@romseygeek this should be ready for you too! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I almost clicked the merge button! It wouldn't have compiled! sneaky sneaky. |
This adds support for synthetic
_source
to thematch_only_text
fieldtype. When synthetic
_source
is enabledmatch_only_text
fieldscreate a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic
_source
is still writing_recovery_source
, which meanswe're writing the bits for this field twice.