-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature/semantic_text] Register semantic text sub fields in the mapping #106560
[feature/semantic_text] Register semantic text sub fields in the mapping #106560
Conversation
…erver (elastic#105012)" This reverts commit f4d3ab9.
…copy-to-support-inference # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/BulkShardRequestInferenceProvider.java # server/src/test/java/org/elasticsearch/action/bulk/BulkOperationTests.java
…-fields in the mapping
…instead of re-creating them each time.
…gister_semantic_text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd like to check if we can avoid serializing inference_id
on model settings for the field mapper, to avoid duplicating that information in the mapping.
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Show resolved
Hide resolved
...nce/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, none that I think block merging though
...nce/src/main/java/org/elasticsearch/xpack/inference/mapper/InferenceMetadataFieldMapper.java
Show resolved
Hide resolved
for (XContentParser.Token token = parser.nextToken(); token != XContentParser.Token.END_OBJECT; token = parser.nextToken()) { | ||
switch (parser.currentName()) { | ||
case RESULTS -> parseResultsList(xContentLocation, parser, context, nestedMapper); | ||
default -> throw new DocumentParsingException(xContentLocation, "Unknown field name " + parser.currentName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We originally skipped unknown field names to be forward-compatible with any new fields added in the future. Any concerns about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR preserves this behaviour but only for the _inference.$field_name.results
object. I am not a fan of this leniency but it is needed since we'll want to extend the information we provide in the future. I thought that we can be strict on the top level fields here since they should not change.
...nce/src/main/java/org/elasticsearch/xpack/inference/mapper/InferenceMetadataFieldMapper.java
Show resolved
Hide resolved
MappedFieldType mappedFieldType, | ||
CopyTo copyTo, | ||
IndexVersion indexVersionCreated, | ||
IndexAnalyzers indexAnalyzers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit silly that we're defining the index analyzers, given that we're not indexing or storing the text field.
The point of this text field is to store the text that generated the chunk in _source
, correct? What if we made it a keyword
field instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good call, I wonder if we need to explicitly map this field though. Maybe we can just avoid creating it in the mapping entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlosdelest WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to index or analyze it in any way; users can use multifields / copy_to in case they want to search for it. I'd vote for skipping the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to a keyword field instead so that it's clear that the field is there but not indexed.
...erence/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextModelSettings.java
Show resolved
Hide resolved
@@ -55,25 +56,7 @@ setup: | |||
index: test-index | |||
id: doc_1 | |||
body: | |||
non_inference_field: "you know, for testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were meant to address the parsing of the inference results. I don't see this explicitly tested in other tests. Should we keep these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these tests to InferenceMetadataFieldMapperTests since it is not possible anymore to provide the _inference field manually in a bulk request.
*/ | ||
public class SemanticTextFieldMapper extends FieldMapper { | ||
private static final Logger logger = LogManager.getLogger(SemanticTextFieldMapper.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using the logger anymore I think
...nce/src/main/java/org/elasticsearch/xpack/inference/mapper/InferenceMetadataFieldMapper.java
Outdated
Show resolved
Hide resolved
...nce/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Show resolved
Hide resolved
+ TASK_TYPE_FIELD.getPreferredName() | ||
+ "], expected " | ||
+ TEXT_EMBEDDING | ||
+ "or " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String formatting error: There's no space between "TEXT_EMBEDDING" and "or"
This PR refactors the semantic text field mapper to register its sub fields in the mapping instead of re-creating them each time when parsing documents.
It also fixes the generation of these fields in case the semantic text field is defined in an object field.
Lastly this change adds a new section called
model_settings
in the field parameter that is updated by the field mapper when inference results are received from a bulk action. The model settings are available in the fields as soon as the first document with the inference field is ingested and they are used to validate that updates. They are used to ensure consistency between what's used in the bulk action and what's defined in the field mapping.Note: This PR is opened against the feature branch: feature/semantic_text