-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semantic_text field mapper and inference #107262
semantic_text field mapper and inference #107262
Conversation
Store semantic_text model info in IndexMetadata: On document ingestion, we need to perform inference only once, in the coordinating node. Otherwise, we would be doing inference for each of the shards the document is stored in. The problem with the coordinating node is that it doesn't necessarily hold mapping information if it is not used for storing an index. A pure coordinating node doesn't have any mapping information at all. We need to understand when we need to generate text embeddings on the coordinating node. This means that the model information associated with index fields needs to be efficiently accessed from there. This information needs to be kept up to date with mapping changes, and not be recomputed otherwise. The model / fields information is going to be included as part of the IndexMetadata, to ensure it is communicated to all nodes in the cluster.
Adds SemanticTextInferenceResultFieldMapper, which indexes inference results for semantic_text fields.
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…elastic#106357) --------- Co-authored-by: carlosdelest <[email protected]>
…ing (elastic#106560) This PR refactors the semantic text field mapper to register its sub fields in the mapping instead of re-creating them each time when parsing documents. It also fixes the generation of these fields in case the semantic text field is defined in an object field. Lastly this change adds a new section called model_settings in the field parameter that is updated by the field mapper when inference results are received from a bulk action. The model settings are available in the fields as soon as the first document with the inference field is ingested and they are used to validate that updates. They are used to ensure consistency between what's used in the bulk action and what's defined in the field mapping.
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…ce metadata in `IndexMetadata` (elastic#106743) This change refactors the integration of the field inference metadata in IndexMetadata. Instead of partial diffs, the new class simply sends the entire object as diff if it has changed. This PR also rename the fields and methods related to the inference fields consistently. The inference phase (in the transport shard bulk action) is also changed so that inference is not called if: The document contains a value for the inference input. The document also contains a value for the inference results of that field (in the _inference map). If the document contains no value for the inference input but an inference result for that field, it is marked as failed. --------- Co-authored-by: carlosdelest <[email protected]>
try { | ||
// make sure that we don't expand dots in field names while parsing | ||
context.path().setWithinLeafObject(true); | ||
if (context.path().isWithinLeafObject() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isWithinLeafObject() was not restored previously to its original value. Now we need to have that for parsing inference results.
@@ -106,6 +107,12 @@ public void testCopyToFieldsParsing() throws Exception { | |||
|
|||
fieldMapper = mapperService.documentMapper().mappers().getMapper("new_field"); | |||
assertThat(fieldMapper.typeName(), equalTo("long")); | |||
|
|||
MappingLookup mappingLookup = mapperService.mappingLookup(); | |||
assertThat(mappingLookup.sourcePaths("another_field"), equalTo(Set.of("copy_test", "int_to_str_test", "another_field"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that copy_to information is included in source paths for inference fields
@@ -118,23 +118,25 @@ private SparseEmbeddingResults makeResults(List<String> input) { | |||
for (int i = 0; i < input.size(); i++) { | |||
var tokens = new ArrayList<SparseEmbeddingResults.WeightedToken>(); | |||
for (int j = 0; j < 5; j++) { | |||
tokens.add(new SparseEmbeddingResults.WeightedToken(Integer.toString(j), (float) j)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes were needed in the test inference service to properly test sparse_vector with actual indexing
} | ||
|
||
@Override | ||
public Collection<ActionFilter> getActionFilters() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating an ActionFilter
makes the inference action actually pluggable from the inference plugin - thanks Jim!
…into carlosdelest/semantic-text-field-mapping # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java # server/src/main/java/org/elasticsearch/index/mapper/InferenceFieldMapper.java # x-pack/plugin/inference/build.gradle # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/20_semantic_text_field_mapper.yml
…anges' into carlosdelest/semantic-text-field-mapping
4bf1ba3
to
cab2534
Compare
cab2534
to
0f57a5b
Compare
Hi @carlosdelest, I've created a changelog YAML for you. |
…mapping # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/cluster/metadata/InferenceFieldMetadata.java # server/src/test/java/org/elasticsearch/cluster/metadata/IndexMetadataTests.java
…-field-mapping' into carlosdelest/semantic-text-field-mapping
Pinging @elastic/es-search (Team:Search) |
Closing this PR - opening a separate PR for the field mapping, will open a new one with the inference changes |
This PR adds the semantic_text field mapping and inference actions:
semantic_text
dense_vector
orsparse_vector
fields internally