Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature/semantic-text] semantic text copy to support #106689

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Mar 22, 2024

Adds copy_to support to semantic_text.

Changes:

  • Iterate on the source fields for calculating inference
  • Allow inference to be applied from multiple responses to a single field

jimczi and others added 20 commits March 14, 2024 01:52
…copy-to-support-inference

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/bulk/BulkShardRequestInferenceProvider.java
#	server/src/test/java/org/elasticsearch/action/bulk/BulkOperationTests.java
…ject mappers can provide underlying field mappers
…s and object mappers can provide underlying field mappers"

This reverts commit 05aa06f.
…copy-to-support-inference

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/InferenceResultFieldMapper.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextModelSettings.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/InferenceResultFieldMapperTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_inference.yml
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/20_semantic_text_field_mapper.yml
@carlosdelest carlosdelest changed the title Carlosdelest/semantic text copy to support inference semantic text copy to / multifield support Mar 26, 2024
@carlosdelest carlosdelest changed the title semantic text copy to / multifield support semantic text copy to support Mar 27, 2024
@carlosdelest
Copy link
Member Author

I'm dropping support for multifields after our conversations, so this PR is now adding just copy_to support.

@jimczi , @Mikep86 - should we double check that all source fields are provided when doing an update, so we can redo inference on them?

I would say no for the moment. I can think of having source fields that are potentially null, and thus users would not have an easy way out of it. We could include docs that include the limitations for copy_to behaviour in semantic_text. WDYT?

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 27, 2024

@carlosdelest

I'm dropping support for multifields after our conversations, so this PR is now adding just copy_to support.

@jimczi , @Mikep86 - should we double check that all source fields are provided when doing an update, so we can redo inference on them?

I would say no for the moment. I can think of having source fields that are potentially null, and thus users would not have an easy way out of it. We could include docs that include the limitations for copy_to behaviour in semantic_text. WDYT?

I agree. Partial update handling has some more moving parts, such as attributing each chunk to the source that generated it, that are best left to a separate PR.

@@ -1030,7 +1030,7 @@ public final void testMinimalIsInvalidInRoutingPath() throws IOException {
}
}

private String minimalIsInvalidRoutingPathErrorMessage(Mapper mapper) {
protected String minimalIsInvalidRoutingPathErrorMessage(Mapper mapper) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to take back this to protected so it can be overriden by our tests

item.id(),
new FieldInferenceResponseAccumulator(
String fieldName = entry.getKey();
for (var sourceField : entry.getValue().sourceFields()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we iterate on all the source fields for retrieving inference - so we're adding an additional loop here

Map<String, Object> fieldMap = new LinkedHashMap<>();
fieldMap.put(INFERENCE_ID, model.getInferenceEntityId());

Map<String, Object> fieldMap = (Map<String, Object>) inferenceMap.computeIfAbsent(field, s -> new LinkedHashMap<>());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple inference results can be applied to a single field from each source field - so we need to be prepared for that

@@ -310,3 +310,38 @@ setup:
id: doc_1
body:
non_inference_field: "non inference test"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other tests come to mind?

@carlosdelest
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-2

@carlosdelest carlosdelest marked this pull request as ready for review March 27, 2024 17:50
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 27, 2024
@mark-vieira mark-vieira added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Mar 27, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Mar 27, 2024
@carlosdelest carlosdelest changed the title semantic text copy to support [feature/semantic-text] semantic text copy to support Mar 27, 2024
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Mar 27, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

…copy-to-support-inference

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/FieldTypeLookupTests.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/cluster/metadata/SemanticTextClusterMetadataTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java
#	x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_inference.yml
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just one question I sign off on it

var value = XContentMapValues.extractValue(sourceField, docMap);
if (value == null) {
if (isUpdateRequest) {
addInferenceResponseFailure(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this check for disallowing updates that do not use all source fields. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, except for one edge case: Say that a user wants to use an update request to delete a field from a document. Traditionally, they would do this by explicitly setting the field value to null. How do we tell the difference between the null we get when the user didn't set a value and the null we get when the user explicitly set it?

IMO since we have a way to support partial updates in the near future (chunk source attribution), we could just live with this edge case for now.

@carlosdelest carlosdelest requested a review from Mikep86 April 1, 2024 18:58
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

var value = XContentMapValues.extractValue(sourceField, docMap);
if (value == null) {
if (isUpdateRequest) {
addInferenceResponseFailure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, except for one edge case: Say that a user wants to use an update request to delete a field from a document. Traditionally, they would do this by explicitly setting the field value to null. How do we tell the difference between the null we get when the user didn't set a value and the null we get when the user explicitly set it?

IMO since we have a way to support partial updates in the near future (chunk source attribution), we could just live with this edge case for now.

@carlosdelest
Copy link
Member Author

Merging this after checking with Jim and Mike.

We can change the way we deal with missing source fields later.

@carlosdelest carlosdelest merged commit b6ca8d2 into elastic:feature/semantic-text Apr 2, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants