Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic_text field mapper and inference #107262

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
9332ef9
Merge branch 'main' into feature/semantic-text
Mikep86 Jan 12, 2024
9311f50
Merge branch 'main' into feature/semantic-text
Mikep86 Jan 17, 2024
f86ae02
Merge branch 'main' into feature/semantic-text
Mikep86 Jan 17, 2024
d06038c
Merge branch 'main' into feature/semantic-text
Mikep86 Jan 17, 2024
833469c
Store semantic_text model info in mappings (#103319)
Mikep86 Jan 17, 2024
64b4799
semantic_text inference results indexing (#103978)
Mikep86 Jan 18, 2024
eda88d0
Merge branch 'main' into feature/semantic-text
carlosdelest Jan 18, 2024
94805a6
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 1, 2024
551fe80
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 1, 2024
7e2610b
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 2, 2024
e3b6a65
Move semantic_text field mappers to inference plugin (#105187)
carlosdelest Feb 6, 2024
553484c
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 7, 2024
ca65a70
semantic_text - Field inference (#103697)
carlosdelest Feb 9, 2024
16762be
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 14, 2024
f3d5a78
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 20, 2024
ffa4d40
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Feb 21, 2024
3f7ccde
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Mar 5, 2024
881c394
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Mar 5, 2024
b1a3ee8
Semantic text dense vector support (#105515)
carlosdelest Mar 6, 2024
2039fb3
This was supposed to be merged into #105515 but didn't make it
carlosdelest Mar 6, 2024
db67976
Merge branch 'main' into feature/semantic-text
Mikep86 Mar 18, 2024
3ca808b
semantic_text - extract Index Metadata inference information to separ…
carlosdelest Mar 19, 2024
823fb58
[feature/semantic_text] Refactor inference to run as an action filter…
jimczi Mar 20, 2024
d4e283d
[feature/semantic_text] Register semantic text sub fields in the mapp…
jimczi Mar 22, 2024
9531948
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Mar 27, 2024
122e439
Fix build error
Mikep86 Mar 27, 2024
ef3abd9
[feature/semantic-text] Simplify the integration of the field inferen…
jimczi Mar 28, 2024
2e89d99
Merge branch 'main' into feature/semantic-text
Mikep86 Mar 28, 2024
b6ca8d2
[feature/semantic-text] semantic text copy to support (#106689)
carlosdelest Apr 2, 2024
2c11a3f
Merge remote-tracking branch 'upstream/main' into feature/semantic-text
jimczi Apr 4, 2024
5556763
[feature/semantic-text] Move the inference results back to the origin…
jimczi Apr 5, 2024
17f1fde
semantic_text: Add cluster metadata information for inference field m…
carlosdelest Apr 5, 2024
4025d2c
Add javadoc
carlosdelest Apr 5, 2024
d78acc3
Fix test helper
carlosdelest Apr 5, 2024
7a2b70b
PR Review comments
carlosdelest Apr 5, 2024
6b83424
Merge branch 'refs/heads/main' into feature/semantic-text
carlosdelest Apr 5, 2024
f565596
[feature/semantic-text] Handle chunked error (#107192)
jimczi Apr 8, 2024
dc46e88
Add test coverage for null constructor args
carlosdelest Apr 8, 2024
937572d
Add first query tests
carlosdelest Apr 9, 2024
bef2214
Add inner_hits tests
carlosdelest Apr 9, 2024
84a2735
Add mapping incompatibility tests
carlosdelest Apr 9, 2024
81c864c
Merge remote-tracking branch 'origin/main' into feature/semantic-text
carlosdelest Apr 9, 2024
750f895
Merge branch 'refs/heads/feature/semantic-text' into carlosdelest/sem…
carlosdelest Apr 9, 2024
1f78a4a
Merge branch 'refs/heads/main' into carlosdelest/semantic-text-field-…
carlosdelest Apr 9, 2024
b0e6d43
Add semantic_text field mapper and inference generation
carlosdelest Apr 9, 2024
531d1b1
Merge branch 'main' into carlosdelest/semantic-text-index-metadata-ch…
carlosdelest Apr 9, 2024
537f610
Merge branch 'refs/heads/carlosdelest/semantic-text-add-query-tests' …
carlosdelest Apr 9, 2024
3bce501
Add tests pending from #107256
carlosdelest Apr 9, 2024
3c29dcb
Merge branch 'refs/heads/carlosdelest/semantic-text-index-metadata-ch…
carlosdelest Apr 9, 2024
0f57a5b
Fix merge
carlosdelest Apr 9, 2024
3e847f4
Fix merge
carlosdelest Apr 9, 2024
c0960fa
Fix merge
carlosdelest Apr 9, 2024
ff8365a
Update docs/changelog/107262.yaml
carlosdelest Apr 10, 2024
7d8fe11
Merge branch 'refs/heads/main' into carlosdelest/semantic-text-field-…
carlosdelest Apr 10, 2024
6805726
Merge remote-tracking branch 'carlosdelest/carlosdelest/semantic-text…
carlosdelest Apr 10, 2024
7dbb53b
Update changelog
carlosdelest Apr 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/107262.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 107262
summary: semantic_text field mapper and inference generation
area: Mapping
type: feature
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,10 @@ private void executeBulkRequestsByShard(
bulkRequest.getRefreshPolicy(),
requests.toArray(new BulkItemRequest[0])
);
var indexMetadata = clusterState.getMetadata().index(shardId.getIndexName());
if (indexMetadata != null && indexMetadata.getInferenceFields().isEmpty() == false) {
bulkShardRequest.setInferenceFieldMap(indexMetadata.getInferenceFields());
}
bulkShardRequest.waitForActiveShards(bulkRequest.waitForActiveShards());
bulkShardRequest.timeout(bulkRequest.timeout());
bulkShardRequest.routedBasedOnClusterVersion(clusterState.version());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@
import org.elasticsearch.action.support.replication.ReplicatedWriteRequest;
import org.elasticsearch.action.support.replication.ReplicationRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.cluster.metadata.InferenceFieldMetadata;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.util.set.Sets;
import org.elasticsearch.index.shard.ShardId;
import org.elasticsearch.transport.RawIndexingDataTransportRequest;

import java.io.IOException;
import java.util.Map;
import java.util.Set;

public final class BulkShardRequest extends ReplicatedWriteRequest<BulkShardRequest>
Expand All @@ -33,6 +35,8 @@ public final class BulkShardRequest extends ReplicatedWriteRequest<BulkShardRequ

private final BulkItemRequest[] items;

private transient Map<String, InferenceFieldMetadata> inferenceFieldMap = null;

public BulkShardRequest(StreamInput in) throws IOException {
super(in);
items = in.readArray(i -> i.readOptionalWriteable(inpt -> new BulkItemRequest(shardId, inpt)), BulkItemRequest[]::new);
Expand All @@ -44,6 +48,30 @@ public BulkShardRequest(ShardId shardId, RefreshPolicy refreshPolicy, BulkItemRe
setRefreshPolicy(refreshPolicy);
}

/**
* Public for test
* Set the transient metadata indicating that this request requires running inference before proceeding.
*/
public void setInferenceFieldMap(Map<String, InferenceFieldMetadata> fieldInferenceMap) {
this.inferenceFieldMap = fieldInferenceMap;
}

/**
* Consumes the inference metadata to execute inference on the bulk items just once.
*/
public Map<String, InferenceFieldMetadata> consumeInferenceFieldMap() {
Map<String, InferenceFieldMetadata> ret = inferenceFieldMap;
inferenceFieldMap = null;
return ret;
}

/**
* Public for test
*/
public Map<String, InferenceFieldMetadata> getInferenceFieldMap() {
return inferenceFieldMap;
}

public long totalSizeInBytes() {
long totalSizeInBytes = 0;
for (int i = 0; i < items.length; i++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,7 @@ public static Map<String, Object> nodeMapValue(Object node, String desc) {
if (node instanceof Map) {
return (Map<String, Object>) node;
} else {
throw new ElasticsearchParseException(desc + " should be a hash but was of type: " + node.getClass());
throw new ElasticsearchParseException(desc + " should be a map but was of type: " + node.getClass());
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -696,6 +696,10 @@ private static void failIfMatchesRoutingPath(DocumentParserContext context, Stri
*/
private static void parseCopyFields(DocumentParserContext context, List<String> copyToFields) throws IOException {
for (String field : copyToFields) {
if (context.mappingLookup().getMapper(field) instanceof InferenceFieldMapper) {
// ignore copy_to that targets inference fields, values are already extracted in the coordinating node to perform inference.
continue;
}
// In case of a hierarchy of nested documents, we need to figure out
// which document the field should go to
LuceneDocument targetDoc = null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1199,7 +1199,7 @@ public static final class Conflicts {
private final String mapperName;
private final List<String> conflicts = new ArrayList<>();

Conflicts(String mapperName) {
public Conflicts(String mapperName) {
this.mapperName = mapperName;
}

Expand All @@ -1211,7 +1211,7 @@ void addConflict(String parameter, String existing, String toMerge) {
conflicts.add("Cannot update parameter [" + parameter + "] from [" + existing + "] to [" + toMerge + "]");
}

void check() {
public void check() {
if (conflicts.isEmpty()) {
return;
}
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ public static MapperMergeContext from(MapperBuilderContext mapperBuilderContext,
* @param name the name of the child context
* @return a new {@link MapperMergeContext} with this context as its parent
*/
MapperMergeContext createChildContext(String name, ObjectMapper.Dynamic dynamic) {
public MapperMergeContext createChildContext(String name, ObjectMapper.Dynamic dynamic) {
return createChildContext(mapperBuilderContext.createChildContext(name, dynamic));
}

Expand All @@ -69,7 +69,7 @@ MapperMergeContext createChildContext(MapperBuilderContext childContext) {
return new MapperMergeContext(childContext, newFieldsBudget);
}

MapperBuilderContext getMapperBuilderContext() {
public MapperBuilderContext getMapperBuilderContext() {
return mapperBuilderContext;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ public CompressedXContent toCompressedXContent() {
/**
* Returns the root object for the current mapping
*/
RootObjectMapper getRoot() {
public RootObjectMapper getRoot() {
return root;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,16 @@ protected Parameter<?>[] getParameters() {
return new Parameter<?>[] { elementType, dims, indexed, similarity, indexOptions, meta };
}

public Builder similarity(VectorSimilarity vectorSimilarity) {
similarity.setValue(vectorSimilarity);
return this;
}

public Builder dimensions(int dimensions) {
this.dims.setValue(dimensions);
return this;
}

@Override
public DenseVectorFieldMapper build(MapperBuilderContext context) {
return new DenseVectorFieldMapper(
Expand Down Expand Up @@ -754,7 +764,7 @@ public static ElementType fromString(String name) {
ElementType.FLOAT
);

enum VectorSimilarity {
public enum VectorSimilarity {
L2_NORM {
@Override
float score(float similarity, ElementType elementType, int dim) {
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Set;

import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder;
import static org.hamcrest.Matchers.equalTo;
Expand Down Expand Up @@ -106,6 +107,12 @@ public void testCopyToFieldsParsing() throws Exception {

fieldMapper = mapperService.documentMapper().mappers().getMapper("new_field");
assertThat(fieldMapper.typeName(), equalTo("long"));

MappingLookup mappingLookup = mapperService.mappingLookup();
assertThat(mappingLookup.sourcePaths("another_field"), equalTo(Set.of("copy_test", "int_to_str_test", "another_field")));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that copy_to information is included in source paths for inference fields

assertThat(mappingLookup.sourcePaths("new_field"), equalTo(Set.of("new_field", "int_to_str_test")));
assertThat(mappingLookup.sourcePaths("copy_test"), equalTo(Set.of("copy_test", "cyclic_test")));
assertThat(mappingLookup.sourcePaths("cyclic_test"), equalTo(Set.of("cyclic_test", "copy_test")));
}

public void testCopyToFieldsInnerObjectParsing() throws Exception {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,9 @@ public void testSourcePathFields() throws IOException {
final Set<String> fieldsUsingSourcePath = new HashSet<>();
((FieldMapper) mapper).sourcePathUsedBy().forEachRemaining(mapper1 -> fieldsUsingSourcePath.add(mapper1.name()));
assertThat(fieldsUsingSourcePath, equalTo(Set.of("field.subfield1", "field.subfield2")));

assertThat(mapperService.mappingLookup().sourcePaths("field.subfield1"), equalTo(Set.of("field")));
assertThat(mapperService.mappingLookup().sourcePaths("field.subfield2"), equalTo(Set.of("field")));
}

public void testUnknownLegacyFieldsUnderKnownRootField() throws Exception {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -652,7 +652,7 @@ public static MetadataRolloverService getMetadataRolloverService(
AllocationService allocationService = mock(AllocationService.class);
when(allocationService.reroute(any(ClusterState.class), any(String.class), any())).then(i -> i.getArguments()[0]);
when(allocationService.getShardRoutingRoleStrategy()).thenReturn(TestShardRoutingRoleStrategies.DEFAULT_ROLE_ONLY);
MappingLookup mappingLookup = null;
MappingLookup mappingLookup = MappingLookup.EMPTY;
if (dataStream != null) {
RootObjectMapper.Builder root = new RootObjectMapper.Builder("_doc", ObjectMapper.Defaults.SUBOBJECTS);
root.add(
Expand Down Expand Up @@ -731,6 +731,7 @@ public static IndicesService mockIndicesServices(MappingLookup mappingLookup) th
when(documentMapper.mapping()).thenReturn(mapping);
when(documentMapper.mappers()).thenReturn(MappingLookup.EMPTY);
when(documentMapper.mappingSource()).thenReturn(mapping.toCompressedXContent());
when(documentMapper.mappers()).thenReturn(mappingLookup);
RoutingFieldMapper routingFieldMapper = mock(RoutingFieldMapper.class);
when(routingFieldMapper.required()).thenReturn(false);
when(documentMapper.routingFieldMapper()).thenReturn(routingFieldMapper);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1030,7 +1030,7 @@ public final void testMinimalIsInvalidInRoutingPath() throws IOException {
}
}

private String minimalIsInvalidRoutingPathErrorMessage(Mapper mapper) {
protected String minimalIsInvalidRoutingPathErrorMessage(Mapper mapper) {
if (mapper instanceof FieldMapper fieldMapper && fieldMapper.fieldType().isDimension() == false) {
return "All fields that match routing_path must be configured with [time_series_dimension: true] "
+ "or flattened fields with a list of dimensions in [time_series_dimensions] and "
Expand Down
12 changes: 12 additions & 0 deletions x-pack/plugin/inference/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@
*/
apply plugin: 'elasticsearch.internal-es-plugin'
apply plugin: 'elasticsearch.internal-cluster-test'
apply plugin: 'elasticsearch.internal-yaml-rest-test'

restResources {
restApi {
include '_common', 'bulk', 'indices', 'inference', 'index', 'get', 'update', 'reindex', 'search'
}
}

esplugin {
name 'x-pack-inference'
Expand All @@ -24,4 +31,9 @@ dependencies {
compileOnly project(path: xpackModule('core'))
testImplementation(testArtifact(project(xpackModule('core'))))
testImplementation project(':modules:reindex')
clusterPlugins project(':x-pack:plugin:inference:qa:test-service-plugin')
}

tasks.named('yamlRestTest') {
usesDefaultDistribution()
}
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,6 @@ public TestServiceModel(
super(new ModelConfigurations(modelId, taskType, service, serviceSettings, taskSettings), new ModelSecrets(secretSettings));
}

@Override
public TestDenseInferenceServiceExtension.TestServiceSettings getServiceSettings() {
return (TestDenseInferenceServiceExtension.TestServiceSettings) super.getServiceSettings();
}

@Override
public TestTaskSettings getTaskSettings() {
return (TestTaskSettings) super.getTaskSettings();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ public static TestServiceSettings fromMap(Map<String, Object> map) {
SimilarityMeasure similarity = null;
String similarityStr = (String) map.remove("similarity");
if (similarityStr != null) {
similarity = SimilarityMeasure.valueOf(similarityStr);
similarity = SimilarityMeasure.fromString(similarityStr);
}

return new TestServiceSettings(model, dimensions, similarity);
Expand Down
Loading