Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic text - field mapper #102971

Merged

Conversation

carlosdelest
Copy link
Member

This creates a new semantic_text field mapper, that receives a model_id parameter.

model_id will be used to perform inference on document ingestion and query time.

PUT test-index
{
    "mappings": {
        "properties": {
            "my-inference-field": {
                "type": "semantic_text",
                "model_id": "my-model-id"
            }
      }
}

This field mapper does not index values - that will be taken care of by a metadata field mapper, as the inference results will be included in a different top level field.

A new getInferenceModel() method has been added to MappedFieldType so we'll be able to get information of the models used in the different fields from the MappingLookup and related structures in future PRs.

@@ -203,6 +203,14 @@ public List<String> dimensions() {
return Collections.emptyList();
}

public String getInferenceModel() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method added for future retrieval from MappingLookup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosdelest I would prefer a new interface & instanceof to check that something satisfies this new API. I don't like the idea of having it at the base MappedFieldType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear on where the method would be called / where would the instanceof check go. Could you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna I don't think we should update any core methods until we know 100% where they are going to be used. Consequently, I am against adding anything like this to MappedFieldType.

If anything, it seems like no matter where this is called an instanceof check against some other interface would work. @carlosdelest we should defer any changes here until we know where and how its going to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the sentiment, I just could not see what the impact is of an instanceof because the method is currently only exposed but never used. It is also true that this kind of special cases complicate the code in the long run, but I see what you mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna I think @carlosdelest is trying to lay the ground work for gathering field capabilities in some way to determine if a model is used and what model.

I agree, it's way too early here to make any changes to wide expansive interfaces like this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Ben said, this will be used to gather the fields that are using an ML model so they can be included in IndexMetadata in a later stage.

I'm OK with using a new interface and defer the decision to include this method as part of the base class.

@Mikep86, please be aware that I'll be moving getInferenceModel() from MappedFieldType to a new inteface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 048779b

@@ -447,7 +447,10 @@ public void onIndexModule(IndexModule indexModule) {

@Override
public Function<String, Predicate<String>> getFieldFilter() {
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream().map(p -> p.getFieldFilter()).toList();
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new MapperPlugin broke this check. Check that an actual filter has been used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. We shouldn't change this. Why do the other plugins that satisfy the mapping plugin don't suffer from this?

What is the actual issue here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up Ben!

Not doing this made some tests fail (see this test result).

My understanding was that we were adding another MapperPlugin that was not accounted for - thus the change to understand that the MapperPlugin did something useful.

I've been checking and after merging with main this is no longer an issue, so I've withdrawn the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I still see the change here, you are going to revert this in a future commit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same case as the other one - fixed in d7d8d5e

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - removing this caused tests to fail. I've looked deeper into this issue:

  • LocalStateCompositeXPackPlugin is used for testing
  • No other subclasses of LocalStateCompositeXPackPlugin use a MapperPlugin as part of its plugins
  • LocalStateMachineLearning has as plugins Security (which is a MapperPlugin) and MachineLearning. Making MachineLearning another MapperPlugin implied that there are 2 MapperPlugins
  • The existing code does not check whether the actual getFieldsFilter() method has been overriden or not, but just the numbrer of MapperPlugins

So I think the changes I've done actually preserve the original intention - we're not looking for how many MapperPlugins there are, but how many of them actually override the method (that is, do not return NOOP_FIELD_FILTER.

Does it make sense? Happy to discuss if not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosdelest I see now. Seems good to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I say this is ok.

@carlosdelest carlosdelest marked this pull request as ready for review December 5, 2023 13:50
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 5, 2023
@carlosdelest carlosdelest added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Dec 5, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Search Meta label for search team labels Dec 5, 2023
@carlosdelest carlosdelest added >non-issue Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label v8.12.0 labels Dec 5, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Search Meta label for search team labels Dec 5, 2023
@carlosdelest carlosdelest requested review from a team, benwtrent and jimczi December 5, 2023 13:52
@carlosdelest carlosdelest added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Dec 5, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Search Meta label for search team labels Dec 5, 2023
@carlosdelest carlosdelest removed the needs:triage Requires assignment of a team area label label Dec 5, 2023
@thecoop thecoop added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed needs:triage Requires assignment of a team area label labels Dec 6, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Dec 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@@ -203,6 +203,14 @@ public List<String> dimensions() {
return Collections.emptyList();
}

public String getInferenceModel() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosdelest I would prefer a new interface & instanceof to check that something satisfies this new API. I don't like the idea of having it at the base MappedFieldType.

@@ -447,7 +447,10 @@ public void onIndexModule(IndexModule indexModule) {

@Override
public Function<String, Predicate<String>> getFieldFilter() {
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream().map(p -> p.getFieldFilter()).toList();
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. We shouldn't change this. Why do the other plugins that satisfy the mapping plugin don't suffer from this?

What is the actual issue here?


public static class Builder extends FieldMapper.Builder {

private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> builder(m).modelId.get(), null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> builder(m).modelId.get(), null)
private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> toType(m).fieldType().modelId, null)

Or at least don't create a builder here. That seems unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - removing it

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of smaller points on cleaning up the code a little :)


public static final TypeParser PARSER = new TypeParser((n, c) -> new Builder(n), notInMultiFields(CONTENT_TYPE));

public static class SemanticTextFieldType extends SimpleMappedFieldType implements InferenceModelFieldType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this nested class to the bottom of the file please? I know we have it like this in other mappers but it makes the code very hard to follow as can be seen by the accidental introduction of duplicate state across the field and mapper here IMO :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - done in 4b06655

}
}

private final String modelId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the modelId on the field, I don't think we also need it on the mapper redundantly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - we could always retrieve it from the field type. Addressed in 39e061f


private final String modelId;

private final Builder builder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the builder has been causing trouble with memory in the past. It shouldn't matter too much here but still, could we just retain the explicit data we need from the builder instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I removed the need for storing the Builder previously, so I removed builder refs entirely in 4b06655

LEARNING_TO_RANK("es.learning_to_rank_feature_flag_enabled=true", Version.fromString("8.12.0"), null),
LEARN_TO_RANK("es.learn_to_rank_feature_flag_enabled=true", Version.fromString("8.10.0"), null),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like an unnecessary change? Merge in main?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is - reverting some past commits negated the merge from main. Thanks for the catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 07613ae

.feature(FeatureFlag.LEARNING_TO_RANK)
.feature(FeatureFlag.LEARN_TO_RANK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, seems unnecessary? Merge in main?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before - fixed in 07613ae

@carlosdelest carlosdelest merged commit 8ee4721 into elastic:main Dec 12, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants