-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic text - field mapper #102971
Semantic text - field mapper #102971
Conversation
@@ -203,6 +203,14 @@ public List<String> dimensions() { | |||
return Collections.emptyList(); | |||
} | |||
|
|||
public String getInferenceModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method added for future retrieval from MappingLookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlosdelest I would prefer a new interface & instanceof
to check that something satisfies this new API. I don't like the idea of having it at the base MappedFieldType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unclear on where the method would be called / where would the instanceof check go. Could you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna I don't think we should update any core methods until we know 100% where they are going to be used. Consequently, I am against adding anything like this to MappedFieldType.
If anything, it seems like no matter where this is called an instanceof
check against some other interface would work. @carlosdelest we should defer any changes here until we know where and how its going to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the sentiment, I just could not see what the impact is of an instanceof because the method is currently only exposed but never used. It is also true that this kind of special cases complicate the code in the long run, but I see what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna I think @carlosdelest is trying to lay the ground work for gathering field capabilities in some way to determine if a model is used and what model.
I agree, it's way too early here to make any changes to wide expansive interfaces like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Ben said, this will be used to gather the fields that are using an ML model so they can be included in IndexMetadata
in a later stage.
I'm OK with using a new interface and defer the decision to include this method as part of the base class.
@Mikep86, please be aware that I'll be moving getInferenceModel()
from MappedFieldType
to a new inteface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 048779b
@@ -447,7 +447,10 @@ public void onIndexModule(IndexModule indexModule) { | |||
|
|||
@Override | |||
public Function<String, Predicate<String>> getFieldFilter() { | |||
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream().map(p -> p.getFieldFilter()).toList(); | |||
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new MapperPlugin broke this check. Check that an actual filter has been used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. We shouldn't change this. Why do the other plugins that satisfy the mapping plugin don't suffer from this?
What is the actual issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up Ben!
Not doing this made some tests fail (see this test result).
My understanding was that we were adding another MapperPlugin that was not accounted for - thus the change to understand that the MapperPlugin did something useful.
I've been checking and after merging with main this is no longer an issue, so I've withdrawn the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but I still see the change here, you are going to revert this in a future commit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case as the other one - fixed in d7d8d5e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - removing this caused tests to fail. I've looked deeper into this issue:
LocalStateCompositeXPackPlugin
is used for testing- No other subclasses of
LocalStateCompositeXPackPlugin
use aMapperPlugin
as part of its plugins LocalStateMachineLearning
has as pluginsSecurity
(which is aMapperPlugin
) andMachineLearning
. MakingMachineLearning
anotherMapperPlugin
implied that there are 2MapperPlugins
- The existing code does not check whether the actual
getFieldsFilter()
method has been overriden or not, but just the numbrer ofMapperPlugins
So I think the changes I've done actually preserve the original intention - we're not looking for how many MapperPlugins
there are, but how many of them actually override the method (that is, do not return NOOP_FIELD_FILTER
.
Does it make sense? Happy to discuss if not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlosdelest I see now. Seems good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I say this is ok.
Pinging @elastic/es-search (Team:Search) |
@@ -203,6 +203,14 @@ public List<String> dimensions() { | |||
return Collections.emptyList(); | |||
} | |||
|
|||
public String getInferenceModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlosdelest I would prefer a new interface & instanceof
to check that something satisfies this new API. I don't like the idea of having it at the base MappedFieldType.
@@ -447,7 +447,10 @@ public void onIndexModule(IndexModule indexModule) { | |||
|
|||
@Override | |||
public Function<String, Predicate<String>> getFieldFilter() { | |||
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream().map(p -> p.getFieldFilter()).toList(); | |||
List<Function<String, Predicate<String>>> items = filterPlugins(MapperPlugin.class).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. We shouldn't change this. Why do the other plugins that satisfy the mapping plugin don't suffer from this?
What is the actual issue here?
|
||
public static class Builder extends FieldMapper.Builder { | ||
|
||
private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> builder(m).modelId.get(), null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> builder(m).modelId.get(), null) | |
private final Parameter<String> modelId = Parameter.stringParam("model_id", false, m -> toType(m).fieldType().modelId, null) |
Or at least don't create a builder here. That seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - removing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of smaller points on cleaning up the code a little :)
|
||
public static final TypeParser PARSER = new TypeParser((n, c) -> new Builder(n), notInMultiFields(CONTENT_TYPE)); | ||
|
||
public static class SemanticTextFieldType extends SimpleMappedFieldType implements InferenceModelFieldType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this nested class to the bottom of the file please? I know we have it like this in other mappers but it makes the code very hard to follow as can be seen by the accidental introduction of duplicate state across the field and mapper here IMO :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - done in 4b06655
} | ||
} | ||
|
||
private final String modelId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the modelId
on the field, I don't think we also need it on the mapper redundantly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - we could always retrieve it from the field type. Addressed in 39e061f
|
||
private final String modelId; | ||
|
||
private final Builder builder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retaining the builder has been causing trouble with memory in the past. It shouldn't matter too much here but still, could we just retain the explicit data we need from the builder instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context. I removed the need for storing the Builder previously, so I removed builder refs entirely in 4b06655
LEARNING_TO_RANK("es.learning_to_rank_feature_flag_enabled=true", Version.fromString("8.12.0"), null), | ||
LEARN_TO_RANK("es.learn_to_rank_feature_flag_enabled=true", Version.fromString("8.10.0"), null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like an unnecessary change? Merge in main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is - reverting some past commits negated the merge from main. Thanks for the catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 07613ae
.feature(FeatureFlag.LEARNING_TO_RANK) | ||
.feature(FeatureFlag.LEARN_TO_RANK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, seems unnecessary? Merge in main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before - fixed in 07613ae
This creates a new
semantic_text
field mapper, that receives amodel_id
parameter.model_id
will be used to perform inference on document ingestion and query time.This field mapper does not index values - that will be taken care of by a metadata field mapper, as the inference results will be included in a different top level field.
A new
getInferenceModel()
method has been added toMappedFieldType
so we'll be able to get information of the models used in the different fields from theMappingLookup
and related structures in future PRs.