-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Simplify the Inference Ingest Processor configuration #100205
Conversation
Documentation preview: |
Hi @davidkyle, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments, I'm not sure they're worth keeping the PR from being merged though.
@@ -74,6 +74,12 @@ public Map<String, Object> asMap() { | |||
return asMap; | |||
} | |||
|
|||
@Override | |||
public Map<String, Object> asMap(String outputField) { | |||
// errors do not have a result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this throw like RawInferenceResults
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawInferenceResults
is used internal by the tree ensembles, it should never be seen by the outside world which is why it throws. The comment here isn't very clear but the idea is that we don't write the error message to the result field (which might be mapped to a dense vector when written to the index), instead it goes in a different field
* @param outputField Write the inference result to this field | ||
* @return Map representation of the InferenceResult | ||
*/ | ||
Map<String, Object> asMap(String outputField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing the importance of why we need to pass in the outputField
here but from looking at a few of the implementations of InferenceResults
most seem to have a results
field this is already used with this version Map<String, Object> asMap();
. Why couldn't the caller just instantiate the implementing class with the output field when constructing the object instead of having to pass it in with this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this isn't feasible but the code here:
InferenceResults.writeResultToField(
response.getInferenceResults().get(i),
ingestDocument,
inputs.get(i).outputBasePath(),
inputs.get(i).outputField,
response.getId() != null ? response.getId() : modelId,
i == 0
So I think what I'm suggesting whoever is creating the inference results class that is eventually being placed in the response and then access here with response.getInferenceResults().get(i)
would need to pass in the corresponding output field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is a bit back to front: InferenceResults
has a result_field
and to change that field we need to send an inference config update with the inference request just to update the result_field
in the response. The sender knows where they want to write the result to but we have to use this verbose and long winded way of changing result_field
.
It all breaks down when you have multiple fields and you want the responses to be written to different output fields (e.g title & body), if we used result_field
we would have to create separate requests for each input field just to change the result_field
in each one.
It's much simpler to say write yourself to this location but we have to keep the legacy method so the code isn't very neat. Hopefully this can be refactored away eventually.
Map<String, String> fieldMap | ||
Map<String, String> fieldMap, | ||
List<Factory.InputConfig> inputs, | ||
boolean configuredWithInputsFields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be too big of a change now, but I wonder if in the future we could extract out the need for a boolean here and pass in a small class that handles defining how the ingest processor works if it has multiple input fields. Or maybe split this into two separate class 🤷♂️
That way we can avoid the if-else logic throughout this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ good idea
InferModelAction.Request request; | ||
try { | ||
request = buildRequest(ingestDocument); | ||
} catch (ElasticsearchStatusException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth logging something here?
if (configuredWithInputsFields) { | ||
List<String> requestInputs = new ArrayList<>(); | ||
for (var inputFields : inputs) { | ||
var lookup = (String) fields.get(inputFields.inputField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do an instanceof
or type check before casting?
…astic#100335) Following on from elastic#100205 this PR adds more tests and checks for corner cases when parsing the configuration.
Introduces a new way to configure which fields will be passed for inference and where to write the the results to.
field_map
to rename a fieldThe new configuration does not support Data frame analytics, those models accept the full ingest document not specific fields.
Example: Process the field
body
and write the results tobody_tokens
Example: Select multiple input fields can be specified as: