[ML] Simplify the Inference Ingest Processor configuration #100205

davidkyle · 2023-10-03T15:39:50Z

Introduces a new way to configure which fields will be passed for inference and where to write the the results to.

Removes the need to use a field_map to rename a field
The output location is configured in 1 place next to the input
Multiple fields from the ingest document can be selected for inference

The new configuration does not support Data frame analytics, those models accept the full ingest document not specific fields.

Example: Process the field `body` and write the results to `body_tokens`

{
    "processors": [
        {
            "inference": {
                "model_id": "elser_model",
                "input_output": [
                    {
                        "input_field": "body",
                        "output_field": "body_tokens"
                    }
                ]
            }
        }
    ]
}

Example: Select multiple input fields can be specified as:

{
    "processors": [
        {
            "inference": {
                "model_id": "elser_model",
                "input_output": [
                    {
                        "input_field": "body",
                        "output_field": "body_tokens"
                    },
                    {
                        "input_field": "title",
                        "output_field": "title_tokens"
                    },
                ]
            }
        }
    ]
}

This reverts commit 2a9395d.

github-actions · 2023-10-03T15:40:08Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2023-10-03T15:40:15Z

Hi @davidkyle, I've created a changelog YAML for you.

elasticsearchmachine · 2023-10-03T15:51:31Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Left a couple comments, I'm not sure they're worth keeping the PR from being merged though.

jonathan-buttner · 2023-10-03T16:28:45Z

...e/src/main/java/org/elasticsearch/xpack/core/ml/inference/results/ErrorInferenceResults.java

@@ -74,6 +74,12 @@ public Map<String, Object> asMap() {
        return asMap;
    }

+    @Override
+    public Map<String, Object> asMap(String outputField) {
+        // errors do not have a result


Should this throw like RawInferenceResults does?

RawInferenceResults is used internal by the tree ensembles, it should never be seen by the outside world which is why it throws. The comment here isn't very clear but the idea is that we don't write the error message to the result field (which might be mapped to a dense vector when written to the index), instead it goes in a different field

jonathan-buttner · 2023-10-03T16:31:49Z

server/src/main/java/org/elasticsearch/inference/InferenceResults.java

+     * @param outputField Write the inference result to this field
+     * @return Map representation of the InferenceResult
+     */
+    Map<String, Object> asMap(String outputField);


I'm probably missing the importance of why we need to pass in the outputField here but from looking at a few of the implementations of InferenceResults most seem to have a results field this is already used with this version Map<String, Object> asMap();. Why couldn't the caller just instantiate the implementing class with the output field when constructing the object instead of having to pass it in with this method?

Maybe this isn't feasible but the code here:

InferenceResults.writeResultToField( response.getInferenceResults().get(i), ingestDocument, inputs.get(i).outputBasePath(), inputs.get(i).outputField, response.getId() != null ? response.getId() : modelId, i == 0

So I think what I'm suggesting whoever is creating the inference results class that is eventually being placed in the response and then access here with response.getInferenceResults().get(i) would need to pass in the corresponding output field.

The code is a bit back to front: InferenceResults has a result_field and to change that field we need to send an inference config update with the inference request just to update the result_field in the response. The sender knows where they want to write the result to but we have to use this verbose and long winded way of changing result_field.

It all breaks down when you have multiple fields and you want the responses to be written to different output fields (e.g title & body), if we used result_field we would have to create separate requests for each input field just to change the result_field in each one.

It's much simpler to say write yourself to this location but we have to keep the legacy method so the code isn't very neat. Hopefully this can be refactored away eventually.

jonathan-buttner · 2023-10-03T16:49:37Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

-        Map<String, String> fieldMap
+        Map<String, String> fieldMap,
+        List<Factory.InputConfig> inputs,
+        boolean configuredWithInputsFields


This would probably be too big of a change now, but I wonder if in the future we could extract out the need for a boolean here and pass in a small class that handles defining how the ingest processor works if it has multiple input fields. Or maybe split this into two separate class 🤷‍♂️

That way we can avoid the if-else logic throughout this class.

++ good idea

jonathan-buttner · 2023-10-03T16:50:19Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+        InferModelAction.Request request;
+        try {
+            request = buildRequest(ingestDocument);
+        } catch (ElasticsearchStatusException e) {


Is it worth logging something here?

jonathan-buttner · 2023-10-03T16:51:29Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+        if (configuredWithInputsFields) {
+            List<String> requestInputs = new ArrayList<>();
+            for (var inputFields : inputs) {
+                var lookup = (String) fields.get(inputFields.inputField);


Do we need to do an instanceof or type check before casting?

…00335) Following on from #100205 this PR adds more tests and checks for corner cases when parsing the configuration.

…astic#100335) Following on from elastic#100205 this PR adds more tests and checks for corner cases when parsing the configuration.

…00335) (#100416) Following on from #100205 this PR adds more tests and checks for corner cases when parsing the configuration.

davidkyle added 4 commits October 2, 2023 21:58

Add input configuration

d8803b5

use non results fields

2a9395d

Revert "use non results fields"

fc99bf5

This reverts commit 2a9395d.

Write results to output field

13e2264

davidkyle added >enhancement :ml Machine learning v8.11.0 labels Oct 3, 2023

Update docs/changelog/100205.yaml

4eaa524

davidkyle marked this pull request as ready for review October 3, 2023 15:51

elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 3, 2023

davidkyle added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Oct 3, 2023

update docs

5918062

jonathan-buttner approved these changes Oct 3, 2023

View reviewed changes

davidkyle merged commit b055204 into elastic:main Oct 3, 2023

This was referenced Oct 5, 2023

[ML] Inference Ingest Processor cannot use a different field if a text_field exists #99289

Closed

[ML] More checks and tests for parsing Inference processor config #100335

Merged

davidkyle added a commit that referenced this pull request Oct 6, 2023

[ML] More checks and tests for parsing Inference processor config (#1…

6cde0df

…00335) Following on from #100205 this PR adds more tests and checks for corner cases when parsing the configuration.

elasticsearchmachine pushed a commit that referenced this pull request Oct 6, 2023

[ML] More checks and tests for parsing Inference processor config (#1…

360aed6

…00335) (#100416) Following on from #100205 this PR adds more tests and checks for corner cases when parsing the configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Simplify the Inference Ingest Processor configuration #100205

[ML] Simplify the Inference Ingest Processor configuration #100205

davidkyle commented Oct 3, 2023

github-actions bot commented Oct 3, 2023

elasticsearchmachine commented Oct 3, 2023

elasticsearchmachine commented Oct 3, 2023

jonathan-buttner left a comment

jonathan-buttner Oct 3, 2023

davidkyle Oct 3, 2023

jonathan-buttner Oct 3, 2023

jonathan-buttner Oct 3, 2023

davidkyle Oct 3, 2023

jonathan-buttner Oct 3, 2023

davidkyle Oct 3, 2023

jonathan-buttner Oct 3, 2023

jonathan-buttner Oct 3, 2023

[ML] Simplify the Inference Ingest Processor configuration #100205

[ML] Simplify the Inference Ingest Processor configuration #100205

Conversation

davidkyle commented Oct 3, 2023

Example: Process the field body and write the results to body_tokens

Example: Select multiple input fields can be specified as:

github-actions bot commented Oct 3, 2023

elasticsearchmachine commented Oct 3, 2023

elasticsearchmachine commented Oct 3, 2023

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Example: Process the field `body` and write the results to `body_tokens`