Emit multiple fields from a runtime field script #75108

javanna · 2021-07-08T08:29:27Z

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. Though each runtime field can only emit values for a single field. This PR introduces support for emitting multiple fields from the same script.

The API call to define a runtime field that emits multiple fields is the following:

PUT localhost:9200/logs/_mappings
{
    "runtime" : {
      "log" : {
        "type" : "composite",
        "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))",
        "fields" : {
            "clientip" : {
                "type" : "ip"
            },
            "response" : {
                "type" : "long"
            }
        }
      }
    }
}

The script context for this new field type accepts two emit signatures:

emit(String, Object)
emit(Map)

Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. In this first iteration, mapping additional sub-fields requires to provide the whole object with its script and all the existing plus new sub-fields. In a follow-up we will address this by allowing users to provide only what needs to be changed or added.

The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields.

The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error Found two runtime fields with same name [log.response]:

PUT localhost:9200/logs/_mappings
{
    "runtime" : {
        "log.response" : {
            "type" : "keyword"
        },
        "log" : {
            "type" : "composite",
            "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)",
            "fields" : {
                "response" : {
                    "type" : "long"
                }
            }
        }
    }
}

Closes #68203

…elds

server/src/main/java/org/elasticsearch/index/mapper/ObjectRuntimeField.java

javanna · 2021-07-08T10:02:55Z

server/src/main/java/org/elasticsearch/index/mapper/ObjectRuntimeField.java

+        builder.startObject("fields");
+        for (RuntimeField subfield : subfields) {
+            subfield.toXContent(builder, params);
+        }


we print out all the parameters above, and fields is also a parameter, but nothing gets printed out. It is probably good as we can only parse RuntimeFields at a later time, which is what we need to print out the subfields. Is what I am doing here ok? Do we have to somehow make sure that serializing the parameters does not serialize also the fields, why does nothing happen at the moment @romseygeek ?

…lds_multi_emit

ywelsch

I've reviewed this as someone not familiar with this part of the codebase (so needs another reviewer). I've left a few questions / comments but overall the PR looks in good shape to me.

ywelsch · 2021-08-04T11:10:52Z

server/src/main/java/org/elasticsearch/index/mapper/CompositeRuntimeField.java

+        new RuntimeField.Builder(name) {
+            private final FieldMapper.Parameter<Script> script = new FieldMapper.Parameter<>(
+                "script",
+                false,


should this be true, i.e., updateable? Same question for fields below

Runtime field definitions don't get merged, they get replaced entirely, so this parameter doesn't apply.

Reusing the field mapper parameter parsing was a nice hack but this shows one of the places where its kind of confusing. I'm not sure what to do about it though.

ywelsch · 2021-08-04T11:52:15Z

server/src/main/java/org/elasticsearch/script/CompositeFieldScript.java

+     * @return the values that were emitted for the provided field
+     */
+    public final List<Object> getValues(String field) {
+        //TODO for now we re-run the script every time a leaf field is accessed, but we could cache the values?


I think this is such a significant limitation that this behavior should at least be documented, so that it does not surprise users that are have composite fields with lots of subfields.

It means that the benefits of the PR are mostly of syntactic nature, allowing users not to have to copy paste the same script into two field definitions.

Definitely. There aren't any docs in this PR, they will be added as a follow-up in collaboration with the docs team, but we'll make sure to call out this limitation.

ywelsch · 2021-08-04T11:59:56Z

test/framework/src/main/java/org/elasticsearch/script/MockScriptEngine.java

+            CompositeFieldScript.Factory objectFieldScript = (f, p, s) -> ctx -> new CompositeFieldScript(f, p, s, ctx) {
+                @Override
+                public void execute() {
+                    emit("field", "value");


emit multiple fields here?

ywelsch · 2021-08-04T12:02:45Z

server/src/main/java/org/elasticsearch/script/CompositeFieldScript.java

+        //TODO for now we re-run the script every time a leaf field is accessed, but we could cache the values?
+        fieldValues.clear();
+        execute();
+        return fieldValues.get(field);


should fieldValues be cleared before returning field so that it does not hold onto unnecessary state?

nik9000

Oh boy!

I wonder if we could have emit on the composite script bind its results into all of the "child" scripts immediately. Right now it looks like you can run a script and it'll collect a Map of values. And then, after the script is finished, you extract just the ones you need. That's good because you don't convert the ones you don't need. But its bad that you silently ignore when you emit invalid values that you aren't reading. It also feels bad that the error comes after the script instead of during it. If we throw the error while running the script we'd get the line it fails on. Painless has try/catch so you could even use that to dodge error cases or add extra information when an error is thrown.

A smaller thing, but now that there are three sorts of subclasses for stuff like DoubleFieldScript maybe it'd be a good idea to have a more natural shaped supertype. The design of these classes is very much around painless implementing them, but we have those two cases where we implement it by hand. It might make sense to make refactor the base classes so they are less "painless shaped" and make a "painless shaped" layer that it can implement. I'm not sure, and, regardless, I don't think that is a "for now" thing.

nik9000 · 2021-08-04T14:32:41Z

server/src/main/java/org/elasticsearch/index/mapper/CompositeRuntimeField.java

+    }
+
+    @Override
+    public Collection<MappedFieldType> asMappedFieldTypes() {


Maybe return a Stream from this if you are going to be flatMaping it. If called needs a list they can make one?

romseygeek · 2021-08-05T08:18:22Z

I wonder if we could have emit on the composite script bind its results into all of the "child" scripts immediately. Right now it looks like you can run a script and it'll collect a Map of values. And then, after the script is finished, you extract just the ones you need. That's good because you don't convert the ones you don't need. But its bad that you silently ignore when you emit invalid values that you aren't reading. It also feels bad that the error comes after the script instead of during it. If we throw the error while running the script we'd get the line it fails on. Painless has try/catch so you could even use that to dodge error cases or add extra information when an error is thrown.

At the moment the script is actually run by the child field, and there's no caching or sharing of values between different child fields. So if you have a grok defined on a parent field 'message' and you search on 'message.status' and 'message.ip', both fields will run the painless script separately and then extract the part of the map that they need. This isn't ideal, as Yannick mentions, but it's the way runtime fields work now. It would be great to rework things so that different runtime fields could share a LeafSearchLookup, in which case we could cache these results there, but that's a much bigger project.

jtibshirani

This makes sense to me overall, I left some initial comments. The name composite has really grown on me!

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

server/src/main/java/org/elasticsearch/index/mapper/RuntimeField.java

server/src/main/java/org/elasticsearch/index/mapper/LeafRuntimeField.java

server/src/main/java/org/elasticsearch/index/mapper/RuntimeField.java

romseygeek · 2021-08-05T12:11:52Z

Thanks for the reviews everybody!

I have reworked things so that runtime builders now have two methods, one to create a simple leaf field and one to create a field with a parent. The parsing code can take an optional function that will choose which method to call when building fields, and it nicely separates out composite field construction from 'normal' field construction.

…lds_multi_emit

jtibshirani

This is looking good to me, just left some small comments. As a caveat, I am not deeply familiar with this code and its history (AbstractScriptFieldType, RuntimeField, etc.)

server/src/main/java/org/elasticsearch/index/mapper/AbstractScriptFieldType.java

server/src/main/java/org/elasticsearch/script/CompositeFieldScript.java

server/src/main/java/org/elasticsearch/index/query/SearchExecutionContext.java

server/src/main/java/org/elasticsearch/index/mapper/CompositeRuntimeField.java

romseygeek · 2021-08-10T10:18:41Z

@elasticmachine update branch

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes elastic#68203

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes #68203 Co-authored-by: Luca Cavanna <[email protected]>

markharwood · 2021-08-23T08:24:24Z

Hi @romseygeek I have a null pointer fix for this in master but when I tried backporting that fix it looks like this feature is not backported in 7.x? The label on this PR says 7.15 so I'm not sure if it's still just pending?

markharwood · 2021-08-23T08:49:28Z

Ignore my last comment. I can see the NPE is in a different class in 7.15 - TransportFieldCapabilitiesIndexAction.

mattkime · 2021-08-31T02:25:45Z

@javanna - I'd like to confirm - composite fields can only have 'child' fields one level deep, correct?

romseygeek · 2021-08-31T10:36:03Z

@mattkime yes that's correct

javanna and others added 12 commits June 15, 2021 14:19

wip

a6b13fc

add some more TODOs

83ecb52

wip

6440373

Merge remote-tracking branch 'origin/master' into runtime/multiple-fi…

325f33f

…elds

Merge branch 'master' into poc/emit_multiple_fields

ad0ac3d

Merge branch 'master' into poc/emit_multiple_fields

5a6a0f9

Merge branch 'master' into poc/emit_multiple_fields

15bd02f

remove comment on naming

af19b2e

add parent name

2c9d0d5

javadocs

3c033d0

update assertion

0c8edbd

update assertion

a13d476

javanna added >enhancement release highlight :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.15.0 labels Jul 8, 2021

Fix serialization

9066065

javanna commented Jul 8, 2021

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/ObjectRuntimeField.java Outdated Show resolved Hide resolved

javanna commented Jul 8, 2021

View reviewed changes

javanna and others added 10 commits July 8, 2021 12:07

Merge branch 'master' into feature/runtime_fields_multi_emit

52c2e43

Add tests

5023ecb

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

0eb1439

…lds_multi_emit

rename to 'composite'

32a4f0c

Fix dynamic field shadowing

b91d07e

Add yaml test

add5537

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

23dbff7

…lds_multi_emit

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

67577dc

…lds_multi_emit

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

a78c453

…lds_multi_emit

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

6752847

…lds_multi_emit

ywelsch reviewed Aug 4, 2021

View reviewed changes

nik9000 reviewed Aug 4, 2021

View reviewed changes

well that would never have worked would it?

7872c94

jtibshirani reviewed Aug 5, 2021

View reviewed changes

feedback

5bd7e51

romseygeek added 2 commits August 9, 2021 09:20

Merge remote-tracking branch 'origin/master' into feature/runtime_fie…

9d48afb

…lds_multi_emit

more cleanups

767633d

jtibshirani reviewed Aug 9, 2021

View reviewed changes

feedback

803ed32

jtibshirani approved these changes Aug 10, 2021

View reviewed changes

elasticmachine and others added 2 commits August 10, 2021 20:18

Merge branch 'master' into feature/runtime_fields_multi_emit

4741f5c

wtf

66b7cb6

romseygeek merged commit 32d2f60 into master Aug 10, 2021

romseygeek deleted the feature/runtime_fields_multi_emit branch August 10, 2021 12:07

benwtrent mentioned this pull request Aug 19, 2021

field_caps Throws NPE when running against composite runtime fields #76716

Closed

sebelga mentioned this pull request Aug 26, 2021

[Runtime field editor] Composite runtime in Kibana Data Views elastic/kibana#110226

Merged

5 tasks

romseygeek mentioned this pull request Sep 13, 2021

Add index-time composite fields #77625

Open

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

davismcphee mentioned this pull request Jun 21, 2023

[data views] Field editor endpoint versioning and schema validation elastic/kibana#159626

Merged

pquentin mentioned this pull request Sep 5, 2024

Add composite fields to search API elastic/elasticsearch-specification#2070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit multiple fields from a runtime field script #75108

Emit multiple fields from a runtime field script #75108

javanna commented Jul 8, 2021 •

edited by romseygeek

Loading

javanna Jul 8, 2021

ywelsch left a comment

ywelsch Aug 4, 2021

romseygeek Aug 4, 2021

nik9000 Aug 4, 2021

ywelsch Aug 4, 2021

romseygeek Aug 4, 2021

ywelsch Aug 4, 2021

ywelsch Aug 4, 2021

nik9000 left a comment

nik9000 Aug 4, 2021

romseygeek commented Aug 5, 2021

jtibshirani left a comment

romseygeek commented Aug 5, 2021

jtibshirani left a comment

romseygeek commented Aug 10, 2021

markharwood commented Aug 23, 2021 •

edited

Loading

markharwood commented Aug 23, 2021

mattkime commented Aug 31, 2021

romseygeek commented Aug 31, 2021

Emit multiple fields from a runtime field script #75108

Emit multiple fields from a runtime field script #75108

Conversation

javanna commented Jul 8, 2021 • edited by romseygeek Loading

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek commented Aug 5, 2021

jtibshirani left a comment

Choose a reason for hiding this comment

romseygeek commented Aug 5, 2021

jtibshirani left a comment

Choose a reason for hiding this comment

romseygeek commented Aug 10, 2021

markharwood commented Aug 23, 2021 • edited Loading

markharwood commented Aug 23, 2021

mattkime commented Aug 31, 2021

romseygeek commented Aug 31, 2021

javanna commented Jul 8, 2021 •

edited by romseygeek

Loading

markharwood commented Aug 23, 2021 •

edited

Loading