Add support for emitting multiple fields values from a script #68203

javanna · 2021-01-29T11:18:46Z

We recently added support for runtime fields, that are computed at search time based on a painless script. As of today, a runtime field script can emit values for a single field, the one that the script is declared under.

We would like to add the ability for a script to emit values for multiple fields. This will be achieved by introducing support for a new field type (name to be defined) as part of the runtime section. Its script emits fields that belong to such object. This is particularly useful given that scripts support grok and dissect (#68088):

PUT localhost:9200/logs/_mappings
{
  "runtime" : {
    "log" : {
      "type" : "tbd",
      "script": '''
        emit(grok('%{COMMONAPACHELOG}').extract(doc["message"].value)));
      ''',
      "fields" : {
        "clientip" : {
          "type" : "ip"
        },
        "verb" : {
          "type" : "keyword"
        },
        "request" : {
          "type" : "keyword"
        },
        "response" : {
          "type" : "long"
        }
      }
    }
  },
  "properties" : {
    "message" : {
      "type" : "keyword"
    }
  }
}

In the example above, the grok function splits the message field into sub-fields based on the provided grok pattern, and each of the resulting fields is emitted in the following loop. The emitted fields need to be listed under the sub-fields in order to specify their type and make them searchable (and discoverable through field_caps) like any other field:

POST /logs*/_search
{
  "aggs": {
    "response_codes": {
      "range": {
        "field": "log.response",
        "ranges": [
          { "to": 300 },
          { "from": 300, "to": 400 },
          { "from": 500 }
        ]
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-01-29T11:18:48Z

Pinging @elastic/es-search (Team:Search)

jtibshirani · 2021-02-05T19:04:13Z

An alternative would be to model this using multi-fields:

"message" : {
      "type" : "keyword",
      "script": '''
        Map fields = grok('%{COMMONAPACHELOG}').extract(doc["message"].value);
        for (Map.Entry field : fields) {
          emit(field.getKey(), field.getValue());
        }
      ''',
      "fields" : {
        "clientip" : {
          "type" : "ip"
        }
      }
    }
  },

I think it fits okay into the multi-fields concept: the parent field holds some value, and each subfield consults the same value but exposes it differently. A benefit of this approach is that there's only one way to define sub-fields, so the conflict in the example isn't possible. A more theoretical point, but I also like that it maintains an invariant: any field that stores + parses a document value is a leaf field type (not an object mapper). This includes fields that expose virtual subfields like flattened in addition to non-virtual ones like aggregate_metric_double.

javanna · 2021-02-05T19:53:43Z

Thanks for the feedback!

One point of concern is how the runtime field will be shaped when it is made indexed. We'd love to be able to paste it under the properties section. That sounds harder if we follow the multi-fields approach, I'm afraid. Or maybe not, as long as scripts behave the same under both the runtime and the properties section.

I wonder if it would be harder to follow that the script for a keyword field can emit multiple fields or not depending on whether it has sub-fields defined. The intention so far would be to expose that key-value emit (or something along those lines) only to the object variant of a runtime field.

On the naming conflict, I believe that you could still define a message.clientip field with the dot in its name, outside of the message field of type keyword, and we would have to possibly forbid having both. It is still up for discussion what the right behaviour should be, my intention was only to point out the potential problem in the description above.

I was also under the impression that the invariant you mentioned is valid with both examples, because you still need to define the leaf fields. The script is meta: it tells what to emit and where to take it from, but the object is still a collection of other fields.

Does this make sense to you?

javanna · 2021-02-06T17:20:57Z

One additional point that came to mind is that the message field itself may be defined under properties, and doc['message'] would refer to it. If the script to split the field is defined under a message field of type keyword, that forces the mssage field to be runtime too? That field, to my mind, is really just a container of other fields, but on its own, it has little meaning when referred to, for instance in query.

javanna · 2021-06-03T10:11:01Z

Heads up: I have updated the description of the issue according to recent discussions and the draft PR I opened. The API has slightly changed, and we need to come up with a name for the new field type if we decide to not use object.

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes #68203

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes elastic#68203

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes #68203 Co-authored-by: Luca Cavanna <[email protected]>

javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Jan 29, 2021

elasticmachine added the Team:Search Meta label for search team label Jan 29, 2021

sebelga mentioned this issue Apr 26, 2021

[Index pattern field editor] Support "object" runtime fields elastic/kibana#98330

Closed

4 tasks

javanna mentioned this issue May 19, 2021

Emit multiple fields from the same runtime field script #73252

Closed

javanna changed the title ~~Add support for runtime object fields~~ Add support for emitting multiple fields values from a script Jun 3, 2021

javanna mentioned this issue Jul 8, 2021

Emit multiple fields from a runtime field script #75108

Merged

romseygeek closed this as completed in #75108 Aug 10, 2021

romseygeek mentioned this issue Aug 10, 2021

Emit multiple fields from a runtime field script (#75108) #76287

Merged

javanna mentioned this issue Jan 20, 2022

Add support for composite indexed field #82878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for emitting multiple fields values from a script #68203

Add support for emitting multiple fields values from a script #68203

javanna commented Jan 29, 2021 •

edited

Loading

elasticmachine commented Jan 29, 2021

jtibshirani commented Feb 5, 2021

javanna commented Feb 5, 2021

javanna commented Feb 6, 2021

javanna commented Jun 3, 2021

Add support for emitting multiple fields values from a script #68203

Add support for emitting multiple fields values from a script #68203

Comments

javanna commented Jan 29, 2021 • edited Loading

elasticmachine commented Jan 29, 2021

jtibshirani commented Feb 5, 2021

javanna commented Feb 5, 2021

javanna commented Feb 6, 2021

javanna commented Jun 3, 2021

javanna commented Jan 29, 2021 •

edited

Loading