Emit multiple fields from the same runtime field script #73252

javanna · 2021-05-19T19:01:43Z

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. Though each runtime field can only emit values for a single field. This PR introduces support for emitting multiple fields from the same script.

Note that this is a draft to share the path forward and gather initial feedback. We have not settled yet on the API. I have introduced support for an object field type on the runtime section, with a new script context that accepts two emit signatures:

emit(String, Object)
emit(Map)

We may use the object type, or introduce a new field type.

The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes the relevant sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields.

Once we settled on the API, besides addressing the TODOs, we will need to add support for making the new field type indexed by moving its definition to the properties section.

Closes #68203

jdconrad · 2021-05-20T17:28:43Z

This looks good to me as a practical solution for what needs to get done. Thank you for sharing this draft!

A couple thoughts/notes:

Looking at the code I wonder if would make sense to just have two emit methods per type where one is emit(Object) -> emit(def) and the other is emit(String, Object) -> emit(String, def). This would allow the emitToObject methods you have to replace emit and get the conversions we want anyway. I wonder how bad boxing really is on the perf hit. In a number of cases it may already be boxed depending on what they do with the values.
The arity overloading is indeed a bummer and could've saved us a bunch of trouble. Just for transparency (and please feel free to skip as you may already know this) we don't do this for a few reasons:
** Inconsistency between compile time resolution and runtime resolution. IE at runtime we could always pick the best method based on a def type, but at compile time that's not the case and could lead to strange situations where unexpected methods are picked.
** It's very challenging to make appropriate determinations about what counts as the closest method, and there's no simple public library to do this for us.
** It would make a lot more difficult to do planned flexibility extensions to the casting model in the future.

rjernst

A couple thoughts

rjernst · 2021-05-20T12:45:19Z

server/src/main/java/org/elasticsearch/script/ObjectFieldScript.java

+    }
+
+    protected final void emit(String field, Object value) {
+        List<Object> values = this.fieldValues.computeIfAbsent(field, s -> new ArrayList<>());


Even without method overloading in painless, we can do some validation here on the type. It could still be fast by using a Set of allowed types. Call value.getClass() and check for existence in the Set.

Given that in most cases we end up trying to parse what toString returns when the type is not exactly what we'd like, I am not sure how we would restrict the type of the argument that is provided here. That is maybe a consequence of re-using the existing code for parsing from _source that tries to adapt to the different field types that may be found in _source.

.../src/main/resources/org/elasticsearch/painless/spi/org.elasticsearch.script.object_field.txt

javanna · 2021-05-25T13:20:01Z

thanks @jdconrad

Looking at the code I wonder if would make sense to just have two emit methods per type where one is emit(Object) -> emit(def) and the other is emit(String, Object) -> emit(String, def). This would allow the emitToObject methods you have to replace emit and get the conversions we want anyway. I wonder how bad boxing really is on the perf hit. In a number of cases it may already be boxed depending on what they do with the values

I was thinking the same, maybe that is a potential follow-up. I think it deserves more discussion, the two concerns I could see is that we would introduce more leniency in the existing runtime field types, and once we've done that we will hardly have a way to go back. Another concern could be the auto-boxing performance costs, like you said. A big advantage though would be to help users tremendously with their type conversions, which I am sure they struggle with at the moment.

…elds

javanna · 2021-07-07T19:01:01Z

Closing this, we will work in a feature branch for the last couple of things that need to be completed and open another PR when ready. Thanks for the early feedback everybody.

javanna added >enhancement release highlight :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.14.0 labels May 19, 2021

javanna requested review from romseygeek and jtibshirani May 19, 2021 19:01

javanna added :Search/Search Search-related issues that do not fall into other categories and removed :Search Foundations/Mapping Index mappings, including merging and defining field types labels May 19, 2021

rjernst reviewed May 20, 2021

View reviewed changes

javanna force-pushed the poc/emit_multiple_fields branch 3 times, most recently from 674eb65 to aaf6aa4 Compare May 31, 2021 09:55

javanna mentioned this pull request Jun 1, 2021

Introduce specialized getMatchingFieldTypes that takes a predicate #73618

Closed

javanna force-pushed the poc/emit_multiple_fields branch 2 times, most recently from e998fed to 73d4284 Compare June 8, 2021 15:38

javanna added 3 commits June 15, 2021 14:19

wip

a6b13fc

add some more TODOs

83ecb52

wip

6440373

javanna force-pushed the poc/emit_multiple_fields branch 2 times, most recently from 75f3836 to 6440373 Compare June 15, 2021 12:29

romseygeek and others added 2 commits June 18, 2021 10:15

Merge remote-tracking branch 'origin/master' into runtime/multiple-fi…

325f33f

…elds

Merge branch 'master' into poc/emit_multiple_fields

ad0ac3d

mark-vieira added v7.15.0 and removed v7.14.0 labels Jun 30, 2021

javanna added 3 commits July 6, 2021 14:46

Merge branch 'master' into poc/emit_multiple_fields

5a6a0f9

Merge branch 'master' into poc/emit_multiple_fields

15bd02f

remove comment on naming

af19b2e

javanna added 4 commits July 7, 2021 20:43

add parent name

2c9d0d5

javadocs

3c033d0

update assertion

0c8edbd

update assertion

a13d476

javanna removed :Search/Search Search-related issues that do not fall into other categories >enhancement release highlight v7.15.0 v8.0.0 labels Jul 7, 2021

javanna closed this Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit multiple fields from the same runtime field script #73252

Emit multiple fields from the same runtime field script #73252

javanna commented May 19, 2021 •

edited

Loading

jdconrad commented May 20, 2021

rjernst left a comment

rjernst May 20, 2021

javanna May 25, 2021

javanna commented May 25, 2021

javanna commented Jul 7, 2021

Emit multiple fields from the same runtime field script #73252

Emit multiple fields from the same runtime field script #73252

Conversation

javanna commented May 19, 2021 • edited Loading

jdconrad commented May 20, 2021

rjernst left a comment

Choose a reason for hiding this comment

rjernst May 20, 2021

Choose a reason for hiding this comment

javanna May 25, 2021

Choose a reason for hiding this comment

javanna commented May 25, 2021

javanna commented Jul 7, 2021

javanna commented May 19, 2021 •

edited

Loading