Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit multiple fields from the same runtime field script #73252

Closed
wants to merge 12 commits into from

Conversation

javanna
Copy link
Member

@javanna javanna commented May 19, 2021

We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. Though each runtime field can only emit values for a single field. This PR introduces support for emitting multiple fields from the same script.

Note that this is a draft to share the path forward and gather initial feedback. We have not settled yet on the API. I have introduced support for an object field type on the runtime section, with a new script context that accepts two emit signatures:

  • emit(String, Object)
  • emit(Map)

We may use the object type, or introduce a new field type.

The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes the relevant sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields.

Once we settled on the API, besides addressing the TODOs, we will need to add support for making the new field type indexed by moving its definition to the properties section.

Closes #68203

@javanna javanna added >enhancement release highlight :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.14.0 labels May 19, 2021
@javanna javanna requested review from romseygeek and jtibshirani May 19, 2021 19:01
@javanna javanna added :Search/Search Search-related issues that do not fall into other categories and removed :Search Foundations/Mapping Index mappings, including merging and defining field types labels May 19, 2021
@jdconrad
Copy link
Contributor

This looks good to me as a practical solution for what needs to get done. Thank you for sharing this draft!

A couple thoughts/notes:

  • Looking at the code I wonder if would make sense to just have two emit methods per type where one is emit(Object) -> emit(def) and the other is emit(String, Object) -> emit(String, def). This would allow the emitToObject methods you have to replace emit and get the conversions we want anyway. I wonder how bad boxing really is on the perf hit. In a number of cases it may already be boxed depending on what they do with the values.
  • The arity overloading is indeed a bummer and could've saved us a bunch of trouble. Just for transparency (and please feel free to skip as you may already know this) we don't do this for a few reasons:
    ** Inconsistency between compile time resolution and runtime resolution. IE at runtime we could always pick the best method based on a def type, but at compile time that's not the case and could lead to strange situations where unexpected methods are picked.
    ** It's very challenging to make appropriate determinations about what counts as the closest method, and there's no simple public library to do this for us.
    ** It would make a lot more difficult to do planned flexibility extensions to the casting model in the future.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple thoughts

}

protected final void emit(String field, Object value) {
List<Object> values = this.fieldValues.computeIfAbsent(field, s -> new ArrayList<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without method overloading in painless, we can do some validation here on the type. It could still be fast by using a Set of allowed types. Call value.getClass() and check for existence in the Set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in most cases we end up trying to parse what toString returns when the type is not exactly what we'd like, I am not sure how we would restrict the type of the argument that is provided here. That is maybe a consequence of re-using the existing code for parsing from _source that tries to adapt to the different field types that may be found in _source.

@javanna
Copy link
Member Author

javanna commented May 25, 2021

thanks @jdconrad

Looking at the code I wonder if would make sense to just have two emit methods per type where one is emit(Object) -> emit(def) and the other is emit(String, Object) -> emit(String, def). This would allow the emitToObject methods you have to replace emit and get the conversions we want anyway. I wonder how bad boxing really is on the perf hit. In a number of cases it may already be boxed depending on what they do with the values

I was thinking the same, maybe that is a potential follow-up. I think it deserves more discussion, the two concerns I could see is that we would introduce more leniency in the existing runtime field types, and once we've done that we will hardly have a way to go back. Another concern could be the auto-boxing performance costs, like you said. A big advantage though would be to help users tremendously with their type conversions, which I am sure they struggle with at the moment.

@javanna javanna force-pushed the poc/emit_multiple_fields branch 3 times, most recently from 674eb65 to aaf6aa4 Compare May 31, 2021 09:55
@javanna javanna force-pushed the poc/emit_multiple_fields branch 2 times, most recently from e998fed to 73d4284 Compare June 8, 2021 15:38
@javanna javanna force-pushed the poc/emit_multiple_fields branch 2 times, most recently from 75f3836 to 6440373 Compare June 15, 2021 12:29
@javanna javanna removed :Search/Search Search-related issues that do not fall into other categories >enhancement release highlight v7.15.0 v8.0.0 labels Jul 7, 2021
@javanna
Copy link
Member Author

javanna commented Jul 7, 2021

Closing this, we will work in a feature branch for the last couple of things that need to be completed and open another PR when ready. Thanks for the early feedback everybody.

@javanna javanna closed this Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for emitting multiple fields values from a script
5 participants