-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add typed emit functionality #68231
Add typed emit functionality #68231
Conversation
This creates `.emit()` methods on a few primitive things for `keyword` and `date` style runtime fields. Now you can do stuff like: ``` "d": { "type": "date", "script": "'2020-01-18T17:41:34.000Z'.emit()" } ``` We get to piggy back of painless's dynamic dispatch code to regonize that you are in a `date` and trying to emit a `String` so we can "do the right thing" without any extra runtime cost. In this case we just parse the date using the `formatter` on the field. And since the example above doesn't use a formatter we get ISO8601, the date format of kings. Similarly, you can do stuff like: ``` "s": { "type": "keyword", "script": """ for (int i = 0; i < 100; i++) { i.emit(); } """ } ``` Painless *knows* `i` is an `int` and will call an emit method that emits its string value. Also! Assuming we get the syntax proposed in elastic#68088, because this is a chain of method invocations you can do something like: ``` "i": { "type": "long", "script": """ grok('%{NUMBER:i} %{NUMBER:j}').extract(doc['message'].value)?.i?.emit() """ } ``` This should be read as "if grok matches the message and extracts a value for `i` then emit it to the runtime field." If either the grok doesn't match or doesn't extract an `i` value then nothing will be emitted. As an extra nice thing - we'll automatilly convert whatever the `grok` expression outputs into a `long`. All of it handled for us using painless's standard dynamic dispatch code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm in no position to review the painless infrastructure changes, but I like the API from a scripting point of view.
voltage.rounded: | ||
type: keyword | ||
script: | | ||
long v = ((long) doc['voltage'].value).emit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you meant to have this trailing emit
here, but I'm intrigued about its effect - do we just have two identical long values for every document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emit twice is indeed a mistake. This won't work either - emit
returns void
so this likely won't compile. I think this is an artifact of me pushing the draft too early. I'll fix.
Ah NOCOMMIT. I'll fix you. |
Just like the standard |
This PR only adds |
@@ -869,8 +887,9 @@ public void visitBreak(SBreak userBreakNode, ScriptScope scriptScope) { | |||
@Override | |||
public void visitAssignment(EAssignment userAssignmentNode, ScriptScope scriptScope) { | |||
boolean read = scriptScope.getCondition(userAssignmentNode, Read.class); | |||
Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class) ? | |||
scriptScope.getDecoration(userAssignmentNode, CompoundType.class).getCompoundType() : null; | |||
Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no. I think I ran the formatter over this entire file. Ooops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work you've done here! So, I don't want to stop these changes from going in because I think moving forward for grok/dissect is super important. However, after giving this some thought I would like to see us not have specialized context information as part of the PainlessLookupBuilder. While these are per context, the PainlessLookupBuilder shouldn't need to know about different script types. This information is all contained as part of ScriptClassInfo where we turn the ScriptContext into what amounts to a Painless Context.
Alternatives to the infrastructure added as part of this PR:
First alternative --
- new annotation maybe something like HideAndInjectAnnotation[int position] - this annotation let's us know that the parameter in position is injected by Painless without the user having knowledge of it
- the injected parameter is still added as part of the whitelisted method so we get the benefit of naturally loading it as part of the PainlessLookupBuilder without having context specfic information
- we add the injected value as a member variable of the generated class (script) and store it so it's easily accessible for user defined functions or lambdas
Second alternative --
We extend the current injection annotation to allow for a marker for "this" since these already work with lambdas and member functions.
Either way I think it's practical to still include the parameter as part of the whitelist so the script is naturally included without specialized knowledge in the PainlessLookupBuilder.
Would be interested to hear thoughts from @stu-elastic as well as your thoughts @nik9000 .
|
||
public class InjectScriptAnnotationParser implements WhitelistAnnotationParser { | ||
|
||
public static final InjectScriptAnnotationParser INSTANCE = new InjectScriptAnnotationParser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add a sentence or two JavaDoc for this? The name doesn't make it immediately clear to me that we are passing in around the "this" pointer for the script internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Same as before. But maybe we should somehow link this into the inject annotation like you were saying? Some way for inject to know its injecting member variables?
@@ -445,7 +445,7 @@ public static CallSite bootstrap(PainlessLookup painlessLookup, FunctionTable fu | |||
switch(flavor) { | |||
// "function-call" like things get a polymorphic cache | |||
case METHOD_CALL: | |||
if (args.length == 0) { | |||
if (args.length < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always pass in the script as an argument? Seems like we should only do that when it's actually required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass in two things indy - the recipe like before and whether or not we passed in the script. arg[1] 0 if we didn't pass the script and 1 if we did.
import org.elasticsearch.painless.Location; | ||
import org.elasticsearch.painless.phase.IRTreeVisitor; | ||
|
||
public class LoadScriptNode extends ExpressionNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-term this should probably be a LoadThisNode, but that should be a separate PR.
@@ -40,14 +40,16 @@ | |||
private final Map<String, PainlessMethod> painlessMethodKeysToImportedPainlessMethods; | |||
private final Map<String, PainlessClassBinding> painlessMethodKeysToPainlessClassBindings; | |||
private final Map<String, PainlessInstanceBinding> painlessMethodKeysToPainlessInstanceBindings; | |||
private final Map<String, Class<?>> methodNameToInjectedScriptType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this information come from the context (parsed into ScriptClassInfo)? This doesn't seem like the right place to track this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it come from here because I wanted to use the type from the whitelist. I figured it could be an interface or a superclass of the actual script. I think something like this makes a lot more sense with your idea of injecting members though.
@@ -233,6 +236,12 @@ public PainlessMethod lookupFunctionalInterfacePainlessMethod(Class<?> targetCla | |||
return targetPainlessClass.functionalInterfaceMethod; | |||
} | |||
|
|||
public Class<?> typeOfScriptToInjectForMethod(String methodName, int methodArity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think this would fit better if the script was a parameter of the whitelisted method. Then this method wouldn't be need. It seems awkward to me to have what is specialized context information in the lookup that doesn't really know anything about contexts.
I really like your idea of injecting members! I think its super flexible and let's the whitelists be much more general. |
I talked with @jdconrad out of band:
|
@elasticmachine update branch |
I can make this one smaller but I don't think I can remove it. Mostly because we don't have a |
@nik9000 Yeah, let's reevaluate methodNameToInjectedScriptType after this PR. Thanks for considering the other immediate options, but I agree that none of them look reasonable at this moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question about this in general. Does this actually make scripts more readable? I'm not sure it does. The examples I see in tests seem to simply move the emit method to the end of the line, instead of it being at the beginning. So as an example:
before:
emit(doc['timestamp'].value.plusYears(2))
after:
doc['timestamp'].value.plusYears(2).emit()
So from my perspective, what is the user actually gaining other than yet another way to call emit? I'm overly cautious here for adding syntactic sugar because maintaining this support if we then came up with yet another way to make scripts "easier", these emit() methods on all types would be akin to the dot access for map values that has plagued us for so long now, which is almost impossible to get rid of at this point.
They are gaining dynamic dispatch on the receiver. If painless had dynamic
dispatch on the args I wouldn't have proposed it. I haven't looked into
implementing dynamic dispatch dispatch on args but I'm told it's super
tricky. Maybe it's worth another check.
…On Wed, Feb 3, 2021, 19:18 Ryan Ernst ***@***.***> wrote:
***@***.**** commented on this pull request.
I have one question about this in general. Does this actually make scripts
more readable? I'm not sure it does. The examples I see in tests seem to
simply move the emit method to the end of the line, instead of it being at
the beginning. So as an example:
before:
emit(doc['timestamp'].value.plusYears(2))
after:
doc['timestamp'].value.plusYears(2).emit()
So from my perspective, what is the user actually gaining other than yet
another way to call emit? I'm overly cautious here for adding syntactic
sugar because maintaining this support if we then came up with yet another
way to make scripts "easier", these emit() methods on all types would be
akin to the dot access for map values that has plagued us for so long now,
which is almost impossible to get rid of at this point.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68231 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIV7QBK4VTPQG4MZVMDS5HRUZANCNFSM4WZE2GUA>
.
|
They can also do null safe chains on the emits to succinctly skip emits
when things don't match. Those sorts of sequences are kind of neat because
we can likely skip auto-boxing in those cases. If have primitives in the
first place.
It's not that this is easier to read. It's that it is easier to integrate
with painless's strengths.
…On Wed, Feb 3, 2021, 19:22 Nikolas Everett ***@***.***> wrote:
They are gaining dynamic dispatch on the receiver. If painless had dynamic
dispatch on the args I wouldn't have proposed it. I haven't looked into
implementing dynamic dispatch dispatch on args but I'm told it's super
tricky. Maybe it's worth another check.
On Wed, Feb 3, 2021, 19:18 Ryan Ernst ***@***.***> wrote:
> ***@***.**** commented on this pull request.
>
> I have one question about this in general. Does this actually make
> scripts more readable? I'm not sure it does. The examples I see in tests
> seem to simply move the emit method to the end of the line, instead of it
> being at the beginning. So as an example:
>
> before:
>
> emit(doc['timestamp'].value.plusYears(2))
>
> after:
>
> doc['timestamp'].value.plusYears(2).emit()
>
> So from my perspective, what is the user actually gaining other than yet
> another way to call emit? I'm overly cautious here for adding syntactic
> sugar because maintaining this support if we then came up with yet another
> way to make scripts "easier", these emit() methods on all types would be
> akin to the dot access for map values that has plagued us for so long now,
> which is almost impossible to get rid of at this point.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#68231 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AABUXIV7QBK4VTPQG4MZVMDS5HRUZANCNFSM4WZE2GUA>
> .
>
|
This creates
.emit()
methods on a few primitive things forkeyword
and
date
style runtime fields. Now you can do stuff like:We get to piggy back of painless's dynamic dispatch code to regonize
that you are in a
date
and trying to emit aString
so we can "do theright thing" without any extra runtime cost. In this case we just parse
the date using the
formatter
on the field. And since the example abovedoesn't use a formatter we get ISO8601, the date format of kings.
Similarly, you can do stuff like:
Painless knows
i
is anint
and will call an emit method that emitsits string value.
Also! Assuming we get the syntax proposed in #68088, because this is a
chain of method invocations you can do something like:
This should be read as "if grok matches the message and extracts a value
for
i
then emit it to the runtime field." If either the grok doesn'tmatch or doesn't extract an
i
value then nothing will be emitted. Asan extra nice thing - we'll automatilly convert whatever the
grok
expression outputs into a
long
. All of it handled for us usingpainless's standard dynamic dispatch code.