Add typed emit functionality #68231

nik9000 · 2021-01-29T16:46:03Z

This creates .emit() methods on a few primitive things for keyword
and date style runtime fields. Now you can do stuff like:

"d": {
  "type": "date",
  "script": "'2020-01-18T17:41:34.000Z'.emit()"
}

We get to piggy back of painless's dynamic dispatch code to regonize
that you are in a date and trying to emit a String so we can "do the
right thing" without any extra runtime cost. In this case we just parse
the date using the formatter on the field. And since the example above
doesn't use a formatter we get ISO8601, the date format of kings.

Similarly, you can do stuff like:

"s": {
  "type": "keyword",
  "script": """
    for (int i = 0; i < 100; i++) {
      i.emit();
    }
  """
}

Painless knows i is an int and will call an emit method that emits
its string value.

Also! Assuming we get the syntax proposed in #68088, because this is a
chain of method invocations you can do something like:

"i": {
  "type": "long",
  "script": """
    grok('%{NUMBER:i} %{NUMBER:j}').extract(doc['message'].value)?.i?.emit()
  """
}

This should be read as "if grok matches the message and extracts a value
for i then emit it to the runtime field." If either the grok doesn't
match or doesn't extract an i value then nothing will be emitted. As
an extra nice thing - we'll automatilly convert whatever the grok
expression outputs into a long. All of it handled for us using
painless's standard dynamic dispatch code.

This creates `.emit()` methods on a few primitive things for `keyword` and `date` style runtime fields. Now you can do stuff like: ``` "d": { "type": "date", "script": "'2020-01-18T17:41:34.000Z'.emit()" } ``` We get to piggy back of painless's dynamic dispatch code to regonize that you are in a `date` and trying to emit a `String` so we can "do the right thing" without any extra runtime cost. In this case we just parse the date using the `formatter` on the field. And since the example above doesn't use a formatter we get ISO8601, the date format of kings. Similarly, you can do stuff like: ``` "s": { "type": "keyword", "script": """ for (int i = 0; i < 100; i++) { i.emit(); } """ } ``` Painless *knows* `i` is an `int` and will call an emit method that emits its string value. Also! Assuming we get the syntax proposed in elastic#68088, because this is a chain of method invocations you can do something like: ``` "i": { "type": "long", "script": """ grok('%{NUMBER:i} %{NUMBER:j}').extract(doc['message'].value)?.i?.emit() """ } ``` This should be read as "if grok matches the message and extracts a value for `i` then emit it to the runtime field." If either the grok doesn't match or doesn't extract an `i` value then nothing will be emitted. As an extra nice thing - we'll automatilly convert whatever the `grok` expression outputs into a `long`. All of it handled for us using painless's standard dynamic dispatch code.

romseygeek

Nice! I'm in no position to review the painless infrastructure changes, but I like the API from a scripting point of view.

romseygeek · 2021-01-29T17:07:41Z

x-pack/plugin/src/test/resources/rest-api-spec/test/runtime_fields/13_keyword_emit.yml

+            voltage.rounded:
+              type: keyword
+              script: |
+                long v = ((long) doc['voltage'].value).emit();


I don't think you meant to have this trailing emit here, but I'm intrigued about its effect - do we just have two identical long values for every document?

Emit twice is indeed a mistake. This won't work either - emit returns void so this likely won't compile. I think this is an artifact of me pushing the draft too early. I'll fix.

nik9000 · 2021-01-29T17:13:50Z

** elasticsearch-ci/1 ** — Build finished.

Ah NOCOMMIT. I'll fix you.

nik9000 · 2021-01-29T17:53:28Z

Just like the standard emit function, these .emit functions won't work inside of lambdas or functions for now (#68235).

nik9000 · 2021-01-29T19:39:39Z

This PR only adds .emit() for keyword and date typed runtime fields. That'll leave boolean, long, double, ip, and geopoint. Some of them I just haven't done. Some I'm not sure which classes should be able to .emit(). Either way, those are for another PR.

nik9000 · 2021-02-02T15:50:22Z

...ng-painless/src/main/java/org/elasticsearch/painless/phase/DefaultUserTreeToIRTreePhase.java

@@ -869,8 +887,9 @@ public void visitBreak(SBreak userBreakNode, ScriptScope scriptScope) {
    @Override
    public void visitAssignment(EAssignment userAssignmentNode, ScriptScope scriptScope) {
        boolean read = scriptScope.getCondition(userAssignmentNode, Read.class);
-        Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class) ?
-                scriptScope.getDecoration(userAssignmentNode, CompoundType.class).getCompoundType() : null;
+        Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class)


Oh no. I think I ran the formatter over this entire file. Ooops.

jdconrad

Thanks for the work you've done here! So, I don't want to stop these changes from going in because I think moving forward for grok/dissect is super important. However, after giving this some thought I would like to see us not have specialized context information as part of the PainlessLookupBuilder. While these are per context, the PainlessLookupBuilder shouldn't need to know about different script types. This information is all contained as part of ScriptClassInfo where we turn the ScriptContext into what amounts to a Painless Context.

Alternatives to the infrastructure added as part of this PR:

First alternative --

new annotation maybe something like HideAndInjectAnnotation[int position] - this annotation let's us know that the parameter in position is injected by Painless without the user having knowledge of it
the injected parameter is still added as part of the whitelisted method so we get the benefit of naturally loading it as part of the PainlessLookupBuilder without having context specfic information
we add the injected value as a member variable of the generated class (script) and store it so it's easily accessible for user defined functions or lambdas

Second alternative --
We extend the current injection annotation to allow for a marker for "this" since these already work with lambdas and member functions.

Either way I think it's practical to still include the parameter as part of the whitelist so the script is naturally included without specialized knowledge in the PainlessLookupBuilder.

Would be interested to hear thoughts from @stu-elastic as well as your thoughts @nik9000 .

jdconrad · 2021-02-01T17:21:20Z

...pi/src/main/java/org/elasticsearch/painless/spi/annotation/InjectScriptAnnotationParser.java

+
+public class InjectScriptAnnotationParser implements WhitelistAnnotationParser {
+
+    public static final InjectScriptAnnotationParser INSTANCE = new InjectScriptAnnotationParser();


Would you please add a sentence or two JavaDoc for this? The name doesn't make it immediately clear to me that we are passing in around the "this" pointer for the script internally.

++ Same as before. But maybe we should somehow link this into the inject annotation like you were saying? Some way for inject to know its injecting member variables?

jdconrad · 2021-02-02T15:17:46Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/DefBootstrap.java

@@ -445,7 +445,7 @@ public static CallSite bootstrap(PainlessLookup painlessLookup, FunctionTable fu
        switch(flavor) {
            // "function-call" like things get a polymorphic cache
            case METHOD_CALL:
-                if (args.length == 0) {
+                if (args.length < 2) {


Do we always pass in the script as an argument? Seems like we should only do that when it's actually required?

We pass in two things indy - the recipe like before and whether or not we passed in the script. arg[1] 0 if we didn't pass the script and 1 if we did.

jdconrad · 2021-02-02T15:19:11Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/ir/LoadScriptNode.java

+import org.elasticsearch.painless.Location;
+import org.elasticsearch.painless.phase.IRTreeVisitor;
+
+public class LoadScriptNode extends ExpressionNode {


Long-term this should probably be a LoadThisNode, but that should be a separate PR.

jdconrad · 2021-02-02T15:20:54Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/lookup/PainlessLookup.java

@@ -40,14 +40,16 @@
    private final Map<String, PainlessMethod> painlessMethodKeysToImportedPainlessMethods;
    private final Map<String, PainlessClassBinding> painlessMethodKeysToPainlessClassBindings;
    private final Map<String, PainlessInstanceBinding> painlessMethodKeysToPainlessInstanceBindings;
+    private final Map<String, Class<?>> methodNameToInjectedScriptType;


Shouldn't this information come from the context (parsed into ScriptClassInfo)? This doesn't seem like the right place to track this information.

I had it come from here because I wanted to use the type from the whitelist. I figured it could be an interface or a superclass of the actual script. I think something like this makes a lot more sense with your idea of injecting members though.

jdconrad · 2021-02-02T15:23:49Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/lookup/PainlessLookup.java

@@ -233,6 +236,12 @@ public PainlessMethod lookupFunctionalInterfacePainlessMethod(Class<?> targetCla
        return targetPainlessClass.functionalInterfaceMethod;
    }

+    public Class<?> typeOfScriptToInjectForMethod(String methodName, int methodArity) {


So, I think this would fit better if the script was a parameter of the whitelisted method. Then this method wouldn't be need. It seems awkward to me to have what is specialized context information in the lookup that doesn't really know anything about contexts.

nik9000 · 2021-02-02T16:04:32Z

I really like your idea of injecting members! I think its super flexible and let's the whitelists be much more general.

nik9000 · 2021-02-02T17:22:11Z

I talked with @jdconrad out of band:

I'm going try to switch from injecting the script to injecting the result of calling methods on the script. That'll make the whitelists a little more portable. In runtime fields those methods will just return this; but it quite flexible. We like the idea that the whitelist will no longer have to be dependent on the script itself.
I'm going to see if I can remove the funny methodNameToInjectedScriptType lookup. That'll take a little performance testing, but we'll see!

mark-vieira · 2021-02-03T00:15:58Z

@elasticmachine update branch

nik9000 · 2021-02-03T13:58:01Z

2\. I'm going to see if I can remove the funny `methodNameToInjectedScriptType` lookup. That'll take a little performance testing, but we'll see!

I can make this one smaller but I don't think I can remove it. Mostly because we don't have a this inside lambdas and relying on having one would break every def call in a lambda. Which is terrible. We do want to fix this, but, now is probably not the time. So instead all def calls that might need this will refuse to compile in lambdas. This is probably fine because the only def calls that might need this after this PR are def.emit(). Nothing else needs it. Yet.

jdconrad · 2021-02-03T15:35:55Z

@nik9000 Yeah, let's reevaluate methodNameToInjectedScriptType after this PR. Thanks for considering the other immediate options, but I agree that none of them look reasonable at this moment.

rjernst

I have one question about this in general. Does this actually make scripts more readable? I'm not sure it does. The examples I see in tests seem to simply move the emit method to the end of the line, instead of it being at the beginning. So as an example:

before:

emit(doc['timestamp'].value.plusYears(2))

after:

doc['timestamp'].value.plusYears(2).emit()

So from my perspective, what is the user actually gaining other than yet another way to call emit? I'm overly cautious here for adding syntactic sugar because maintaining this support if we then came up with yet another way to make scripts "easier", these emit() methods on all types would be akin to the dot access for map values that has plagued us for so long now, which is almost impossible to get rid of at this point.

nik9000 · 2021-02-04T00:23:04Z

They are gaining dynamic dispatch on the receiver. If painless had dynamic dispatch on the args I wouldn't have proposed it. I haven't looked into implementing dynamic dispatch dispatch on args but I'm told it's super tricky. Maybe it's worth another check.

…

On Wed, Feb 3, 2021, 19:18 Ryan Ernst ***@***.***> wrote: ***@***.**** commented on this pull request. I have one question about this in general. Does this actually make scripts more readable? I'm not sure it does. The examples I see in tests seem to simply move the emit method to the end of the line, instead of it being at the beginning. So as an example: before: emit(doc['timestamp'].value.plusYears(2)) after: doc['timestamp'].value.plusYears(2).emit() So from my perspective, what is the user actually gaining other than yet another way to call emit? I'm overly cautious here for adding syntactic sugar because maintaining this support if we then came up with yet another way to make scripts "easier", these emit() methods on all types would be akin to the dot access for map values that has plagued us for so long now, which is almost impossible to get rid of at this point. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#68231 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIV7QBK4VTPQG4MZVMDS5HRUZANCNFSM4WZE2GUA> .

nik9000 · 2021-02-04T00:27:48Z

They can also do null safe chains on the emits to succinctly skip emits when things don't match. Those sorts of sequences are kind of neat because we can likely skip auto-boxing in those cases. If have primitives in the first place. It's not that this is easier to read. It's that it is easier to integrate with painless's strengths.

…

On Wed, Feb 3, 2021, 19:22 Nikolas Everett ***@***.***> wrote: They are gaining dynamic dispatch on the receiver. If painless had dynamic dispatch on the args I wouldn't have proposed it. I haven't looked into implementing dynamic dispatch dispatch on args but I'm told it's super tricky. Maybe it's worth another check. On Wed, Feb 3, 2021, 19:18 Ryan Ernst ***@***.***> wrote: > ***@***.**** commented on this pull request. > > I have one question about this in general. Does this actually make > scripts more readable? I'm not sure it does. The examples I see in tests > seem to simply move the emit method to the end of the line, instead of it > being at the beginning. So as an example: > > before: > > emit(doc['timestamp'].value.plusYears(2)) > > after: > > doc['timestamp'].value.plusYears(2).emit() > > So from my perspective, what is the user actually gaining other than yet > another way to call emit? I'm overly cautious here for adding syntactic > sugar because maintaining this support if we then came up with yet another > way to make scripts "easier", these emit() methods on all types would be > akin to the dot access for map values that has plagued us for so long now, > which is almost impossible to get rid of at this point. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#68231 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABUXIV7QBK4VTPQG4MZVMDS5HRUZANCNFSM4WZE2GUA> > . >

nik9000 added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.12.0 labels Jan 29, 2021

nik9000 requested review from javanna, romseygeek and jdconrad January 29, 2021 16:46

romseygeek reviewed Jan 29, 2021

View reviewed changes

ITer

24bdf4e

nik9000 mentioned this pull request Jan 29, 2021

Runtime Field shorthand for emit(null) #64542

Open

Merge branch 'master' into painless_inject_script

40ddd40

nik9000 added 2 commits January 29, 2021 14:53

TemporalAccessor.emit

efe2abd

Example of truncating

ddca7a5

nik9000 commented Feb 2, 2021

View reviewed changes

jdconrad approved these changes Feb 2, 2021

View reviewed changes

rjernst reviewed Feb 4, 2021

View reviewed changes

williamrandolph added v7.13.0 and removed v7.12.0 labels Feb 18, 2021

nik9000 removed the :Search/Search Search-related issues that do not fall into other categories label Mar 24, 2021

nik9000 removed >enhancement v7.13.0 v8.0.0 labels Mar 24, 2021

nik9000 closed this Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add typed emit functionality #68231

Add typed emit functionality #68231

nik9000 commented Jan 29, 2021

romseygeek left a comment

romseygeek Jan 29, 2021

nik9000 Jan 29, 2021

nik9000 commented Jan 29, 2021

nik9000 commented Jan 29, 2021

nik9000 commented Jan 29, 2021

nik9000 Feb 2, 2021

jdconrad left a comment •

edited

Loading

jdconrad Feb 1, 2021

nik9000 Feb 2, 2021

jdconrad Feb 2, 2021

nik9000 Feb 2, 2021

jdconrad Feb 2, 2021

jdconrad Feb 2, 2021

nik9000 Feb 2, 2021

jdconrad Feb 2, 2021

nik9000 commented Feb 2, 2021

nik9000 commented Feb 2, 2021

mark-vieira commented Feb 3, 2021

nik9000 commented Feb 3, 2021

jdconrad commented Feb 3, 2021

rjernst left a comment

nik9000 commented Feb 4, 2021 via email

nik9000 commented Feb 4, 2021 via email


		public class InjectScriptAnnotationParser implements WhitelistAnnotationParser {

		public static final InjectScriptAnnotationParser INSTANCE = new InjectScriptAnnotationParser();

Add typed emit functionality #68231

Add typed emit functionality #68231

Conversation

nik9000 commented Jan 29, 2021

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Jan 29, 2021

nik9000 commented Jan 29, 2021

nik9000 commented Jan 29, 2021

Choose a reason for hiding this comment

jdconrad left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 2, 2021

nik9000 commented Feb 2, 2021

mark-vieira commented Feb 3, 2021

nik9000 commented Feb 3, 2021

jdconrad commented Feb 3, 2021

rjernst left a comment

Choose a reason for hiding this comment

nik9000 commented Feb 4, 2021 via email

nik9000 commented Feb 4, 2021 via email

jdconrad left a comment •

edited

Loading