Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typed emit functionality #68231

Closed
wants to merge 5 commits into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 29, 2021

This creates .emit() methods on a few primitive things for keyword
and date style runtime fields. Now you can do stuff like:

"d": {
  "type": "date",
  "script": "'2020-01-18T17:41:34.000Z'.emit()"
}

We get to piggy back of painless's dynamic dispatch code to regonize
that you are in a date and trying to emit a String so we can "do the
right thing" without any extra runtime cost. In this case we just parse
the date using the formatter on the field. And since the example above
doesn't use a formatter we get ISO8601, the date format of kings.

Similarly, you can do stuff like:

"s": {
  "type": "keyword",
  "script": """
    for (int i = 0; i < 100; i++) {
      i.emit();
    }
  """
}

Painless knows i is an int and will call an emit method that emits
its string value.

Also! Assuming we get the syntax proposed in #68088, because this is a
chain of method invocations you can do something like:

"i": {
  "type": "long",
  "script": """
    grok('%{NUMBER:i} %{NUMBER:j}').extract(doc['message'].value)?.i?.emit()
  """
}

This should be read as "if grok matches the message and extracts a value
for i then emit it to the runtime field." If either the grok doesn't
match or doesn't extract an i value then nothing will be emitted. As
an extra nice thing - we'll automatilly convert whatever the grok
expression outputs into a long. All of it handled for us using
painless's standard dynamic dispatch code.

This creates `.emit()` methods on a few primitive things for `keyword`
and `date` style runtime fields. Now you can do stuff like:
```
"d": {
  "type": "date",
  "script": "'2020-01-18T17:41:34.000Z'.emit()"
}
```

We get to piggy back of painless's dynamic dispatch code to regonize
that you are in a `date` and trying to emit a `String` so we can "do the
right thing" without any extra runtime cost. In this case we just parse
the date using the `formatter` on the field. And since the example above
doesn't use a formatter we get ISO8601, the date format of kings.

Similarly, you can do stuff like:
```
"s": {
  "type": "keyword",
  "script": """
    for (int i = 0; i < 100; i++) {
      i.emit();
    }
  """
}
```

Painless *knows* `i` is an `int` and will call an emit method that emits
its string value.

Also! Assuming we get the syntax proposed in elastic#68088, because this is a
chain of method invocations you can do something like:
```
"i": {
  "type": "long",
  "script": """
    grok('%{NUMBER:i} %{NUMBER:j}').extract(doc['message'].value)?.i?.emit()
  """
}
```

This should be read as "if grok matches the message and extracts a value
for `i` then emit it to the runtime field." If either the grok doesn't
match or doesn't extract an `i` value then nothing will be emitted. As
an extra nice thing - we'll automatilly convert whatever the `grok`
expression outputs into a `long`. All of it handled for us using
painless's standard dynamic dispatch code.
@nik9000 nik9000 added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.12.0 labels Jan 29, 2021
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm in no position to review the painless infrastructure changes, but I like the API from a scripting point of view.

voltage.rounded:
type: keyword
script: |
long v = ((long) doc['voltage'].value).emit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you meant to have this trailing emit here, but I'm intrigued about its effect - do we just have two identical long values for every document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emit twice is indeed a mistake. This won't work either - emit returns void so this likely won't compile. I think this is an artifact of me pushing the draft too early. I'll fix.

@nik9000
Copy link
Member Author

nik9000 commented Jan 29, 2021

** elasticsearch-ci/1 ** — Build finished.

Ah NOCOMMIT. I'll fix you.

@nik9000
Copy link
Member Author

nik9000 commented Jan 29, 2021

Just like the standard emit function, these .emit functions won't work inside of lambdas or functions for now (#68235).

@nik9000
Copy link
Member Author

nik9000 commented Jan 29, 2021

This PR only adds .emit() for keyword and date typed runtime fields. That'll leave boolean, long, double, ip, and geopoint. Some of them I just haven't done. Some I'm not sure which classes should be able to .emit(). Either way, those are for another PR.

@@ -869,8 +887,9 @@ public void visitBreak(SBreak userBreakNode, ScriptScope scriptScope) {
@Override
public void visitAssignment(EAssignment userAssignmentNode, ScriptScope scriptScope) {
boolean read = scriptScope.getCondition(userAssignmentNode, Read.class);
Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class) ?
scriptScope.getDecoration(userAssignmentNode, CompoundType.class).getCompoundType() : null;
Class<?> compoundType = scriptScope.hasDecoration(userAssignmentNode, CompoundType.class)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no. I think I ran the formatter over this entire file. Ooops.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work you've done here! So, I don't want to stop these changes from going in because I think moving forward for grok/dissect is super important. However, after giving this some thought I would like to see us not have specialized context information as part of the PainlessLookupBuilder. While these are per context, the PainlessLookupBuilder shouldn't need to know about different script types. This information is all contained as part of ScriptClassInfo where we turn the ScriptContext into what amounts to a Painless Context.

Alternatives to the infrastructure added as part of this PR:

First alternative --

  1. new annotation maybe something like HideAndInjectAnnotation[int position] - this annotation let's us know that the parameter in position is injected by Painless without the user having knowledge of it
  2. the injected parameter is still added as part of the whitelisted method so we get the benefit of naturally loading it as part of the PainlessLookupBuilder without having context specfic information
  3. we add the injected value as a member variable of the generated class (script) and store it so it's easily accessible for user defined functions or lambdas

Second alternative --
We extend the current injection annotation to allow for a marker for "this" since these already work with lambdas and member functions.

Either way I think it's practical to still include the parameter as part of the whitelist so the script is naturally included without specialized knowledge in the PainlessLookupBuilder.

Would be interested to hear thoughts from @stu-elastic as well as your thoughts @nik9000 .


public class InjectScriptAnnotationParser implements WhitelistAnnotationParser {

public static final InjectScriptAnnotationParser INSTANCE = new InjectScriptAnnotationParser();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a sentence or two JavaDoc for this? The name doesn't make it immediately clear to me that we are passing in around the "this" pointer for the script internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Same as before. But maybe we should somehow link this into the inject annotation like you were saying? Some way for inject to know its injecting member variables?

@@ -445,7 +445,7 @@ public static CallSite bootstrap(PainlessLookup painlessLookup, FunctionTable fu
switch(flavor) {
// "function-call" like things get a polymorphic cache
case METHOD_CALL:
if (args.length == 0) {
if (args.length < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always pass in the script as an argument? Seems like we should only do that when it's actually required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass in two things indy - the recipe like before and whether or not we passed in the script. arg[1] 0 if we didn't pass the script and 1 if we did.

import org.elasticsearch.painless.Location;
import org.elasticsearch.painless.phase.IRTreeVisitor;

public class LoadScriptNode extends ExpressionNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long-term this should probably be a LoadThisNode, but that should be a separate PR.

@@ -40,14 +40,16 @@
private final Map<String, PainlessMethod> painlessMethodKeysToImportedPainlessMethods;
private final Map<String, PainlessClassBinding> painlessMethodKeysToPainlessClassBindings;
private final Map<String, PainlessInstanceBinding> painlessMethodKeysToPainlessInstanceBindings;
private final Map<String, Class<?>> methodNameToInjectedScriptType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this information come from the context (parsed into ScriptClassInfo)? This doesn't seem like the right place to track this information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it come from here because I wanted to use the type from the whitelist. I figured it could be an interface or a superclass of the actual script. I think something like this makes a lot more sense with your idea of injecting members though.

@@ -233,6 +236,12 @@ public PainlessMethod lookupFunctionalInterfacePainlessMethod(Class<?> targetCla
return targetPainlessClass.functionalInterfaceMethod;
}

public Class<?> typeOfScriptToInjectForMethod(String methodName, int methodArity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this would fit better if the script was a parameter of the whitelisted method. Then this method wouldn't be need. It seems awkward to me to have what is specialized context information in the lookup that doesn't really know anything about contexts.

@nik9000
Copy link
Member Author

nik9000 commented Feb 2, 2021

I really like your idea of injecting members! I think its super flexible and let's the whitelists be much more general.

@nik9000
Copy link
Member Author

nik9000 commented Feb 2, 2021

I talked with @jdconrad out of band:

  1. I'm going try to switch from injecting the script to injecting the result of calling methods on the script. That'll make the whitelists a little more portable. In runtime fields those methods will just return this; but it quite flexible. We like the idea that the whitelist will no longer have to be dependent on the script itself.
  2. I'm going to see if I can remove the funny methodNameToInjectedScriptType lookup. That'll take a little performance testing, but we'll see!

@mark-vieira
Copy link
Contributor

@elasticmachine update branch

@nik9000
Copy link
Member Author

nik9000 commented Feb 3, 2021

2\. I'm going to see if I can remove the funny `methodNameToInjectedScriptType` lookup. That'll take a little performance testing, but we'll see!

I can make this one smaller but I don't think I can remove it. Mostly because we don't have a this inside lambdas and relying on having one would break every def call in a lambda. Which is terrible. We do want to fix this, but, now is probably not the time. So instead all def calls that might need this will refuse to compile in lambdas. This is probably fine because the only def calls that might need this after this PR are def.emit(). Nothing else needs it. Yet.

@jdconrad
Copy link
Contributor

jdconrad commented Feb 3, 2021

@nik9000 Yeah, let's reevaluate methodNameToInjectedScriptType after this PR. Thanks for considering the other immediate options, but I agree that none of them look reasonable at this moment.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question about this in general. Does this actually make scripts more readable? I'm not sure it does. The examples I see in tests seem to simply move the emit method to the end of the line, instead of it being at the beginning. So as an example:

before:

emit(doc['timestamp'].value.plusYears(2))

after:

doc['timestamp'].value.plusYears(2).emit()

So from my perspective, what is the user actually gaining other than yet another way to call emit? I'm overly cautious here for adding syntactic sugar because maintaining this support if we then came up with yet another way to make scripts "easier", these emit() methods on all types would be akin to the dot access for map values that has plagued us for so long now, which is almost impossible to get rid of at this point.

@nik9000
Copy link
Member Author

nik9000 commented Feb 4, 2021 via email

@nik9000
Copy link
Member Author

nik9000 commented Feb 4, 2021 via email

@nik9000 nik9000 removed the :Search/Search Search-related issues that do not fall into other categories label Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants