-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit multiple fields from a runtime field script #75108
Conversation
server/src/main/java/org/elasticsearch/index/mapper/ObjectRuntimeField.java
Outdated
Show resolved
Hide resolved
builder.startObject("fields"); | ||
for (RuntimeField subfield : subfields) { | ||
subfield.toXContent(builder, params); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we print out all the parameters above, and fields is also a parameter, but nothing gets printed out. It is probably good as we can only parse RuntimeFields at a later time, which is what we need to print out the subfields. Is what I am doing here ok? Do we have to somehow make sure that serializing the parameters does not serialize also the fields, why does nothing happen at the moment @romseygeek ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed this as someone not familiar with this part of the codebase (so needs another reviewer). I've left a few questions / comments but overall the PR looks in good shape to me.
new RuntimeField.Builder(name) { | ||
private final FieldMapper.Parameter<Script> script = new FieldMapper.Parameter<>( | ||
"script", | ||
false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be true, i.e., updateable? Same question for fields below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime field definitions don't get merged, they get replaced entirely, so this parameter doesn't apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing the field mapper parameter parsing was a nice hack but this shows one of the places where its kind of confusing. I'm not sure what to do about it though.
* @return the values that were emitted for the provided field | ||
*/ | ||
public final List<Object> getValues(String field) { | ||
//TODO for now we re-run the script every time a leaf field is accessed, but we could cache the values? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is such a significant limitation that this behavior should at least be documented, so that it does not surprise users that are have composite fields with lots of subfields.
It means that the benefits of the PR are mostly of syntactic nature, allowing users not to have to copy paste the same script into two field definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. There aren't any docs in this PR, they will be added as a follow-up in collaboration with the docs team, but we'll make sure to call out this limitation.
CompositeFieldScript.Factory objectFieldScript = (f, p, s) -> ctx -> new CompositeFieldScript(f, p, s, ctx) { | ||
@Override | ||
public void execute() { | ||
emit("field", "value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emit multiple fields here?
//TODO for now we re-run the script every time a leaf field is accessed, but we could cache the values? | ||
fieldValues.clear(); | ||
execute(); | ||
return fieldValues.get(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should fieldValues be cleared before returning field so that it does not hold onto unnecessary state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy!
I wonder if we could have emit
on the composite script bind its results into all of the "child" scripts immediately. Right now it looks like you can run a script and it'll collect a Map
of values. And then, after the script is finished, you extract just the ones you need. That's good because you don't convert the ones you don't need. But its bad that you silently ignore when you emit invalid values that you aren't reading. It also feels bad that the error comes after the script instead of during it. If we throw the error while running the script we'd get the line it fails on. Painless has try/catch
so you could even use that to dodge error cases or add extra information when an error is thrown.
A smaller thing, but now that there are three sorts of subclasses for stuff like DoubleFieldScript
maybe it'd be a good idea to have a more natural shaped supertype. The design of these classes is very much around painless implementing them, but we have those two cases where we implement it by hand. It might make sense to make refactor the base classes so they are less "painless shaped" and make a "painless shaped" layer that it can implement. I'm not sure, and, regardless, I don't think that is a "for now" thing.
} | ||
|
||
@Override | ||
public Collection<MappedFieldType> asMappedFieldTypes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe return a Stream
from this if you are going to be flatMap
ing it. If called needs a list they can make one?
At the moment the script is actually run by the child field, and there's no caching or sharing of values between different child fields. So if you have a grok defined on a parent field 'message' and you search on 'message.status' and 'message.ip', both fields will run the painless script separately and then extract the part of the map that they need. This isn't ideal, as Yannick mentions, but it's the way runtime fields work now. It would be great to rework things so that different runtime fields could share a LeafSearchLookup, in which case we could cache these results there, but that's a much bigger project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me overall, I left some initial comments. The name composite
has really grown on me!
server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/LeafRuntimeField.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/RuntimeField.java
Outdated
Show resolved
Hide resolved
Thanks for the reviews everybody! I have reworked things so that runtime builders now have two methods, one to create a simple leaf field and one to create a field with a parent. The parsing code can take an optional function that will choose which method to call when building fields, and it nicely separates out composite field construction from 'normal' field construction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me, just left some small comments. As a caveat, I am not deeply familiar with this code and its history (AbstractScriptFieldType
, RuntimeField
, etc.)
server/src/main/java/org/elasticsearch/index/mapper/AbstractScriptFieldType.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/script/CompositeFieldScript.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/query/SearchExecutionContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/CompositeRuntimeField.java
Show resolved
Hide resolved
@elasticmachine update branch |
We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes elastic#68203
We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. However, each runtime field can only emit values for a single field. This commit introduces support for emitting multiple fields from the same script. The API call to define a runtime field that emits multiple fields is the following: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log" : { "type" : "composite", "script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))", "fields" : { "clientip" : { "type" : "ip" }, "response" : { "type" : "long" } } } } } ``` The script context for this new field type accepts two emit signatures: * `emit(String, Object)` * `emit(Map)` Sub-fields need to be declared under fields in order to be discoverable through the field_caps API and accessible through the search API. The way that it emits multiple fields is by returning multiple MappedFieldTypes from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields. The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error `Found two runtime fields with same name [log.response]`: ``` PUT localhost:9200/logs/_mappings { "runtime" : { "log.response" : { "type" : "keyword" }, "log" : { "type" : "composite", "script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)", "fields" : { "response" : { "type" : "long" } } } } } ``` Closes #68203 Co-authored-by: Luca Cavanna <[email protected]>
Hi @romseygeek I have a null pointer fix for this in master but when I tried backporting that fix it looks like this feature is not backported in 7.x? The label on this PR says 7.15 so I'm not sure if it's still just pending? |
Ignore my last comment. I can see the NPE is in a different class in 7.15 - TransportFieldCapabilitiesIndexAction. |
@javanna - I'd like to confirm - composite fields can only have 'child' fields one level deep, correct? |
@mattkime yes that's correct |
We have recently introduced support for grok and dissect to the runtime fields Painless context that allows to split a field into multiple fields. Though each runtime field can only emit values for a single field. This PR introduces support for emitting multiple fields from the same script.
The API call to define a runtime field that emits multiple fields is the following:
The script context for this new field type accepts two emit signatures:
Sub-fields need to be declared under
fields
in order to be discoverable through the field_caps API and accessible through the search API. In this first iteration, mapping additional sub-fields requires to provide the whole object with its script and all the existing plus new sub-fields. In a follow-up we will address this by allowing users to provide only what needs to be changed or added.The way that it emits multiple fields is by returning multiple
MappedFieldType
s fromRuntimeField#asMappedFieldTypes
. The sub-fields are instances of the runtime fields that are already supported, with a little tweak to adapt the script defined by their parent to an artificial script factory for each of the sub-fields that makes its corresponding sub-field accessible. This approach allows to reuse all of the existing runtime fields code for the sub-fields.The runtime section has been flat so far as it has not supported objects until now. That stays the same, meaning that runtime fields can have dots in their names. Because there are though two ways to create the same field with the introduction of the ability to emit multiple fields, we have to make sure that a runtime field with a certain name cannot be defined twice, which is why the following mappings are rejected with the error
Found two runtime fields with same name [log.response]
:Closes #68203