-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some queries for runtime fields #58940
Some queries for runtime fields #58940
Conversation
Put together some example script contexts for scripted fields. This is almost certainly wrong, but it gets us something to think about.
This reworks the script contexts to support multiple values for each hit. Just like before, these are almost certainly not the final implementation, but they give us something to iterate on.
Painless extensions work differently now.
Uses a temporary hack to allow us to use painless in unit tests.
@Override | ||
public float matchCost() { | ||
// TODO we have no idea what this should be and no real way to get one | ||
return 1000f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cost can be for now the approximation.cost() , it is proportional to maxDocs if I remember correctly
@@ -21,6 +21,8 @@ | |||
|
|||
import org.elasticsearch.search.lookup.SearchLookup; | |||
|
|||
import java.io.IOException; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if (ifd instanceof SearchLookupAware) { | ||
((SearchLookupAware) ifd).setSearchLookup(searchLookup); | ||
} | ||
return ifd.load(reader).getScriptValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not foresee the need for these changes but I see how they are caused by me not changing fielddataBuilder and rather hacking query shard context. Just to double check: this is to support runtime fields that refer to other runtime fields, otherwise they have no search lookup set?
To keep this contained and have the hack in a single place, would it work to rather modify QueryShardContext#lookup to do the following? It may have weird consequences but in our branch with a big TODO to revert it it may be ok? Also let's mention in a comment specifically why this is needed?
public SearchLookup lookup() {
if (lookup == null) {
lookup = new SearchLookup(getMapperService(), this::getForField);
}
return lookup;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is what my TODO was about - maybe this should go through getForField
. I can look into making that change in master. I think it is good regardless. And, if it has consequences that we didn't think of we may as well see them earlier rather than later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this makes sense on master and what difference it would make, but I think it does make sense in our feature branch to isolate the hack around augmenting the fielddata impl.
|
||
@Override | ||
public float matchCost() { | ||
// TODO we don't have a good way of estimating the complexity of the script so we just go with 9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the constant, what value does it add? I think that the important part is that a script needs to be run for each document, which is the cost approximation. Then one script can be heavier than another, but I wonder if that is negligible at this stage, unless we can calculate the approximate cost of a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the javadoc, I think it should actually just be some constant - matchCost
says it is only for a single document.
If I just returned 1
here then I think Lucene'd reason about the query as though it were match_all
, which is probably bad
...rc/main/java/org/elasticsearch/xpack/runtimefields/mapper/RuntimeKeywordMappedFieldType.java
Outdated
Show resolved
Hide resolved
return DocValues::new; | ||
} | ||
|
||
private class DocValues extends SortedBinaryDocValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is effectively the docvalues implementation that replaces the one that I had written which you deleted?
I was wondering, and not even sure if it matters, queries don't need sorted doc_values right? But we always implement the interface that we need for fielddata to simplify things? I know that I had implemented the lucene one for query but it made things more complex for almost no reason, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is effectively the docvalues implementation that replaces the one that I had written which you deleted?
Yeah. I had mine sitting around and it had unit tests and stuff. Its the same deal.
I was wondering, and not even sure if it matters, queries don't need sorted doc_values right? But we always implement the interface that we need for fielddata to simplify things? I know that I had implemented the lucene one for query but it made things more complex for almost no reason, I think.
In the patch as it stands now the queries don't use this and look at the array of values directly. And they don't sort them unless they build this thing. When I was writing the queries it felt simpler not to target the doc values implementation - partially because most queries don't care about the values being sorted. And, if they did care about the values being sorted, then they'd want a sorted array. Which is what we use to build the doc values implementation anyway.
be83b55
to
e17fef8
Compare
@@ -297,8 +297,7 @@ MappedFieldType failIfFieldMappingNotFound(String name, MappedFieldType fieldMap | |||
|
|||
public SearchLookup lookup() { | |||
if (lookup == null) { | |||
lookup = new SearchLookup(getMapperService(), | |||
mappedFieldType -> indexFieldDataService.apply(mappedFieldType, fullyQualifiedIndex.getName())); | |||
lookup = new SearchLookup(getMapperService(), this::getForField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a TODO here, to remind us that we need to figure out if this should be merged upstream or not? once the searchlookupaware hack is removed and fielddatabuilder is adapted, we will not need this change anyways?
@@ -45,8 +45,7 @@ | |||
|
|||
private int docId = -1; | |||
|
|||
LeafDocLookup(MapperService mapperService, Function<MappedFieldType, IndexFieldData<?>> fieldDataLookup, | |||
LeafReaderContext reader) { | |||
LeafDocLookup(MapperService mapperService, Function<MappedFieldType, IndexFieldData<?>> fieldDataLookup, LeafReaderContext reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert please? Trying to trim down changes to stuff that's outside of the runtime fields plugin
docLookup = new LeafDocLookup(mapperService, | ||
ignored -> fieldData, | ||
null); | ||
docLookup = new LeafDocLookup(mapperService, ignored -> fieldData, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, let's revert?
* <p> | ||
* Inspired by the ForceNoBulkScoringQuery in Lucene's monitor project. | ||
*/ | ||
class ForceNoBulkScoringQuery extends Query { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this right?
@@ -27,7 +35,7 @@ | |||
private final StringScriptFieldScript.Factory scriptFactory; | |||
|
|||
RuntimeKeywordMappedFieldType(String name, Script script, StringScriptFieldScript.Factory scriptFactory, Map<String, String> meta) { | |||
super(name, false, false, TextSearchInfo.NONE, meta); | |||
super(name, false, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why SIMPLE_MATCH_ONLY instead of NONE? I think we end up creating a lucene field type here which should not be needed?
public final class DoubleRuntimeFieldHelper { | ||
@FunctionalInterface | ||
public interface NewLeafLoader { | ||
IntConsumer leafLoader(LeafReaderContext ctx, DoubleConsumer sync) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of my confusion is also around where the consumer is placed I think, which forces these indirections. Ideally, the values consumer would be provided anew for each document, either as an execute argument or while we setDocument? Is there some technical reason why it needs to provided with the leaf reader context? Ideally, we would be able to abstract this away and even have in the script base class the collection of values, so that every script allows to simply retrieve the result by calling getResult(docId) ? Do you think that would be possible?
return new Values().new TermQuery(fieldName, value); | ||
} | ||
|
||
public Query termsQuery(String fieldName, double... value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to see value in these helper classes, but maybe this is due to the confusion I mentioned in other comments. I would have expected these methods implemented in the mapped field type, creating the queries there directly, and have top-level classes like ScriptTermQuery etc. maybe that is because I am hoping that the collection of values can be abstracted and exposed as part of the script object itself or some wrapper around it.
If wonder if this Values and abstractions are some reminiscence of the previous implementation that was trying to cache values.
|
||
@Override | ||
protected void sort() { | ||
Arrays.sort(values, 0, count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorting is always done in the same way regardless of the data type right? I mean calling Arrays.sort
} | ||
|
||
private class Values extends AbstractRuntimeValues { | ||
private String[] values = new String[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just have the collection of values shared between the different types and use a typed ArrayList? Would that be a tragedy in terms of performance?
return DocValues::new; | ||
} | ||
|
||
private class DocValues extends SortedBinaryDocValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like I said in other comments, I think that you have all the bits that are needed, I only would turn around a bit the dependencies/flow. I would keep the existing doc_values impl, which is only used in aggs/sorting (and it is totally fine that it's not used in queries), and I would make it lighter so that all it does is really returning the result of the script for the current doc, by calling that script.getResult(docId) method I mentioned above. I think that would clarify who does what. At the moment it is hard to follow who needs doc_values who needs them sorted etc.
Pinging @elastic/es-search (:Search/Search) |
shall we close this @nik9000 ? I think that it was a good experiment and we can now proceed getting in what we need step by step. |
This implements a few queries for runtime fields. It doesn't wire them into the
runtime fields infrastructure yet because, well, we haven't committed that. But
the queries are there and they work. As do the doc values implementations.
You may wonder, Nik, why are you reimplementing all of these queries? Lucene
has tons of queries, some of them even against doc values. You make doc values,
why not just query the doc values you make with the Lucene queries? That'd be
a good question! It turns out that Lucene's doc_values queries are quite tied
up with Lucene's doc values implementations. They compare values using the
binary format that lucene stores the doc values in. Which is a great idea! But
it means that they only really work on Lucene's doc values implementations.
It also tries to only call each script once, even asserting that in most tests.
This isn't strictly required but I think it is useful and something that
would be difficult to design "after the fact". Still, this behavior shouldn't
block implementing the feature. We'll keep it if we can. But if we can't, well,
we'll drop and it and come back to it later.
relates to #59332