-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a rewrite phase to queries on the shard level #16870
Conversation
if (optionalFormat != null) { | ||
throw new UnsupportedOperationException("custom format isn't supported"); | ||
} | ||
if (value instanceof java.lang.Long) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just Long
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just do this:
if (value instanceof Number) {
return Long.toString(((Number)value).longValue());
}
@colings86 I left some comments, looks great so far |
Just a note, and absolutely not a blocker, but I think this rewrite will "escape" the profiler. Rewrite timings are only recorded when they go through the ContextIndexSearcher, so this sidesteps that ability. Also, we don't know if we should even profile until a little later when the source is parsed. Perhaps slap a "TODO: escapes profiling" in there somewhere so that I can revisit this in the future? I imagine the profiler will need a bit of massaging to accommodate this new style of rewrites (which will only become more common in the future). |
this is a different rewrite than lucene rewrite I think we are fine here? |
Yeah, which is why it doesn't get timed :) I assumed we would want any rewrite (ours or Lucene) to be included in the final rewrite timing, since it's still work that must be done and adds to latency? |
compared to lucene I think we can ignore this rewriting here the one that is costly is MTQ rewrite this one is rather cheap. |
+1 to ignore |
Works for me! :) |
Currently fails with seed -Dtests.seed=B4AEDE8445BFF1A5 |
@@ -734,6 +734,53 @@ public BytesReference ext() { | |||
return ext; | |||
} | |||
|
|||
public SearchSourceBuilder rewrite(QueryShardContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get some basic javadocs here too?
this looks pretty awesome I left one minor comment... I love the simplification to the |
if (fieldStatsProvider.isDenseField(fieldName)) { | ||
return new MatchAllQueryBuilder(); | ||
} else { | ||
return new ExistsQueryBuilder(fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy we made those exist queries fast. :)
@colings86 I think this issue would close #9526 |
throw new UnsupportedOperationException("custom format isn't supported"); | ||
} | ||
if (value instanceof Number) { | ||
return java.lang.Float.toString(((Number) value).longValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this should be ((Number) value).floatValue()
no? I think we need better dedicated tests for all of those?
throw new UnsupportedOperationException("custom format isn't supported"); | ||
} | ||
if (value instanceof Number) { | ||
return java.lang.Double.toString(((Number) value).longValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here doubleValue()
@colings86 this looks awesome - pretty close, I left some comments |
@s1monw I pushed another commit addressing most of your comments |
LGTM |
@s1monw had to push one more commit to fix SimpleSearchIT.testSimpleDateRange(). |
LGTM |
This change adds a rewrite phase to the queries on the shard before they are assessed for caching or executed. This allows the opportunity to rewrite queries as faster running simpler queries based on attributes known to only the shard itself. The first query to implement this is the RangeQueryBuilder which will rewrite to a MatchAllQueryBuilder if the range of terms on the shard is a subset of the query and rewrites to a MatchNoneQueryBuilder if the range of terms on the shard is completely outside the query.
quick note that based on a few production search use cases I've seen, supporting regular queries with hits would also be highly valuable. A typical example is if a dataset both has a "created time" (which drives the index name) and a "last updated time". Ideally querying on a window of "last updated time" that does not match anything in a given older index would make use of this optimization. |
also @agirbal regardless of caching, all range queries will make use of the optimization of checking whether the range intersects with the values in the shard. |
@clintongormley right I don't see what caching has to do with this. I think the query cache will be off like default. Great to hear there will be a "fail-fast" for shards that don't have the range. |
Is it possible to backport this to 2.x? |
@asafm Unfortunately not. This PR is relatively small but there was a huge amount of work that lead up to this change which involved refactoring every part of the search request (every query type, aggregation, highlighter, suggester, etc.) so that it could be parsed into an internal Elasticsearch QueryBuilder object on the coordinating node to allow it to be in a form where it can be rewritten. Unfortunately the changes are just too extensive to be able to backport. |
@colings86 Ok, thanks. When you do estimate Elastic 5.x is production worthy? |
5.x will be production worthy with the release of the 5.0.0 GA. When that will be is not easy to estimate as it depends on feedback we get from the alpha and beta versions. |
This change adds a rewrite phase to the queries on the shard before they are assessed for caching or executed. This allows the opportunity to rewrite queries as faster running simpler queries based on attributes known to only the shard itself. The first query to implement this is the RangeQueryBuilder which will rewrite to a MatchAllQueryBuilder if the range of terms on the shard is a subset of the query and rewrites to a MatchNoneQueryBuilder if the range of terms on the shard is completely outside the query.
Closes #9526