Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a rewrite phase to queries on the shard level #16870

Merged
merged 1 commit into from
Mar 17, 2016
Merged

Adds a rewrite phase to queries on the shard level #16870

merged 1 commit into from
Mar 17, 2016

Conversation

colings86
Copy link
Contributor

This change adds a rewrite phase to the queries on the shard before they are assessed for caching or executed. This allows the opportunity to rewrite queries as faster running simpler queries based on attributes known to only the shard itself. The first query to implement this is the RangeQueryBuilder which will rewrite to a MatchAllQueryBuilder if the range of terms on the shard is a subset of the query and rewrites to a MatchNoneQueryBuilder if the range of terms on the shard is completely outside the query.

Closes #9526

if (optionalFormat != null) {
throw new UnsupportedOperationException("custom format isn't supported");
}
if (value instanceof java.lang.Long) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Long?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do this:

if (value instanceof Number) {
   return Long.toString(((Number)value).longValue());
} 

@s1monw
Copy link
Contributor

s1monw commented Feb 29, 2016

@colings86 I left some comments, looks great so far

@polyfractal
Copy link
Contributor

Just a note, and absolutely not a blocker, but I think this rewrite will "escape" the profiler. Rewrite timings are only recorded when they go through the ContextIndexSearcher, so this sidesteps that ability. Also, we don't know if we should even profile until a little later when the source is parsed.

Perhaps slap a "TODO: escapes profiling" in there somewhere so that I can revisit this in the future? I imagine the profiler will need a bit of massaging to accommodate this new style of rewrites (which will only become more common in the future).

@s1monw
Copy link
Contributor

s1monw commented Feb 29, 2016

Just a note, and absolutely not a blocker, but I think this rewrite will "escape" the profiler. Rewrite timings are only recorded when they go through the ContextIndexSearcher, so this sidesteps that ability. Also, we don't know if we should even profile until a little later when the source is parsed.

this is a different rewrite than lucene rewrite I think we are fine here?

@polyfractal
Copy link
Contributor

this is a different rewrite than lucene rewrite I think we are fine here?

Yeah, which is why it doesn't get timed :) I assumed we would want any rewrite (ours or Lucene) to be included in the final rewrite timing, since it's still work that must be done and adds to latency?

@s1monw
Copy link
Contributor

s1monw commented Feb 29, 2016

Yeah, which is why it doesn't get timed :) I assumed we would want any rewrite (ours or Lucene) to be included in the final rewrite timing, since it's still work that must be done and adds to latency?

compared to lucene I think we can ignore this rewriting here the one that is costly is MTQ rewrite this one is rather cheap.

@jpountz
Copy link
Contributor

jpountz commented Feb 29, 2016

+1 to ignore

@polyfractal
Copy link
Contributor

Works for me! :)

@colings86
Copy link
Contributor Author

Currently fails with seed -Dtests.seed=B4AEDE8445BFF1A5

@@ -734,6 +734,53 @@ public BytesReference ext() {
return ext;
}

public SearchSourceBuilder rewrite(QueryShardContext context) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get some basic javadocs here too?

@s1monw
Copy link
Contributor

s1monw commented Mar 2, 2016

this looks pretty awesome I left one minor comment... I love the simplification to the RangeQueryBuilderTests but I think we now need a dedicated unittest for FieldStatsProvider can you do that? Other than that LGTM

if (fieldStatsProvider.isDenseField(fieldName)) {
return new MatchAllQueryBuilder();
} else {
return new ExistsQueryBuilder(fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy we made those exist queries fast. :)

@s1monw
Copy link
Contributor

s1monw commented Mar 2, 2016

@colings86 I think this issue would close #9526

throw new UnsupportedOperationException("custom format isn't supported");
}
if (value instanceof Number) {
return java.lang.Float.toString(((Number) value).longValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this should be ((Number) value).floatValue() no? I think we need better dedicated tests for all of those?

throw new UnsupportedOperationException("custom format isn't supported");
}
if (value instanceof Number) {
return java.lang.Double.toString(((Number) value).longValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here doubleValue()

@s1monw
Copy link
Contributor

s1monw commented Mar 16, 2016

@colings86 this looks awesome - pretty close, I left some comments

@colings86
Copy link
Contributor Author

@s1monw I pushed another commit addressing most of your comments

@s1monw
Copy link
Contributor

s1monw commented Mar 16, 2016

LGTM

@colings86
Copy link
Contributor Author

@s1monw had to push one more commit to fix SimpleSearchIT.testSimpleDateRange().

@s1monw
Copy link
Contributor

s1monw commented Mar 17, 2016

LGTM

This change adds a rewrite phase to the queries on the shard before they are assessed for caching or executed. This allows the opportunity to rewrite queries as faster running simpler queries based on attributes known to only the shard itself. The first query to implement this is the RangeQueryBuilder which will rewrite to a MatchAllQueryBuilder if the range of terms on the shard is a subset of the query and rewrites to a MatchNoneQueryBuilder if the range of terms on the shard is completely outside the query.
@colings86 colings86 merged commit 0f6b8dd into elastic:master Mar 17, 2016
@colings86 colings86 deleted the feature/queryRewrite branch March 17, 2016 09:12
@agirbal
Copy link

agirbal commented Jul 19, 2016

quick note that based on a few production search use cases I've seen, supporting regular queries with hits would also be highly valuable. A typical example is if a dataset both has a "created time" (which drives the index name) and a "last updated time". Ideally querying on a window of "last updated time" that does not match anything in a given older index would make use of this optimization.

@s1monw
Copy link
Contributor

s1monw commented Jul 19, 2016

@agirbal see #19472

@s1monw s1monw removed the review label Jul 19, 2016
@clintongormley
Copy link
Contributor

also @agirbal regardless of caching, all range queries will make use of the optimization of checking whether the range intersects with the values in the shard.

@agirbal
Copy link

agirbal commented Jul 19, 2016

@clintongormley right I don't see what caching has to do with this. I think the query cache will be off like default. Great to hear there will be a "fail-fast" for shards that don't have the range.

@asafm
Copy link

asafm commented Jul 27, 2016

Is it possible to backport this to 2.x?

@colings86
Copy link
Contributor Author

@asafm Unfortunately not. This PR is relatively small but there was a huge amount of work that lead up to this change which involved refactoring every part of the search request (every query type, aggregation, highlighter, suggester, etc.) so that it could be parsed into an internal Elasticsearch QueryBuilder object on the coordinating node to allow it to be in a form where it can be rewritten. Unfortunately the changes are just too extensive to be able to backport.

@asafm
Copy link

asafm commented Jul 27, 2016

@colings86 Ok, thanks. When you do estimate Elastic 5.x is production worthy?

@colings86
Copy link
Contributor Author

5.x will be production worthy with the release of the 5.0.0 GA. When that will be is not easy to estimate as it depends on feedback we get from the alpha and beta versions.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants