Refactor parsing of queries/filters, aggs, suggester APIs #10217

clintongormley · 2015-03-23T04:43:40Z

Copied from #9901:

Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.

This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages

XContent parsing creation are in one file and can be tested more easily
the class allows a typed in-memory representation of the query that can be modified before a lucene query is build
the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent)
the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> Should we parse search requests on the coordinating node? #8150
for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> Kibana 4 unable to utilize query cache #9526
code wise two classes are merged into one which is nice

Queries (Completed)

Total of 54 Queries
54 done

Former filters were mostly merged or converted to queries and are included in this list.

Aggregations (Completed - #14136)

Total of 44 Aggregations
44 done

Suggesters

Term suggester
Phrase Suggester
Completion Suggester
Context Suggester

Total of 4 Suggesters
4 done, 0 in open PRs

Highlighters

plain
fvh
postings

Total of 3 Highlighters
3 done

Others

APIs to be adapted/revised besides _search

~~search exists api~~ (removed, see Remove search exists api #13911)
explain api: (Explain api: move query parsing to the coordinating node #14270)
validate query api: (Validate query api: move query parsing to the coordinating node #14384)
suggest api (Remove suggest transport action #17198)
percolator (Planned to be removed, see Replace percolate APIs with a percolator query #16349)
~~index warmers~~ (removed, see Remove query warmers and the warmer API. #15614)
alias filters

The above apis don't necessarily have to change to parse queries in our intermediate format, for instance the percolator will still need to parse to lucene query straight-away, but we should still have a look at each of those and double check if anything needs to be adjusted after all the infra changes we have made.

mattweber · 2015-03-23T18:38:06Z

+1, In #3278 I perform a terms lookup after parsing but this happened on all shards and resulted in multiple lookup requests for a single query. This would allow the expensive lookup to be performed once on the coordinating node which would be very beneficial!

javanna · 2015-03-31T17:13:59Z

Here is our rough plan:

First step (everything still happens on the nodes that hold the relevant shards):

split existing QueryParser#parse() method into 1) fromXContent() that parses the query but allows to have an intermediate format for it and 2) toQuery() that allows to create the lucene query ouf the intermediate query. Keep the exisiting Query parse() method around temporarily, which will call both fromXContent and toQuery.
Move toQuery to QueryBuilders so that every query (intermediate format) can be transformed into the corresponding lucene query on each data node.
make all the QueryBuilders implement Streamable so that the intermediate query format can be sent over the wire between the nodes (will be unused at first)

Second step (query parsing moved to the coordinating node):

Refactor the whole SearchRequest so that instead of holding the source bytes array in json format, it holds different elements of a search request, all in Streamable format
leverage existing request validation mechanism so that SearchRequest#validate validates the query too (during TransportSearchAction#execute): this is very convenient as it gets called on the coordinating node, no matter where the request comes from (could come as json through rest layer, or as java objects through java api, either transport client or node client).
call fromXContent on the coordinating node and make use of Streamable methods to send queries over the wire
delete parse method from all QueryBuilders, not needed anymore.

Things we are not too happy about and might need improvement, will be tackled later on:

rename QueryBuilders by removing the Builders suffix, as they will not be really just builders anymore
java api side of things is heavier on users, since a single class exposes a lot of internal aspects, we might be able to make methods package private

dakrone · 2015-04-02T20:49:12Z

@clintongormley et al,

Because we are touching every single query in this change, it also gives us the ability to remove support for camel-casing in queries where it exists. How do we feel about removing the camel casing in these PRs as well?

clintongormley · 2015-04-05T17:52:55Z

@dakrone ++ - should we be doing this by using parseField and then later using #8963 to warn about the deprecations?

cbuescher · 2015-04-17T19:06:38Z

We decided to reset the feature branch to the current tip of master and branch from there starting with #10580.

This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217

One final refactoring of the SpanTermQuery - makes sure the class hierarchy works again. Relates to elastic#10217

…ring Refactors SpanTermQueryBuilder. Due to similarities with TermQueryBuilder a lot of code was moved into separate abstract classes that can be used by both - TermQueryBuilder and SpanTermQueryBuilder. Relates to #10217

…est. Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable support for serialization and hashCode(), equals() for better testing. This PR also adds test setup for two mappes fields (integer, date) to the BaseQueryTestCase and introduces helper methods for optional conversion of String fields to BytesRef representation that is shared with the already refactored BaseTermQueryBuilder. Relates to #10217 Closes #11108

This commit makes SimpleQueryStringBuilder streamable, add hashCode and equals. Switched to using toLanguageTag/forLanguageTag when parsing Locales. Using LocaleUtils from either Elasticsearch or Apache commons resulted in Locales not passing the roundtrip test. For more info see https://issues.apache.org/jira/browse/LUCENE-4021 Relates to elastic#10217

…n parameter Before the refactoring we didn't check any invalid settings for strategy and relation in the GeoShapeQueryBuilder. However, using SpatialStrategy.TERM and ShapeRelation.INTERSECTS together is invalid and we tried to protect against that in the validate() method. This PR moves these checks to setter for strategy and relation and adds tests for the new behaviour. Relates to elastic#10217

Similarly to what we did with the search api, we can now also move query parsing on the coordinating node for the validate query api. Given that the explain api is a single shard operation (compared to search which is instead a broadcast operation), this doesn't change a lot in how the api works internally. The main benefit is that we can simplify the java api by requiring a structured query object to be provided rather than a bytes array that will get parsed on the data node. Previously if you specified a QueryBuilder it would be serialized in json format and would get reparsed on the data node, while now it doesn't go through parsing anymore (as expected), given that after the query-refactoring we are able to properly stream queries natively. Note that the WrapperQueryBuilder can be used from the java api to provide a query as a string, in that case the actual parsing of the inner query will happen on the data node. Relates to elastic#10217 Closes elastic#14384

For the ongoing search refactoring (elastic#10217) the PhraseSuggestionBuilder gets a way of parsing from xContent that will eventually replace the current SuggestParseElement. This PR adds the fromXContent method to the PhraseSuggestionBuilder and also adds parsing code for the common suggestion parameters to SuggestionBuilder. Also adding links from the Suggester implementations registeres in the Suggesters registry to the corresponding prototype that is going to be used for parsing once the refactoring is done and we switch from parsing on shard to parsing on coordinating node.

Refactors all suggestion builders to be able to be parsed on the coordinating node and serialized as objects to the shards. Specifically, all SuggestionBuilder implementations implement NamedWritable for serialization, a fromXContent() method that handles parsing xContent and a build() method that is called on the shard to create the SuggestionContext. Relates to #10217

For the current refactoring of SortBuilders related to elastic#10217, each SortBuilder should get a build() method that produces a SortField according to the SortBuilder parameters on the shard. This change also slightly refactors the current parse method in SortParseElement to extract an internal parse method that returns a list of sort fields only needs a QueryShardContext as input instead of a full SearchContext. This allows using this internal parse method for testing.

For the refactoring of SortBuilders related to #10217, each SortBuilder needs to get a build() method that produces a SortField according to the SortBuilder parameters on the shard.

javanna · 2016-08-11T11:46:56Z

I had a quick look at what is left to do here and marked inner_hits and sort done. There are some specific open issues marked :Search Refactoring, so I am wondering if we should close this issue. @cbuescher @colings86 Thoughts? What were your plans around alias filters?

colings86 · 2016-08-11T13:49:05Z

I'm for closing this issue. The few issues which are still open and tagged as :Search Refactoring can stand alone as issues and the success of this issue doesn't rely directly on them (though it would be nice to fix those issues. As for the alias filters maybe we can open a separate issue for them too?

cbuescher · 2016-08-11T13:54:02Z

+1 on closing and opening a separate issue for the alias filters. Are they part of the SearchSourceBuilder? Just curious what place they have in terms of parsing incoming requests.

colings86 · 2016-08-11T13:58:29Z

@cbuescher I think the reason alias filter were on the list was because they are represented by a QueryBuilder object and could be stored as a serialised QueryBuilder rather than JSON. I agree though that it's not directly connected to this issue

javanna · 2016-11-04T17:05:32Z

What was most important to do for alias filters was done with #20916. Each search request against filtered indices was previously sending the filter as a string all the way to the shards, where the actual filter was parsed and converted to lucene query. Now the parsing happens once on the coordinating node, only toFilter gets called on each shard to get the corresponding lucene query. We still store alias filters in compressed XContent format as part of the cluster state, but I don't think that is going to change anytime soon. That said we can close this issue, nothing left to be done. Yay!

clintongormley added >enhancement >breaking v2.0.0-beta1 Meta labels Mar 23, 2015

clintongormley mentioned this issue Mar 23, 2015

Roadmap for 2.0 #9970

Closed

14 tasks

colings86 mentioned this issue Mar 24, 2015

Aggregations: Add moving average aggregation #10024

Closed

cbuescher mentioned this issue Mar 30, 2015

Query Refactoring: Merging Parser and Builder classes #10324

Merged

javanna mentioned this issue Mar 31, 2015

Should we parse search requests on the coordinating node? #8150

Closed

colings86 mentioned this issue Apr 7, 2015

Maximum Bucket reducer #10250

Merged

dakrone mentioned this issue Apr 10, 2015

There should be common interfaces for the same name methods in java client library #5361

Closed

cbuescher mentioned this issue Apr 13, 2015

Query refactoring: Introduce toQuery() and fromXContent() methods in QueryBuilders and QueryParsers #10580

Closed

alexksikes mentioned this issue May 5, 2015

Deprecate + Remove More Like This API #10736

Closed

MaineC mentioned this issue May 6, 2015

Refactors SpanTermQueryBuilder. #11005

Merged

cbuescher mentioned this issue May 12, 2015

Query refactoring: BoolQueryBuilder and Parser #11121

Merged

MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 18, 2015

Adjusting SpanTermQuery to work w/ latest changes.

7a7c7f4

One final refactoring of the SpanTermQuery - makes sure the class hierarchy works again. Relates to elastic#10217

javanna mentioned this issue Sep 25, 2015

Query refactoring: split parse phase into fromXContent and toQuery for all queries #13788

Merged

javanna added :Search Refactoring and removed :Core/Infra/Transport API Transport client API labels Oct 19, 2015

javanna mentioned this issue Oct 19, 2015

Java api: revise SearchRequest and SearchRequestBuilder object structure #14190

Closed

javanna mentioned this issue Oct 31, 2015

Deduplicate search sections parsing code #14413

Closed

clintongormley mentioned this issue Nov 21, 2015

Add phase to prefetch doc for query #6719

Closed

cbuescher mentioned this issue Nov 26, 2015

Make HighlightBuilder implement Writable #15044

Closed

MaineC mentioned this issue Dec 2, 2015

Make sort implement writable #15178

Closed

colings86 mentioned this issue Dec 7, 2015

Replace MovingAvgModel.Streams with NamedWriteable #15279

Closed

clintongormley mentioned this issue Jan 20, 2016

Caching of filters #16108

Closed

colings86 mentioned this issue Jan 28, 2016

Log slow queries as json, not binary. #12992

Closed

cbuescher mentioned this issue Feb 1, 2016

Refactor PhraseSuggestionBuilder.DirectCandidateGenerator #16185

Merged

cbuescher mentioned this issue Feb 29, 2016

Remove QueryInnerHits from HasChildQuery, HasParentQuery and NestedQueryBuilder #16856

Closed

cbuescher mentioned this issue Mar 14, 2016

Refactoring of Suggestions #17096

Merged

cbuescher mentioned this issue Mar 16, 2016

Add build() method to SortBuilder implementations #17146

Merged

areek mentioned this issue Mar 18, 2016

Remove suggest transport action #17198

Merged

javanna closed this as completed Nov 4, 2016

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Search Refactoring labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Refactor parsing of queries/filters, aggs, suggester APIs #10217

clintongormley commented Mar 23, 2015 •

edited by javanna

Loading

mattweber commented Mar 23, 2015

javanna commented Mar 31, 2015

dakrone commented Apr 2, 2015

clintongormley commented Apr 5, 2015

cbuescher commented Apr 17, 2015

javanna commented Aug 11, 2016

colings86 commented Aug 11, 2016

cbuescher commented Aug 11, 2016

colings86 commented Aug 11, 2016

javanna commented Nov 4, 2016

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Comments

clintongormley commented Mar 23, 2015 • edited by javanna Loading

Queries (Completed)

Aggregations (Completed - #14136)

Suggesters

Highlighters

Others

APIs to be adapted/revised besides _search

mattweber commented Mar 23, 2015

javanna commented Mar 31, 2015

dakrone commented Apr 2, 2015

clintongormley commented Apr 5, 2015

cbuescher commented Apr 17, 2015

javanna commented Aug 11, 2016

colings86 commented Aug 11, 2016

cbuescher commented Aug 11, 2016

colings86 commented Aug 11, 2016

javanna commented Nov 4, 2016

clintongormley commented Mar 23, 2015 •

edited by javanna

Loading