Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parsing of queries/filters, aggs, suggester APIs #10217

Closed
clintongormley opened this issue Mar 23, 2015 · 11 comments
Closed

Refactor parsing of queries/filters, aggs, suggester APIs #10217

clintongormley opened this issue Mar 23, 2015 · 11 comments
Labels
>breaking >enhancement Meta :Search/Search Search-related issues that do not fall into other categories

Comments

@clintongormley
Copy link
Contributor

clintongormley commented Mar 23, 2015

Copied from #9901:

Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.

This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages

  • XContent parsing creation are in one file and can be tested more easily
  • the class allows a typed in-memory representation of the query that can be modified before a lucene query is build
  • the query can be normalized and serialized via Streamable to be used as a normalized cache key (not depending on the order of the keys in the XContent)
  • the query can be parsed on the coordinating node to allow document prefetching etc. forwarding to the executing nodes would work via Streamable binary representation --> Should we parse search requests on the coordinating node? #8150
  • for the query cache a query tree can be "walked" to rewrite range queries into match all queries with MIN/MAX terms to get cache hits for sliding windows --> Kibana 4 unable to utilize query cache #9526
  • code wise two classes are merged into one which is nice

Queries (Completed)

Total of 54 Queries
54 done

Former filters were mostly merged or converted to queries and are included in this list.

Aggregations (Completed - #14136)

Total of 44 Aggregations
44 done

Suggesters

  • Term suggester
  • Phrase Suggester
  • Completion Suggester
  • Context Suggester

Total of 4 Suggesters
4 done, 0 in open PRs

Highlighters

  • plain
  • fvh
  • postings

Total of 3 Highlighters
3 done

Others

APIs to be adapted/revised besides _search

The above apis don't necessarily have to change to parse queries in our intermediate format, for instance the percolator will still need to parse to lucene query straight-away, but we should still have a look at each of those and double check if anything needs to be adjusted after all the infra changes we have made.

@mattweber
Copy link
Contributor

+1, In #3278 I perform a terms lookup after parsing but this happened on all shards and resulted in multiple lookup requests for a single query. This would allow the expensive lookup to be performed once on the coordinating node which would be very beneficial!

@javanna
Copy link
Member

javanna commented Mar 31, 2015

Here is our rough plan:

First step (everything still happens on the nodes that hold the relevant shards):

  • split existing QueryParser#parse() method into 1) fromXContent() that parses the query but allows to have an intermediate format for it and 2) toQuery() that allows to create the lucene query ouf the intermediate query. Keep the exisiting Query parse() method around temporarily, which will call both fromXContent and toQuery.
  • Move toQuery to QueryBuilders so that every query (intermediate format) can be transformed into the corresponding lucene query on each data node.
  • make all the QueryBuilders implement Streamable so that the intermediate query format can be sent over the wire between the nodes (will be unused at first)

Second step (query parsing moved to the coordinating node):

  • Refactor the whole SearchRequest so that instead of holding the source bytes array in json format, it holds different elements of a search request, all in Streamable format
  • leverage existing request validation mechanism so that SearchRequest#validate validates the query too (during TransportSearchAction#execute): this is very convenient as it gets called on the coordinating node, no matter where the request comes from (could come as json through rest layer, or as java objects through java api, either transport client or node client).
  • call fromXContent on the coordinating node and make use of Streamable methods to send queries over the wire
  • delete parse method from all QueryBuilders, not needed anymore.

Things we are not too happy about and might need improvement, will be tackled later on:

  • rename QueryBuilders by removing the Builders suffix, as they will not be really just builders anymore
  • java api side of things is heavier on users, since a single class exposes a lot of internal aspects, we might be able to make methods package private

@dakrone
Copy link
Member

dakrone commented Apr 2, 2015

@clintongormley et al,

Because we are touching every single query in this change, it also gives us the ability to remove support for camel-casing in queries where it exists. How do we feel about removing the camel casing in these PRs as well?

@clintongormley
Copy link
Contributor Author

@dakrone ++ - should we be doing this by using parseField and then later using #8963 to warn about the deprecations?

@cbuescher
Copy link
Member

We decided to reset the feature branch to the current tip of master and branch from there starting with #10580.

MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 6, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 7, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 11, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 12, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 13, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 18, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder.

The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 18, 2015
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes.

Relates to elastic#10217
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 18, 2015
One final refactoring of the SpanTermQuery - makes sure the class hierarchy works again.

Relates to elastic#10217
MaineC pushed a commit that referenced this issue May 18, 2015
…ring

Refactors SpanTermQueryBuilder.

Due to similarities with TermQueryBuilder a lot of code was moved into separate abstract classes that can be used by both - TermQueryBuilder and SpanTermQueryBuilder.

Relates to #10217
cbuescher added a commit that referenced this issue May 18, 2015
…est.

Split the parse(QueryParseContext ctx) method into a parsing and a query building part,
adding Streamable support for serialization and hashCode(), equals() for better testing.

This PR also adds test setup for two mappes fields (integer, date) to the BaseQueryTestCase
and introduces helper methods for optional conversion of String fields to BytesRef representation
that is shared with the already refactored BaseTermQueryBuilder.

Relates to #10217
Closes #11108
MaineC pushed a commit to MaineC/elasticsearch that referenced this issue May 21, 2015
This commit makes SimpleQueryStringBuilder streamable, add hashCode and equals.

Switched to using toLanguageTag/forLanguageTag when parsing Locales. Using LocaleUtils from either Elasticsearch or Apache commons resulted in Locales not passing the roundtrip test. For more info see https://issues.apache.org/jira/browse/LUCENE-4021

Relates to elastic#10217
cbuescher added a commit to cbuescher/elasticsearch that referenced this issue Sep 25, 2015
…n parameter

Before the refactoring we didn't check any invalid settings for strategy and relation
in the GeoShapeQueryBuilder. However, using SpatialStrategy.TERM and ShapeRelation.INTERSECTS
together is invalid and we tried to protect against that in the validate() method.

This PR moves these checks to setter for strategy and relation and adds tests for the new
behaviour.

Relates to elastic#10217
@javanna javanna added :Search Refactoring and removed :Core/Infra/Transport API Transport client API labels Oct 19, 2015
javanna added a commit to javanna/elasticsearch that referenced this issue Nov 2, 2015
 Similarly to what we did with the search api, we can now also move query parsing on the coordinating node for the validate query api. Given that the explain api is a single shard operation (compared to search which is instead a broadcast operation), this doesn't change a lot in how the api works internally. The main benefit is that we can simplify the java api by requiring a structured query object to be provided rather than a bytes array that will get parsed on the data node. Previously if you specified a QueryBuilder it would be serialized in json format and would get reparsed on the data node, while now it doesn't go through parsing anymore (as expected), given that after the query-refactoring we are able to properly stream queries natively. Note that the WrapperQueryBuilder can be used from the java api to provide a query as a string, in that case the actual parsing of the inner query will happen on the data node.

Relates to elastic#10217
Closes elastic#14384
cbuescher added a commit to cbuescher/elasticsearch that referenced this issue Feb 8, 2016
For the ongoing search refactoring (elastic#10217) the PhraseSuggestionBuilder
gets a way of parsing from xContent that will eventually replace the
current SuggestParseElement. This PR adds the fromXContent method
to the PhraseSuggestionBuilder and also adds parsing code for the
common suggestion parameters to SuggestionBuilder.

Also adding links from the Suggester implementations registeres in the
Suggesters registry to the corresponding prototype that is going to
be used for parsing once the refactoring is done and we switch from
parsing on shard to parsing on coordinating node.
cbuescher added a commit that referenced this issue Mar 16, 2016
Refactors all suggestion builders to be able to be parsed on
the coordinating node and serialized as objects to the shards.
Specifically, all SuggestionBuilder implementations implement
NamedWritable for serialization, a fromXContent() method that
handles parsing xContent and a build() method that is called
on the shard to create the SuggestionContext.

Relates to #10217
cbuescher added a commit to cbuescher/elasticsearch that referenced this issue Mar 16, 2016
For the current refactoring of SortBuilders related to elastic#10217,
each SortBuilder should get a build() method that produces a
SortField according to the SortBuilder parameters on the shard.

This change also slightly refactors the current parse method in
SortParseElement to extract an internal parse method that returns
a list of sort fields only needs a QueryShardContext as input
instead of a full SearchContext. This allows using this internal
parse method for testing.
cbuescher added a commit that referenced this issue Mar 22, 2016
For the refactoring of SortBuilders related to #10217, each SortBuilder needs to get a build()
method that produces a SortField according to the SortBuilder parameters on the shard.
@javanna
Copy link
Member

javanna commented Aug 11, 2016

I had a quick look at what is left to do here and marked inner_hits and sort done. There are some specific open issues marked :Search Refactoring, so I am wondering if we should close this issue. @cbuescher @colings86 Thoughts? What were your plans around alias filters?

@colings86
Copy link
Contributor

I'm for closing this issue. The few issues which are still open and tagged as :Search Refactoring can stand alone as issues and the success of this issue doesn't rely directly on them (though it would be nice to fix those issues. As for the alias filters maybe we can open a separate issue for them too?

@cbuescher
Copy link
Member

+1 on closing and opening a separate issue for the alias filters. Are they part of the SearchSourceBuilder? Just curious what place they have in terms of parsing incoming requests.

@colings86
Copy link
Contributor

@cbuescher I think the reason alias filter were on the list was because they are represented by a QueryBuilder object and could be stored as a serialised QueryBuilder rather than JSON. I agree though that it's not directly connected to this issue

@javanna
Copy link
Member

javanna commented Nov 4, 2016

What was most important to do for alias filters was done with #20916. Each search request against filtered indices was previously sending the filter as a string all the way to the shards, where the actual filter was parsed and converted to lucene query. Now the parsing happens once on the coordinating node, only toFilter gets called on each shard to get the corresponding lucene query. We still store alias filters in compressed XContent format as part of the cluster state, but I don't think that is going to change anytime soon. That said we can close this issue, nothing left to be done. Yay!

@javanna javanna closed this as completed Nov 4, 2016
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Search Refactoring labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >enhancement Meta :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants