-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor parsing of queries/filters, aggs, suggester APIs #10217
Comments
+1, In #3278 I perform a terms lookup after parsing but this happened on all shards and resulted in multiple lookup requests for a single query. This would allow the expensive lookup to be performed once on the coordinating node which would be very beneficial! |
Here is our rough plan: First step (everything still happens on the nodes that hold the relevant shards):
Second step (query parsing moved to the coordinating node):
Things we are not too happy about and might need improvement, will be tackled later on:
|
@clintongormley et al, Because we are touching every single query in this change, it also gives us the ability to remove support for camel-casing in queries where it exists. How do we feel about removing the camel casing in these PRs as well? |
We decided to reset the feature branch to the current tip of master and branch from there starting with #10580. |
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
This attempts to do to SpanTermQueryBuilder what has already been changed for TermQueryBuilder. The commit tries to avoid code duplication where possible by pulling what is the same for both QueryBuilders and tests into separate classes. Relates to elastic#10217
One final refactoring of the SpanTermQuery - makes sure the class hierarchy works again. Relates to elastic#10217
…ring Refactors SpanTermQueryBuilder. Due to similarities with TermQueryBuilder a lot of code was moved into separate abstract classes that can be used by both - TermQueryBuilder and SpanTermQueryBuilder. Relates to #10217
…est. Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable support for serialization and hashCode(), equals() for better testing. This PR also adds test setup for two mappes fields (integer, date) to the BaseQueryTestCase and introduces helper methods for optional conversion of String fields to BytesRef representation that is shared with the already refactored BaseTermQueryBuilder. Relates to #10217 Closes #11108
This commit makes SimpleQueryStringBuilder streamable, add hashCode and equals. Switched to using toLanguageTag/forLanguageTag when parsing Locales. Using LocaleUtils from either Elasticsearch or Apache commons resulted in Locales not passing the roundtrip test. For more info see https://issues.apache.org/jira/browse/LUCENE-4021 Relates to elastic#10217
…n parameter Before the refactoring we didn't check any invalid settings for strategy and relation in the GeoShapeQueryBuilder. However, using SpatialStrategy.TERM and ShapeRelation.INTERSECTS together is invalid and we tried to protect against that in the validate() method. This PR moves these checks to setter for strategy and relation and adds tests for the new behaviour. Relates to elastic#10217
Similarly to what we did with the search api, we can now also move query parsing on the coordinating node for the validate query api. Given that the explain api is a single shard operation (compared to search which is instead a broadcast operation), this doesn't change a lot in how the api works internally. The main benefit is that we can simplify the java api by requiring a structured query object to be provided rather than a bytes array that will get parsed on the data node. Previously if you specified a QueryBuilder it would be serialized in json format and would get reparsed on the data node, while now it doesn't go through parsing anymore (as expected), given that after the query-refactoring we are able to properly stream queries natively. Note that the WrapperQueryBuilder can be used from the java api to provide a query as a string, in that case the actual parsing of the inner query will happen on the data node. Relates to elastic#10217 Closes elastic#14384
For the ongoing search refactoring (elastic#10217) the PhraseSuggestionBuilder gets a way of parsing from xContent that will eventually replace the current SuggestParseElement. This PR adds the fromXContent method to the PhraseSuggestionBuilder and also adds parsing code for the common suggestion parameters to SuggestionBuilder. Also adding links from the Suggester implementations registeres in the Suggesters registry to the corresponding prototype that is going to be used for parsing once the refactoring is done and we switch from parsing on shard to parsing on coordinating node.
Refactors all suggestion builders to be able to be parsed on the coordinating node and serialized as objects to the shards. Specifically, all SuggestionBuilder implementations implement NamedWritable for serialization, a fromXContent() method that handles parsing xContent and a build() method that is called on the shard to create the SuggestionContext. Relates to #10217
For the current refactoring of SortBuilders related to elastic#10217, each SortBuilder should get a build() method that produces a SortField according to the SortBuilder parameters on the shard. This change also slightly refactors the current parse method in SortParseElement to extract an internal parse method that returns a list of sort fields only needs a QueryShardContext as input instead of a full SearchContext. This allows using this internal parse method for testing.
For the refactoring of SortBuilders related to #10217, each SortBuilder needs to get a build() method that produces a SortField according to the SortBuilder parameters on the shard.
I had a quick look at what is left to do here and marked |
I'm for closing this issue. The few issues which are still open and tagged as |
+1 on closing and opening a separate issue for the alias filters. Are they part of the SearchSourceBuilder? Just curious what place they have in terms of parsing incoming requests. |
@cbuescher I think the reason alias filter were on the list was because they are represented by a QueryBuilder object and could be stored as a serialised QueryBuilder rather than JSON. I agree though that it's not directly connected to this issue |
What was most important to do for alias filters was done with #20916. Each search request against filtered indices was previously sending the filter as a string all the way to the shards, where the actual filter was parsed and converted to lucene query. Now the parsing happens once on the coordinating node, only |
Copied from #9901:
Today we have a massive infrastructure to parse all our requests. We have client side builders and server side parsers but no real representation of the query, filter, aggregation etc until it's executed. What is produced from a XContent binary is a Lucene query directly which causes huge parse methods in separate classes etc. that hare hard to test and don't allow decoupled modifications or actions on the query itself between parsing and executing.
This refactoring splits the parsing and the creation of the lucene query, this has a couple of advantages
Queries (Completed)
Total of 54 Queries
54 done
Former filters were mostly merged or converted to queries and are included in this list.
Aggregations (Completed - #14136)
Total of 44 Aggregations
44 done
Suggesters
Total of 4 Suggesters
4 done, 0 in open PRs
Highlighters
Total of 3 Highlighters
3 done
Others
query_binary: (removed, queries should only be specified via type-safe builders in the Java API, see Remove query_binary, filter_binary & aggs_binary #14308)filter_binary: (removed, filters should only be specified via type-safe builders in the Java API, see Remove query_binary, filter_binary & aggs_binary #14308)aggregations_binary: (to be removed, aggregations should only be specified via type-safe builders in the Java API, see Remove query_binary, filter_binary & aggs_binary #14308 and Refactoring of Aggregations #14136)APIs to be adapted/revised besides _search
search exists api(removed, see Remove search exists api #13911)index warmers(removed, see Remove query warmers and the warmer API. #15614)The above apis don't necessarily have to change to parse queries in our intermediate format, for instance the percolator will still need to parse to lucene query straight-away, but we should still have a look at each of those and double check if anything needs to be adjusted after all the infra changes we have made.
The text was updated successfully, but these errors were encountered: