Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Lucene query pushdown optimization #671

Conversation

dai-chen
Copy link
Member

@dai-chen dai-chen commented Aug 7, 2020

Issue #, if available:

Description of changes: The major changes in this PR are new added LuceneQuery abstraction and its subclass to implement different Lucene query APIs. To clarify, Lucene query here means Lucene query via Elasticsearch DSL. We don't bypass DSL and call Lucene directly.

Problem Statement: In PR #663, we register our expression as new script language and optimize expression evaluation by pushdown to script query in Elasticsearch DSL. However, what was not covered yet is that how we can leverage Lucene API for expression that can be optimized. For certain expression, we want to optimize filtering expression further by pushing down to Lucene query fully or partially.

Solution: The core logic is in FilterQueryBuilder.visitFunction method which is executed in the top down way:

  1. AND, OR, NOT: Translate to bool query and visit left/right side recursively
  2. Functions that Lucene may support: Translate to Lucene query if left side is a reference (field name) and right is a literal. Otherwise go to 3.
  3. Functions that Lucene doesn't support or argument is not valid as mentioned in 2: Serialize the expression (subtree with current function node as root) and translate to script query.

Here is an example to help understand. AND is translated to bool filter query directly (case #1). The name = 'John' is eligible because = can be translated to Lucene term query and left side (first argument) is a reference and right side is a literal (case #2). ABS(age) = 30 is translated to a script query (case #3).

lucene-pushdown

Testing: new UT and PPL IT can pass. Since we don't have explain API yet, SQL IT and doctest will be added later.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dai-chen dai-chen self-assigned this Aug 7, 2020
@dai-chen dai-chen force-pushed the lucene-pushdown-optimization branch from da47f8b to d478810 Compare August 11, 2020 16:00
@dai-chen dai-chen marked this pull request as ready for review August 13, 2020 19:56
@dai-chen dai-chen requested review from penghuo and chloe-zh August 13, 2020 19:56
Copy link
Contributor

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

Copy link
Member

@chloe-zh chloe-zh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,thanks!

@dai-chen dai-chen merged commit 4962a37 into opendistro-for-elasticsearch:develop Aug 17, 2020
@dai-chen dai-chen deleted the lucene-pushdown-optimization branch August 17, 2020 18:18
penghuo pushed a commit to penghuo/sql that referenced this pull request Aug 21, 2020
* Add lucene builder interface and term query impl

* Add lucene query interface and term query impl

* Add range query impl

* Add more UT for range query

* Add wildcard query impl

* Add exists query impl

* Pass jacoco test

* Prepare PR

* Only push down filter close to relation

* Prepare PR

* Add limitation doc

* Add limitation doc

* Add limitation doc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants