-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Combine Disjunctive CIDRMatch #111501
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit on the fence about reusing/extending this rule, but it does make some sense as "generic" OR-combiner; and then also rename to drop the ...ToIn
.
...src/test/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToInTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/ip.csv-spec
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok re-using this rule to also cover CIDRs. The final result of these combinations (either ==
or IN
or CIDR
) have a common result - the use of a terms
query.
But:
- the javadoc of the class must be updated to reflect the new reality
- other comments in the class itself need update or new comments added
- an additional test I think it's needed in
CombineDisjunctionsToInTests
that cover more complex queries:WHERE host == "alpha" OR host == "gamma" OR CIDR_MATCH(ip1, "127.0.0.2/32") OR CIDR_MATCH(ip1, "127.0.0.3/32") OR card IN ("eth0", "eth1") OR card == "lo0" OR CIDR_MATCH(ip0, "127.0.0.1") OR CIDR_MATCH(ip0, "fe80::cae2:65ff:fece:feb9") OR host == "beta"
- use the same test as above in the
csv-spec
file - look into why the query above is translated into
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "host",
"next": {
"terms": {
"host": [
"alpha",
"gamma",
"beta"
],
"boost": 1
}
},
"source": "host@1:20"
}
},
{
"esql_single_value": {
"field": "card",
"next": {
"terms": {
"card": [
"eth0",
"eth1",
"lo0"
],
"boost": 1
}
},
"source": "card@1:128"
}
},
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "ip1",
"next": {
"terms": {
"ip1": [
"127.0.0.2/32",
"127.0.0.3/32"
],
"boost": 1
}
},
"source": "ip1@1:69"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"terms": {
"ip0": [
"127.0.0.1",
"fe80::cae2:65ff:fece:feb9"
],
"boost": 1
}
},
"source": "ip0@1:184"
}
}
],
"boost": 1
}
}
],
"boost": 1
}
}
There is an additional bool
query inside but I don't think it should be there (there should be only one bool
query with 4 should
statements).
- this query has a strange behavior, maybe we can improve it:
WHERE host == "alpha" OR host == "gamma" OR ip1 IN ("127.0.0.2/32"::ip) OR CIDR_MATCH(ip1, "127.0.0.3/32") OR card IN ("eth0", "eth1") OR card == "lo0" OR ip0 IN ("127.0.0.1"::ip) OR CIDR_MATCH(ip0, "fe80::cae2:65ff:fece:feb9") OR host == "beta"
is translated into (ip0
and ip1
are not included in the same terms
query, even though they could; yes, they are coming from ip0 IN ....
and cidr_match(ip0...)
but the resulting queries could be combined in a better way):
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "card",
"next": {
"terms": {
"card": [
"eth0",
"eth1",
"lo0"
],
"boost": 1
}
},
"source": "card@1:124"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"term": {
"ip0": {
"value": "127.0.0.1"
}
}
},
"source": "ip0@1:169"
}
},
{
"esql_single_value": {
"field": "host",
"next": {
"terms": {
"host": [
"alpha",
"gamma",
"beta"
],
"boost": 1
}
},
"source": "host@1:20"
}
},
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "ip1",
"next": {
"terms": {
"ip1": [
"127.0.0.3/32"
],
"boost": 1
}
},
"source": "ip1@1:100"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"terms": {
"ip0": [
"fe80::cae2:65ff:fece:feb9"
],
"boost": 1
}
},
"source": "ip0@1:208"
}
}
],
"boost": 1
}
}
],
"boost": 1
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this looks good to me. Bogdan pointed out some good additional edge cases that I also think we should add tests for.
One thing I think would be great is tests that show that the combined CIDR_MATCH now gets pushed down to Lucene; I added one in #105061, it'd be great to have another one like that that comes with different kinds of disjunctions (and conjunctions, for good measure) of CIDR_MATCHes.
I think previously, we were not able to push down multiple CIDR_MATCHes (or at least, not as a single terms query), but combined this should now be possible.
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
Thanks to @astefan @bpintea and @alex-spies for reviewing! These are really good test cases!! The disjunctive or(tree) we build is not a left or right deep tree, it is a balance tree in stead, The |
Thank you for catching these interesting problems! Both queries are added to either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Left only few minor remarks.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I have a minor testing suggestion, but this is already good as-is IMO.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Outdated
Show resolved
Hide resolved
Thank you for the iterations @fang-xing-esql ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/main: (22 commits) Prune changelogs after 8.15.0 release Bump versions after 8.15.0 release EIS integration (elastic#111154) Skip LOOKUP/INLINESTATS cases unless on snapshot (elastic#111755) Always enforce strict role validation (elastic#111056) Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testUnsupportedAndMultiTypedFields elastic#111753 [ML] Force time shift integration test (elastic#111620) ESQL: Add tests for sort, where with unsupported type (elastic#111737) [ML] Force time shift documentation (elastic#111668) Fix remote cluster credential secure settings reload (elastic#111535) ESQL: Fix for overzealous validation in case of invalid mapped fields (elastic#111475) Pass allow security manager flag in gradle test policy setup plugin (elastic#111725) Rename streamContent/Separator to bulkContent/Separator (elastic#111716) Mute org.elasticsearch.tdigest.ComparisonTests testSparseGaussianDistribution elastic#111721 Remove 8.14 from branches.json Only emit product origin in deprecation log if present (elastic#111683) Forward port release notes for v8.15.0 (elastic#111714) [ES|QL] Combine Disjunctive CIDRMatch (elastic#111501) ESQL: Remove qualifier from attrs (elastic#110581) Force using the last centroid during merging (elastic#111644) ... # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java
* support CIDRMatch in CombineDisjunctions
* support CIDRMatch in CombineDisjunctions
Resolves #105143
This PR leverages
CombineDisjunctionsToIn
as it is very similar with combining multipleEquals
s orIn
s intoIn
, and efficient if do it in one pass.