Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Optimize multiple CIDR_MATCHes #105143

Closed
alex-spies opened this issue Feb 5, 2024 · 3 comments · Fixed by #111501
Closed

ESQL: Optimize multiple CIDR_MATCHes #105143

alex-spies opened this issue Feb 5, 2024 · 3 comments · Fixed by #111501
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@alex-spies
Copy link
Contributor

Follow up to #105042

Two (or more) CIDR_MATCHes in an OR expression can be integrated if they are on the same field.

WHERE CIDR_MATCH(field, "123.0.0.0/24") OR CIDR_MATCH(field, "123.1.2.4.5/24") OR CIDR_MATCH(another_field, "172.0.0.1/16") AND CIDR_MATCH(extra_field, "5.4.2.1/8")

gets optimized into

WHERE CIDR_MATCH(field, "123.0.0.0/24", "123.1.2.4.5/24") OR CIDR_MATCH(another_field, "172.0.0.1/16") AND CIDR_MATCH(extra_field, "5.4.2.1/8")
@alex-spies alex-spies self-assigned this Feb 5, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies
Copy link
Contributor Author

alex-spies commented Feb 5, 2024

We could add a logical optimization that would replace OR expressions by merging any CIDR_MATCH(field, ...) with the same field.

However, I'm not sure about the difference that would make.

Due to how IpFieldMapper.termsQuery works, a pushed down WHERE CIDR_MATCH(ip_field, "127.0.1.0/24", "127.0.0.0/24") results in a BooleanQuery (equivalent to "should": ["term": "127.0.1.0/24", "term": 127.0.0.0/24"]). (IpFieldMapper falls back to MappedFieldType.termsQuery.)

#105061 pushes down WHERE CIDR_MATCH(ip_field, block1, block2, ...) and represents it as a SingleValueQuery wrapped around the BooleanQuery from the previous paragraph.

As a consequence, the difference between

  1. WHERE CIDR_MATCH(ip_field, block1) OR CIDR_MATCH(ip_field, block2) and
  2. WHERE CIDR_MATCH(ip_field, block1, block2)

is that 1. is a BooleanQuery containing two SingleValueQueries (one for each block) and 2. is a SingleValueQuery containing a BooleanQuery.

@costin
Copy link
Member

costin commented Feb 6, 2024

The issue we're trying to avoid is having multiple queries, that is avoid 1 and have 2.
Mainly to address cases we've seen in the past where queries get generated into a composable manner and have a high number of OR clauses which end up blowing things up. See #96236 and #30267.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants