Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Convert wildcards to LIKE in analyzer #51901

Merged
merged 4 commits into from
Mar 6, 2020
Merged

EQL: Convert wildcards to LIKE in analyzer #51901

merged 4 commits into from
Mar 6, 2020

Conversation

rw-access
Copy link
Contributor

@rw-access rw-access commented Feb 5, 2020

Addressing the comment thread from #51558 (comment).

Added ReplaceWildcards to the optimizer which detects the == "wild*card*" or != "wild*card*" patterns and replaces with LIKE.

This is branched from #51886, so only the last commit is relevant.

Update: Resolves #53104

@rw-access rw-access added the :Analytics/EQL EQL querying label Feb 5, 2020
@rw-access rw-access requested a review from costin February 5, 2020 00:36
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/EQL)

@costin costin requested review from matriv and astefan February 5, 2020 10:14
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.
The PR will be easier to complete after #51929.


private static class ReplaceWildcards extends AnalyzeRule<Filter> {

private static boolean isWildcard(Expression expr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a wildcard appear in any type of string? e.g. some*glob?
I wonder whether the parser could detect it so instead of having Literal that might a string, it could have its own expression rule.

if (expr instanceof Literal) {
Literal l = (Literal) expr;
if (l.value() instanceof String) {
String s = (String) l.value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential improvement is to check whether an expression is foldable instead of being a literal.
Thus if string concatenation were to be added, the rule would still be applied:

if (e.foldable() && e.fold() instanceof String) {
    return e.fold().toString().contains("*");
}

which can be transformed into a one-liner:

return e.foldable() && e.fold() instanceof String && e.fold().toString().contains("*");

@Override
protected LogicalPlan rule(Filter filter) {
return filter.transformExpressionsUp(e -> {
// expr == "wildcard*phrase"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about foo > "wild*card" or other value comparisons?
If that's valid grammar, the verifier should pick the pattern and fail the query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is valid grammar, but since it's not == or !=, this will just be a lexicographical comparison

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean the operators are not consistent since == would expand the wildcard while > & co would compare against it...

Equals eq = (Equals) e;

if (isWildcard(eq.right())) {
String wcString = (String) ((Literal) eq.right()).value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isWildcard already does the checks so simply do: eq.fold().toString()

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment regarding tests.

Literal l = (Literal) expr;
if (l.value() instanceof String) {
String s = (String) l.value();
return s.contains("*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be possible to add more tests that also look at scenarios involving escape characters and all types of string that eql supports?
Can the * be escaped? If so, we should have a test covering this case.

Copy link
Contributor Author

@rw-access rw-access Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can add more tests. And there's no escape for *. But if you want to perform an exact (+/- case sensitivity) comparison, you can put the wildcard on the left (#51901 (comment)). Whether this functionality is good or not is a fair question, and I think it's fair to change this because I doubt any users are aware of the workaround.

field == "*wildcard*" <==> field LIKE "%wildcard%"
"*wildcard*" == field <==> field == '*wildcard*'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I find this more like a bug, rather than feature (that implies an workaround).
A == B and B == A should be equivalent imo. Equality is not a predicate (like LIKE) where a certain element sits on the right and another one sits on the left and they are not interchangeable.

@costin
Copy link
Member

costin commented Feb 7, 2020

I think it's worth discussing the feature and then figure out the implementation for it.

  1. if == and != are overloaded with the special string with wildcard, the behavior should be consistent with the existing rules.
  2. the overloading is dropped in favor of a dedicated construct (LIKE) which can introduce its own semantics and syntax. The downside here is backwards compatibility.

@rw-access
Copy link
Contributor Author

Agreed that wildcards should be made commutative. I don't think we should add a new construct like LIKE just yet. In the future, I think it could make sense, after we go through the usual deprecation steps.

I think the main question I have is -- should this be a separate rule in the grammar, or should it be handled in the analyzer?

@costin
Copy link
Member

costin commented Feb 10, 2020

If the wildcard can be picked up by the parser through a dedicated rule, I would opt for that.
However I don't see an easy way to do this in the grammar nor the parser itself and considering its commutative nature I would handle it in an optimizer rule.

@andrei how close are you to moving the optimizer rules in?

@astefan
Copy link
Contributor

astefan commented Feb 11, 2020

@andrei how close are you to moving the optimizer rules in?

@costin #52172

@rw-access rw-access requested a review from costin February 19, 2020 16:07
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -33,6 +43,7 @@ public LogicalPlan optimize(LogicalPlan verified) {
new BooleanSimplification(),
new BooleanLiteralsOnTheRight(),
// needs to occur before BinaryComparison combinations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems incorrect as it referred to PropagateEquals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where should ReplaceWildcards() be moved to?

@costin
Copy link
Member

costin commented Mar 4, 2020

elasticsearch-ci/2 is failing due to a checkstyle violation.

@rw-access rw-access merged commit 51adeaa into elastic:master Mar 6, 2020
rw-access added a commit that referenced this pull request Mar 6, 2020
* EQL: Convert wildcard comparisons to Like
* EQL: Simplify wildcard handling, update tests
* EQL: Lint fixes for Optimizer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EQL: Wildcard matching doesn't work
5 participants