-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Categorize grouping function testing improvements #118013
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
// TODO: randomize and try multiple pages. | ||
// TODO: assert the state of the BlockHash after adding pages. Including the categorizer state. | ||
// TODO: also test the lookup method and other stuff. | ||
// TODO: randomize values? May give wrong results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see randomizing the values in a separate test and asserting that we get expected results/don't crash out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But later. This is fine now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the CSV test + QL changes. They look good to me, except for the capability requirement (see below).
Thanks Ivan!
TRUE, // Whether the expression can become null | ||
FALSE, // The expression can never become null | ||
UNKNOWN // Cannot determine if the expression supports possible null folding | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec
Outdated
Show resolved
Hide resolved
@@ -92,6 +93,12 @@ public boolean foldable() { | |||
return false; | |||
} | |||
|
|||
@Override | |||
public Nullability nullable() { | |||
// Both nulls and empty strings result in null values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ for the comment, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precisely: any string for which the analyzer returns zero tokens results in a null category. The analyzer discards tokens like numbers, hex.numbers, and stopwords like Jan, Feb, Mon, Tue, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💚 Backport successful
|
Added some extra tests on the CategorizeBlockHash. Added NullFold rule comments, and forced nullable() to TRUE on Categorize.
} else if (e instanceof Alias == false | ||
&& e.nullable() == Nullability.TRUE | ||
} else if (e instanceof Alias == false && e.nullable() == Nullability.TRUE | ||
// Categorize function stays as a STATS grouping (It isn't moved to an early EVAL like other groupings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The story of null folding is not complete (PropagateEvalFoldables has missing pieces), but I appreciate adding this comment here. I was kind of confused and surprised to see here this out-of-nowhere instanceof Categorize
.
Added some extra tests on the CategorizeBlockHash.
More autogenerated/randomized testcases could be added, but given the complexity of the Categorize function, we would probably have to have some kind of whitelist of strings, and that would require some extra time that we won't be investing here yet.
So for now, this PR adds extra checks for all its methods, and some loops to check categories are unchanged.
I planned to remove the Categorize exception we have in NullFold rule here, but it's more complicated than expected.