Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Categorize grouping function testing improvements #118013

Merged
merged 7 commits into from
Dec 9, 2024

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented Dec 4, 2024

Added some extra tests on the CategorizeBlockHash.

More autogenerated/randomized testcases could be added, but given the complexity of the Categorize function, we would probably have to have some kind of whitelist of strings, and that would require some extra time that we won't be investing here yet.
So for now, this PR adds extra checks for all its methods, and some loops to check categories are unchanged.

I planned to remove the Categorize exception we have in NullFold rule here, but it's more complicated than expected.

@ivancea ivancea added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

// TODO: randomize and try multiple pages.
// TODO: assert the state of the BlockHash after adding pages. Including the categorizer state.
// TODO: also test the lookup method and other stuff.
// TODO: randomize values? May give wrong results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see randomizing the values in a separate test and asserting that we get expected results/don't crash out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But later. This is fine now.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the CSV test + QL changes. They look good to me, except for the capability requirement (see below).

Thanks Ivan!

TRUE, // Whether the expression can become null
FALSE, // The expression can never become null
UNKNOWN // Cannot determine if the expression supports possible null folding
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -92,6 +93,12 @@ public boolean foldable() {
return false;
}

@Override
public Nullability nullable() {
// Both nulls and empty strings result in null values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ for the comment, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precisely: any string for which the analyzer returns zero tokens results in a null category. The analyzer discards tokens like numbers, hex.numbers, and stopwords like Jan, Feb, Mon, Tue, ...

@ivancea ivancea enabled auto-merge (squash) December 9, 2024 12:00
Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ivancea ivancea merged commit 63ee866 into elastic:main Dec 9, 2024
15 of 17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

ivancea added a commit to ivancea/elasticsearch that referenced this pull request Dec 9, 2024
Added some extra tests on the CategorizeBlockHash.

Added NullFold rule comments, and forced nullable() to TRUE on Categorize.
@ivancea ivancea deleted the categorize-follow-up branch December 9, 2024 13:24
elasticsearchmachine pushed a commit that referenced this pull request Dec 9, 2024
…18262)

Added some extra tests on the CategorizeBlockHash.

Added NullFold rule comments, and forced nullable() to TRUE on Categorize.
} else if (e instanceof Alias == false
&& e.nullable() == Nullability.TRUE
} else if (e instanceof Alias == false && e.nullable() == Nullability.TRUE
// Categorize function stays as a STATS grouping (It isn't moved to an early EVAL like other groupings),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The story of null folding is not complete (PropagateEvalFoldables has missing pieces), but I appreciate adding this comment here. I was kind of confused and surprised to see here this out-of-nowhere instanceof Categorize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants