Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: BUCKET: allow numerical spans as whole numbers #111874

Merged
merged 10 commits into from
Aug 20, 2024

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Aug 14, 2024

This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double.

This also expands the tests for date ranges to include string types.

Resolves #109340, resolves #104646, resolves #105375.

This laxes the check on numerical spans to allow them be specified as
whole numbers. So far it was required that they be provided as a double.

This also expands the tests for date ranges to include string types.
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 14, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've updated the changelog YAML for you.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I couldn't find tests for the type validation in case a BUCKET query is invalid. (Though, it's been a while since I looked at BUCKET and only made a cursory search for the tests.) Should they actually be missing, maybe we should add them.

},
{
"name" : "to",
"type" : "keyword",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit out of scope, but: @fang-xing-esql introduced a mechanism for implicit casting. to and from arguments of type keyword and text actually only apply to datetime literals, right?

If so, maybe we can slim down BUCKETs huge type table, and slim down this JSON here as well by moving the handling of datetime strings out of BUCKET and into ImplicitCasting?

(Something for a follow up, but wanted to throw it out there as maybe a nicer long term solution.)

@@ -314,6 +314,21 @@ FROM sample_data
3 |2025-10-01T00:00:00.000Z
;

bucketByYearLowBucketCount#[skip:-8.13.99, reason:BUCKET extended in 8.14]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test related to this PR, or is it just adding a test for something that existed?

Is the skip accurate, or should this use a capability instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly: it just notes that we're not really respecting the contract of providing up to the number of buckets requested, once the bucket size reaches a year.
The skip is accurate, the test isn't using any newly introduced change.

bucketNumericMixedTypesIntegerSpans
required_capability: bucket_whole_number_as_span
ROW long = TO_LONG(100), double = 99., int = 100
| STATS BY b1 = BUCKET(long, double::int), b2 = BUCKET(double, long), b3 = BUCKET(int, 49.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: How does that work when the buckets parameter is obtained from a field? Doesn't it have to be a constant for the STATS to make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it have to be a constant for the STATS to make sense?

It doesn't have to be a constant, but it has to be a foldable expression (i.e. a field declared with ROW).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row is a good source of constants, but we can also define foldable values in eval (and used with from). Just wanted to point out that "foldable expression" isn't always something that comes from row.

@@ -176,20 +174,20 @@ public Bucket(
) Expression field,
@Param(
name = "buckets",
type = { "integer", "double", "date_period", "time_duration" },
type = { "integer", "long", "double", "date_period", "time_duration" },
description = "Target number of buckets."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon my ignorance, but should that be

Suggested change
description = "Target number of buckets."
description = "Target number of buckets, or desired bucket size if `from` and `to` parameters are omitted."

?

: isNumeric(buckets, sourceText(), SECOND).and(checkArgsCount(2));
return isNumeric(buckets, sourceText(), SECOND).and(() -> {
if (bucketsType.isRationalNumber()) {
return checkArgsCount(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have (verifier) test cases for when the 2nd argument is rational but we provide more than 2 arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added new tests around incorrect arguments, both in number and type, thanks.
(This should have been done in my earlier change already.)

return bucketsType.isWholeNumber()
? checkArgsCount(4).and(() -> isNumeric(from, sourceText(), THIRD)).and(() -> isNumeric(to, sourceText(), FOURTH))
: isNumeric(buckets, sourceText(), SECOND).and(checkArgsCount(2));
return isNumeric(buckets, sourceText(), SECOND).and(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have (verifier) test cases for when the first argument is numeric but the second is not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment on lines +342 to +343
resolution = checkArgsCount(4).and(() -> isNumeric(from, sourceText(), THIRD))
.and(() -> isNumeric(to, sourceText(), FOURTH));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have (verifier) test cases for when the args count is higher than 4, or the 3rd and 4th are not numeric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some.
Not for a higher than 4 count, though, since this would be a general parser exception, raised when resolving the function.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the iteration @bpintea !

@@ -374,6 +374,66 @@ public void testGroupingInsideGrouping() {
);
}

public void testInvalidBucketCalls() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think the validation for BUCKET is complex enough that we'll be very happy about this test when we make changes in the future.

@@ -3514,6 +3514,49 @@ public void testBucketWithAggExpression() {
assertThat(agg.groupings().get(0), is(ref));
}

public void testBucketWithNonFoldingArgs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't these be better placed in the VerifierTests? I wouldn't know to look here for these tests (except when they're failing :) )

Copy link
Contributor Author

@bpintea bpintea Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out we do another validation past optimisations, to be able to take the folding into account. (Not sure if we'll ever need this for anything else than supporting ROW, but that's how it now works.)

@bpintea bpintea merged commit dd49c33 into elastic:main Aug 20, 2024
16 checks passed
@bpintea bpintea deleted the enh/bucket_span_as_whole branch August 20, 2024 11:41
lkts pushed a commit to lkts/elasticsearch that referenced this pull request Aug 20, 2024
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double.

This also expands the tests for date ranges to include string types.

Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double.

This also expands the tests for date ranges to include string types.

Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double.

This also expands the tests for date ranges to include string types.

Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-update-serverless v8.16.0
Projects
None yet
4 participants