-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: BUCKET: allow numerical spans as whole numbers #111874
Conversation
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types.
Documentation preview: |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
Hi @bpintea, I've created a changelog YAML for you. |
Hi @bpintea, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, but I couldn't find tests for the type validation in case a BUCKET
query is invalid. (Though, it's been a while since I looked at BUCKET
and only made a cursory search for the tests.) Should they actually be missing, maybe we should add them.
}, | ||
{ | ||
"name" : "to", | ||
"type" : "keyword", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit out of scope, but: @fang-xing-esql introduced a mechanism for implicit casting. to
and from
arguments of type keyword
and text
actually only apply to datetime literals, right?
If so, maybe we can slim down BUCKET
s huge type table, and slim down this JSON here as well by moving the handling of datetime strings out of BUCKET
and into ImplicitCasting
?
(Something for a follow up, but wanted to throw it out there as maybe a nicer long term solution.)
@@ -314,6 +314,21 @@ FROM sample_data | |||
3 |2025-10-01T00:00:00.000Z | |||
; | |||
|
|||
bucketByYearLowBucketCount#[skip:-8.13.99, reason:BUCKET extended in 8.14] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test related to this PR, or is it just adding a test for something that existed?
Is the skip accurate, or should this use a capability instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly: it just notes that we're not really respecting the contract of providing up to the number of buckets requested, once the bucket size reaches a year.
The skip is accurate, the test isn't using any newly introduced change.
bucketNumericMixedTypesIntegerSpans | ||
required_capability: bucket_whole_number_as_span | ||
ROW long = TO_LONG(100), double = 99., int = 100 | ||
| STATS BY b1 = BUCKET(long, double::int), b2 = BUCKET(double, long), b3 = BUCKET(int, 49.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw: How does that work when the buckets
parameter is obtained from a field? Doesn't it have to be a constant for the STATS
to make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it have to be a constant for the STATS to make sense?
It doesn't have to be a constant, but it has to be a foldable expression (i.e. a field declared with ROW
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row
is a good source of constants, but we can also define foldable values in eval
(and used with from
). Just wanted to point out that "foldable expression" isn't always something that comes from row
.
@@ -176,20 +174,20 @@ public Bucket( | |||
) Expression field, | |||
@Param( | |||
name = "buckets", | |||
type = { "integer", "double", "date_period", "time_duration" }, | |||
type = { "integer", "long", "double", "date_period", "time_duration" }, | |||
description = "Target number of buckets." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon my ignorance, but should that be
description = "Target number of buckets." | |
description = "Target number of buckets, or desired bucket size if `from` and `to` parameters are omitted." |
?
: isNumeric(buckets, sourceText(), SECOND).and(checkArgsCount(2)); | ||
return isNumeric(buckets, sourceText(), SECOND).and(() -> { | ||
if (bucketsType.isRationalNumber()) { | ||
return checkArgsCount(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have (verifier) test cases for when the 2nd argument is rational but we provide more than 2 arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added new tests around incorrect arguments, both in number and type, thanks.
(This should have been done in my earlier change already.)
return bucketsType.isWholeNumber() | ||
? checkArgsCount(4).and(() -> isNumeric(from, sourceText(), THIRD)).and(() -> isNumeric(to, sourceText(), FOURTH)) | ||
: isNumeric(buckets, sourceText(), SECOND).and(checkArgsCount(2)); | ||
return isNumeric(buckets, sourceText(), SECOND).and(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have (verifier) test cases for when the first argument is numeric but the second is not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
resolution = checkArgsCount(4).and(() -> isNumeric(from, sourceText(), THIRD)) | ||
.and(() -> isNumeric(to, sourceText(), FOURTH)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have (verifier) test cases for when the args count is higher than 4, or the 3rd and 4th are not numeric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some.
Not for a higher than 4 count, though, since this would be a general parser exception, raised when resolving the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iteration @bpintea !
@@ -374,6 +374,66 @@ public void testGroupingInsideGrouping() { | |||
); | |||
} | |||
|
|||
public void testInvalidBucketCalls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I think the validation for BUCKET
is complex enough that we'll be very happy about this test when we make changes in the future.
@@ -3514,6 +3514,49 @@ public void testBucketWithAggExpression() { | |||
assertThat(agg.groupings().get(0), is(ref)); | |||
} | |||
|
|||
public void testBucketWithNonFoldingArgs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wouldn't these be better placed in the VerifierTests
? I wouldn't know to look here for these tests (except when they're failing :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out we do another validation past optimisations, to be able to take the folding into account. (Not sure if we'll ever need this for anything else than supporting ROW, but that's how it now works.)
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves elastic#109340, resolves elastic#104646, resolves elastic#105375.
This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double.
This also expands the tests for date ranges to include string types.
Resolves #109340, resolves #104646, resolves #105375.