-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ES|QL bit_length function #115792
Add ES|QL bit_length function #115792
Conversation
Documentation preview: |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @timgrein, I've created a changelog YAML for you. |
/** | ||
* Support for function {@code GREATEST}. Done in #98630. | ||
*/ | ||
FN_GREATEST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this for a bit_length
test case, where I use the new function inside another function (see new test case bitLengthInsideOtherFunction
in string.csv-spec
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! You shouldn't need this. Those capabilities at the CSV tests are just used to know if a node knows about a feature or not. If a node has BIT_LENGTH, it knows about GREATEST too, so just using BIT_LENGTH should be enough.
Also, GREATEST has been in ESQL since long ago, so adding this now would be weird 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation :) Adjusted it with Remove FN_GREATEST capability and only use the most recent in bitLeng…
… into esql-string-bit-length-function
@elasticmachine update branch |
… into esql-string-bit-length-function
…expression/function/scalar/string/BitLengthTests.java Co-authored-by: Iván Cea Fontenla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one noteworthy remark, LG otherwise.
|
||
@Evaluator | ||
static int process(BytesRef val) { | ||
return val.length * Byte.SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use Math.multiplyExact()
, as this'll overflow with docs under 300MB. Such large docs might hit other limits before this, but better be safe.
We'll then need to declare the thrown exception in the decorator (and thus let it be transformed into a null
by the evaluator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Adjusted with Catch ArithmeticException in Evaluator and turn expression result into null value.
Really cool meta-programming stuff you've built with code generation based on annotations 👏
if (childrenResolved() == false) { | ||
return new TypeResolution("Unresolved children"); | ||
} | ||
|
||
return isString(field(), sourceText(), DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit/style optional: could use the ternary operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, adjusted with Use ternary operator for type resolution
public static Iterable<Object[]> parameters() { | ||
List<TestCaseSupplier> suppliers = new ArrayList<>(); | ||
|
||
for (DataType stringType : new DataType[] { DataType.KEYWORD, DataType.TEXT }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use DataType.STRING_TYPES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -30,7 +30,7 @@ setup: | |||
- method: POST | |||
path: /_query | |||
parameters: [] | |||
capabilities: [ snapshot_test_for_telemetry ] | |||
capabilities: [ snapshot_test_for_telemetry, fn_bit_length ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually doesn't seem to cut it, still running into the issue that rest compatibility tests for older versions 8.x
fail with:
> Task :distribution:bwc:minor:checkoutBwcBranch
Performing checkout of elastic/8.x...
Checkout hash for :distribution:bwc:minor is 9f8cf46bec9afc64bd4f9d63cb547bea6f65fc51
[...]
REPRODUCE WITH: ./gradlew ":x-pack:plugin:yamlRestCompatTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=esql/60_usage/Basic ESQL usage output (telemetry) snapshot version}" -Dtests.seed=4FAF1A8CF690062D -Dtests.locale=az-Latn-AZ -Dtests.timezone=Etc/GMT-5 -Druntime.java=22
XPackRestIT > test {p0=esql/60_usage/Basic ESQL usage output (telemetry) snapshot version} FAILED
java.lang.AssertionError: Failure at [esql/60_usage:164]: field [esql.functions] doesn't have length [117]
Expected: <117>
but: was <118>
Any additional ideas @bpintea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first idea would have been to use only fn_bit_length
. Gut-feeling wise, that should be the only one needed, but I am not certain and I would try it like this as well and maybe dig a bit deeper to understand exactly why XPackRestIT fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted with Only use most recent capability in usage tests. Let's see ... 👀 If it still fails I'll try to dig in to see what's going wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my own understanding of bwc rest compability tests...How does this exactly work? Does the old usage test (from branch 8.x...
; basically mimicking an "old" client) gets executed against current main
? If so, the observed error would make sense as main
has now one ES|QL function more, while the old test expects |esql functions on main| - |functions which were added after 8.x.|
. Also adding the fn_bit_length
capability wouldn't solve the problem, because this change is not present on the older branch, right?
One potential solution could be to rewrite the test to assert that the response contains (at least) an expected explicit set of functions, rather than asserting on the number of functions. If you keep the number of functions assertion you could also change it to a greater or equal than assertion, which account for the fact that new functions can be added, but no function should be removed in later versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the old usage test (from branch 8.x...; basically mimicking an "old" client) gets executed against current main?
Yes. This is documented here: "The build system will download the latest prior version of the YAML rest tests and execute them against the current cluster version."
.
(I thought it would work as the other BWC tests, that run new queries against old code, but it's the other way around. TIL.)
So indeed, updating the test on main (9.x) has no effect on the test run: we need to update the 8.x test.
@astefan, I think we might need to either skip these tests in REST BWC tests on main, or update the way we check for the functions count on 8.x (which might be tricky, but probably checking against a lower limit might be safe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this s/snapshot_test_for_telemetry
/fn_bit_length
is probably then no longer necessary.
|
||
foldBitLength | ||
required_capability: fn_bit_length | ||
row a = 1 | eval b = bit_length("hello"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be row b = bit_length("hello")
, but can stay as is too.
// tag::bitLength[] | ||
FROM employees | ||
| KEEP first_name, last_name | ||
| EVAL fn_bit_length = BIT_LENGTH(first_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add a fn_length = LENGTH(first_name)
for a quick /8 (reviewing) check, but can stay as is too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these are docs examples it's probably fine to let it solely be focused on bit_length
IMHO.
@timgrein add the following to
The final block should probably look like
|
@timgrein, this one line above might be superfluous, after adding the two Andrei suggests. Should be easy to check locally (and then through the CI). |
…t number of functions won't work on older versions as soon as you add at least one new function on main.
@@ -30,7 +30,7 @@ setup: | |||
- method: POST | |||
path: /_query | |||
parameters: [] | |||
capabilities: [ snapshot_test_for_telemetry ] | |||
capabilities: [ fn_bit_length ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgrein apologies, I forgot that when I added the two different capabilities I did it with a twist:
SNAPSHOT_TEST_FOR_TELEMETRY(Build.current().isSnapshot())
NON_SNAPSHOT_TEST_FOR_TELEMETRY(Build.current().isSnapshot() == false)
I think these two capabilities must always be placed in those two tests, together with the new capability: capabilities: [ snapshot_test_for_telemetry, fn_bit_length ]
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -656,3 +656,21 @@ FROM sample_data | |||
|
|||
@timestamp:date | client_ip:ip | event_duration:long | message:keyword | |||
; | |||
|
|||
docsBitLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need required_capability: fn_bit_length
.
… into esql-string-bit-length-function
💚 Backport successful
|
(Spacetime project) :-)
Adds one new string function aka
BIT_LENGTH
.Example: