Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ES|QL bit_length function #115792

Merged
merged 32 commits into from
Nov 7, 2024

Conversation

timgrein
Copy link
Contributor

@timgrein timgrein commented Oct 28, 2024

(Spacetime project) :-)

Adds one new string function aka BIT_LENGTH.

Example:

PUT test_index
{
  "mappings": {
    "properties": {
      "keyword_field": {
        "type": "keyword"
      },
      "text_field": {
        "type": "text"
      }
    }
  }
}

POST test_index/_doc
{
    "keyword_field": "abc",
    "text_field": "☕"
}

POST _query
{
  "query": "FROM test_index | EVAL x = bit_length(keyword_field) | EVAL y = bit_length(text_field)"
}

@timgrein timgrein added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.0.0 labels Oct 28, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @timgrein, I've created a changelog YAML for you.

/**
* Support for function {@code GREATEST}. Done in #98630.
*/
FN_GREATEST,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this for a bit_length test case, where I use the new function inside another function (see new test case bitLengthInsideOtherFunction in string.csv-spec)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! You shouldn't need this. Those capabilities at the CSV tests are just used to know if a node knows about a feature or not. If a node has BIT_LENGTH, it knows about GREATEST too, so just using BIT_LENGTH should be enough.

Also, GREATEST has been in ESQL since long ago, so adding this now would be weird 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation :) Adjusted it with Remove FN_GREATEST capability and only use the most recent in bitLeng…

@timgrein timgrein mentioned this pull request Oct 28, 2024
75 tasks
@timgrein timgrein removed the v8.17.0 label Oct 29, 2024
@timgrein timgrein added auto-backport Automatically create backport pull requests when merged v8.16.0 labels Oct 29, 2024
@timgrein
Copy link
Contributor Author

@elasticmachine update branch

@timgrein timgrein added v8.17.0 auto-backport Automatically create backport pull requests when merged v9.0.0 and removed auto-backport Automatically create backport pull requests when merged v8.16.0 v9.0.0 labels Oct 29, 2024
timgrein and others added 2 commits October 30, 2024 13:22
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one noteworthy remark, LG otherwise.


@Evaluator
static int process(BytesRef val) {
return val.length * Byte.SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use Math.multiplyExact(), as this'll overflow with docs under 300MB. Such large docs might hit other limits before this, but better be safe.
We'll then need to declare the thrown exception in the decorator (and thus let it be transformed into a null by the evaluator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Adjusted with Catch ArithmeticException in Evaluator and turn expression result into null value.

Really cool meta-programming stuff you've built with code generation based on annotations 👏

Comment on lines 78 to 82
if (childrenResolved() == false) {
return new TypeResolution("Unresolved children");
}

return isString(field(), sourceText(), DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/style optional: could use the ternary operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, adjusted with Use ternary operator for type resolution

public static Iterable<Object[]> parameters() {
List<TestCaseSupplier> suppliers = new ArrayList<>();

for (DataType stringType : new DataType[] { DataType.KEYWORD, DataType.TEXT }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use DataType.STRING_TYPES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -30,7 +30,7 @@ setup:
- method: POST
path: /_query
parameters: []
capabilities: [ snapshot_test_for_telemetry ]
capabilities: [ snapshot_test_for_telemetry, fn_bit_length ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually doesn't seem to cut it, still running into the issue that rest compatibility tests for older versions 8.x fail with:

> Task :distribution:bwc:minor:checkoutBwcBranch
Performing checkout of elastic/8.x...
Checkout hash for :distribution:bwc:minor is 9f8cf46bec9afc64bd4f9d63cb547bea6f65fc51
[...]
REPRODUCE WITH: ./gradlew ":x-pack:plugin:yamlRestCompatTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=esql/60_usage/Basic ESQL usage output (telemetry) snapshot version}" -Dtests.seed=4FAF1A8CF690062D -Dtests.locale=az-Latn-AZ -Dtests.timezone=Etc/GMT-5 -Druntime.java=22
XPackRestIT > test {p0=esql/60_usage/Basic ESQL usage output (telemetry) snapshot version} FAILED
    java.lang.AssertionError: Failure at [esql/60_usage:164]: field [esql.functions] doesn't have length [117]
    Expected: <117>
         but: was <118>

Any additional ideas @bpintea ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first idea would have been to use only fn_bit_length. Gut-feeling wise, that should be the only one needed, but I am not certain and I would try it like this as well and maybe dig a bit deeper to understand exactly why XPackRestIT fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted with Only use most recent capability in usage tests. Let's see ... 👀 If it still fails I'll try to dig in to see what's going wrong

Copy link
Contributor Author

@timgrein timgrein Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my own understanding of bwc rest compability tests...How does this exactly work? Does the old usage test (from branch 8.x...; basically mimicking an "old" client) gets executed against current main? If so, the observed error would make sense as main has now one ES|QL function more, while the old test expects |esql functions on main| - |functions which were added after 8.x.|. Also adding the fn_bit_length capability wouldn't solve the problem, because this change is not present on the older branch, right?

One potential solution could be to rewrite the test to assert that the response contains (at least) an expected explicit set of functions, rather than asserting on the number of functions. If you keep the number of functions assertion you could also change it to a greater or equal than assertion, which account for the fact that new functions can be added, but no function should be removed in later versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the old usage test (from branch 8.x...; basically mimicking an "old" client) gets executed against current main?

Yes. This is documented here: "The build system will download the latest prior version of the YAML rest tests and execute them against the current cluster version.".
(I thought it would work as the other BWC tests, that run new queries against old code, but it's the other way around. TIL.)
So indeed, updating the test on main (9.x) has no effect on the test run: we need to update the 8.x test.
@astefan, I think we might need to either skip these tests in REST BWC tests on main, or update the way we check for the functions count on 8.x (which might be tricky, but probably checking against a lower limit might be safe).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this s/snapshot_test_for_telemetry/fn_bit_length is probably then no longer necessary.


foldBitLength
required_capability: fn_bit_length
row a = 1 | eval b = bit_length("hello");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be row b = bit_length("hello"), but can stay as is too.

// tag::bitLength[]
FROM employees
| KEEP first_name, last_name
| EVAL fn_bit_length = BIT_LENGTH(first_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could add a fn_length = LENGTH(first_name) for a quick /8 (reviewing) check, but can stay as is too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these are docs examples it's probably fine to let it solely be focused on bit_length IMHO.

@astefan
Copy link
Contributor

astefan commented Oct 31, 2024

@timgrein add the following to elasticsearch\x-pack\plugin\build.gradle in tasks.named("yamlRestCompatTestTransform").configure({ task -> block:

  task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry) snapshot version", "The number of functions is constantly increasing")
  task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry) non-snapshot version", "The number of functions is constantly increasing")

The final block should probably look like

tasks.named("yamlRestCompatTestTransform").configure({ task ->
  task.skipTest("security/10_forbidden/Test bulk response with invalid credentials", "warning does not exist for compatibility")
  task.skipTest("inference/inference_crud/Test get all", "Assertions on number of inference models break due to default configs")
  task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry)", "The telemetry output changed. We dropped a column. That's safe.")
  task.skipTest("esql/80_text/reverse text", "The output type changed from TEXT to KEYWORD.")
  task.skipTest("esql/80_text/values function", "The output type changed from TEXT to KEYWORD.")
  task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry) snapshot version", "The number of functions is constantly increasing")
  task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry) non-snapshot version", "The number of functions is constantly increasing")
})

@bpintea
Copy link
Contributor

bpintea commented Oct 31, 2024

The final block should probably look like
task.skipTest("esql/60_usage/Basic ESQL usage output (telemetry)", "The telemetry output changed. We dropped a column. That's safe.")

@timgrein, this one line above might be superfluous, after adding the two Andrei suggests. Should be easy to check locally (and then through the CI).

…t number of functions won't work on older versions as soon as you add at least one new function on main.
@@ -30,7 +30,7 @@ setup:
- method: POST
path: /_query
parameters: []
capabilities: [ snapshot_test_for_telemetry ]
capabilities: [ fn_bit_length ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timgrein apologies, I forgot that when I added the two different capabilities I did it with a twist:

SNAPSHOT_TEST_FOR_TELEMETRY(Build.current().isSnapshot())
NON_SNAPSHOT_TEST_FOR_TELEMETRY(Build.current().isSnapshot() == false)

I think these two capabilities must always be placed in those two tests, together with the new capability: capabilities: [ snapshot_test_for_telemetry, fn_bit_length ]

@timgrein
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -656,3 +656,21 @@ FROM sample_data

@timestamp:date | client_ip:ip | event_duration:long | message:keyword
;

docsBitLength
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need required_capability: fn_bit_length.

@timgrein timgrein merged commit 81fd1de into elastic:main Nov 7, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

timgrein added a commit to timgrein/elasticsearch that referenced this pull request Nov 7, 2024
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Nov 7, 2024
jozala pushed a commit that referenced this pull request Nov 13, 2024
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants