Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] Add SPACE function #112350

Merged
merged 25 commits into from
Sep 9, 2024
Merged

Conversation

chrisberkhout
Copy link
Contributor

Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).

Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added v8.16.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @chrisberkhout, I've created a changelog YAML for you.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I've added a note about the "constant" flavor of the evaluator - I feel like you should either remove it or flip it to building the space string up front. I asked for two extra tests as well, one out of an abundance of paranoia and one for wire serialization testing.

While you are there, could you add those two tests to the guide? I'll bet they aren't in there because, well, I forgot.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🚀
Added some comments I think would simplify code a bit

@ivancea
Copy link
Contributor

ivancea commented Aug 30, 2024

@astefan Oh, good catch. Looks like a problem in the toEvaluator .fold() cast to int. It's the evaluator the one converting multivalues to null/warnings, so it's too early to blindly cast there.

About the tests infra, indeed, we should add an automatic test case generator for multivalues, similar to the one we have for nulls. Created an issue here: #112396
We can use that issue to also fix locate and whatever other function that fails

@nik9000
Copy link
Member

nik9000 commented Aug 30, 2024

fails with

Fun! I just bumped into a similar error in case. I think we're not used to dealing with multivalued fields on the folding path.

@chrisberkhout
Copy link
Contributor Author

chrisberkhout commented Aug 31, 2024

I resolved the issue of multi-value on the fold path.

The current logic is:

  • If it's foldable to an integer -> fail hard if the integer is out of range, otherwise build literal spaces.
  • If it's not foldable to an integer -> warn and return null if it's out of range or multi-value, otherwise build the BytesRef.

So that matches the general documentation regarding multi-value handling, but still gives early errors about out-of-range integers when foldable.

If that's too complicated I can take out the fold path and send everything to the evaluator.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's too complicated I can take out the fold path and send everything to the evaluator.

I think it's ok that way. It improves what looks like a common usecase of SPACE (constant param), so nice!
We could also throw and fail the query if it's a foldable multivalue, but I'm not sure if we're doing that in other functions, or if we want to do that at all.
Cc-ing @costin in case he has an opinion on that

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @chrisberkhout

@astefan astefan added the ES|QL-ui Impacts ES|QL UI label Sep 3, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@costin
Copy link
Member

costin commented Sep 3, 2024

We could also throw and fail the query if it's a foldable multivalue, but I'm not sure if we're doing that in other functions, or if we want to do that at all.

Hi - sorry for responding late.
in ESQL functions don't throw exceptions, only null + warnings. Hence the error for out of range integers is not good - better to send the call to the underlying evaluator.
If it's too late (as this PR has been already approved), feel free to address this in a follow PR.

@nik9000
Copy link
Member

nik9000 commented Sep 3, 2024

in ESQL functions don't throw exceptions, only null + warnings. Hence the error for out of range integers is not good - better to send the call to the underlying evaluator.

They do! The exceptions are how we signal null + warnings at runtime. We have, like, a catch list.

And failing hard for invalid stuff is a thing we do all the time at parse time. So long as we can do it at parse time.

@costin
Copy link
Member

costin commented Sep 3, 2024

Right - I mean the users doesn't see the underlying exception. @ivancea pointed out the foldable aspect - for this case look at the Validatable interface; to bail out when the arguments (that are foldable) are invalid but before executing the query.

@nik9000
Copy link
Member

nik9000 commented Sep 3, 2024

I wonder if we can make the Validatable interface more obvious. I often forget about it. And in this case it feels like throwing an exception from fold should communicate the warning well. I suppose it just doesn't?

@costin
Copy link
Member

costin commented Sep 3, 2024

The subtle difference between folding throwing an exception and Validatable is the former calls the evaluator which is used at runtime hence why exceptions are sent as warnings while the latter only occurs during plan validation where an exception is no longer a warning but a query stopper: the query definition is incorrect and should be corrected before warning.

An improvement separate from this PR is to catch the warning in foldable and bubble it up similar to how Validatable does it.

@chrisberkhout
Copy link
Contributor Author

I tried implementing Validatable and found that when the function is foldable, it is folded before it is validated.

In LogicalPlanOptimizer#optimize, RuleExecutor#execute runs before LogicalVerifier#verify. During the rule execution, the PropagateEvalFoldables rule will try to build literals of foldables by folding them.

The two functions that currently implement Validatable with some logic, MvSort and Bucket, do still throw exceptions in toEvaluator. When the function is not foldable, they can generate validation failures regarding parameters that are or should be foldable. The Space function is always foldable if its parameter is foldable, so the case of validation before folding doesn't exist there.

If validation could be done earlier or folding done later, then validation would have a chance to run, but that seems like separate work.

For now it seems like the best options are still:

  1. When foldable, let toEvaluator return a literal or throw an exception (as is)
  2. No special handing for foldable, everything goes to the evaluator

@ivancea
Copy link
Contributor

ivancea commented Sep 9, 2024

Hey @chrisberkhout!
Thanks for checking it! I've been talking about this with Andrei, and we'll see how it's evolved, if it makes sense to use Validatable, or other mechanism. Not only for this specific function, but in general for other potential cases.
So, merge whenever you want; it's nice as it's now, and if we end up with a "fix" or something for that folding error usecase, we'll update and document it!
:shipit:

@chrisberkhout
Copy link
Contributor Author

Thanks @ivancea!

@chrisberkhout chrisberkhout merged commit fbaeb1e into elastic:main Sep 9, 2024
15 checks passed
@chrisberkhout chrisberkhout deleted the esql-functions-space branch September 9, 2024 11:43
elasticsearchmachine pushed a commit that referenced this pull request Sep 9, 2024
Just a race condition while merging two PRs (#112055 and #112350). 

Fixes #112659 Fixes #112660 Fixes #112661
@nik9000 nik9000 mentioned this pull request Sep 10, 2024
75 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants