-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Add multi-node Spec tests #98849
ES|QL: Add multi-node Spec tests #98849
Conversation
PR #98938 will fix this. |
@luigidellaquila you wanna separate out the changes to PlanNamedTypes ? It would be good to add a unit test to PlanNamedTypesTests, for this too. |
Yes definitely, this PR is mostly to outline possible problems in multi-node execution, but the single fixes will be in separate PRs |
Current failures are due to non-deterministic results for queries without SORT, see #99045 |
Problems related to non-deterministic CSV tests seem to be fixed. Current failures are due to serialization of |
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
I believe we are serializing DocBlock because the LocalExecutionPlanner is either selecting the wrong projection mask or skipping the projection incorrectly. I haven't dug into the details yet. Below are two examples of this failure.
The correct projection mask should be {1, 2, 3} instead of {0, 1, 3}. In the below case, we mistakenly removed the projection.
I won't be able to spend more time on this until tomorrow. |
It seems that conflicting nameId is an issue here as the |
Kudos to @dnhatn for figuring the issue - it relates to the deserialization of a NameId in the plan sent from the coordinator on the data nodes. When running in a multi node cluster, each node has its own counter and depending on how the queries get executed, sometimes the NameIds clash. Currently the serialization saves the id as a long so when it gets dehydrated on the other node, it can and will interfere with the nameId of newly created attributes.
Here's an example - a plan is sent to the data node. |
Our integration tests (i.e., EsqlActionIT) don't identify this issue because the testing Elasticsearch nodes run on the same JVM with a single global NameId. |
With the NameId issue resolved, I still see a couple (two) randomly failing tests, which appear to be unrelated to named id or serialisation. They are:
Reproducibly locally with loopy testing, e.g.
|
Thanks Chris, I'll check them, probably just non-deterministic tests |
@ChrisHegarty your fix on NameId worked great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding a module to run multi-node Spec tests, see #98731
The first run spotted a small serialization problem (InvalidMappedField was not properly mapped - fixed with this PR)
WIP: there seem to be more serialization problems that make the tests fail:
assertion failures on block name checks