-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] WatchBackwardsCompatibilityIT failures #40841
Comments
Pinging @elastic/es-core-features |
…Term(...) for updating watch status if all nodes are at least on 6.7.0. Otherwise fallback using UpdateRequest#version(...) Closes elastic#40841
This should be fixed by #40888 |
I have seen a similar error in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+bwc-tests/364/console
The error is on the one third upgraded node upgrading from 6.1.0 to 6.7.3, I grabbed relevant logs |
I suspect https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+bwc-tests/366/console and https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+bwc-tests/27/console are the same root problem. In both cases a node seems to have died unexpectedly |
There have been more failures of this today (and I suspect every day last week too as nobody was doing test triage during the all hands event).
These all manifest themselves in the test log as a failure to kill a process, for example:
Then in the one third upgraded cluster node log is the error from the original issue description:
|
All 6.7 and 6.8 BWC builds still in the history (~11 days) have failed with this problem: |
One more here from today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+bwc-tests/62/ |
I can reproduce this locally with
but not
I am pretty confident that this error:
is not related to anything with Watcher. Still looking, but this could be an indication of more general rolling upgrade from early version of 6.x issues. |
Another failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+bwc-tests/72/console |
Got to the root cause of this failure and logged a new issue: #42561 This isn't really a Watcher issue, it shows up in the Watcher tests since Watcher is continually issuing partial document updates (to .watches) while in a mixed cluster. Real world reproduction results in an IllegalStateException but since the tests are run with the assertion, the assertion crashes the running JVM. This only impacts the 6.x branches and I am bit hesitant to simply mute all Watcher BWC tests (which would be required) since it discovered a real issue. We could simply remove (or comment out) the assertion and I tests pass just fine without the assertion (though there are extra errors in the logs). I would prefer that we comment out the assertion with a link to the issue. This would allow the tests to pass without having to disable all of Watcher's BWC tests for reasons Watcher does not control. If there are other BWC tests that get caught up on this, this would allow them to pass too. Thoughts on commenting out the assertion in 6.8 ? (as opposed to muting the test) |
Pinging @elastic/es-distributed |
Great find. Thanks @jakelandis. This is fixed by #42596. |
Several bwc tests fail, because of connection timeouts. In the es logs of the upgraded node the following fatal error can be found:
I suspect that this assertion error is caused by watcher trying to update a watch status, because the watcher tests was the first one to fail with connection timeout. Also Watcher's execution service does not take into account nodes before 6.6.0 that don't support
if_seq_no
andif_primary_term
parameters on write requests.build url: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+bwc-tests/226/console
The text was updated successfully, but these errors were encountered: