Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

Closed
droberts195 opened this issue Oct 16, 2019 · 3 comments
Closed

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

droberts195 opened this issue Oct 16, 2019 · 3 comments
Labels
:Delivery/Build Build or test infrastructure :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

These failures started occurring on 15th October. After that many 6.8 BWC tests against 5.6.x versions have failed. For example, look at the failure pattern in:

Not all versions are affected. For example 5.6.13 has been fine:

The errors vary. For example:

Infra has also noticed that some of the workers are running out of disk space due to huge console logs. For example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.6,nodes=centos-7&&immutable/210/consoleText is 352MB.

Was something changed recently that increased the parallelism of 6.8 BWC tests? If so then that probably explains it. But it seems that the level of parallelism is beyond what the currently configured workers can cope with.

I'm tagging the distributed team in case the java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null} errors are a worry, and also the core build team in case this is all down to trying to run too much in parallel.

@droberts195 droberts195 added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Oct 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Build)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@dnhatn
Copy link
Member

dnhatn commented Oct 16, 2019

java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null}

#47895 in 6.8 should be the source here. We do not always have sequence numbers for the refresh optimization in a mixed cluster between 6.8 and 5.6.

I have reverted #47895 in 6.8 in 97e64fe.

@droberts195 Thanks for the ping.

@dnhatn dnhatn closed this as completed Oct 16, 2019
dnhatn added a commit that referenced this issue Oct 16, 2019
We might not have sequence numbers in a mixed cluster between 6.8 and 
5.6. In this case, we should refresh unconditionally; otherwise, we can
apply the refresh optimization. An alternative is to use translog
locations instead of the local checkpoint for this optimization.

Closes #48114
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants