[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

droberts195 · 2019-10-16T09:21:49Z

These failures started occurring on 15th October. After that many 6.8 BWC tests against 5.6.x versions have failed. For example, look at the failure pattern in:

Not all versions are affected. For example 5.6.13 has been fine:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.13,nodes=centos-7&&immutable

The errors vary. For example:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.10,nodes=centos-7&&immutable/210/console is java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null}
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.6,nodes=centos-7&&immutable/210/console is java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null}
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.2,nodes=centos-7&&immutable/210/console is a suite timeout
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.4,nodes=centos-7&&immutable/210/console is a suite timeout

Infra has also noticed that some of the workers are running out of disk space due to huge console logs. For example https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.8+default-distro+bwc/BWC_VERSION=5.6.6,nodes=centos-7&&immutable/210/consoleText is 352MB.

Was something changed recently that increased the parallelism of 6.8 BWC tests? If so then that probably explains it. But it seems that the level of parallelism is beyond what the currently configured workers can cope with.

I'm tagging the distributed team in case the java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null} errors are a worry, and also the core build team in case this is all down to trying to run too much in parallel.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-16T09:21:50Z

Pinging @elastic/es-core-infra (:Core/Infra/Build)

elasticmachine · 2019-10-16T09:21:52Z

Pinging @elastic/es-distributed (:Distributed/Engine)

dnhatn · 2019-10-16T13:59:45Z

java.lang.AssertionError: IndexVersionValue{version=1, seqNo=-2, term=1, location=null}

#47895 in 6.8 should be the source here. We do not always have sequence numbers for the refresh optimization in a mixed cluster between 6.8 and 5.6.

I have reverted #47895 in 6.8 in 97e64fe.

@droberts195 Thanks for the ping.

We might not have sequence numbers in a mixed cluster between 6.8 and 5.6. In this case, we should refresh unconditionally; otherwise, we can apply the refresh optimization. An alternative is to use translog locations instead of the local checkpoint for this optimization. Closes #48114

droberts195 added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Oct 16, 2019

dnhatn closed this as completed Oct 16, 2019

dnhatn mentioned this issue Oct 16, 2019

Fix refresh optimization for realtime get in mixed cluster #48151

Merged

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

droberts195 commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

dnhatn commented Oct 16, 2019

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

[CI] 6.8 BWC tests against many 5.6.x versions failing #48114

Comments

droberts195 commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

elasticmachine commented Oct 16, 2019

dnhatn commented Oct 16, 2019