Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: org.elasticsearch.index.shard.IndexShardIT#testIndexCanChangeCustomDataPath #43964

Closed
jkakavas opened this issue Jul 4, 2019 · 4 comments
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@jkakavas
Copy link
Member

jkakavas commented Jul 4, 2019

Example reproduction


./gradlew :server:integTest --tests "org.elasticsearch.index.shard.IndexShardIT.testIndexCanChangeCustomDataPath" \
  -Dtests.seed=1CAAEC094ABB61B8 \
  -Dtests.security.manager=true \
  -Dtests.locale=he-IL \
  -Dtests.timezone=CAT \
  -Dcompiler.java=12 \
  -Druntime.java=8

Example failure

https://scans.gradle.com/s/zngdfh37brdo2/

Frequency

11 times since yesterday

Will mute this

@jkakavas jkakavas added >test-failure Triaged test failures from CI :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Jul 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented Jul 4, 2019

I've assigned this to @tlrx as I suspect it might be related to the translog trimming work which now runs on closed indices. The test assumes that the data on disk does not change for a closed index.

@mayya-sharipova
Copy link
Contributor

tlrx added a commit that referenced this issue Jul 4, 2019
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
tlrx added a commit that referenced this issue Jul 4, 2019
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
@tlrx
Copy link
Member

tlrx commented Jul 5, 2019

Fixed in #43978

@tlrx tlrx closed this as completed Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants