Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose translog stats in ReadOnlyEngine #43752

Merged
merged 4 commits into from
Jul 1, 2019

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jun 28, 2019

This pull request changes ReadOnlyEngine so that it loads stats from the translog when the engine is created. It also adds tests to check that translog stats are correctly exposed in Stats API for closed and frozen indices.

@tlrx tlrx added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.3.0 labels Jun 28, 2019
@tlrx tlrx requested review from ywelsch and dnhatn June 28, 2019 14:08
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@tlrx
Copy link
Member Author

tlrx commented Jun 28, 2019

@elasticmachine run elasticsearch-ci/docbldesx

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left two minor comments.

---
"Translog stats on closed indices":
- skip:
version: " - 7.2.99"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should skip this test until 7.9.99 then adjust to this value after backporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that this did not fail the tests? Perhaps BWC is disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pfff... I should have seen this. And yes, bwc tests were disabled at that time. I merged master and bwc are enabled again.

@@ -119,6 +121,7 @@ public ReadOnlyEngine(EngineConfig config, SeqNoStats seqNoStats, TranslogStats
reader = wrapReader(reader, readerWrapperFunction);
searcherManager = new SearcherManager(reader, searcherFactory);
this.docsStats = docsStats(lastCommittedSegmentInfos);
this.translogStats = translogStats != null ? translogStats : translogStats(config, lastCommittedSegmentInfos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that if translogStats is null then obtainLock is true? This ensures that we don't open two translog instances at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ especially as opening the translog here writes a new checkpoint file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good suggestion, thanks

---
"Translog stats on closed indices":
- skip:
version: " - 7.2.99"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that this did not fail the tests? Perhaps BWC is disabled?

- do:
cluster.health:
expand_wildcards: all
wait_for_no_initializing_shards: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only needed for the backport.
Alternatively you could use wait_for_active_shards = 1 on the indices.close action

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@@ -119,6 +121,7 @@ public ReadOnlyEngine(EngineConfig config, SeqNoStats seqNoStats, TranslogStats
reader = wrapReader(reader, readerWrapperFunction);
searcherManager = new SearcherManager(reader, searcherFactory);
this.docsStats = docsStats(lastCommittedSegmentInfos);
this.translogStats = translogStats != null ? translogStats : translogStats(config, lastCommittedSegmentInfos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ especially as opening the translog here writes a new checkpoint file.

@tlrx
Copy link
Member Author

tlrx commented Jul 1, 2019

@dnhatn @ywelsch Thanks for your feedback. I updated the code.

@tlrx tlrx requested a review from ywelsch July 1, 2019 09:27
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Don't forget to add the wait_for_active_shards condition on the 7.x backport.

@tlrx tlrx merged commit 03c2b27 into elastic:master Jul 1, 2019
@tlrx tlrx deleted the expose-translog-stats branch July 1, 2019 11:52
@tlrx
Copy link
Member Author

tlrx commented Jul 1, 2019

Thanks @ywelsch and @dnhatn

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jul 1, 2019
tlrx added a commit that referenced this pull request Jul 2, 2019
tlrx added a commit that referenced this pull request Jul 4, 2019
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
tlrx added a commit that referenced this pull request Jul 4, 2019
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.3.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants