Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene merges should run on the target shard during recovery #10463

Closed
wants to merge 3 commits into from

Conversation

mikemccand
Copy link
Contributor

This is already fixed on 2.0, since we let Lucene launch its own merges again.

But in 1.x, Lucene merges might not run on the target during recovery, causing segment explosion when there are many docs to replay and/or the index buffer is low. This then makes recovery time O(N^2) and can cause issues like #9226.

I just moved launching of the mergeScheduleFuture out of startScheduledTasksIfNeeded (only called once recovery is done) and into createNewEngine. This way whenever the engine is created we also start checking for merges.

I also renamed startScheduledTasksIfNeeded -> startEngineRefresher, and cleaned up a couple unrelated things.

@mikemccand
Copy link
Contributor Author

I moved the mergeScheduleFuture creation to ctor, so now we create it once when the IndexShard is created, not in newEngine.

And I fixed EngineMerge to use engineUnsafe and skip merging if engine is currently null...

@bleskes
Copy link
Contributor

bleskes commented Apr 7, 2015

LGTM

@mikemccand mikemccand added the >bug label Apr 7, 2015
mikemccand added a commit that referenced this pull request Apr 7, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463
mikemccand added a commit that referenced this pull request Apr 7, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463
@mikemccand mikemccand closed this Apr 7, 2015
@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Apr 9, 2015
mikemccand added a commit to mikemccand/elasticsearch that referenced this pull request Apr 11, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(elastic#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like elastic#9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes elastic#10463
mikemccand added a commit that referenced this pull request Apr 11, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like #9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes #10463
@kimchy kimchy added the v1.4.5 label Apr 11, 2015
@clintongormley clintongormley changed the title Core: Lucene merges should run on the target shard during recovery Lucene merges should run on the target shard during recovery May 30, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(elastic#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like elastic#9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes elastic#10463
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
This does not affect 2.0, where we let Lucene launch merges normally
(elastic#8643).

In 1.x, every 1 sec (default), we ask Lucene to kick off any new
merges, but we unfortunately don't turn that logic on in the target
shard until after recovery has finished.

This means if you have a large translog, and/or a smallish index
buffer, way too many segments can accumulate in the target shard
during recovery, making version lookups slower and slower (OI(N^2))
and possibly causing slow recovery issues like elastic#9226.

This fix changes IndexShard to launch merges as soon as the shard is
created, so merging runs during recovery.

Closes elastic#10463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants