-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use retention lease in peer recovery of closed indices #48430
Conversation
Pinging @elastic/es-distributed (:Distributed/Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up Nhat, looks good. Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN
throughout ReplicationTracker
and simplify useRetentionLeases
to shard.indexSettings().isSoftDeleteEnabled()
in RecoverySourceHandler
?
Edit: Non-replicated closed indices are not instanciated at all and thus have no IndexShard or ReplicationTracker, thanks David for pointing this out.
|
Yes, that is my plan. |
When you say that's your plan, do you mean to do it in a follow-up or in this PR? |
@DaveCTurner In a follow-up. I can make both changes in this PR if you prefer. |
I would prefer the assertions to be adjusted here, yes, since this PR is strengthening those very invariants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I am seeing occasional failures on this PR branch. For instance, this test fails sometimes:
$ ./gradlew :server:integTest --tests "org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices" -Dtests.iters=100 -Dtests.failfast=true
...
2> java.lang.AssertionError:
Expected: <YELLOW>
but: was <RED>
at __randomizedtesting.SeedInfo.seed([2032DCC3D4D6AA6D:92F6D1A1B2C791EF]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.junit.Assert.assertThat(Assert.java:956)
at org.junit.Assert.assertThat(Assert.java:923)
at org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices(ClusterHealthIT.java:165)
It fails when it closes the index while there is an ongoing recovery that has just sent a retention lease sync. The mechanism is a bit tricky but here's what I think is happening. Prior to this change this sync would fail during the reroute phase with an IndexClosedException
thrown from the IndexNameExpressionResolver
since it wasn't considering closed indices, but with this change we now resolve this index correctly and wait for a minute for the primary to become active:
1> [2019-10-31T05:04:53,924][WARN ][o.e.i.c.IndicesClusterStateService] [node_s1] [index-1][0] retention lease sync failed
1> org.elasticsearch.action.UnavailableShardsException: [index-1][0] primary shard is not active Timeout: [1m], request: [RetentionLeaseSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=2, leases={peer_recovery/-PtC6-JyTJqmI3p3OeA-5g=RetentionLease{id='peer_recovery/-PtC6-JyTJqmI3p3OeA-5g', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}, peer_recovery/UW1qUPngQKusN4ZjmnqKCA=RetentionLease{id='peer_recovery/UW1qUPngQKusN4ZjmnqKCA', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}}}, shardId=[index-1][0], timeout=1m, index='index-1', waitForActiveShards=0}]
1> at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:846) [main/:?]
1> at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:725) [main/:?]
1> at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:677) [main/:?]
1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
1> at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:806) [main/:?]
1> at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [main/:?]
1> at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [main/:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:592) [main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
1> at java.lang.Thread.run(Thread.java:835) [?:?]
The wait is futile, however, because the recovery holds the shard lock from the previous assignment of the shard which prevents us from making another assignment:
1> [2019-11-01T00:15:46,026][WARN ][o.e.i.c.IndicesClusterStateService] [node_s0] [index-3][2] marking and sending shard failed due to [failed to create shard]
1> java.io.IOException: failed to obtain in-memory shard lock
1> at org.elasticsearch.index.IndexService.createShard(IndexService.java:445) ~[main/:?]
1> at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:652) ~[main/:?]
1> at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:164) ~[main/:?]
1> at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:664) [main/:?]
1> at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:640) [main/:?]
1> at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:252) [main/:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:511) [main/:?]
1> at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:508) [main/:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
1> at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
1> at java.lang.Thread.run(Thread.java:835) [?:?]
1> Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [index-3][2]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
1> at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:860) ~[main/:?]
1> at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775) ~[main/:?]
1> at org.elasticsearch.index.IndexService.createShard(IndexService.java:365) ~[main/:?]
1> ... 17 more
@DaveCTurner Thank you for digging into the test failure. I appreciate that :). I have adjusted the RetentionLeaseSyncAction to skip the ReroutePhase. Can you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a handful of questions about the bypassing of the reroute phase.
server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/seqno/RetentionLeaseSyncAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question about a change to the REST client used, but assuming that change was necessary this LGTM. Great work @dnhatn.
@@ -202,7 +202,7 @@ public void testForgetFollower() throws IOException { | |||
|
|||
assertOK(client().performRequest(new Request("POST", "/" + forgetFollower + "/_ccr/pause_follow"))); | |||
|
|||
try (RestClient leaderClient = buildLeaderClient(restClientSettings())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unexpected to me. Can you explain why it's needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing retention leases requires the admin role.
I think this will fix the test failure, although I haven't tested it: diff --git a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
index 0837f431fff..db3818832f6 100644
--- a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
+++ b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
@@ -383,7 +383,7 @@ public class DeterministicTaskQueue {
@Override
public Runnable preserveContext(Runnable command) {
- throw new UnsupportedOperationException();
+ return command;
}
@Override |
@DaveCTurner Thank you very much for your thoughtful review. This PR should be a joint work :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I added a single question, primarily for my understanding I think.
@@ -91,7 +91,7 @@ public synchronized void rescheduleIfNecessary() { | |||
if (logger.isTraceEnabled()) { | |||
logger.trace("scheduling {} every {}", toString(), interval); | |||
} | |||
cancellable = threadPool.schedule(this, interval, getThreadPool()); | |||
cancellable = threadPool.schedule(threadPool.preserveContext(this), interval, getThreadPool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about why this change is necessary?
The concrete tasks all seem to be system like and thus should not really depend on the caller context. If there is some dependency, this could become problematic if a user triggers the creation of an IndexService?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadPool#schedule
does not itself preserve the context of the caller and instead runs the scheduled task in the default context which is not a system context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I knew about that. The question was more about the motivation for moving from simply setting the system context inside the task to preserving it here. I did some double checking on the callers and AFAICS it looks ok, just seemed odd to me to make this change for this specific PR. On second thought, I think it makes sense to preserve the context here, given that the AbstractAsyncTask
is not specific to IndexService
, but could be good to maybe add an assertion about being in system-context to IndexService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I prefer preserving an existing context here since the security implications are much clearer - security bugs lurk in areas where privileges change, so the less of that we do the better. IMO it's a bit trappy that EsThreadPoolExecutor#execute
preserves the caller's context but ThreadPool#schedule
does not, although this is one of the very few places where that matters right now.
It's possible we could assert that we are in system context here, but that seems an overly strong statement to make. We already have tests to assert that we're in a context with appropriate permissions which I think is enough.
The backport depends on #49448.
|
Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <[email protected]>
Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled.
Relates #45136