Shard history retention leases #37165

jasontedor · 2019-01-06T11:09:38Z

When a shard of a follower index is consuming shard history from its corresponding shard of its leader index, it could be that the history operations is no longer available on any of the leader shard copies. This can happen if some operations were soft deleted and subsequently merged away before the shard of the following index had a chance to replicate these operations. This has catastrophic consequences for the follower index though as now the only option for the follower index to recover is a full file-based recovery. In the context of cross-cluster replication, this can potentially be over a WAN with limited networking resources. During this file-based recovery, the follower index becomes unavailable, defeating the purpose of being an available copy of the leader index in another cluster.

One idea towards solving this problem is for the shard of a follower index to be able to leave a marker on the corresponding shard of its leader index to notate where in shard history the following shard is. This marker would prevent any operations with sequence number at least at that marker from being eligible to be merged away.

And thus was born the idea of shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API.

Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem.

Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to ~~Lucene~~ a dedicated state file and restored during recovery, and broadcast to replicas under certain circumstances.

This issue is a meta-issue for tracking the progress of implementing shard history retention leases. We will proceed with implementing shard history retention leases along the following rough plan:

elasticmachine · 2019-01-06T11:09:39Z

Pinging @elastic/es-distributed

elasticmachine · 2019-01-06T11:09:40Z

Pinging @elastic/es-core-features

If a new retention lease is added while a primary's soft-deletes policy is locked for peer-recovery, that lease won't be baked into the Lucene commit. Relates #37165 Relates #37375

When a primary shard is recovered from its store, we trim the last commit (when it's unsafe). If that primary crashes before the recovery completes, we will lose the committed retention leases because they are baked in the last commit. With this change, we copy the retention leases from the last commit to the safe commit when trimming unsafe commits. Relates #37165

Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (elastic#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.

Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.

This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165

This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates elastic#37165

Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates #37165

Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates elastic#37165

Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates #37165

dnhatn · 2019-05-04T13:22:35Z

Closing since the work to integrate shard history retention leases with recovery is tracked #41536.

jasontedor self-assigned this Jan 6, 2019

This was referenced Jan 6, 2019

Introduce shard history retention leases #37167

Merged

Fix handling of fractional time value settings #37171

Merged

Introduce retention lease expiration #37195

Merged

martijnvg pinned this issue Jan 8, 2019

matriv unpinned this issue Jan 8, 2019

colings86 mentioned this issue Jan 9, 2019

[ILM] Allow ILM and CCR to work well together #34648

Closed

3 tasks

This was referenced Jan 10, 2019

Add validation for retention lease construction #37312

Merged

Introduce retention lease persistence #37375

Merged

Introduce retention lease syncing #37398

Merged

Introduce retention lease serialization #37447

Merged

jasontedor mentioned this issue Jan 27, 2019

Sync retention leases on expiration #37902

Merged

dnhatn mentioned this issue Jan 28, 2019

Soft-deletes policy should always fetch latest leases #37940

Merged

jasontedor mentioned this issue Jan 29, 2019

Introduce retention leases versioning #37951

Merged

dnhatn mentioned this issue Jan 29, 2019

Introduce time-based retention policy for soft-deletes #34943

Closed

jasontedor mentioned this issue Jan 29, 2019

Expose retention leases in shard stats #37991

Merged

dnhatn mentioned this issue Jan 29, 2019

Copy retention leases when trim unsafe commits #37995

Merged

jasontedor mentioned this issue Jan 29, 2019

Fix limit on retaining sequence number #37992

Merged

jasontedor added the Meta label Jan 29, 2019

dnhatn mentioned this issue Jan 31, 2019

[CCR] Define a good default for the soft delete retention policy #34908

Closed

jasontedor assigned dnhatn Feb 2, 2019

DaveCTurner mentioned this issue Feb 14, 2019

Recover peers from translog, ignoring soft deletes #38904

Merged

This was referenced Feb 16, 2019

Introduce retention lease state file #39004

Merged

Add some logging related to retention lease syncing #39066

Merged

Remove retention leases when unfollowing #39088

Merged

Allow retention lease operations under blocks #39089

Merged

jasontedor assigned DaveCTurner Feb 21, 2019

jasontedor mentioned this issue Feb 23, 2019

Renew retention leases while following #39335

Merged

DaveCTurner mentioned this issue Feb 25, 2019

Create retention leases file during recovery #39359

Merged

This was referenced Feb 27, 2019

Add BWC for retention leases #39482

Merged

Introduce forget follower API #39718

Merged

Rename retention lease setting #39719

Merged

DaveCTurner mentioned this issue Mar 15, 2019

Create retention leases file during recovery (#39359) #40082

Merged

dnhatn closed this as completed May 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard history retention leases #37165

Shard history retention leases #37165

jasontedor commented Jan 6, 2019 •

edited

Loading

elasticmachine commented Jan 6, 2019

elasticmachine commented Jan 6, 2019

dnhatn commented May 4, 2019

Shard history retention leases #37165

Shard history retention leases #37165

Comments

jasontedor commented Jan 6, 2019 • edited Loading

elasticmachine commented Jan 6, 2019

elasticmachine commented Jan 6, 2019

dnhatn commented May 4, 2019

jasontedor commented Jan 6, 2019 •

edited

Loading