Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard history retention leases #37165

Closed
24 tasks done
jasontedor opened this issue Jan 6, 2019 · 3 comments
Closed
24 tasks done

Shard history retention leases #37165

jasontedor opened this issue Jan 6, 2019 · 3 comments
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >feature Meta

Comments

@jasontedor
Copy link
Member

jasontedor commented Jan 6, 2019

When a shard of a follower index is consuming shard history from its corresponding shard of its leader index, it could be that the history operations is no longer available on any of the leader shard copies. This can happen if some operations were soft deleted and subsequently merged away before the shard of the following index had a chance to replicate these operations. This has catastrophic consequences for the follower index though as now the only option for the follower index to recover is a full file-based recovery. In the context of cross-cluster replication, this can potentially be over a WAN with limited networking resources. During this file-based recovery, the follower index becomes unavailable, defeating the purpose of being an available copy of the leader index in another cluster.

One idea towards solving this problem is for the shard of a follower index to be able to leave a marker on the corresponding shard of its leader index to notate where in shard history the following shard is. This marker would prevent any operations with sequence number at least at that marker from being eligible to be merged away.

And thus was born the idea of shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API.

Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem.

Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene a dedicated state file and restored during recovery, and broadcast to replicas under certain circumstances.

This issue is a meta-issue for tracking the progress of implementing shard history retention leases. We will proceed with implementing shard history retention leases along the following rough plan:

@jasontedor jasontedor added >feature :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jan 6, 2019
@jasontedor jasontedor self-assigned this Jan 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg martijnvg pinned this issue Jan 8, 2019
@matriv matriv unpinned this issue Jan 8, 2019
dnhatn added a commit that referenced this issue Jan 31, 2019
If a new retention lease is added while a primary's soft-deletes policy
is locked for peer-recovery, that lease won't be baked into the Lucene
commit.

Relates #37165
Relates #37375
dnhatn added a commit that referenced this issue Jan 31, 2019
If a new retention lease is added while a primary's soft-deletes policy
is locked for peer-recovery, that lease won't be baked into the Lucene
commit.

Relates #37165
Relates #37375
dnhatn added a commit that referenced this issue Feb 13, 2019
When a primary shard is recovered from its store, we trim the last
commit (when it's unsafe). If that primary crashes before the recovery
completes, we will lose the committed retention leases because they are
baked in the last commit. With this change, we copy the retention leases
from the last commit to the safe commit when trimming unsafe commits.

Relates #37165
dnhatn added a commit that referenced this issue Feb 13, 2019
When a primary shard is recovered from its store, we trim the last
commit (when it's unsafe). If that primary crashes before the recovery
completes, we will lose the committed retention leases because they are
baked in the last commit. With this change, we copy the retention leases
from the last commit to the safe commit when trimming unsafe commits.

Relates #37165
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Feb 14, 2019
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.

In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (elastic#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
DaveCTurner added a commit that referenced this issue Feb 15, 2019
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.

In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
DaveCTurner added a commit that referenced this issue Feb 15, 2019
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.

In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
DaveCTurner added a commit that referenced this issue Feb 15, 2019
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.

In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
dnhatn added a commit that referenced this issue Feb 20, 2019
This commit introduces the retention leases to ESIndexLevelReplicationTestCase,
then adds some tests verifying that the retention leases replication works
correctly in spite of the presence of the primary failover or out of order
delivery of retention leases sync requests.

Relates #37165
dnhatn added a commit that referenced this issue Feb 21, 2019
This commit introduces the retention leases to ESIndexLevelReplicationTestCase,
then adds some tests verifying that the retention leases replication works
correctly in spite of the presence of the primary failover or out of order
delivery of retention leases sync requests.

Relates #37165
dnhatn added a commit that referenced this issue Feb 21, 2019
This commit introduces the retention leases to ESIndexLevelReplicationTestCase,
then adds some tests verifying that the retention leases replication works
correctly in spite of the presence of the primary failover or out of order
delivery of retention leases sync requests.

Relates #37165
weizijun pushed a commit to weizijun/elasticsearch that referenced this issue Feb 22, 2019
This commit introduces the retention leases to ESIndexLevelReplicationTestCase,
then adds some tests verifying that the retention leases replication works
correctly in spite of the presence of the primary failover or out of order
delivery of retention leases sync requests.

Relates elastic#37165
weizijun pushed a commit to weizijun/elasticsearch that referenced this issue Feb 22, 2019
This commit introduces the retention leases to ESIndexLevelReplicationTestCase,
then adds some tests verifying that the retention leases replication works
correctly in spite of the presence of the primary failover or out of order
delivery of retention leases sync requests.

Relates elastic#37165
DaveCTurner added a commit that referenced this issue Mar 15, 2019
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
DaveCTurner added a commit that referenced this issue Mar 15, 2019
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Mar 15, 2019
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates elastic#37165
DaveCTurner added a commit that referenced this issue Mar 15, 2019
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
DaveCTurner added a commit that referenced this issue Mar 15, 2019
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
@dnhatn
Copy link
Member

dnhatn commented May 4, 2019

Closing since the work to integrate shard history retention leases with recovery is tracked #41536.

@dnhatn dnhatn closed this as completed May 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >feature Meta
Projects
None yet
Development

No branches or pull requests

4 participants