-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Lucene soft-deletes in peer recovery #30522
Conversation
We are targeting to replace translog by Lucene history in peer-recovery. In peer-recovery, the primary transfers a safe commit to replicas then sends subsequent operations after the local checkpoint of that safe commit. This requires the primary having all operations after the checkpoint of the current safe commit. This commit hardens the soft-deletes retention merge policy by taking safe-commit into account.
Pinging @elastic/es-distributed |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some questions
@@ -1049,6 +1049,7 @@ public void testFilterCacheStats() throws Exception { | |||
assertEquals(DocWriteResponse.Result.DELETED, client().prepareDelete("index", "type", "2").get().getResult()); | |||
if (IndexSettings.INDEX_SOFT_DELETES_SETTING.get(settings)) { | |||
persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | |||
forceMerge(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? can we randomly do it?
if (primary.indexSettings().isSoftDeleteEnabled()) { | ||
// Need to sync global checkpoint and create a new safe commit as the soft-deletes MergePolicy depends on it. | ||
primary.flush(new FlushRequest()); | ||
primary.forceMerge(new ForceMergeRequest().onlyExpungeDeletes(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? the force merge seems not necessary maybe do it randomly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was wrong - a flush is enough. I pushed 9ae627c.
private final LongSupplier globalCheckpointSupplier; | ||
private int retentionLockCount; | ||
private long checkpointOfSafeCommit; | ||
private volatile long minRequiredSeqNoForRecovery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make all methods synchronized then we don't need to do the volatile thing here. the contention on these locks seems very low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 176b497
*/ | ||
synchronized Releasable acquireRetentionLock() { | ||
retentionLockCount++; | ||
assert retentionLockCount > 0 : "Invalid number of retention locks [" + retentionLockCount + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is needed I think the assertion should happen before we mutate?
@simonw I've addressed your comments. Can you please take another look? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some minors. I think @bleskes should also look this. LGTM
// Package-level for testing | ||
synchronized long getMinSeqNoToRetain() { | ||
final long minSeqNoForQueryingChanges = globalCheckpointSupplier.getAsLong() + 1 - retentionOperations; | ||
return Math.min(minRequiredSeqNoForRecovery, minSeqNoForQueryingChanges); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding the 1
on assignment which is done in 2 places should we do it here?
@@ -2919,7 +2917,12 @@ public void testSegmentMemoryTrackedInBreaker() throws Exception { | |||
|
|||
// Deleting a doc causes its memory to be freed from the breaker | |||
deleteDoc(primary, "_doc", "0"); | |||
primary.sync(); // need to sync global checkpoint as the soft-deletes retention MergePolicy depends on it. | |||
if (primary.indexSettings().isSoftDeleteEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this I think @bleskes needs to look at this
persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | ||
// Need to persist the global checkpoint and flush a new safe commit | ||
// as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. | ||
persistGlobalCheckpoint("index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this I think @bleskes needs to look at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the comment - @dnhatn can you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I would also like to understand (and see a test) how this interact with ongoing merges that have started before the safe commit was updated and thus use old information (I think it's OK but we need to know for sure). This PR tries to guarantee that all operations above the commit's local checkpoint will be available in any future NRT reader - that may include the result of such a merge.
private void updateMinRequiredSeqNoForRecovery() { | ||
assert Thread.holdsLock(this) : Thread.currentThread().getName(); | ||
if (retentionLockCount == 0) { | ||
this.minRequiredSeqNoForRecovery = checkpointOfSafeCommit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name this storeRecovery? I think it will help clarify that this is only used for store recovery.
persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | ||
// Need to persist the global checkpoint and flush a new safe commit | ||
// as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. | ||
persistGlobalCheckpoint("index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the comment - @dnhatn can you please explain?
primary.sync(); // need to sync global checkpoint as the soft-deletes retention MergePolicy depends on it. | ||
if (primary.indexSettings().isSoftDeleteEnabled()) { | ||
// Need to persist the global checkpoint and flush a new safe commit | ||
// as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. Can you please explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment for this.
* make sure that all operations after the local checkpoint of the safe commit are retained until the lock is released. | ||
* This is a analogy to the translog's retention lock; see {@link Translog#acquireRetentionLock()} | ||
*/ | ||
synchronized Releasable acquireRetentionLock() { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@bleskes and @s1monw I've included the peer-recovery usage in this PR so that we can have the whole picture of this change. I will continue working on tests. |
Looks great IMO |
@bleskes this is still waiting on you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. My only concern is that we should randomize the usage of soft deletes in our standard testing. We don't plan to turn it on by default for 6.x and we should make sure everything works without it as well as with.
@@ -257,6 +257,18 @@ private LocalCheckpointTracker createLocalCheckpointTracker( | |||
return localCheckpointTrackerSupplier.apply(maxSeqNo, localCheckpoint); | |||
} | |||
|
|||
private SoftDeletesPolicy newSoftDeletesPolicy() throws IOException { | |||
final Map<String, String> commitUserData = store.readLastCommittedSegmentsInfo().userData; | |||
final long lastSeqNoSeenByMergePolicy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to minRetainedSeqNo ?
@@ -468,18 +480,39 @@ public void syncTranslog() throws IOException { | |||
} | |||
|
|||
@Override | |||
public Closeable acquireTranslogRetentionLock() { | |||
return getTranslog().acquireRetentionLock(); | |||
public Translog.Snapshot newTranslogSnapshotBetween(long minSeqNo, long maxSeqNo) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this as a follow up, in favor of readHistoryOperations ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will remove this method after making primary-replica resync use Lucene.
store = createStore(); | ||
engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null, null, globalCheckpoint::get)); | ||
assertThat(engine.getMinRetainedSeqNo(), equalTo(0L)); | ||
long lastSeqNoSeenByMP = engine.getMinRetainedSeqNo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename
@bleskes Once this PR is in, I will make a follow-up to randomize the soft-deletes setting? Are you okay? |
Of course. |
@elasticmachine test this please |
This commit adds Lucene soft-deletes as another source for peer-recovery besides translog. Relates #29530
Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates elastic#30522 Relates elastic#29530
…es (#33190) Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates #30522 Relates #29530
…es (#33190) Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates #30522 Relates #29530
This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch. Highlight works in this PR include: 1. Replace hard-deletes by soft-deletes in InternalEngine 2. Use _recovery_source if _source is disabled or modified (elastic#31106) 3. Soft-deletes retention policy based on the global checkpoint (elastic#30335) 4. Read operation history from Lucene instead of translog (elastic#30120) 5. Use Lucene history in peer-recovery (elastic#30522) These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand [email protected] Co-authored-by: Boaz Leskes [email protected] Co-authored-by: Jason Tedor [email protected] Co-authored-by: Martijn van Groningen [email protected] Co-authored-by: Nhat Nguyen [email protected] Co-authored-by: Simon Willnauer [email protected]
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (elastic#31106) - Soft-deletes retention policy based on the global checkpoint (elastic#30335) - Read operation history from Lucene instead of translog (elastic#30120) - Use Lucene history in peer-recovery (elastic#30522) Relates elastic#30086 Closes elastic#29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>
This commit adds Lucene soft-deletes as another source for peer-recovery besides translog.
Relates #29530