-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace hard-deletes by soft-deletes to maintain document history #29549
Conversation
Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback.
Pinging @elastic/es-distributed |
/cc @jasontedor and @martijnvg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat. I left some comments.
@@ -80,6 +81,7 @@ | |||
private final CircuitBreakerService circuitBreakerService; | |||
private final LongSupplier globalCheckpointSupplier; | |||
private final LongSupplier primaryTermSupplier; | |||
private final MetaDocSupplier metaDocSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is TombstoneDoc a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's better - more explicit.
|
||
private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) { | ||
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON); | ||
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating everything and removing what we don't need, can we maybe fold this into the document mapper and add a createTombstoneDoc method to it that only does what it needs to create the right fields (probably only calls the preParse / postParse methods in the right fields (similar to here)?
private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) { | ||
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON); | ||
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source); | ||
parsedDocument.updateSeqID(seqno, primaryTerm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to be consistent with how the engine set these during indexing - i.e., during the addition to lucene.
@@ -2962,13 +2979,13 @@ public void testSegmentMemoryTrackedWithRandomSearchers() throws Exception { | |||
for (Thread t : threads) { | |||
t.join(); | |||
} | |||
// Close remaining searchers | |||
IOUtils.close(searchers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this move earlier? I think I miss something I don't know - can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a left-over.
please run all tests. |
@elasticmachine please test this |
run sample packaging tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I left some questions none of them are blockers
@@ -363,4 +367,13 @@ public CircuitBreakerService getCircuitBreakerService() { | |||
public LongSupplier getPrimaryTermSupplier() { | |||
return primaryTermSupplier; | |||
} | |||
|
|||
@FunctionalInterface | |||
public interface TombstoneDocSupplier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some javadocs here?
} | ||
|
||
public ParsedDocument createTombstoneDoc(String index, String type, String id) throws MapperParsingException { | ||
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need some identifier that this doc is a tombstone? I am not sure we do at this point but down the road we would no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 . We don't plan to keep many of these around so the overhead is minimal and we'd have the ability to search for them and debug. The question is how to do this. I guess the easiest would be to add a boolean metadata field to the mapping, but that feels like an overkill. In any case, I think this can be a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will think about it and make it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I think we want a unit test on the engine level for this? What am I missing?
We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here. |
Cool. Thanks for explaining. |
Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback. Relates #29530
Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.
Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.
Relates #29530