Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace hard-deletes by soft-deletes to maintain document history #29549

Merged
merged 10 commits into from
Apr 20, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 17, 2018

Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.

Relates #29530

Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.
@dnhatn dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 17, 2018
@dnhatn dnhatn requested review from s1monw and bleskes April 17, 2018 02:56
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Apr 17, 2018

/cc @jasontedor and @martijnvg

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat. I left some comments.

@@ -80,6 +81,7 @@
private final CircuitBreakerService circuitBreakerService;
private final LongSupplier globalCheckpointSupplier;
private final LongSupplier primaryTermSupplier;
private final MetaDocSupplier metaDocSupplier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is TombstoneDoc a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's better - more explicit.


private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating everything and removing what we don't need, can we maybe fold this into the document mapper and add a createTombstoneDoc method to it that only does what it needs to create the right fields (probably only calls the preParse / postParse methods in the right fields (similar to here)?

private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);
parsedDocument.updateSeqID(seqno, primaryTerm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to be consistent with how the engine set these during indexing - i.e., during the addition to lucene.

@@ -2962,13 +2979,13 @@ public void testSegmentMemoryTrackedWithRandomSearchers() throws Exception {
for (Thread t : threads) {
t.join();
}
// Close remaining searchers
IOUtils.close(searchers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this move earlier? I think I miss something I don't know - can you explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a left-over.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 18, 2018

@bleskes and @simonw This is ready for review. Can you please take a look. Thank you!

@dnhatn
Copy link
Member Author

dnhatn commented Apr 18, 2018

please run all tests.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 18, 2018

@elasticmachine please test this

@dnhatn
Copy link
Member Author

dnhatn commented Apr 18, 2018

run sample packaging tests.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I left some questions none of them are blockers

@@ -363,4 +367,13 @@ public CircuitBreakerService getCircuitBreakerService() {
public LongSupplier getPrimaryTermSupplier() {
return primaryTermSupplier;
}

@FunctionalInterface
public interface TombstoneDocSupplier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some javadocs here?

}

public ParsedDocument createTombstoneDoc(String index, String type, String id) throws MapperParsingException {
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need some identifier that this doc is a tombstone? I am not sure we do at this point but down the road we would no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 . We don't plan to keep many of these around so the overhead is minimal and we'd have the ability to search for them and debug. The question is how to do this. I guess the easiest would be to add a boolean metadata field to the mapping, but that feels like an overkill. In any case, I think this can be a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will think about it and make it in a follow-up.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 19, 2018

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

@bleskes
Copy link
Contributor

bleskes commented Apr 19, 2018

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

Cool. Thanks for explaining.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 20, 2018

Thanks @bleskes and @simonw for reviewing.

@dnhatn dnhatn merged commit ac84879 into elastic:ccr Apr 20, 2018
@dnhatn dnhatn deleted the soft-deletes branch April 20, 2018 00:45
dnhatn added a commit that referenced this pull request May 10, 2018
Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.

Relates #29530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants