Replace hard-deletes by soft-deletes to maintain document history #29549

dnhatn · 2018-04-17T02:56:08Z

Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.

Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.

Relates #29530

Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback.

elasticmachine · 2018-04-17T02:56:09Z

Pinging @elastic/es-distributed

dnhatn · 2018-04-17T02:56:30Z

/cc @jasontedor and @martijnvg

bleskes

Thanks Nhat. I left some comments.

bleskes · 2018-04-17T13:52:30Z

server/src/main/java/org/elasticsearch/index/engine/EngineConfig.java

@@ -80,6 +81,7 @@
    private final CircuitBreakerService circuitBreakerService;
    private final LongSupplier globalCheckpointSupplier;
    private final LongSupplier primaryTermSupplier;
+    private final MetaDocSupplier metaDocSupplier;


is TombstoneDoc a better name?

Yes, it's better - more explicit.

bleskes · 2018-04-17T13:56:12Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+
+    private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
+        final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
+        final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);


Instead of creating everything and removing what we don't need, can we maybe fold this into the document mapper and add a createTombstoneDoc method to it that only does what it needs to create the right fields (probably only calls the preParse / postParse methods in the right fields (similar to here)?

bleskes · 2018-04-17T13:58:58Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+    private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) {
+        final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON);
+        final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source);
+        parsedDocument.updateSeqID(seqno, primaryTerm);


I prefer to be consistent with how the engine set these during indexing - i.e., during the addition to lucene.

bleskes · 2018-04-17T14:12:16Z

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java

@@ -2962,13 +2979,13 @@ public void testSegmentMemoryTrackedWithRandomSearchers() throws Exception {
        for (Thread t : threads) {
            t.join();
        }
+        // Close remaining searchers
+        IOUtils.close(searchers);


Why was this move earlier? I think I miss something I don't know - can you explain?

It's a left-over.

dnhatn · 2018-04-18T19:08:05Z

@bleskes and @simonw This is ready for review. Can you please take a look. Thank you!

dnhatn · 2018-04-18T20:06:40Z

please run all tests.

dnhatn · 2018-04-18T20:07:10Z

@elasticmachine please test this

dnhatn · 2018-04-18T21:10:19Z

run sample packaging tests.

s1monw

LGTM I left some questions none of them are blockers

s1monw · 2018-04-19T07:08:02Z

server/src/main/java/org/elasticsearch/index/engine/EngineConfig.java

@@ -363,4 +367,13 @@ public CircuitBreakerService getCircuitBreakerService() {
    public LongSupplier getPrimaryTermSupplier() {
        return primaryTermSupplier;
    }
+
+    @FunctionalInterface
+    public interface TombstoneDocSupplier {


maybe add some javadocs here?

s1monw · 2018-04-19T07:09:28Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

+    }
+
+    public ParsedDocument createTombstoneDoc(String index, String type, String id) throws MapperParsingException {
+        final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);


do we need some identifier that this doc is a tombstone? I am not sure we do at this point but down the road we would no?

+1 . We don't plan to keep many of these around so the overhead is minimal and we'd have the ability to search for them and debug. The question is how to do this. I guess the easiest would be to add a boolean metadata field to the mapping, but that feels like an overkill. In any case, I think this can be a follow up.

Yes, I will think about it and make it in a follow-up.

bleskes

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

dnhatn · 2018-04-19T16:20:44Z

This LGTM, but I think we want a unit test on the engine level for this? What am I missing?

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

bleskes · 2018-04-19T17:29:33Z

We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here.

Cool. Thanks for explaining.

dnhatn · 2018-04-20T00:44:31Z

Thanks @bleskes and @simonw for reviewing.

Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback. Relates #29530

dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 17, 2018

dnhatn requested review from s1monw and bleskes April 17, 2018 02:56

bleskes reviewed Apr 17, 2018

View reviewed changes

dnhatn added 8 commits April 17, 2018 18:44

Merge branch 'ccr' into soft-deletes

1b0d48d

Rename

0693cf6

Enable docStats tests

cdad16b

create tombstone from document mapper

19a091b

Merge branch 'ccr' into soft-deletes

a81eb8a

Create empty source inside

512a816

metadataMappers -> metadataFieldsMappers

d173563

style check

32e8f0a

s1monw approved these changes Apr 19, 2018

View reviewed changes

bleskes reviewed Apr 19, 2018

View reviewed changes

bleskes approved these changes Apr 19, 2018

View reviewed changes

Add javadoc

9f40bab

dnhatn merged commit ac84879 into elastic:ccr Apr 20, 2018

dnhatn deleted the soft-deletes branch April 20, 2018 00:45

dnhatn added the backport pending label Apr 20, 2018

dnhatn mentioned this pull request Apr 23, 2018

Use soft-deletes to maintain document history #29530

Closed

14 tasks

dnhatn removed the backport pending label May 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace hard-deletes by soft-deletes to maintain document history #29549

Replace hard-deletes by soft-deletes to maintain document history #29549

dnhatn commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

dnhatn commented Apr 17, 2018

bleskes left a comment

bleskes Apr 17, 2018

dnhatn Apr 17, 2018

bleskes Apr 17, 2018

bleskes Apr 17, 2018

bleskes Apr 17, 2018

dnhatn Apr 17, 2018

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

s1monw left a comment

s1monw Apr 19, 2018

s1monw Apr 19, 2018

bleskes Apr 19, 2018

dnhatn Apr 19, 2018

bleskes left a comment

dnhatn commented Apr 19, 2018

bleskes commented Apr 19, 2018

dnhatn commented Apr 20, 2018

Replace hard-deletes by soft-deletes to maintain document history #29549

Replace hard-deletes by soft-deletes to maintain document history #29549

Conversation

dnhatn commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

dnhatn commented Apr 17, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

dnhatn commented Apr 18, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

dnhatn commented Apr 19, 2018

bleskes commented Apr 19, 2018

dnhatn commented Apr 20, 2018