Integrates soft-deletes into Elasticsearch #33222

dnhatn · 2018-08-29T01:00:45Z

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

Replace hard-deletes by soft-deletes in InternalEngine
Use _recovery_source if _source is disabled or modified (Use a _recovery_source if source is omitted or modified #31106)
Soft-deletes retention policy based on the global checkpoint (Introduce soft-deletes retention policy based on global checkpoint #30335)
Read operation history from Lucene instead of translog ([CCR] Read changes from Lucene instead of translog #30120)
Use Lucene history in peer-recovery (Use Lucene soft-deletes in peer recovery #30522)

These pieces were reviewed already in the feature branch but we
would like to give them an extra look before pulling into the upstream.

Relates #30086
Closes #29530

These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand [email protected]
Co-authored-by: Boaz Leskes [email protected]
Co-authored-by: Jason Tedor [email protected]
Co-authored-by: Martijn van Groningen [email protected]
Co-authored-by: Nhat Nguyen [email protected]
Co-authored-by: Simon Willnauer [email protected]

This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch. Highlight works in this PR include: 1. Replace hard-deletes by soft-deletes in InternalEngine 2. Use _recovery_source if _source is disabled or modified (elastic#31106) 3. Soft-deletes retention policy based on the global checkpoint (elastic#30335) 4. Read operation history from Lucene instead of translog (elastic#30120) 5. Use Lucene history in peer-recovery (elastic#30522) These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>

elasticmachine · 2018-08-29T01:00:47Z

Pinging @elastic/es-distributed

dnhatn · 2018-08-29T01:01:26Z

@elastic/es-distributed Can you please also have a look? Thank you!

We should enable by default in 7.0 in a follow-up.

s1monw

Looks awesome. Great piece of work! I left some comments most of them are nits but a couple of minors.

s1monw · 2018-08-29T06:38:44Z

server/src/main/java/org/elasticsearch/common/lucene/Lucene.java

+                    }
+                    // TODO: Avoid recalculate numDocs everytime.
+                    int numDocs = 0;
+                    for (int i = 0; i < hardLiveDocs.length(); i++) {


we are still waiting for a lucene release to fix this right? this is https://issues.apache.org/jira/browse/LUCENE-8458 ?

if so please put a comment in here referencing the issue

Yes, but there is something not clear on my side. I will add a comment then make the change in a follow-up so we can clarify things.

what is not clean on your side?

s1monw · 2018-08-29T06:45:28Z

server/src/main/java/org/elasticsearch/index/IndexSettings.java

+    /**
+     * Specifies if the index should use soft-delete instead of hard-delete for update/delete operations.
+     */
+    public static final Setting<Boolean> INDEX_SOFT_DELETES_SETTING =


I wonder if we should use a validator here like we use here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetaData.java#L200 to validate that the index this is set on is on version 7.0 or higher? I think this would prevent setting this on other indices and prevent confusion?

I tried but was unable to get it done with the current Validator. I think we need to extend the Validator to pass an entire Settings instance to make this possible (previous discussion #25560 (comment)). Would it okay if I make this in a follow-up?

s1monw · 2018-08-29T06:46:00Z

server/src/main/java/org/elasticsearch/index/IndexSettings.java

+     * documents increases the chance of operation-based recoveries and allows querying a longer history of documents.
+     * If soft-deletes is enabled, an engine by default will retain all operations up to the global checkpoint.
+     **/
+    public static final Setting<Long> INDEX_SOFT_DELETES_RETENTION_OPERATIONS_SETTING =


same comment as above?

should we only allow setting this one if soft deletes are enabled?

s1monw · 2018-08-29T06:47:57Z

server/src/main/java/org/elasticsearch/index/engine/Engine.java

+    /**
+     * Creates a new history snapshot from Lucene for reading operations whose seqno in the requesting seqno range (both inclusive)
+     */
+    public abstract Translog.Snapshot newLuceneChangesSnapshot(String source, MapperService mapperService,


can we just call this newChangesSnapshot ?

s1monw · 2018-08-29T06:50:56Z

server/src/main/java/org/elasticsearch/index/engine/RecoverySourcePruneMergePolicy.java

+        builder.add(new DocValuesFieldExistsQuery(recoverySourceField), BooleanClause.Occur.FILTER);
+        builder.add(retainSourceQuerySupplier.get(), BooleanClause.Occur.FILTER);
+        IndexSearcher s = new IndexSearcher(reader);
+        s.setQueryCache(null);


I know this is not necessary per-se but I think we should call s.rewrite(builder.build()) and pass the result of this to s.createWeigth(...) we missed this also in lucene. ie if you'd pass a prefix query to this it would fail. I realized this yesterday when I worked on something else. :)

s1monw · 2018-08-29T06:56:44Z

server/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java

+                // TODO: We haven't had timestamp for Index operations in Lucene yet, we need to loosen this check without timestamp.
+                // We don't store versionType in Lucene index, we need to exclude it from this check
+                final boolean sameOp;
+                if (newOp instanceof Translog.Index && prvOp instanceof Translog.Index) {


can we maybe fix the equals method of Operation rather than this? WE can do it in a followup no worries. I think versionType is maybe not needed for comparison?

I removed versionType from the comment. I will move this comparison to the Operation's equals.

s1monw · 2018-08-29T06:59:13Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            try (Translog.Snapshot snapshot =
+                     newLuceneChangesSnapshot(source, mapperService, Math.max(0, startingSeqNo), Long.MAX_VALUE, false)) {
+                return snapshot.totalOperations();
+            } catch (IOException ex) {


should we catch Exception here instead of IOException and check if we need to fail?

I removed this catch and added a tragic check when we create a new snapshot.

s1monw · 2018-08-29T07:05:36Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        @Override
+        public void afterRefresh(boolean didRefresh) {
+            if (didRefresh) {
+                refreshedCheckpoint.set(pendingCheckpoint);


can we add an assertion that we actually set the pendingCheckpoint?

There is a bug with the current implementation. I added a test and fixed this in
4ed7907

s1monw · 2018-08-29T07:05:42Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+     */
+    final long lastRefreshedCheckpoint() {
+        return lastRefreshedCheckpointListener.refreshedCheckpoint.get();
+    }


extra newline please

jpountz · 2018-08-29T15:53:20Z

server/src/main/java/org/elasticsearch/index/IndexSettings.java

+     * documents increases the chance of operation-based recoveries and allows querying a longer history of documents.
+     * If soft-deletes is enabled, an engine by default will retain all operations up to the global checkpoint.
+     **/
+    public static final Setting<Long> INDEX_SOFT_DELETES_RETENTION_OPERATIONS_SETTING =


should we only allow setting this one if soft deletes are enabled?

jpountz · 2018-08-29T15:55:21Z

server/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

@@ -101,6 +104,9 @@ private void updateTranslogDeletionPolicy() throws IOException {
        assert minRequiredGen <= lastGen : "minRequiredGen must not be greater than lastGen";
        translogDeletionPolicy.setTranslogGenerationOfLastCommit(lastGen);
        translogDeletionPolicy.setMinTranslogGenerationForRecovery(minRequiredGen);
+
+        softDeletesPolicy.setLocalCheckpointOfSafeCommit(
+            Long.parseLong(safeCommit.getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)));


Do we need to care about commits that don't have values for this key?

We ensure that all index commits should have this key.

jpountz · 2018-08-29T16:09:39Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-    private static final class IndexingStrategy {
+    private void addStaleDocs(final List<ParseContext.Document> docs, final IndexWriter indexWriter) throws IOException {
+        assert softDeleteEnabled : "Add history documents but soft-deletes is disabled";
+        docs.forEach(d -> d.add(softDeletesField));


nit: a for loop would feel more natural here?

jpountz · 2018-08-29T16:46:24Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+    @Override
+    public Closeable acquireRetentionLockForPeerRecovery() {
+        final Closeable translogLock = translog.acquireRetentionLock();
+        final Releasable softDeletesLock = softDeletesPolicy.acquireRetentionLock();


do we need to take care of releasing the translog lock if acquiring the soft deletes lock fails?

Great catch. We should acquire one of them but not both.

jpountz · 2018-08-29T16:52:21Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);
+        final Sort sortedBySeqNoThenByTerm = new Sort(
+            new SortedNumericSortField(SeqNoFieldMapper.NAME, SortField.Type.LONG),
+            new SortedNumericSortField(SeqNoFieldMapper.PRIMARY_TERM_NAME, SortField.Type.LONG, true)


you could use regular SortField instances since these fields are single-valued

jpountz · 2018-08-29T17:19:36Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+    }
+
+    private TopDocs searchOperations(ScoreDoc after) throws IOException {
+        final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);


I'm not sure how much it would help, but when after is not null, we could cast to a FieldDoc to extract is seq no and use it as a lower bound of the range query. That would help skip documents that have already been visited more efficiently.

Yes, I use "lastSeenSeqNo + 1" as the lower bound.

jpountz · 2018-08-29T17:20:32Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        if (fromSeqNo < 0 || toSeqNo < 0 || fromSeqNo > toSeqNo) {
+            throw new IllegalArgumentException("Invalid range; from_seqno [" + fromSeqNo + "], to_seqno [" + toSeqNo + "]");
+        }
+        if (searchBatchSize < 0) {


I think we need to reject 0 too?

dnhatn · 2018-08-29T21:49:22Z

@s1monw and @jpountz I've addressed all your comments except the settings suggestion. Can you please have another look? Thank you!

s1monw

LGTM

s1monw · 2018-08-30T06:40:23Z

server/src/main/java/org/elasticsearch/common/lucene/Lucene.java

+                    }
+                    // TODO: Avoid recalculate numDocs everytime.
+                    int numDocs = 0;
+                    for (int i = 0; i < hardLiveDocs.length(); i++) {


what is not clean on your side?

Today we add a NoOp to Lucene and translog if we fail to process an indexing operation. However, we are only adding NoOps to translog for delete operations. In order to have a complete history in Lucene, we should add NoOps of failed delete operations to both Lucene and translog. Relates elastic#29530

dnhatn · 2018-08-30T13:21:49Z

The last CI failed because we changed the soft-deletes setting during restoring a snapshot. I added the soft-deletes setting to the UNMODIFIABLE_SETTINGS list in 96d5f0d.

dnhatn · 2018-08-30T15:28:41Z

Hmm. CI failed due to #32299.

@elasticmachine retest this please.

dnhatn · 2018-08-30T21:01:28Z

Thanks so much @s1monw and @jpountz!

dnhatn · 2018-08-30T21:02:29Z

@elasticmachine run sample packaging tests please

dnhatn · 2018-08-30T21:05:33Z

@elasticmachine retest this please.

dnhatn · 2018-08-30T23:30:58Z

Another watcher test failure ./gradlew :x-pack:plugin:integTestRunner -Dtests.seed=3DA1EDA8F6DFAADB -Dtests.class=org.elasticsearch.xpack.test.rest.XPackRestIT -Dtests.method="test {p0=watcher/usage/10_basic/Test watcher usage stats output}" -Dtests.security.manager=true -Dtests.locale=nl -Dtests.timezone=Africa/Gaborone -Dcompiler.java=10 -Druntime.java=8 -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*

@elasticmachine test this please.

Revert to correct co-author tags. This reverts commit 6dd0aa5.

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (elastic#31106) - Soft-deletes retention policy based on the global checkpoint (elastic#30335) - Read operation history from Lucene instead of translog (elastic#30120) - Use Lucene history in peer-recovery (elastic#30522) Relates elastic#30086 Closes elastic#29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>

* master: Integrates soft-deletes into Elasticsearch (#33222) Revert "Integrates soft-deletes into Elasticsearch (#33222)" Add support for "authorization_realms" (#33262)

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Boaz Leskes <[email protected]> Co-authored-by: Jason Tedor <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Nhat Nguyen <[email protected]> Co-authored-by: Simon Willnauer <[email protected]>

Relates #33222

* 6.x: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test TEST: Disable soft-deletes in ParentChildTestCase TEST: Disable randomized soft-deletes settings Integrates soft-deletes into Elasticsearch (#33222) drop `index.shard.check_on_startup: fix` (#32279) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi [DOCS] Moves ml folder from x-pack/docs to docs (#33248) TEST: mute more SmokeTestWatcherWithSecurityIT tests [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307) MINOR: Remove Dead Code from PathTrie (#33280) (#33306) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Different handling for security specific errors in the CLI. Fix for #33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) Enable forbiddenapis server java9 (#33245) [MUTE] SmokeTestWatcherWithSecurityIT flaky tests Add region ISO code to GeoIP Ingest plugin (#31669) (#33276) Don't be strict for 6.x Update serialization versions for custom IndexMetaData backport Replace IndexMetaData.Custom with Map-based custom metadata (#32749) Painless: Fix Bindings Bug (#33274) SQL: prevent duplicate generation for repeated aggs (#33252) TEST: Mute testMonitorClusterHealth Fix serialization of empty field capabilities response (#33263) Fix nested _source retrieval with includes/excludes (#33180) [DOCS] TLS file resources are reloadable (#33258) Watcher: Ensure TriggerEngine start replaces existing watches (#33157) Ignore module-info in jar hell checks (#33011) Fix docs build after #33241 [DOC] Repository GCS ADC not supported (#33238) Upgrade to latest Gradle 4.10 (#32801) Fix/30904 cluster formation part2 (#32877) Move file-based discovery to core (#33241) HLRC: add client side RefreshPolicy (#33209) [Kerberos] Add unsupported languages for tests (#33253) Watcher: Reload properly on remote shard change (#33167) Fix classpath security checks for external tests. (#33066) [Rollup] Only allow aggregating on multiples of configured interval (#32052) Added deprecation warning for rescore in scroll queries (#33070) Apply settings filter to get cluster settings API (#33247) [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743) HLRC: create base timed request class (#33216) HLRC: Use Optional in validation logic (#33104) Painless: Add Bindings (#33042)

We can have multiple documents in Lucene with the same seq_no for parent-child documents (or without rollback). In this case, the usage "lastSeenSeqNo + 1" is an off-by-one error as it may miss some documents. This error merely affects the `skippedOperations` contract. See: #33222 (comment) Closes #33318

Today we don't store the auto-generated timestamp of append-only operations in Lucene; and assign -1 to every index operations constructed from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica if a retry append-only arrives first via peer-recovery; then an original append-only arrives via replication. Since the retry append-only (delivered via recovery) does not have timestamp, the replica will happily optimizes the original request while it should not. This change transmits the max auto-generated timestamp from the primary to replicas before translog phase in peer recovery. This timestamp will prevent replicas from optimizing append-only requests if retry counterparts have been processed. Relates #33656 Relates #33222

Today we don't store the auto-generated timestamp of append-only operations in Lucene; and assign -1 to every index operations constructed from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica if a retry append-only arrives first via peer-recovery; then an original append-only arrives via replication. Since the retry append-only (delivered via recovery) does not have timestamp, the replica will happily optimizes the original request while it should not. This change transmits the max auto-generated timestamp from the primary to replicas before translog phase in peer recovery. This timestamp will prevent replicas from optimizing append-only requests if retry counterparts have been processed. Relates elastic#33656 Relates elastic#33222

Today we don't store the auto-generated timestamp of append-only operations in Lucene; and assign -1 to every index operations constructed from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica if a retry append-only arrives first via peer-recovery; then an original append-only arrives via replication. Since the retry append-only (delivered via recovery) does not have timestamp, the replica will happily optimizes the original request while it should not. This change transmits the max auto-generated timestamp from the primary to replicas before translog phase in peer recovery. This timestamp will prevent replicas from optimizing append-only requests if retry counterparts have been processed. Relates #33656 Relates #33222

This change enables soft-deletes by default on ES 7.0.0 or later. Relates #33222 Co-authored-by: Jason Tedor <[email protected]>

dnhatn requested review from jpountz, martijnvg, s1monw, bleskes and jasontedor August 29, 2018 01:00

dnhatn added 2 commits August 28, 2018 22:48

disable soft-deletes by default

db079dc

We should enable by default in 7.0 in a follow-up.

docs: remove min_retained_seq_no from the flush doc

8cfc7cb

s1monw requested changes Aug 29, 2018

View reviewed changes

jpountz approved these changes Aug 29, 2018

View reviewed changes

dnhatn added 3 commits August 29, 2018 17:22

Merge branch 'master' into soft-deletes

4c73e52

fix refresh

4ed7907

feedback

2367905

dnhatn requested a review from s1monw August 29, 2018 21:47

s1monw approved these changes Aug 30, 2018

View reviewed changes

dnhatn added 3 commits August 30, 2018 07:57

Merge branch 'master' into soft-deletes

90370bb

Add soft-deletes to UNMODIFIABLE_SETTINGS when restore snapshot

96d5f0d

Merge branch 'master' into soft-deletes

21ced71

Merge branch 'master' into soft-deletes

fab5b18

dnhatn merged commit 6dd0aa5 into elastic:master Aug 31, 2018

dnhatn deleted the soft-deletes branch August 31, 2018 02:11

dnhatn added the backport pending label Aug 31, 2018

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 31, 2018

Revert "Integrates soft-deletes into Elasticsearch (elastic#33222)"

547de71

Revert to correct co-author tags. This reverts commit 6dd0aa5.

dnhatn added a commit that referenced this pull request Aug 31, 2018

Merge branch 'master' into ccr

5330067

* master: Integrates soft-deletes into Elasticsearch (#33222) Revert "Integrates soft-deletes into Elasticsearch (#33222)" Add support for "authorization_realms" (#33262)

dnhatn added a commit that referenced this pull request Aug 31, 2018

Adjust soft-deletes version after backport into 6.5

08b9247

Relates #33222

dnhatn removed the backport pending label Aug 31, 2018

dnhatn mentioned this pull request Sep 2, 2018

Fix from_range in search_after in changes snapshot #33335

Merged

dnhatn mentioned this pull request Sep 20, 2018

Propagate max_auto_id_timestamp in peer recovery #33693

Merged

colings86 removed :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Nov 2, 2018

dnhatn added a commit that referenced this pull request Dec 11, 2018

Enable soft-deletes by default on 7.0.0 or later (#36141)

51800de

This change enables soft-deletes by default on ES 7.0.0 or later. Relates #33222 Co-authored-by: Jason Tedor <[email protected]>

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Integrates soft-deletes into Elasticsearch #33222

Integrates soft-deletes into Elasticsearch #33222

Conversation

dnhatn commented Aug 29, 2018 • edited Loading

elasticmachine commented Aug 29, 2018

dnhatn commented Aug 29, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Aug 29, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Aug 30, 2018 • edited Loading

dnhatn commented Aug 30, 2018

dnhatn commented Aug 30, 2018

dnhatn commented Aug 30, 2018

dnhatn commented Aug 30, 2018

dnhatn commented Aug 30, 2018

dnhatn commented Aug 29, 2018 •

edited

Loading

dnhatn commented Aug 30, 2018 •

edited

Loading