Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exact numDocs in synced-flush and metadata snapshot #30228

Merged
merged 20 commits into from
May 15, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 28, 2018

Since #29458, we use a searcher to calculate the number of documents for a commit stats. Sadly, that approach is flawed. The searcher might no longer point to the last commit if it's refreshed. As synced-flush requires an exact numDocs to work correctly, we have to exclude all soft-deleted docs.

This commit makes synced-flush stop using CommitStats but read an exact numDocs directly from an index commit.

Relates #29458
Relates #29530

Since elastic#29458, we use a searcher to calculate the number of documents for
a commit stats. Sadly, that approach is flawed. The searcher might no
longer point to the last commit if it's refreshed.

This commit uses SoftDeletesDirectoryReaderWrapper to exclude the
soft-deleted documents from numDocs in a SegmentInfos. I chose to modify
the method Luence#getNumDocs so that we can read a store metadata
snapshot correctly without opening an engine.

Relates elastic#29458
@dnhatn dnhatn added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 28, 2018
@dnhatn dnhatn requested a review from s1monw April 28, 2018 03:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

dnhatn added 2 commits April 28, 2018 19:38
It might be incorrect if forceMerge happens
@dnhatn
Copy link
Member Author

dnhatn commented Apr 29, 2018

@elasticmachine test this please

@bleskes
Copy link
Contributor

bleskes commented Apr 30, 2018

I wonder if we should do this. Classically number of docs was always total docs in the segments. I'm not saying this isn't debatable but I'm also not sure it merits the breakage price. Also we want this to go into 6.x. Instead of this, maybe we should another field to the stats to reflect soft deletes? then sycned flush is freed to do what it thinks is right and we have more information.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment

*/
public static int getNumDocs(SegmentInfos info) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this like it is. We should rather, in the engine do this conditionally if we have soft-deletes enabled. Also please make sure that we only load this once and cache it. We don't commit very often and commit stats are fetched rarely we can do this when it's needed and cache it per commit.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2018

@bleskes and @simonw

I understand your concerns. However, we get numDocs from a commit not only when sealing an index but also when verifying sync_id in peer recovery. In peer recovery, we read numDocs directly from Store as a target shard has not opened its engine yet. Thus, fixing this in an engine might not be enough.

numDocs = Lucene.getNumDocs(segmentCommitInfos);

Since we don't seal an index if numDocs are different, is it ok to remove the numDocs checking at recovery time?

@bleskes
Copy link
Contributor

bleskes commented Apr 30, 2018

@dnhatn Simon will be a better judge, but I think we need a better utility method like Lucene. getNumDocs that takes soft deletes into account? we can work directly on the index for this?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2018

I think we need a better utility method like Lucene. getNumDocs that takes soft deletes into account?

@bleskes Yes, we should do it.

@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

@s1monw I take a different approach for this. Can you please have a look? Thank you.

@dnhatn dnhatn requested review from bleskes and s1monw May 1, 2018 02:26
@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

@elasticmachine retest this please

dnhatn added a commit that referenced this pull request May 2, 2018
@s1monw
Copy link
Contributor

s1monw commented May 3, 2018

I am ok with the changing the semantics here. I think we try to make something work that isn't necessarily giving us enough information. I think we should keep it the way it is and deprecate it in 6.x? I really wonder what the purpose of this stats is compared to other stats that have a live numDocs that is accurate?

@s1monw
Copy link
Contributor

s1monw commented May 3, 2018

Double checking - the way it is in master or in the PR? can you clarify?

the way it is in master

other stats that have a live numDocs that is accurate?

I mean for instance docStats here: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html

dnhatn added 2 commits May 8, 2018 20:28
# Conflicts:
#	server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
dnhatn added a commit that referenced this pull request May 10, 2018
@dnhatn
Copy link
Member Author

dnhatn commented May 10, 2018

@s1monw and @bleskes,

I've updated the PR to restore the CommitStats and made synced-flush using an exact numDocs only when soft-deletes is enabled. There are still some issues that I am not sure.

  1. With this change, only the synced-flush call uses an exact numDocs. Should we still need to cache this value in an engine?

  2. In peer-recovery, a replica shard needs to send an exact numDocs to the primary shard if soft-deletes is enabled and an index was sealed. I see two possible approaches:

  • Pass an extra parameter "requiredExactNumDocs" in getStoreMetadataSnapshot. Internally we always calculate an exact numDocs if this parameter is true. This does not optimize the case when the index was not sealed and may cause confusion for other StoreMetadataSnapshot's usages.

  • Override the numDocs by an exact numDocs when the soft-deletes is true and an index was sealed. This is not nice but the change is minimal and does not affect StoreMetadataSnapshot's other usages.

Please have a look and let me know your thought. Thank you!

// The primary shard may need the "exact" numDocs to verify if the commit has syncId.
final boolean softDeleteEnabled = recoveryTarget.indexShard().indexSettings().isSoftDeleteEnabled();
if (softDeleteEnabled && Strings.hasText(snapshot.getSyncId())) {
final List<IndexCommit> commits = DirectoryReader.listCommits(recoveryTarget.store().directory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to use safe commit. This is the commit the engine will use (probably an existing bug) .

@bleskes
Copy link
Contributor

bleskes commented May 10, 2018

Thx @dnhatn . I like where this is going. IMO neither synced flush nor recover are frequent operations. We should just use exact numbers all the time and everywhere. keep it simple. if soft deletes are not enabled, it's the same as the "fast" numDocs (and we can assert for that).

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* Unlike {@link #getNumDocs(SegmentInfos)} this method returns a numDocs that always excludes soft-deleted docs.
* This method is expensive thus prefer using {@link #getNumDocs(SegmentInfos)} unless an exact numDocs is required.
*/
public static int getExactNumDocs(IndexCommit commit) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

final SegmentInfos segmentInfos = Lucene.readSegmentInfos(commitRef.getIndexCommit());
final int numDocs;
if (indexShard.indexSettings().isSoftDeleteEnabled()) {
numDocs = Lucene.getExactNumDocs(commitRef.getIndexCommit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call the getExactNumDocs all the time. it has no overhead if soft_deletes are not there

@@ -341,7 +342,8 @@ public void phase1(final IndexCommit snapshot, final Supplier<Integer> translogO
recoverySourceSyncId.equals(recoveryTargetSyncId);
if (recoverWithSyncId) {
final long numDocsTarget = request.metadataSnapshot().getNumDocs();
final long numDocsSource = recoverySourceMetadata.getNumDocs();
final boolean softDeletesEnabled = shard.indexSettings().isSoftDeleteEnabled();
final long numDocsSource = softDeletesEnabled ? Lucene.getExactNumDocs(snapshot) : recoverySourceMetadata.getNumDocs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here just use getExactNumDocs

dnhatn added 2 commits May 14, 2018 11:02
# Conflicts:
#	server/src/main/java/org/elasticsearch/indices/flush/SyncedFlushService.java
@dnhatn
Copy link
Member Author

dnhatn commented May 14, 2018

@elasticmachine test this please

@dnhatn
Copy link
Member Author

dnhatn commented May 15, 2018

We should just use exact numbers all the time and everywhere. keep it simple.

@bleskes I've updated this PR to always have an exact numDocs when capturing the store metadata snapshot. Would you please have a look? Thank you!

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nhat.

@dnhatn
Copy link
Member Author

dnhatn commented May 15, 2018

Thanks @bleskes and @s1monw

@dnhatn dnhatn changed the title Exclude soft-deleted documents in commit stats Use exact numDocs in synced-flush and store metadata snapshot May 15, 2018
@dnhatn dnhatn changed the title Use exact numDocs in synced-flush and store metadata snapshot Use exact numDocs in synced-flush and metadata snapshot May 15, 2018
@dnhatn dnhatn merged commit b12c2f6 into elastic:ccr May 15, 2018
@dnhatn dnhatn deleted the fix-commit-stats branch May 15, 2018 22:00
dnhatn added a commit that referenced this pull request May 23, 2018
This test suite fails due to Lucene#getExactNumDocs.

Relates #30228
dnhatn added a commit that referenced this pull request May 30, 2018
Since #29458, we use a searcher to calculate the number of documents for
a commit stats. Sadly, that approach is flawed. The searcher might no
longer point to the last commit if it's refreshed. As synced-flush
requires an exact numDocs to work correctly, we have to exclude all
soft-deleted docs.

This commit makes synced-flush stop using CommitStats but read an exact
numDocs directly from an index commit.

Relates #29458
Relates #29530
dnhatn added a commit that referenced this pull request Jun 5, 2018
This PR adapts/utilizes recent enhancements in Lucene-7.4:

- Replaces exactNumDocs by the soft-deletes count in SegmentCommitInfo.
This enhancement allows us to back out changes introduced in #30228.

- Always configure the soft-deletes field in IWC
dnhatn added a commit that referenced this pull request Jun 5, 2018
This PR adapts/utilizes recent enhancements in Lucene-7.4:

- Replaces exactNumDocs by the soft-deletes count in SegmentCommitInfo.
This enhancement allows us to back out changes introduced in #30228.

- Always configure the soft-deletes field in IWC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants