Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use delCount of SegmentInfos to calculate numDocs #36323

Merged
merged 1 commit into from
Dec 9, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 6, 2018

Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.

Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.

Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.

Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.
@dnhatn dnhatn added >enhancement v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.6.0 labels Dec 6, 2018
@dnhatn dnhatn requested review from jpountz and s1monw December 6, 2018 18:19
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Dec 9, 2018

Thanks @s1monw and @jpountz.

@dnhatn dnhatn merged commit d41cf6a into elastic:master Dec 9, 2018
@dnhatn dnhatn deleted the livedocs branch December 9, 2018 02:24
dnhatn added a commit that referenced this pull request Dec 9, 2018
Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.

Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.
jasontedor added a commit to liketic/elasticsearch that referenced this pull request Dec 9, 2018
* elastic/6.x: (37 commits)
  [HLRC] Added support for Follow Stats API (elastic#36253)
  Exposed engine must have all ops below gcp during rollback (elastic#36159)
  TEST: Always enable soft-deletes in ShardChangesTests
  Use delCount of SegmentInfos to calculate numDocs (elastic#36323)
  Add soft-deletes upgrade tests (elastic#36286)
  Remove LocalCheckpointTracker#resetCheckpoint (elastic#34667)
  Option to use endpoints starting with _security (elastic#36379)
  [CCR] Restructured QA modules (elastic#36404)
  RestClient: on retry timeout add root exception (elastic#25576)
  [HLRC] Add support for put privileges API (elastic#35679)
  HLRC: Add rollup search (elastic#36334)
  Explicitly recommend to forceMerge before freezing (elastic#36376)
  Rename internal repository actions to be internal (elastic#36377)
  Core: Remove parseDefaulting from DateFormatter (elastic#36386)
  [ML] Prevent stack overflow while copying ML jobs and datafeeds (elastic#36370)
  Docs: Fix Jackson reference (elastic#36366)
  [ILM] Fix issue where index may not yet be in 'hot' phase (elastic#35716)
  Undeprecate /_watcher endpoints (elastic#36269)
  Docs: Fix typo in bool query (elastic#36350)
  HLRC: Add delete template API (elastic#36320)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants