Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate source of an index in LuceneChangesSnapshot #32288

Merged
merged 5 commits into from
Jul 27, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 23, 2018

Today it's possible to encounter an Index operation in Lucene whose _source is disabled, and _recovery_source was pruned by the MergePolicy. If it's the case, we create a Translog#Index without source and let the caller validate it later. However, this approach is challenging for the caller.

  1. Deletes and No-Ops don't allow invoking "source()" method. The caller has to make sure to call "source()" only on index operations. The current implementation in CCR does not follow this and fail to replica deletes or no-ops.

  2. It's easier to reason if a Translog#Index always has the source.

Today we make _source optional in the LuceneChangesSnapshot then
validate it in ShardChangesAction. This approach, however, prevents
deletes and no-ops from replicating in CCR because accessing _source is
forbidden in Deletes and Noops. Moreover, _source should be required for
an index.

This change checks _source in LuceneChangesSnapshot and fails a read
request if _source is not available.
@dnhatn dnhatn added >bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Jul 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn changed the title CCR: _source should be mandatory for an index CCR: Source should be mandatory for an index Jul 23, 2018
@dnhatn dnhatn changed the title CCR: Source should be mandatory for an index Validate source of an index in LuceneChangesSnapshot Jul 23, 2018
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I asked a question..

@@ -258,10 +258,19 @@ private TopDocs searchOperations(ScoreDoc after) throws IOException {
assert assertDocSoftDeleted(leaf.reader(), segmentDocID) : "Delete op but soft_deletes field is not set [" + op + "]";
} else {
final BytesReference source = fields.source();
if (source == null) {
if (requiredFullRange) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this tied to requiredFullRange so deep?

PS - I thought we were heading towards always requiring full ranges? this make this problem moot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCR always requires full-range but recovery does not require full-range.

why is this tied to requiredFullRange so deep?

Good question. I will think about it.

@bleskes
Copy link
Contributor

bleskes commented Jul 24, 2018 via email

@dnhatn
Copy link
Member Author

dnhatn commented Jul 25, 2018

@dnhatn
Copy link
Member Author

dnhatn commented Jul 26, 2018

@elasticmachine test this please

@dnhatn
Copy link
Member Author

dnhatn commented Jul 27, 2018

@elasticmachine retest this please

@dnhatn
Copy link
Member Author

dnhatn commented Jul 27, 2018

Thanks @s1monw and @bleskes for reviewing

@dnhatn dnhatn merged commit 8474f8a into elastic:ccr Jul 27, 2018
@dnhatn dnhatn deleted the ccr-fix-source-issue branch July 27, 2018 12:16
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jul 31, 2018
* elastic/ccr: (57 commits)
  ShardFollowNodeTask should fetch operation once (elastic#32455)
  Do not expose hard-deleted docs in Lucene history (elastic#32333)
  Tests: Fix convert error tests to use fixed value (elastic#32415)
  IndicesClusterStateService should replace an init. replica with an init. primary with the same aId (elastic#32374)
  REST high-level client: parse back _ignored meta field (elastic#32362)
  [CI] Mute DocumentSubsetReaderTests testSearch
  Reject follow request if following setting not enabled on follower (elastic#32448)
  TEST: testDocStats should always use forceMerge (elastic#32450)
  TEST: avoid merge in testSegmentMemoryTrackedInBreaker
  TEST: Avoid deletion in FlushIT
  AwaitsFix IndexShardTests#testDocStats
  Painless: Add method type to method. (elastic#32441)
  Remove reference to non-existent store type (elastic#32418)
  [TEST] Mute failing FlushIT test
  Fix ordering of bootstrap checks in docs (elastic#32417)
  [TEST] Mute failing InternalEngineTests#testSeqNoAndCheckpoints
  Validate source of an index in LuceneChangesSnapshot (elastic#32288)
  [TEST] Mute failing testConvertLongHexError
  bump lucene version after backport
  Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (elastic#32390)
  ...
dnhatn added a commit that referenced this pull request Aug 2, 2018
Today it's possible to encounter an Index operation in Lucene whose
_source is disabled, and _recovery_source was pruned by the MergePolicy.
If it's the case, we create a Translog#Index without source and let the
caller validate it later. However, this approach is challenging for the
caller.

Deletes and No-Ops don't allow invoking "source()" method. The caller
has to make sure to call "source()" only on index operations. The
current implementation in CCR does not follow this and fail to replica
deletes or no-ops. Moreover, it's easier to reason if a Translog#Index
always has the source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants