Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CCR] Read changes from Lucene instead of translog #30120

Merged
merged 24 commits into from
May 9, 2018

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Apr 25, 2018

This change cuts over from translog to Lucene changes history in CCR component.
The autoGeneratedIdTimestamp will be handled in a follow-up.

@martijnvg martijnvg added review :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Apr 25, 2018
@martijnvg martijnvg requested a review from s1monw April 25, 2018 09:37
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

return versionDvField.longValue();
}

private static boolean isDeleteOperation(LeafReaderContext leafReaderContext, int segmentDocId) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct I think we are blocked here unitl we can identify tombstones

@dnhatn
Copy link
Member

dnhatn commented May 3, 2018

@martijnvg Discussed with @jasontedor, I will continue your work here.

@dnhatn dnhatn self-assigned this May 7, 2018
@dnhatn dnhatn requested review from bleskes, s1monw and jasontedor May 7, 2018 02:14
@dnhatn
Copy link
Member

dnhatn commented May 7, 2018

@martijnvg I've backed out the mapping logic to make this PR as a cut-over from translog to Lucene ops. We can make that in a follow-up. Thank you.

@dnhatn dnhatn changed the title [CCR] use IndexSearcher to read operations from Lucene index instead of using the translog [CCR] Read changes from Lucene instead of translog May 7, 2018
@dnhatn dnhatn added the >feature label May 7, 2018
@martijnvg
Copy link
Member Author

@dnhatn No problem, I can make that change in a different pr.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I left some initial comments and questions

if (lastRefreshedCheckpoint() < maxSeqNo) {
refresh(source, SearcherScope.INTERNAL);
}
refresh(source, SearcherScope.INTERNAL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a mistake?

@@ -2388,4 +2404,28 @@ public long softUpdateDocuments(Term term, Iterable<? extends Iterable<? extends
return super.softUpdateDocuments(term, docs, softDeletes);
}
}

/**
* Returned the maximum local checkpoint value has been refreshed internally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the last local checkpoint

@Override
public void afterRefresh(boolean didRefresh) {
if (didRefresh) {
refreshedCheckpoint.getAndUpdate(prev -> Math.max(prev, pendingCheckpoint));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unsafe? you make capture things that didn't make it into the reader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only mark seq# as completed after adding its op to Lucene and RefreshListener is notified serially under lock. I think it's safe but we need to discuss to make sure that we won't add something unsafe here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this blocks ongoing indexing? I would definitely double check with @s1monw that we want to rely on this semantics (if it is the case). IMO we should keep it simple and just pre-capture the local checkpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes Yeah, I think I made it too complicated. I replaced getAndUpdate by set.

this.lastSeenSeqNo = fromSeqNo - 1;
this.requiredFullRange = requiredFullRange;
boolean success = false;
final Engine.Searcher engineSearcher = searcherFactory.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need supplier? can we just give the searcher as a param?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I see why, it's for closing. I think it's still to pass in a search and close it on exception as you did now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I passed an engine searcher directly.

final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);
final Sort sortedBySeqNoThenByTerm = new Sort(
new SortedNumericSortField(SeqNoFieldMapper.NAME, SortField.Type.LONG),
new SortedNumericSortField(SeqNoFieldMapper.PRIMARY_TERM_NAME, SortField.Type.LONG, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed - this should be needed in the future. Maybe we should remove it and instead assert that we never have duplicate seq#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I miss something here because I think we need it for now but not in the future after we have a Lucene rollback. I will reach out to discuss this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I dropped a not in my comment. "this should not be needed in the future." . It's only relevant in cases where the primary dies while indexing is ongoing and we have more than 1 replica. In these cases this primary sort doesn't help because you also need some kind of a deduping mechanism to realy make it work. Such deduping is fairly easy to implement but I'm on the fence to whether we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have dedup in this PR already (line 161-163). The lastSeenSeqNo is used for dedup and range check. I am fine to remove the primary sort and dedup mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I missed it. I think it's surprising to put it in readDocAsOp and shortcut. I'd prefer to do it in next where do all our state updates and then everything together. it's rare anyway and doesn't require optimization imo. That said, it's all nits. If you prefer it otherwise I'm good. Thanks for clarifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should not mutate anything in readDocAsOp. I will update this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes I moved this to next but we also need to dudup for nested docs then I moved this to readDocAsOp again. I think we should optimize for nested docs. I am open to suggestions here.

@Override
public Translog.Operation next() throws IOException {
final Translog.Operation op = nextOp();
if (requiredFullRange && lastSeenSeqNo < toSeqNo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we to check here that lastSeenSeqNo is < toSeqNo? shouldn't we stop reading before this happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to assert that seqNo != lastSeeSeqNo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller should continue consuming the snapshot until the next method returns null. In the last call, lastSeenSeqNo equals to toSeqNo and op is null. This guard is added to avoid checking in this case. I am +1 on the assertion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - you check for op==null later on? maybe just put the op!=null check on this outer if?

final Translog.Operation op;
final boolean isTombstone = isTombstoneOperation(leaf, segmentDocID);
if (isTombstone && fields.uid() == null) {
op = new Translog.NoOp(seqNo, primaryTerm, ""); // TODO: store reason in ignored fields?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to say yes? It's very rare and it feels like a good debugging tool. I wonder what other people think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it in a follow-up.

@dnhatn dnhatn requested a review from bleskes May 9, 2018 03:12
return op;
}

private boolean isTombstoneOperation(LeafReaderContext leaf, int segmentDocID) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should just take a LeafReader?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we want to pull the tombstoneDV in the ctor next to List<LeafReaderContext> leaves and a List<NumericDocValues> for seqIds... I think this would be nice and prevent getting stuff from the reader over and over again.

return false;
}

private long readNumericDV(LeafReaderContext leaf, String field, int segmentDocID) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should just take a LeafReader?

@@ -610,7 +609,7 @@ protected static void assertVisibleCount(InternalEngine engine, int numDocs, boo
default:
throw new UnsupportedOperationException("unknown version type: " + versionType);
}
if (randomBoolean()) {
if (true || randomBoolean()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a left over?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I removed it.

final Translog.Operation op;
final boolean isTombstone = isTombstoneOperation(leaf, segmentDocID);
if (isTombstone && fields.uid() == null) {
op = new Translog.NoOp(seqNo, primaryTerm, ""); // TODO: store reason in ignored fields?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool change! I left some ideas but LGTM overall

this.onClose = engineSearcher;
success = true;
} finally {
if (success == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be handled on the caller side? We are not responsible for this reference to engine searcher unless fully constructued?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

private boolean isTombstoneOperation(LeafReaderContext leaf, int segmentDocID) throws IOException {
final NumericDocValues tombstoneDV = leaf.reader().getNumericDocValues(SeqNoFieldMapper.TOMBSTONE_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can pull all these in the constructor into an array that we can access by index of the leaf reader. this is how we do things in lucene for stuff we access frequently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw
I tried but realized that NumericDocValues#advanceExact method requires increasing docID values but it's not the case here. Do you have any suggestion for this?

  /** Advance the iterator to exactly {@code target} and return whether
   *  {@code target} has a value.
   *  {@code target} must be greater than or equal to the current
   *  {@link #docID() doc ID} and must be a valid doc ID, ie. &ge; 0 and
   *  &lt; {@code maxDoc}.
   *  After this method returns, {@link #docID()} retuns {@code target}. */
  public abstract boolean advanceExact(int target) throws IOException;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to reset the DV :)

@dnhatn
Copy link
Member

dnhatn commented May 9, 2018

@s1monw I've updated the snapshot to cache/reload docValues. Can you please have a look? Thank you!

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment but LGTM in general

private NumericDocValues tombstoneDV;

CombinedDocValues(LeafReader leafReader) {
this.leafReader = leafReader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't load stuff lazily. go and load it all in the ctor. they are in memory anyways.

}
}

private static final NumericDocValues EMPTY_DOC_VALUES = new NumericDocValues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unnecessary.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit. No need for another review.

@@ -223,76 +224,57 @@ private boolean assertDocSoftDeleted(LeafReader leafReader, int segmentDocId) th
private NumericDocValues primaryTermDV;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can all be final?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw Sadly no. We sometimes need to reload these DocValues if the targeting docId is smaller than the current docId.

if (seqNoDV.docID() > segmentDocId) {
   seqNoDV = leafReader.getNumericDocValues(SeqNoFieldMapper.NAME)
}

Do you have any other idea for this?

@dnhatn
Copy link
Member

dnhatn commented May 9, 2018

Thanks @martijnvg for the great initial work and @bleskes and @s1monw for helpful reviews.

@dnhatn dnhatn merged commit bb6586d into elastic:ccr May 9, 2018
dnhatn pushed a commit that referenced this pull request May 10, 2018
This commit adds an API to read translog snapshot from Lucene,
then cut-over from the existing translog to the new API in CCR.

Relates #30086
Relates #29530
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 11, 2018
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 14, 2018
* es/ccr: (78 commits)
  Upgrade to Lucene-7.4-snapshot-6705632810 (elastic#30519)
  add version compatibility from 6.4.0 after backport, see elastic#30319 (elastic#30390)
  Security: Simplify security index listeners (elastic#30466)
  Add proper longitude validation in geo_polygon_query (elastic#30497)
  Remove Discovery.AckListener.onTimeout() (elastic#30514)
  Build: move generated-resources to build (elastic#30366)
  Reindex: Fold "with all deps" project into reindex (elastic#30154)
  Isolate REST client single host tests (elastic#30504)
  Solve Gradle deprecation warnings around shadowJar (elastic#30483)
  SAML: Process only signed data (elastic#30420)
  Remove BWC repository test (elastic#30500)
  Build: Remove xpack specific run task (elastic#30487)
  AwaitsFix IntegTestZipClientYamlTestSuiteIT#indices.split tests
  Enable soft-deletes in v6.4
  LLClient: Add setJsonEntity (elastic#30447)
  [CCR] Read changes from Lucene instead of translog (elastic#30120)
  Expose CommonStatsFlags directly in IndicesStatsRequest. (elastic#30163)
  Silence IndexUpgradeIT test failures. (elastic#30430)
  Bump Gradle heap to 1792m (elastic#30484)
  [docs] add warning for read-write indices in force merge documentation (elastic#28869)
  ...
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 29, 2018
This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

1. Replace hard-deletes by soft-deletes in InternalEngine
2. Use _recovery_source if _source is disabled or modified (elastic#31106)
3. Soft-deletes retention policy based on the global checkpoint (elastic#30335)
4. Read operation history from Lucene instead of translog (elastic#30120)
5. Use Lucene history in peer-recovery (elastic#30522)

These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <[email protected]>
Co-authored-by: Boaz Leskes <[email protected]>
Co-authored-by: Jason Tedor <[email protected]>
Co-authored-by: Martijn van Groningen <[email protected]>
Co-authored-by: Nhat Nguyen <[email protected]>
Co-authored-by: Simon Willnauer <[email protected]>
dnhatn added a commit that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)

Relates #30086
Closes #29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand [email protected]
Co-authored-by: Boaz Leskes [email protected]
Co-authored-by: Jason Tedor [email protected]
Co-authored-by: Martijn van Groningen [email protected]
Co-authored-by: Nhat Nguyen [email protected]
Co-authored-by: Simon Willnauer [email protected]
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (elastic#31106)
- Soft-deletes retention policy based on the global checkpoint (elastic#30335)
- Read operation history from Lucene instead of translog (elastic#30120)
- Use Lucene history in peer-recovery (elastic#30522)

Relates elastic#30086
Closes elastic#29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <[email protected]>
Co-authored-by: Boaz Leskes <[email protected]>
Co-authored-by: Jason Tedor <[email protected]>
Co-authored-by: Martijn van Groningen <[email protected]>
Co-authored-by: Nhat Nguyen <[email protected]>
Co-authored-by: Simon Willnauer <[email protected]>
dnhatn added a commit that referenced this pull request Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)

Relates #30086
Closes #29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand <[email protected]>
Co-authored-by: Boaz Leskes <[email protected]>
Co-authored-by: Jason Tedor <[email protected]>
Co-authored-by: Martijn van Groningen <[email protected]>
Co-authored-by: Nhat Nguyen <[email protected]>
Co-authored-by: Simon Willnauer <[email protected]>
@weizijun
Copy link
Contributor

weizijun commented Nov 4, 2019

@martijnvg @dnhatn Excuse me, What is the reason of "Read changes from Lucene instead of translog"?

@jasontedor
Copy link
Member

@weizijun The read patterns of CCR effectively require random access to the translog, which it doesn’t support. Without random access, reading changes was too slow. We could bolt random access on top of the translog, but then it over-complicates the translog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants