Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-2579: TestReplicatorCheckpointOnStop flake #6474

Merged
merged 3 commits into from
Sep 27, 2023
Merged

CBG-2579: TestReplicatorCheckpointOnStop flake #6474

merged 3 commits into from
Sep 27, 2023

Conversation

gregns1
Copy link
Contributor

@gregns1 gregns1 commented Sep 27, 2023

CBG-2579

What i've noticed from logs on the test failure:

We seem to be exiting the checkpoint now function early. We see this log line in the failure case -> https://github.com/couchbase/sync_gateway/blob/f7b2ac82b93a040b8533bb2f3f69015eb421ad75/db/active_replicator_checkpointer.go#L238C2-L238C2 but not this log line ->

base.InfofCtx(c.ctx, base.KeyReplicate, "checkpointer: calculated seq: %v", seq)

The only thing that can be forcing this to exit early is the checkpointer not finding a safe sequence. Then I notice in the logs we have expectedSeq list at [1] and processedSeq list empty at []. See below:

13:19:33 2023-09-26T05:18:34.875-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push checkpointer: running -- db.(*Checkpointer).CheckpointNow() at active_replicator_checkpointer.go:238
13:19:33 2023-09-26T05:18:34.875-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push checkpointer: _updateCheckpointLists(expectedSeqs: [1], processedSeqs: map[]) -- db.(*Checkpointer)._updateCheckpointLists() at active_replicator_checkpointer.go:263
13:19:33 2023-09-26T05:18:34.875-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push Inside AddProcessedSeq and context has been cancelled -- db.(*Checkpointer).AddProcessedSeq() at active_replicator_checkpointer.go:125
13:19:33 2023-09-26T05:18:34.875-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push Inside update checkpoint lists -- db.(*Checkpointer)._updateCheckpointLists() at active_replicator_checkpointer.go:264
13:19:33 2023-09-26T05:18:34.875-07:00 [DBG] Replicate+: c:TestReplicatorCheckpointOnStop-push stats reporter goroutine stopped
13:19:33 2023-09-26T05:18:34.876-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push closing blip sender -- db.(*activeReplicatorCommon)._disconnect() at active_replicator_common.go:256
13:19:33 2023-09-26T05:18:34.927-07:00 [TRC] Replicate+: c:TestReplicatorCheckpointOnStop-push closing blip sync context -- db.(*activeReplicatorCommon)._disconnect() at active_replicator_common.go:262

Noticed that the log line Inside AddProcessedSeq and context has been cancelled is after the _updateCheckpointLists logging. I think the callback to update processed sequences isn't firing before the context is cancelled on the active replicator leaving the processed sequences list empty. Thus no checkpoint being created.

Added a wait for the processed sequence count to increase to 1 before test continues.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

@gregns1 gregns1 requested a review from bbrks September 27, 2023 15:24
Copy link
Member

@bbrks bbrks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

This closes up the race for this test, and although technically could be encountered in non-test code, it's not a functional issue beyond a slightly outdated final checkpoint.

I'm OK with this considering the potential complexity of synchronizing the replicator and checkpointer to ensure that all "in-flight" sequences are processed before stopping the replicator and doing the final checkpoint.

@bbrks bbrks merged commit 621641f into master Sep 27, 2023
@bbrks bbrks deleted the CBG-2579 branch September 27, 2023 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants