Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failures of 'CheckReplOplog' in *_passthrough tests suites #108

Open
igorsol opened this issue Oct 5, 2017 · 7 comments
Open

Random failures of 'CheckReplOplog' in *_passthrough tests suites #108

igorsol opened this issue Oct 5, 2017 · 7 comments

Comments

@igorsol
Copy link
Contributor

igorsol commented Oct 5, 2017

Hello,

This issue affects all passthrough test suites. Here is the list (probably incomplete):

  • read_concern_majority_passthrough
  • aggregation_read_concern_majority_passthrough
  • replica_sets_jscore_passthrough
  • read_concern_linearizable_passthrough

This issue also affects both 3.4 and master branches. You can see example failures here (see the red lines in the left column on the pages):

Our investigation has shown that this issue is caused by this commit: 422336 by @AdallomRoy

We also have a possible fix here: #107
We test this fix now and so far our tests are good but it would be great to have more opinions and reviews.

@igorsol igorsol changed the title Random failures in of 'CheckReplOplog' in *_passthrough tests suites Random failures of 'CheckReplOplog' in *_passthrough tests suites Oct 5, 2017
@AdallomRoy
Copy link

Hi Igor,
Great catch!
Doesn't this patch basically remove the optimization for backward cursors? or am I missing something?

Roy.

@igorcanadi
Copy link
Contributor

@AdallomRoy Yes, I think this just removes the optimization for the backward cursors. However, master branch of mongo reimplements oplog visibility rules significantly, so this code-path will require big rewrite soon anyway. See #106

@AdallomRoy
Copy link

Got it. Probably the better fix would be that the backward cursor would check if the record is currently visible in terms of oplog visibility... but not sure it's critical.
Can you also merge it to v3.2? Thanks!

@igorcanadi
Copy link
Contributor

Done

@igorsol
Copy link
Contributor Author

igorsol commented Oct 11, 2017

Hi Roy and Igor,

You are right, my fix disabled optimisation for backward cursors.
But since we still want this optimisation for v3.4 and v3.2 I did additional debugging and found what is going wrong with backward cursors:

During the tests sometimes save()/restore() sequence is called on just created backward cursor BEFORE first call to next(). This causes lost of _skipNextAdvance state which should be true in case of oplog first record retrieval hack. Thus when next() is called redundant advance happens and cursor loses one document.

The fix for this is here: igorsol@031488e
It re-enables first record retrieval hack for backward cursors and fixes CheckReplOplog issues.
I will create pull request after additional tests.

@igorcanadi
Copy link
Contributor

Good find, thanks @igorsol !

@igorsol
Copy link
Contributor Author

igorsol commented Oct 17, 2017

Our testing has finally finished. Pull request is here: #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants