Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication Last Error: retry error if it happens after timeout #12114

Merged
merged 4 commits into from
Feb 9, 2023

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Jan 17, 2023

Description

This PR fixes a bug where a VReplication workflow doesn't restart if the same error happened far enough in the past as to be beyond the defined --vreplication_max_time_to_retry_on_error.

See related issue for more details.

Signed-off-by: Rohit Nayak [email protected]

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Jan 17, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jan 17, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@rohit-nayak-ps rohit-nayak-ps removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Feb 7, 2023
@mattlord mattlord self-requested a review February 8, 2023 12:27
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review February 8, 2023 16:09
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meat of this LGTM, I just had some questions about other changes.

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since at this point my questions/requests are only code comment related. This way you can address those code comment related requests as you think best and are unblocked at that point.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a couple more unit tests and comments could help clarify the behavior of lastError, but otherwise LGTM

Signed-off-by: Rohit Nayak <[email protected]>
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you ❤️

@shlomi-noach
Copy link
Contributor

feel free to ignore onlineddl_vrepl for now.

rohit-nayak-ps added a commit that referenced this pull request Mar 27, 2023
…er timeout (#12114) (#12302)

* Retry error if it happens after timeout

Signed-off-by: Rohit Nayak <[email protected]>

* Address review comments

Signed-off-by: Rohit Nayak <[email protected]>

* Address review comments

Signed-off-by: Rohit Nayak <[email protected]>

* Improve unit tests

Signed-off-by: Rohit Nayak <[email protected]>

---------

Signed-off-by: Rohit Nayak <[email protected]>
Co-authored-by: Rohit Nayak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: Vreplication can stop suddenly after an error instead of retrying
3 participants