-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error handling for initial delta snapshot #165
Fix error handling for initial delta snapshot #165
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we call TakeFullSnapshot
followed by a TakeFullSnapshotAndResetTimer
if the first call returns error instead of a single call to TakeFullSnapshotAndResetTimer
?
bbed3e5
to
02e74a7
Compare
@swapnilgm @amshuman-kr I have addressed the review comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank for addressing the changes. I have added one more small suggestion. Please address it.
Signed-off-by: Shreyas Rao <[email protected]>
02e74a7
to
36bd479
Compare
@swapnilgm I've addressed the review comment. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Shreyas Rao [email protected]
What this PR does / why we need it:
This PR fixes the error handling while taking initial delta snapshot. Currently, if the initial delta snapshot fails, we retry the etcd probe and try taking the initial delta snapshot again, repeatedly until it succeeds. In the case where backup sidecar had already taken some snapshots and then stopped for some time while etcd is still running and is also compacted in that interval, when sidecar starts again, there's a chance the watch for initial delta snapshot will fail because the latest revision from the snapstore is not available anymore on etcd (as it was compacted). This throws backup sidecar into an infinite failure loop.
This PR fixes this behavior by taking a full snapshot on an initial delta snapshot error, instead of retrying delta snapshot again. Error handling for the full snapshot is as expected, ie, it will be retried after subsequent etcd probe.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Release note: