Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not stop the restore process after the "point of no return". #2310

Merged
merged 2 commits into from
Nov 29, 2016

Conversation

pivanof
Copy link
Contributor

@pivanof pivanof commented Nov 29, 2016

Once vttablet shuts down MySQL and starts deleting all files, it won't be able
to recover if the context is cancelled (e.g. because the RPC got cancelled).
That will render the tablet completely unusable. To avoid that let's just not
interrupt the restore process once it passed that point of no return.

Once vttablet shuts down MySQL and starts deleting all files, it won't be able
to recover if the context is cancelled (e.g. because the RPC got cancelled).
That will render the tablet completely unusable. To avoid that let's just not
interrupt the restore process once it passed that point of no return.
@@ -563,11 +563,11 @@ def start_vttablet(

return self.proc

def wait_for_vttablet_state(self, expected, timeout=60.0, port=None):
def _wait_for_vttablet_var(self, var_name, expected, timeout=60.0, port=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already have poll_for_vars in test/utils.py.

After your change, this function sounds to be very similar to the other one? Maybe merge them?
(poll_for_vars is mostly used in worker.py.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that there's also utils.wait_for_tablet_type() that I can use instead of creating a new function. So I just reverted these changes and modified the test to use utils.wait_for_tablet_type().

@alainjobart
Copy link
Contributor

For the record, the longer term fix we had talked about was to change the RPCs to a few RPCs:
XXXStart (starts the action in the background, returns an action UUID).
XXXStatus (streaming RPC that sends progress for a UUID).
XXXCancel (to really interrupt it, whatever the consequences).
and apply it for long-running operations, like Backup and Restore. The added bonus of that approach is to get a progress, and be able to display it in an automation UI (that can re-connect to a running action if restarting a workflow from a checkpoint, and the UUID was check-pointed).

Short of that, this approach looks good to me, after addressing Michael's comments.

@alainjobart
Copy link
Contributor

alainjobart commented Nov 29, 2016

LGTM

Approved with PullApprove

@pivanof pivanof merged commit 075d60b into vitessio:master Nov 29, 2016
@pivanof pivanof deleted the restore_no_stop branch November 29, 2016 20:51
frouioui pushed a commit to planetscale/vitess that referenced this pull request Mar 26, 2024
…socket files are directly used (vitessio#2310)

* cherry pick of 13198

* Fix conflicts

Signed-off-by: Rohit Nayak <[email protected]>

---------

Signed-off-by: Rohit Nayak <[email protected]>
Co-authored-by: Rohit Nayak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants