-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-manager] gracefully shuts down workspace, leaving behind bound PVCs, avoiding backup of user data, after unknown event #14266
Comments
As you mentioned, this could be related to several ws-man restarts in that cluster due to issues with it. |
Thanks @sagor999 ! I updated the description's steps to recreate to consider. I also updated the Anything Else to hint that the backup timeout is still 1h with PVC, and that we should reconsider that. For example, it could also explain why the PVCs were left dangling - I'm not sure if we log that the 1h timeout was hit. Can you check? |
@jenting can you focus on this issue? It is new (created on Friday), and seems like it'll impact our ability to upload user data in a timely manner, potentially causing an odd experience for workspace restarts (when snapshot has not been done yet). |
A similar issue as #13282.
So, we recreate the start workspace request, and the 2nd workspace pod fails to start, correct? |
@jenting good observation, although, I see the same failures logged here too. Does a PVC already exist when we force delete the pod? Would it make sense to also force delete the PVC, and create a new one as part of the 2nd startWorkspace request? |
@jenting another consideration for this issue, is that the startWorkspace request could be for an existing workspace, rather than a new one. In the case of an existing workspace, what is the user experience like? For example, I assume the 2nd workspace never starts. But, if I try restarting the stopped workspace, is my data restored from the PVC snapshot? |
Close this issue because it's duplicate as #13282 |
Bug description
In other words, we don't experience data loss, but, the pod stops gracefully, and when the user starts the workspace again, they would not have their data...even though we have it in a PV.
I tried deleting
us72
, but could not because there were two dangling PVC:For the first...given workspace logs and this workspace trace:
Steps to reproduce
This could be because of:
So, either:
or maybe
Stop a bunch of workspaces, and while they're stopping (before, during, and after snapshot) stop
ws-manager
.or maybe
timeout after an hour
Workspace affected
gitpodio-templatetypesc-qxnleu3pzu4
Expected behavior
There are a few things:
Questions:
Example repository
No response
Anything else?
We currently stop trying to backup after a 1h timeout. This was a design decision for object storage based backups, and should be revisited as part of PVC.
The text was updated successfully, but these errors were encountered: