-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-manager] cannot restart stopped workspace - no backup found
is hidden
#14451
Comments
@sagor999 I still have this ephemeral cluster and can recreate the issue, is it something you want to peek at with me? 🙏 |
I think we have an issue for this already, where sometimes(?) errors from image build are not propogating into workspace error. But yes, that real reason should have been shown and it is a bug.
Was there a previous instance of this workspace that was healthy? Or was this the first instance of the workspace that also failed?
depends. Webapp should GC them. If they are all related to the same workspace, then they should GC faster (I don't remember exact time it takes right now), otherwise last VS for workspace will be alive for 28(?) days.
this is probably the root of the problem here. Please schedule this in, and @jenting or myself will take a look. |
Yes, the original (
Really? Even though your average workspace cluster is ~7 days old, instead of 28? Or, do these get created in "current" workspace clusters when WebApp tries to delete PVC snapshots that were created on "older" clusters? |
If it's a regular workspace, we could GC the older VolumeSnapshot for this workspace, only leaving the newest VolumeSnapshot one. Note that since our VolumeSnapshotClass delete policy is |
@jenting that is a separate enhancement, I think. I created this issue because after stopping my workspace, I was unable to restart it. The most recent restart suggested there was a missing backup. I'm not sure what caused it to go missing. |
That is by design. That is done so that when we GC snapshots, they will be auto removed from GCP. Otherwise as you mentioned we would need to run |
Removed from scheduled groundwork for now, let's treat this as a Day 2 item, pending how things go with PVC running at 10%. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Bug description
I stopped a workspace by going over the ephemeral storage limit, tried to restart the workspace, but cannot.
The error I see is:
Here are logs for my workspace (there were five instances). Here is a related trace with errors.
It looks like
ws-manager
is treating this as an unknown phase?Here is a screenie of what I see as a user:
Steps to reproduce
Not sure, I'm unable to recreate.
The command I used to exceed ephemeral storage is:
I tried in a new workspace, but, was unsuccessful
Workspace affected
https://gitpodio-gitpod-no9dms43jkb.ws-ephemeral-101.gitpod.io/
Expected behavior
The real reason I cannot start the workspace is
no backup found
, I think I should see this?Also, I should be able to start my workspace.
Example repository
n/a
Anything else?
In this case, I was using PVC, the first workspace instance was ws-3ce2b1f6-be70-47d4-a778-77b08ed8d2fe. Here are related details for my VS.
Is it normal to leave behind so many volumesnapshots in the cluster? Should we create a separate issue to GC them?
The text was updated successfully, but these errors were encountered: