Epic: Ensure durability for user workspace files #7901

kylos101 · 2022-01-28T18:09:42Z

Summary

Better protect user data

Context

Sometimes a workspace, node, or workspace cluster fail and the user data cannot be backed up to cloud storage, resulting in data loss. A related incident for a global outage. A related RFC where we are discussing solutions.

Value

By better handling user data, users will trust that even if the Gitpod service is unavailable, once it is online, they will not lose data.

Acceptance criteria

User data is persisted in such a way that even if there is a workspace, node, or cluster failure, the data is accessible to be backed up at a later time.

Tasks

Ops:

Workspace Preview (should be done first)
- automate GCP service account creation for CSI driver to function
- automate deployment of GCP CSI driver into cluster as part of cluster creation operation
  automate deployment of GCP storageClasses as part of cluster creation operation ~~(specify discard mount option)~~
- automate deployment of snapshotter CRD and controller deployment as part of cluster creation operation, validate snapshotter is snapshots and we can create PVC from snapshot
- gitpod-io/workspace-preview#58
Preview Environment
Jobs for workspace-clusters
- gitpod-io/ops#4436
- [garbage collection] have a simple way to cleanup ephemeral cluster generated snapshot by loadgen #12327

Design:

List impacted components and visualize flows in the RFC
Double check the estimate for cost impacts
Compare with other DD, to be consistent / fill in gaps
Investigate if possible to improve PVC mount time #9054

Product changes:

Tests:

Bug

[ws-manager] gracefully shuts down workspace, leaving behind bound PVCs, avoiding backup of user data, after unknown event #14266

Should solve:

Day 2:

Front conversations

The text was updated successfully, but these errors were encountered:

atduarte · 2022-01-31T18:18:49Z

~~@kylos101 Few questions related to "users must be able to access their most recent backup for a workspace regardless of workspace status":~~
~~1. During the stopping state, would the system be able to distinguish a backup that was done as a result of it from a previous one?~~
~~2. From what I understand/recall we store the last 4 backups. Would we be able to provide the WebApp with the links and corresponding timestamps of all of them?~~

aledbf · 2022-03-28T13:17:50Z

automate deployment of GCP storageClasses as part of cluster creation operation (specify discard mount option)

this is not required for XFS.

aledbf · 2022-03-28T13:19:27Z

installer: allow to specify storageClass in gitpod.yaml

this can be optional for the first iteration

kylos101 · 2022-04-06T15:17:46Z

@sagor999 as a heads up, I added a few observability tasks. One of the first ones we'll need (if it doesn't already exist) is the ability to inspect backups and restores now being done with TAR. For example, this way we can measure duration for both.

kylos101 · 2022-06-01T18:46:03Z

@sagor999 @jenting are there any more integration tests that need to be added for new code we've written? In other words, I see you've fixed existing tests, but wanted to double check for new test needs. For example, one test I can think of, would be a test that kills a pod, relies on a process to backup the orphaned PVC, and then assert that the PVC is gone (because it was snapshotted).

axonasif · 2022-11-11T15:46:12Z

Question: How would someone who ran out of hours get their data back? (re: #14393)
Contact support? It'd be better if they could self-serve.

SNWCreations · 2022-11-12T04:41:33Z

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?)
Sometimes, I need to update my artifact on another server by downloading the artifact from Gitpod server and upload it to my server manually.

svenefftinge · 2022-11-14T14:07:44Z

Question: Will this change prevent us to download a single file in the workspace? (Will the "Download..." button in the right-click menu of a file still available?)

No, this is about downloading the workspace content backup. You can still download individual files from your running workspace depending on how you connect to it. E.g. with Vs Code, just drag and drop.

6uliver · 2022-11-18T14:05:39Z

Maybe this issue should be part of this epic to not lose my workspace's content on a regular basis: #11183

atduarte · 2022-11-24T09:04:18Z

Update:
Blocker functional issues, and significantly increased workspace startup times were found on the current technical design. 😞

After internal discussions, given backup success ratio is high and stable following adjacent improvements, and that the implementation of the new design will be considerably faster to do after #11416, we have decided to pause this effort until then.

PS: @6uliver I believe the root cause of that issue is different from the context of this one. I will follow-up on that one there. 🙏

stale · 2023-09-16T21:57:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions · 2024-08-22T15:04:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kylos101 added team: workspace Issue belongs to the Workspace team type: epic labels Jan 28, 2022

atduarte changed the title ~~Epic: better handle user data~~ Epic: Ensure durability and availability of user workspace files Jan 31, 2022

kylos101 changed the title ~~Epic: Ensure durability and availability of user workspace files~~ Epic: Ensure durability user workspace files Jan 31, 2022

atduarte mentioned this issue Feb 2, 2022

Allow users to always download the latest available workspace backup #7973

Closed

kylos101 changed the title ~~Epic: Ensure durability user workspace files~~ Epic: Ensure durability for user workspace files Feb 2, 2022

kylos101 moved this to Scheduled in 🌌 Workspace Team Mar 9, 2022

sagor999 mentioned this issue Mar 28, 2022

Opening old workspaces stuck on pulling container image #8198

Closed

kylos101 moved this from Scheduled to In Progress in 🌌 Workspace Team Mar 28, 2022

kylos101 mentioned this issue Mar 28, 2022

Failed to download OTS in US cluster (possibly happens for prebuilds, only) #8096

Closed

sagor999 mentioned this issue Mar 29, 2022

Opening workspace takes a very long time, when prebuild generate big artifacts #7002

Closed

atduarte self-assigned this Mar 31, 2022

sagor999 mentioned this issue Apr 21, 2022

Epic: Full Workspace Backup #7719

Open

sagor999 mentioned this issue Apr 28, 2022

Please increase the maximum allowed workspace size to 50GB #6243

Closed

princerachit mentioned this issue May 3, 2022

Prebuilds fail with error "cannot initialize workspace: prebuild initializer: git fetch -p -P . tags -f failed" #9280

Closed

4 tasks

This was referenced May 16, 2022

Epic: Workspace component integration tests #8799

Closed

Add support for backing up and restoring workspaces from snapshot volumes #9475

Merged

sagor999 mentioned this issue May 23, 2022

Add snapshot controller support into dev environment #10197

Closed

kylos101 mentioned this issue Jun 12, 2022

ws-manager: ensure it can reconcile orphaned PVCs without VolumeSnapshot is created #10531

Closed

This was referenced Oct 3, 2022

[PVC] unable to open the workspace with download initializer #13531

Closed

[PVC] integration test for testing .gitpod.yml using incorrect repo #13591

Closed

atduarte mentioned this issue Nov 2, 2022

PVC: Deprecate the download feature #14364

Closed

6 tasks

This was referenced Nov 3, 2022

Remove download workspace button from the workspaces list and admin dashboard #14393

Merged

Unable to Download Workspace #11816

Closed

Download workspace shows error while any workpace is running for the first time #4977

Closed

6uliver mentioned this issue Nov 18, 2022

[content-service] cannot restart stopped workspace #11183

Closed

This was referenced Nov 22, 2022

test: disable the PVC integration test #14838

Merged

test: disable PVC integration test #14906

Merged

atduarte removed the status in 🌌 Workspace Team Nov 24, 2022

stale bot added the meta: stale This issue/PR is stale and will be closed soon label Sep 16, 2023

github-actions bot removed the meta: stale This issue/PR is stale and will be closed soon label May 23, 2024

github-actions bot added the meta: stale This issue/PR is stale and will be closed soon label Aug 22, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Ensure durability for user workspace files #7901

Epic: Ensure durability for user workspace files #7901

kylos101 commented Jan 28, 2022 •

edited

Loading

atduarte commented Jan 31, 2022 •

edited

Loading

aledbf commented Mar 28, 2022

aledbf commented Mar 28, 2022

kylos101 commented Apr 6, 2022

kylos101 commented Jun 1, 2022

axonasif commented Nov 11, 2022 •

edited

Loading

SNWCreations commented Nov 12, 2022

svenefftinge commented Nov 14, 2022

6uliver commented Nov 18, 2022

atduarte commented Nov 24, 2022

stale bot commented Sep 16, 2023

github-actions bot commented Aug 22, 2024

Epic: Ensure durability for user workspace files #7901

Epic: Ensure durability for user workspace files #7901

Comments

kylos101 commented Jan 28, 2022 • edited Loading

Summary

Context

Value

Acceptance criteria

Tasks

atduarte commented Jan 31, 2022 • edited Loading

aledbf commented Mar 28, 2022

aledbf commented Mar 28, 2022

kylos101 commented Apr 6, 2022

kylos101 commented Jun 1, 2022

axonasif commented Nov 11, 2022 • edited Loading

SNWCreations commented Nov 12, 2022

svenefftinge commented Nov 14, 2022

6uliver commented Nov 18, 2022

atduarte commented Nov 24, 2022

stale bot commented Sep 16, 2023

github-actions bot commented Aug 22, 2024

kylos101 commented Jan 28, 2022 •

edited

Loading

atduarte commented Jan 31, 2022 •

edited

Loading

axonasif commented Nov 11, 2022 •

edited

Loading