Implement inactive workspace archival #347

bloodearnest · 2022-03-07T11:59:24Z

We need a way to compress and archive inactive workspace.

Proposed initial design:

cli command to archive a workspace
zips $HIGH_PRIVACY_STORAGE_BASE/workspaces/$WORKSPACE to $HIGH_PRIVACY_STORAGE_BASE/archives/$WORKSPACE.zip (should probably do some integrity checks of the zip before deleting the workspace dir)
when job-runner runs a job for a workspace directory that doesn't exist on disk, rather than creating it, it checks if a matching archive link exists, and if it does, it fails the job with a suitable message. Otherwise it creates as normal
the separate path is to allow mounting onto different archive storage
unarchive command to reverse.

sebbacon · 2022-03-07T12:08:00Z

It would be really nice to be able to show that a workspace is archived in the UI, but I can see why you left it out of this proposal. I guess the cli tool could conceivably poke an API endpoint in job server to toggle a flag. But we also want to get this out the door ASAP

bloodearnest · 2022-03-07T12:10:09Z

So, workspaces already have an archive action/UI on job server. The eventual plan would be that archiving on job-server would trigger this action on all job-runners that that workspace has been run on. But we don't yet have a C&C channel and execution model for that kind of action yet.

sebbacon · 2022-03-07T12:13:26Z

Until we do, is there any pragmatic way we can avoid confusion from the unlinked states?

bloodearnest · 2022-03-07T12:43:40Z

Maybe to enforce that for users it's a one way transition from active to archived? I.e. that unarchiving requires an admin?

sebbacon · 2022-03-07T12:46:23Z

Could be. How would we make decisions about what to zip? Should we consider anything archived fair game, at a schedule that suits us?

bloodearnest · 2022-03-07T12:48:07Z

Initial list of workspaces to archive would be those that are currently archived on job-server. We might then go through the larger ones on disk and see if we can archive them on job-server too.

sebbacon · 2022-03-07T12:54:58Z

Yes, so how do we keep them synchronised?

I think we're saying only archived workspaces (i.e. that have the archive flag set) can be zipped?

That means anyone who wants to zip a workspace must archive it first.

So ideally we'd have a forcing function for that. Perhaps just a reminder in the cli tool ("I have checked the corresponding workspace is archived in the job server y/N")?

bloodearnest · 2022-03-07T13:00:47Z

I don't think we need that level of strictness for this very low frequency change.

From a users perspective both states result in the same outcome - you can't run jobs.

If it's archived on job-server, you can't even submit a job, and you don't care about the state on the job-runner. As operators, we can occasionally manually archive.

If it's archived on job-runner, you'll get an actionable message ("contact the tech team").

There's nothing that inherently breaks or isn't handled if the states are mismatched. And this manual coupling should only be for a limited time, until we implement that automatic archival.

bloodearnest · 2022-03-08T16:00:26Z

Basic version of this implemented in #349

A real version needs a bit of thought, as it may affect the executor API

bloodearnest · 2022-03-09T14:44:28Z

This is currently being used to zip stuff, but it's very very slow

bloodearnest · 2022-03-09T14:44:50Z

#349 has the current implementation

bloodearnest mentioned this issue Mar 7, 2022

Stop running out of diskspace. #328

Closed

4 tasks

bloodearnest mentioned this issue Mar 9, 2022

Move TPP backend to ubuntu VM opensafely-core/backend-server#46

Closed

bloodearnest closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement inactive workspace archival #347

Implement inactive workspace archival #347

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

bloodearnest commented Mar 8, 2022

bloodearnest commented Mar 9, 2022

bloodearnest commented Mar 9, 2022

Implement inactive workspace archival #347

Implement inactive workspace archival #347

Comments

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

sebbacon commented Mar 7, 2022

bloodearnest commented Mar 7, 2022

bloodearnest commented Mar 8, 2022

bloodearnest commented Mar 9, 2022

bloodearnest commented Mar 9, 2022