Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement inactive workspace archival #347

Closed
Tracked by #328
bloodearnest opened this issue Mar 7, 2022 · 11 comments
Closed
Tracked by #328

Implement inactive workspace archival #347

bloodearnest opened this issue Mar 7, 2022 · 11 comments

Comments

@bloodearnest
Copy link
Member

We need a way to compress and archive inactive workspace.

Proposed initial design:

  • cli command to archive a workspace
  • zips $HIGH_PRIVACY_STORAGE_BASE/workspaces/$WORKSPACE to $HIGH_PRIVACY_STORAGE_BASE/archives/$WORKSPACE.zip (should probably do some integrity checks of the zip before deleting the workspace dir)
  • when job-runner runs a job for a workspace directory that doesn't exist on disk, rather than creating it, it checks if a matching archive link exists, and if it does, it fails the job with a suitable message. Otherwise it creates as normal
  • the separate path is to allow mounting onto different archive storage
  • unarchive command to reverse.
@sebbacon
Copy link
Contributor

sebbacon commented Mar 7, 2022

It would be really nice to be able to show that a workspace is archived in the UI, but I can see why you left it out of this proposal. I guess the cli tool could conceivably poke an API endpoint in job server to toggle a flag. But we also want to get this out the door ASAP

@bloodearnest
Copy link
Member Author

So, workspaces already have an archive action/UI on job server. The eventual plan would be that archiving on job-server would trigger this action on all job-runners that that workspace has been run on. But we don't yet have a C&C channel and execution model for that kind of action yet.

@sebbacon
Copy link
Contributor

sebbacon commented Mar 7, 2022

Until we do, is there any pragmatic way we can avoid confusion from the unlinked states?

@bloodearnest
Copy link
Member Author

Maybe to enforce that for users it's a one way transition from active to archived? I.e. that unarchiving requires an admin?

@sebbacon
Copy link
Contributor

sebbacon commented Mar 7, 2022

Could be. How would we make decisions about what to zip? Should we consider anything archived fair game, at a schedule that suits us?

@bloodearnest
Copy link
Member Author

Initial list of workspaces to archive would be those that are currently archived on job-server. We might then go through the larger ones on disk and see if we can archive them on job-server too.

@sebbacon
Copy link
Contributor

sebbacon commented Mar 7, 2022

Yes, so how do we keep them synchronised?

I think we're saying only archived workspaces (i.e. that have the archive flag set) can be zipped?

That means anyone who wants to zip a workspace must archive it first.

So ideally we'd have a forcing function for that. Perhaps just a reminder in the cli tool ("I have checked the corresponding workspace is archived in the job server y/N")?

@bloodearnest
Copy link
Member Author

I don't think we need that level of strictness for this very low frequency change.

From a users perspective both states result in the same outcome - you can't run jobs.

If it's archived on job-server, you can't even submit a job, and you don't care about the state on the job-runner. As operators, we can occasionally manually archive.

If it's archived on job-runner, you'll get an actionable message ("contact the tech team").

There's nothing that inherently breaks or isn't handled if the states are mismatched. And this manual coupling should only be for a limited time, until we implement that automatic archival.

@bloodearnest
Copy link
Member Author

Basic version of this implemented in #349

A real version needs a bit of thought, as it may affect the executor API

@bloodearnest
Copy link
Member Author

This is currently being used to zip stuff, but it's very very slow

@bloodearnest
Copy link
Member Author

#349 has the current implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants