-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronously attach PVs to Workspaces #15384
Comments
This also would help in solving issues with e.g. Gluster being too slow for some operations ( |
Another option mentioned by @gorkem is to leverage ephemeral containers introduced in 1.16. That would allow to avoid using rsync. |
Hello, here are some notes: about
optimization
One the goal is to be able to start the workspace as fast as possible. For that it means that we create a new workspace (no previous state):
If there is previous data, IDE needs to wait to have project restored before displaying full layout.
optimization: it could cleanup 'unpacked' folder and only keep zip files if files were not used since a lot of time.
Another optimization: |
Just a couple of notes:
|
about 1. we read the docs as well but thx :-) |
No doubt, but the fact that we're relying on unproven technology might merit a bit of discussion, no? Is this tech that is ready for our customers? |
@tsmaeder we're not considering for now as it should work on all openshift/kubernetes instances |
Do we really need to have per-workspace PV attachment? Can not we have single Data Sync service/deployment used by all the workspaces instead? |
@gazarenkov it's per user namespace (all workspaces of a user) first. |
@benoitf Ok, thanks, then do we really need it per-namespace? :) |
@gazarenkov at first because for example on che.openshift.io you won't be able to mount like a "super big" PV to store all workspaces data (and then how do you manage quota per user as today) and cross-clusters stuff, etc. |
@benoitf |
@gazarenkov that's trickier because the service that does the sync needs to deal with files of different users. That can be implemented as a second iteration though. But let's keep this first iteration simple and implement a PV per user. |
@l0rd
So, I'd definitely suggest considering single data service as an option to consider before we go to implementation. |
Just some thoughts about this issue, the ongoing work on the Workspace CRD, and cloud shell. According to this EPIC #15425, One important point mentioned in this EPIC, is the big scalability gain that would be brought,
In the light of this, I would prefer starting this work with the option that is, as much as possible, compatible with both use-cases:
So it seems to plead for a per-user-namespace solution first. |
@davidfestal Could you please elaborate about your vision of a layer which persists projects code between user sessions (i.e. temporary) in a light of workspace management decentralization. |
Physical storage for workspace data is already per-user (if not per-workspace), through namespaced PVs, and not centralized and common to all the users. I don't see what should change here with the Workspace CRD architecure. Workspace data physical storage is already decentralized. I don't see why it would be required to change the existing way, and now store workspace data in a PV common to all users. But even without going into all technical details here, my point was to say that requiring an additional centralized service in an architecture that finally should be compatible with workspace management decentralization, seems strange to me. Afaict, the initial proposal from @benoitf with per-user-namespace storage, would fit the existing and future structure of the Workspace CRD POC. But sure, a centralized workspace storage service could, at some point, be an optimization option for some use-cases. |
Wouldn't a single big PV require ReadWriteMany access mode? |
@gorkem I would guess RWO will work fine for single Data Store Pod, if second (and more) pods spin up - it depends whether scheduler put it on the same node (should work) or different (will not). |
@gazarenkov why do you think one central service is simpler? In a centralized service we have to build a secured-to-the-bone mechanism that matches users with folders. And we need to consider scalability as well. A problem with that service and users won't be able to access their data or even worst will have access to data of other users. I don't want to deal with those problems right now. For the reuse of existing code that's an implementation detail. I would let the team that will work on the code to decide. |
@l0rd I do not think user should have direct access to this data (which is a hot backup of projects), only via Data Sync service which supposedly can scale Pod the same way as usual K8s Deployment ? I think it may even work w/o this service exactly the same as it does with Ephemeral storage now, i.e. user have access to the instance storage only, syncing this data is exclusively internal mechanism. Additional bonus of this approach may be a zero PV attaching/mounting time (like ephemeral again). So, to me, it looks as an option to consider before coding, no? |
About the Central Service
Some other considerations
|
@ibuziuk was there something left here or we can close the epic? |
Issues go stale after Mark the issue as fresh with If this issue is safe to close now please do so. Moderators: Add |
Is your enhancement related to a problem?
No matter how fast we get to bootstrap a Che workspace, no matter how many external resources we are able to pre-pull (images, extensions, examples source code), we will always need to wait 20+s for a PV to be attached and mounted on the workspace pod.
Describe the solution you'd like
New Workspace lifecycle:
Workspace components in Read-only mode
In the "Startup data sync phase" the user will already be able to use the editor and plugins but those should behave in a read-only mode until all the data has been synced to the ephemeral volume. That means that Che editors (for example theia) should be able to work on read only mode (initially this can be done by showing a progress bar that shows the data sync and not allowing the user to access theia).
rsync protocol
Rsync is mentioned as the remotes files synchronisation protocol but that’s just an example. If there is a better alternative, let's use it.
Ideas to improve performances (even more)
Florent's edit:
Tasks
The text was updated successfully, but these errors were encountered: