Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async PV for workspace: PoC with data sync pod #16042

Closed
ibuziuk opened this issue Feb 14, 2020 · 18 comments
Closed

Async PV for workspace: PoC with data sync pod #16042

ibuziuk opened this issue Feb 14, 2020 · 18 comments
Assignees
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@ibuziuk
Copy link
Member

ibuziuk commented Feb 14, 2020

Is your task related to a problem? Please describe.

It was decided to create a PoC for Hosted Che before start working on the epic - #15384

image

Describe the solution you'd like

rsync client should be a plugin enabled on the devfile level
rsync server pod should be running inside the {$username}

Describe alternatives you've considered

N/A

Additional context

TODO

@ibuziuk ibuziuk added the kind/task Internal things, technical debt, and to-do tasks to be performed. label Feb 14, 2020
@ibuziuk ibuziuk added this to the Backlog - Hosted Che milestone Feb 14, 2020
@ibuziuk ibuziuk added severity/P1 Has a major impact to usage or development of the system. team/hosted-che labels Feb 14, 2020
@gazarenkov
Copy link
Contributor

@ibuziuk could you please elaborate a bit about:

  • What is the goal/success criteria ? (workability of this scheme/sync in general, some performance aspects, etc)
  • On what environment? (probably the most interesting is Persistence Volume type: Gluster and|or something else) ?
  • What are the sizes/types/work-loadings of projects to be tested?
  • Is it expected to have some numbers/comparisons as a result of the experiments? (would be really interesting exercise)

@ibuziuk ibuziuk modified the milestones: Backlog - Hosted Che, 7.10.0 Feb 19, 2020
@vparfonov
Copy link
Contributor

@ibuziuk could you please elaborate a bit about:

  • What is the goal/success criteria ? (workability of this scheme/sync in general, some performance aspects, etc)
  • On what environment? (probably the most interesting is Persistence Volume type: Gluster and|or something else) ?
  • What are the sizes/types/work-loadings of projects to be tested?
  • Is it expected to have some numbers/comparisons as a result of the experiments? (would be really interesting exercise)

Great questions I have the same, but I think it should be addressed to @benoitf.
I thought about metrics before such improvement and start to clone and build Eclipse Che project,
but can't do it on Hosted Che looks like not enough disk space to build user dashboard.
@benoitf probably you already have some metrics. If yes can you share devfiles and steps?

Second that I want to ask is what we actually want to improve. On initial description look like we talk only about project source sync, so in this case we will speed up only second start of workspace but not build process, is it correct?

@benoitf
Copy link
Contributor

benoitf commented Feb 20, 2020

What is the goal/success criteria ? (workability of this scheme/sync in general, some performance aspects, etc)

For the POC, I would say it would be a success if it works. About performance it may be poor on recovery at first with ideas to improve it (like using zip/ another approach like git where you transfer binary but still unpack files at the end), parallel mode, etc). It's still better than pure ephemeral and it can be opted-in per workspace so it will not directly impact all workspaces.

On what environment? (probably the most interesting is Persistence Volume type: Gluster and|or something else) ?

Yes, che.openshift.io is a good target (or any gluster cluster). And other clusters where PV is long time to start.

What are the sizes/types/work-loadings of projects to be tested?

Major failure for now with gluster is for nodejs projects. (it doesn't work at all)
Let say you can take any nodejs project to test as it's using lot of small files which are very impacting all rsync stuff.
Of course you can take java and python but they're less impacted now.

Is it expected to have some numbers/comparisons as a result of the experiments? (would be really interesting exercise)

about metrics, for a fresh workspace

  • it should be the same than ephemeral workspace (as it's what will be used)
  • saving to PV is done asynchronously when workspace is alive so it should not impact it.

for an existing workspace:

  • it was too slow when doing 'rsync' from folders. Especially for nodejs project where there are tons of small files.
  • transfering zip (or tgz) file and unpacking it was done in less than 5seconds for 300MB /projects archive (so this is why I said that zip for recovering a workspace is better)

So it's still interesting to have numbers like

  • project with {few, mid, lot} number of files in /project and global size: how many seconds to recover(have the files in /projects)
  • how many seconds to wait after we stop a workspace before being able to start it again (wait the last rsync)

@vparfonov
Copy link
Contributor

vparfonov commented Feb 20, 2020

  • transfering zip (or tgz) file and unpacking it was done in less than 5seconds for 300MB /projects archive (so this is why I said that zip for recovering a workspace is better)

Try to rephrase my question. We talk only about syncing /projects folder? What about Maven repository or npm deps?

@benoitf
Copy link
Contributor

benoitf commented Feb 20, 2020

yes it's about /projects folder but npm dependencies are stored in node_modules folder of the project (so part of /projects data)
so let say you clone and build theia or che-theia, your /projects folder can reach easily this kind of space.

@skabashnyuk
Copy link
Contributor

@benoitf what are we going to do with maven/gradle based projects?

@benoitf
Copy link
Contributor

benoitf commented Feb 20, 2020

@skabashnyuk I mean it's like today, it's already storing /projects and you can mount extra volume for .m2 dependencies
Are you talking about produced artifacts (target folder for example ? ) in /projects ?

@skabashnyuk
Copy link
Contributor

I'm talking about .m2 and @vparfonov's question Try to rephrase my question. We talk only about syncing /projects folder? What about Maven repository or npm deps?

So we talking about all volumes or only about the one with /projects?

@benoitf
Copy link
Contributor

benoitf commented Feb 20, 2020

@skabashnyuk ok got it 👍

I think POC should check first one volume (/projects) but still keeping in mind multiple volumes and later provide multiple volumes as well.

Even if you loose .m2 your own data are almost safe, it's an optimization (step2 . to avoid to download again the artifacts) while if you loose your changes on /projects it's more dramatic.

@benoitf
Copy link
Contributor

benoitf commented Feb 20, 2020

... start to clone and build Eclipse Che project,
but can't do it on Hosted Che looks like not enough disk space to build user dashboard.

@vparfonov I think you can go without dashboard (pure maven project) or with only dashboard (npm project) (as dashboard will live in its own repository soon)

@vparfonov
Copy link
Contributor

So DoD for POC will be: Same speed as we have now in Ephemeral Mod but with restoring changes on source

@ibuziuk ibuziuk modified the milestones: 7.10.0, 7.11.0 Mar 11, 2020
@vparfonov
Copy link
Contributor

vparfonov commented Mar 11, 2020

Current state demonstrated here: https://youtu.be/8N1uxU-iYlY.
During next sprint:

  • will continue work on investigation how to backup source on workspace stop action, as possible solution will try to use Container hooks.
  • polish code for dockerfiles for rsync containers

After that we can close PoC issue.

@gazarenkov
Copy link
Contributor

Is not a goal of this POC also to make sure it is working with persistent storage like Gluster or whatever we want to use in prod?
(see discussion about the goals above)

@vparfonov
Copy link
Contributor

vparfonov commented Mar 18, 2020

UPD:
Create 2 draft for support k8s container lifecycle event:

Successfully backup source on workspace stop and restore on start, with container lifecycle event.
On next step will try to use some big projects and permanent storage

@ibuziuk ibuziuk changed the title Async PV for workspace: PoC for Hosted Che with data sync pod in the {$username} namespace Async PV for workspace: PoC for Hosted Che with data sync pod Mar 18, 2020
@vparfonov
Copy link
Contributor

Time for backup/restore with real projects on Container file system

Prerequisite

  • Lenovo ThinkPad P50
  • minishift v1.34.2+83ebaab
  • minishift start --cpus 4 --memory 16384 --vm-driver=virtualbox

Cases:

  1. Start empty workspace.
  • clone Eclipse Che
  • checkout 7.10.0
  • build (will keep built artifacts to restore it)
find /projects -type f | wc -l 49333
du -h 1.2G

Repeat start/stop workspace 10 times and track time for backup/restore procedure calculate average:

real 0m 21.66s
user 0m 6.27s
sys 0m 2.49s
  1. Start empty workspace.

2.1 Clone 4 projects:

  - https://github.com/angular/angular.js.git
  - https://github.com/facebook/react.git
  - https://github.com/facebook/react-native.git
  - https://github.com/vuejs/vue.git

2.2 Execute yarn in each project
2.3

find /projects -type f | wc -l 166012
du -h 3.1G

Repeat start/stop workspace 10 times and track time for backup/restore procedure calculate average:

real 1m 37.91s
user 0m 15.43s
sys 0m 25.59s

@ibuziuk ibuziuk changed the title Async PV for workspace: PoC for Hosted Che with data sync pod Async PV for workspace: PoC with data sync pod Mar 25, 2020
@vparfonov
Copy link
Contributor

vparfonov commented Mar 27, 2020

Repeat the same for workspace with 4 JavaScript projects on EBS storage (gp2):

find /projects -type f | wc -l 166012
du -h 3.1G
restore backup
real 2m 19.52s 2m 42.91s
user 0m 13.44s 0m 12.89s
sys 0m 17.67s 0m 1.72s

@vparfonov
Copy link
Contributor

vparfonov commented Mar 30, 2020

Repeat the same for workspace with Eclipse Che (after built) on EBS storage (gp2):

find /projects -type f | wc -l 49333
du -h 1.2G
restore backup
real 0m 42.04s 0m 47.64s
user 0m 5.67s 0m 5.51s
sys 0m 7.39s 0m 1.19s

@vparfonov
Copy link
Contributor

Demo video here: https://youtu.be/MrUnsB_D0Rw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants