Skip to content
This repository has been archived by the owner on Oct 21, 2024. It is now read-only.

Performance improvement by "caching" folders in containers #197

Open
mickaelperrin opened this issue Jun 30, 2016 · 12 comments
Open

Performance improvement by "caching" folders in containers #197

mickaelperrin opened this issue Jun 30, 2016 · 12 comments

Comments

@mickaelperrin
Copy link

Hi,

After watching at the performance report done in the comment #166 (comment), I made a simple reading performance test:

grep -r chainthatdoesnotexist /path/to/core/of/large/project/with/multiple/files

Result:

  • On my MAC: 0,59s
  • In a container: 1,46s
  • In a container with folder mounted: 25,1s

So, it makes me think that I could win a lot of performance by simply making a little script that :

  • watches for changes in a mounted folder and sync them in real time in a not mounted folder directly in the container,
  • watches for changes in a not mounted folder of the container and sync them in real time in a host mounted folder ?

The main issue I currently see is that fsevents_to_vm doesn't support deletion events and that would be handled manually.

Before starting working on this, I would like to have your opinion over this ? Do you have some advice about that ?

Thanks

@mickaelperrin
Copy link
Author

Did some testing using lsyncd, so far it's working well...

@codekitchen
Copy link
Owner

This has come up quite a few times in the past, and the blocker has always been finding a good tool that does two-way syncing. I haven't heard of lsyncd but from a quick glance at the project page, it looks like it's primarily designed for one-way syncing as well. If you could find something that works well, though, that'd be great.

@mickaelperrin
Copy link
Author

mickaelperrin commented Jul 1, 2016

Here is what I am currently investigating : https://hub.docker.com/r/mickaelperrin/lsyncd. Works well so far. I use it currently to share a large PHP code and the reactivity of the website is far far more superior.

However, I think I will hit surely a problem facing that fsevent_to_vm doesn't notify delete. I will have to handle this by hand.

@mickaelperrin
Copy link
Author

In fact, rather than dealing with the forwarding of delete events in the vm. It should be simpler to implement a real-time rsync, like docker-osx-dev and use lsyncd in the container to get it back to the host.

In fact, I am wondering if implementing that way we shouldn't be able to implement two-way syncing.

Say that I want to share a volume like that -v ./src:/src.

1/ Dinghy creates a temporary folder in the vm, says /src.tmp which will be using NFS.
2/ The dinghy daemon running on the host will respond to host events to provide a real time rsync between ./src on the host and /src in the container.
3/ The lsyncd daeamon running in the dedicated container will respond to container events to provide a real time rsync between /src and /src.tmp in the container and NFS will get it back to the host.

@mickaelperrin
Copy link
Author

mickaelperrin commented Jul 2, 2016

Just tried this by hacking docker-osx-dev. Finally, not that good...

I didn't see before that docker-osx-dev 'simply' launch a full rsync with --deleted on the whole volume after each event. This involves abnormal file deletions in the container when files are created there...

I thought event were handle more precisely to add/remove files depending on the event. I will investigate to see if I can handle file deletion in an other way, by directly launching rm commands in the mounted folder.

If you have better ideas...

@mickaelperrin
Copy link
Author

So, I forked docker-osx-dev to improve its handling of file deletion, and packed it in a demo project to show how we could try to do bi-directionnal sync between a folder of the host and a not mounted folder in a container.

I will use it this week in my development workflow and will update this thread to provide some feedback, mainly if it is reliable.

@codekitchen
Copy link
Owner

Cool sounds interesting, I look forward to hearing how it works in practice.

@mickaelperrin
Copy link
Author

So, after near one week of usage, here is the first report I can do:

The good

  • Performance benefit is incredible between a folder with mount and a folder without mount. I finally get near the same performance of what I was used to get on Linux before switching to Mac. Really, really happy with that. Websites are now reacting instantaneously.
  • For a basic and day to day development usage, it works well. Create / Edit / delete files on the host > sync is near instantaneously in the container > launch the script / website > some files are created in the container and synced back to host near instantaneously also.

So is it magic ? Definitly not, there are some drawbacks to be aware of...

The bad

  • Not transparent in configuration. dinghy and docker-osx-dev are totally transparent solutions, run dinghy create && dinghy start and forget about it, just use docker in the way you were used to. At the moment, my setup needs : creation of dedicated volumes in containers, a special comment to configure the rsync between the host and the container, a special container to configure the rsync between the container to the host, and finally a script running on the host to perform the rsync host to container. Not trivial at all for someone new to docker.
  • docker-compose only. Currently, it was only developed with docker-compose files in mind.
  • I was afraid of an endless loop of syncing: container > host > container > host... This doesn't appear but after each sync back from the container to the host, a sync is performed again to the container.

The ugly

The first test was terrible. I needed to setup a large project through the help of composer a dependency management tool for PHP. When I say large, it's around 25k files. And, with that setup, I had lots of troubles: very very very high CPU usage and files not get synced. Here is why and how I managed that:

  • composer extracts zip files in a temporary folder name in a folder watched by the lsyncd daemon and then rename the folder to the package name. This happens too quickly, and sync events are lost because the original path has changed before the whole folder has finished to sync. To resolve this issue, I needed to patch composer to use a temporary unwatched folder for extracting the zip archives. With this patch, the 25k files are well synced back to the host. So, if you are running intensive creation / deletion of files in the container in a folder watched by lsyncd, you can have some troubles, prefer using an unwatched folder for this.
  • docker-osx-dev is more simpler than I thought. docker-osx-dev performs a rsync with --delete option for each fsevent of the watched folder on the host. Thos will delete all created content in the container. I needed to hack it to perform a rsync without the --delete and manage event deletion by forwarding rm commands. Problem, sync back from the container to the host triggers an event... so in my previous use case, you get 25k triggers of full rsync... you're CPU won't like that. To resolve this issue, I enabled watching of the host container only when the initial rsync back was done.

Despite this issues, that's work well for my day to day to usage, and I am pretty happy with it. I think there is definitely room to improve this. I opened a StackOverflow question to see if we can prevent the rsync storm of docker-odx-dev, and I think I need to manage file renaming also in the sync from the host to the container.

@seeruk
Copy link

seeruk commented Jan 25, 2017

Has unison ever come up? Some of the projects I run are too large to run on my MacBook Pro, so I run them on my dedicated server instead and use unison to sync up the files. My only concern with this kind of approach would be the number of files that would need to be watched. That may be a blocker for any approach like this.

@Jean85
Copy link

Jean85 commented Feb 9, 2017

I've tried docker-sync that uses unison on my Symfony app, but it's a mess. The 2-way sync goes mad when it find file edited on both sides, and this may happen a lot in my use case. I'm still searching for a good setup, I will give dinghy a go, but I fear it will be slow as NFS with VMWare, that I tried yesterday.

@paolomainardi
Copy link

paolomainardi commented Feb 9, 2017 via email

@Jean85
Copy link

Jean85 commented Feb 9, 2017

I've tried Dinghy on my Symfony app and it's horribly slow (10x/15x compared to native Linux Docker).
docker-machine-nfs is just 5x, but it's still unusable for everyday's work.
I'm starting to get convinced that big Symfony apps are a special case in this matter, thus I will use my fallback solution: putting the code inside the VM, and sync it to the outside.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants