-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues - multi core? #116
Comments
Hi, which release was faster?
Yes, encryption of a single file transfer is single-thread, but multiple
transfers run in parallel (each transfer get its own thread)
|
@rfjakob I only tested 1.3 yet, the transfer was done with Is it possible to do the encryption in parallel to utilize more cores? |
Ah ok, i misunderstood. I thought it was 50mb/s slower than some other
version.
So given the "-speed" numbers I don't think encryption is the bottleneck.
Two questions
1) what is the cpu load on the used core? 100%?
2) what is the underlying storage? A local ext4 on a hard disk? A ssd?
|
Normal transfers (eg from zfs pool -> same zfs pool) hits a steady 110-120MB/s with the same rsync command as above, just not to a gocryptfs mount point on the very same ZFS pool |
This sounds like poor random write/read performance due to raidz2 parity overhead on top of encryption overhead. Quite possibly that the non-fixed stripe sizes of raidz in combination with fuse are also a factor.. Anything abnormal showing up on iotop? |
Collect all the plaintext and pass everything to contentenc in one call. This will allow easier parallization of the encryption. #116
@jkaberg A difference between plain rsync and rsync+gocryptfs is that gocryptfs writes the data in 128KB blocks, while rsync probably uses bigger blocks. This is a FUSE limitation - the kernel always splits the data into 128KB blocks. What throughput do you get when you write to the ZFS with 128KB blocks? Like this:
Then, to find out why we are running at 100% CPU: Can you post a cpu profile of gocryptfs? Mount with this option:
then run the rsync and unmount. Thanks, Jakob |
This is what I meant by stripe sizes in combination with FUSE. The 128kb block size is probably what is bottlenecking. In this case compiling a custom kernel with FUSE_MAX_PAGES_PER_REQ higher than 32 may help alleviate the issue. |
Yes, increasing FUSE_MAX_PAGES_PER_REQ should increase the throughput. However, this is not something I can ask from users. So I think behaving like |
@rfjakob Here's the output (/media/xfiles is the ZFS mountpoint)
The cpu profile can be found here Strange thing is somehow rsync (using flags avP) to the same unencrypted mount is topping out at around 60 MB/s |
The CPU profile (rendered as pdf: pprof001.svg.pdf) show that we spend our time on:
I have already sped up nonceGenerator.Get quite a little in 80516ed . We cannot do anything about the pwrite syscall. That leaves gcmAesEnc. My benchmarks suggest that we can get a big improvement by parallelizing the encryption: results.txt |
On a 4-core, 8-thread machine (Xeon E31245) we get a superlinear (!!) improvement by switching form one to two threads:
|
Impressive numbers and work @rfjakob. You mind publishing a build for me to test (Linux amd64)? |
Also good news from libfuse, https://github.com/libfuse/libfuse/releases/tag/fuse-3.0.2 "Internal: calculate request buffer size from page size and kernel page limit instead of using hardcoded 128 kB limit." (libfuse/libfuse@4f8f034) This should help speed things up abit 😄 |
The numbers I posted are from a synthetic benchmark ( https://github.com/rfjakob/gocryptfs-microbenchmarks ), I'm working on getting it into gocryptfs. I'll probably not get the same improvement in gocryptfs due the FUSE overhead. Will keep you updated here! The page size thing, unfortunately, only applies to architectures other than x86. I believe arm64 and powerpc have a bigger page size, so they would get much bigger blocks. |
I have added two-way encryption parallelism. If you can test, here is the latest build: |
@rfjakob Indeed, I'm seeing on avarage an 20MB/s increase (with rsync). Very nice! 😄 I did an cpu profile for you aswell, https://cloud.eth0.im/s/jEwCnsLJElFz8E0 While doing the rsync job I noticed my CPU is not going above 130%. From the commit messages I recon you limited threading to 2 threads, do you think bumping up that value would make a difference? |
Great, thanks! Rendered cpu profile: pprof002.svg.pdf I saw about a 20% increase in my testing, and to be honest, I was a bit underwhelmed. It turns out that the encryption threads often get scheduled to the same core. This gets worse with more threads, which is why I have limited it to two-way parallelism for now. |
I think this can be closed - check out the performance history at performance.txt, the last commits gave us quite a boost. |
@rfjakob Thanks. I'll have a go when I'm back from vacation 😄 |
Just tried the 1.3 release and I'm seeing some lower transfer numbers (roughly around 50-60MB/s) on HDD/ZFS pool - usually speeds are around 110MB/s
CPU supports AES-NI (24 cores)
gocryptfs speedtest
While mounting the filesystem and doing an larger transfer (10GB) I notice 1 core gets full load, but no additional cores gets used.
Is gocryptfs (or alternatively the encryption process) limited to one core? If so - consider this a feature request for muli core encryption 😄
If not - any ideas what might be the bottle neck?
The text was updated successfully, but these errors were encountered: