Performance issues - multi core? #116

jkaberg · 2017-06-01T08:46:50Z

Just tried the 1.3 release and I'm seeing some lower transfer numbers (roughly around 50-60MB/s) on HDD/ZFS pool - usually speeds are around 110MB/s

CPU supports AES-NI (24 cores)

grep 'model name' /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz
<...>
grep aes /proc/cpuinfo | wc -l
24

gocryptfs speedtest

./gocryptfs -speed
AES-GCM-256-OpenSSL      226.92 MB/s
AES-GCM-256-Go           376.53 MB/s    (selected in auto mode)
AES-SIV-512-Go            87.52 MB/s

While mounting the filesystem and doing an larger transfer (10GB) I notice 1 core gets full load, but no additional cores gets used.

Is gocryptfs (or alternatively the encryption process) limited to one core? If so - consider this a feature request for muli core encryption 😄

If not - any ideas what might be the bottle neck?

The text was updated successfully, but these errors were encountered:

rfjakob · 2017-06-01T12:28:37Z

Hi, which release was faster? Yes, encryption of a single file transfer is single-thread, but multiple transfers run in parallel (each transfer get its own thread)

jkaberg · 2017-06-01T13:32:06Z

@rfjakob I only tested 1.3 yet, the transfer was done with rsync -avP --progress source target. So it seems my single core performance is too slow (atleast not as fast as the HDD's).

Is it possible to do the encryption in parallel to utilize more cores?

rfjakob · 2017-06-01T15:51:25Z

Ah ok, i misunderstood. I thought it was 50mb/s slower than some other version. So given the "-speed" numbers I don't think encryption is the bottleneck. Two questions 1) what is the cpu load on the used core? 100%? 2) what is the underlying storage? A local ext4 on a hard disk? A ssd?

jkaberg · 2017-06-01T16:07:00Z

@rfjakob

yeah, 100%
the underlying storage is a ZFS Raidz2 pool with 11 x 4TB SATA3 HGST drives.

Normal transfers (eg from zfs pool -> same zfs pool) hits a steady 110-120MB/s with the same rsync command as above, just not to a gocryptfs mount point on the very same ZFS pool

Nodens- · 2017-06-01T16:19:22Z

This sounds like poor random write/read performance due to raidz2 parity overhead on top of encryption overhead. Quite possibly that the non-fixed stripe sizes of raidz in combination with fuse are also a factor..

Anything abnormal showing up on iotop?

Collect all the plaintext and pass everything to contentenc in one call. This will allow easier parallization of the encryption. #116

rfjakob · 2017-06-06T07:52:30Z

@jkaberg A difference between plain rsync and rsync+gocryptfs is that gocryptfs writes the data in 128KB blocks, while rsync probably uses bigger blocks. This is a FUSE limitation - the kernel always splits the data into 128KB blocks.

What throughput do you get when you write to the ZFS with 128KB blocks? Like this:

dd if=/dev/zero of=YOURZFSMOUNT/zero bs=128k

Then, to find out why we are running at 100% CPU: Can you post a cpu profile of gocryptfs? Mount with this option:

gocryptfs -cpuprofile /tmp/cpu.prof

then run the rsync and unmount. Thanks, Jakob

Nodens- · 2017-06-06T08:31:09Z

This is what I meant by stripe sizes in combination with FUSE. The 128kb block size is probably what is bottlenecking. In this case compiling a custom kernel with FUSE_MAX_PAGES_PER_REQ higher than 32 may help alleviate the issue.

rfjakob · 2017-06-07T07:23:35Z

Yes, increasing FUSE_MAX_PAGES_PER_REQ should increase the throughput. However, this is not something I can ask from users.

So I think behaving like dd bs=128k is the best we can do. But that we are pegged at 100% cpu is probably keeping us from getting there. But let's see what the cpu profile says.

jkaberg · 2017-06-07T08:53:01Z

@rfjakob Here's the output (/media/xfiles is the ZFS mountpoint)

root@gunder:/media/xfiles# dd if=/dev/zero of=/media/xfiles/zero bs=128k count=10000 conv=fdatasync
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 0.859343 s, 1.5 GB/s
root@gunder:/media/xfiles# ./gocryptfs encrypted/ unencrypted/
Password:
root@gunder:/media/xfiles# dd if=/dev/zero of=/media/xfiles/unencrypted/zero bs=128k count=10000 conv=fdatasync
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 9.93545 s, 132 MB/s
root@gunder:/media/xfiles# fusermount -u unencrypted/
root@gunder:/media/xfiles# ./gocryptfs -cpuprofile /tmp/cpu.prof encrypted/ unencrypted/
Writing CPU profile to /tmp/cpu.prof
Note: You must unmount gracefully, otherwise the profile file(s) will stay empty!
Password:
root@gunder:/media/xfiles# dd if=/dev/zero of=/media/xfiles/unencrypted/zero bs=128k count=10000 conv=fdatasync
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 10.0026 s, 131 MB/s
root@gunder:/media/xfiles# fusermount -u unencrypted/

The cpu profile can be found here

Strange thing is somehow rsync (using flags avP) to the same unencrypted mount is topping out at around 60 MB/s

rfjakob · 2017-06-10T13:14:55Z

The CPU profile (rendered as pdf: pprof001.svg.pdf) show that we spend our time on:

36.8%    gcmAesEnc
14.2%    syscall.Pwrite
 6.9%    nonceGenerator.Get

I have already sped up nonceGenerator.Get quite a little in 80516ed . We cannot do anything about the pwrite syscall. That leaves gcmAesEnc. My benchmarks suggest that we can get a big improvement by parallelizing the encryption: results.txt

rfjakob · 2017-06-10T13:18:49Z

On a 4-core, 8-thread machine (Xeon E31245) we get a superlinear (!!) improvement by switching form one to two threads:

Benchmark1_gogcm-8            	    5000	    282694 ns/op	 463.65 MB/s
Benchmark2_gogcm-8            	   20000	     99704 ns/op	1314.60 MB/s

jkaberg · 2017-06-10T14:26:19Z

Impressive numbers and work @rfjakob. You mind publishing a build for me to test (Linux amd64)?

jkaberg · 2017-06-10T14:45:12Z

Also good news from libfuse, https://github.com/libfuse/libfuse/releases/tag/fuse-3.0.2

"Internal: calculate request buffer size from page size and kernel page limit instead of using hardcoded 128 kB limit." (libfuse/libfuse@4f8f034)

This should help speed things up abit 😄

rfjakob · 2017-06-10T17:01:45Z

The numbers I posted are from a synthetic benchmark ( https://github.com/rfjakob/gocryptfs-microbenchmarks ), I'm working on getting it into gocryptfs. I'll probably not get the same improvement in gocryptfs due the FUSE overhead. Will keep you updated here!

The page size thing, unfortunately, only applies to architectures other than x86. I believe arm64 and powerpc have a bigger page size, so they would get much bigger blocks.

rfjakob · 2017-06-11T20:19:10Z

I have added two-way encryption parallelism. If you can test, here is the latest build:
gocryptfs_v1.3-70-gafc3a82_linux-static_amd64.tar.gz

jkaberg · 2017-06-11T20:47:45Z

@rfjakob Indeed, I'm seeing on avarage an 20MB/s increase (with rsync). Very nice! 😄

I did an cpu profile for you aswell, https://cloud.eth0.im/s/jEwCnsLJElFz8E0

While doing the rsync job I noticed my CPU is not going above 130%. From the commit messages I recon you limited threading to 2 threads, do you think bumping up that value would make a difference?

rfjakob · 2017-06-11T21:11:13Z

Great, thanks! Rendered cpu profile: pprof002.svg.pdf

I saw about a 20% increase in my testing, and to be honest, I was a bit underwhelmed. It turns out that the encryption threads often get scheduled to the same core. This gets worse with more threads, which is why I have limited it to two-way parallelism for now.

rfjakob · 2017-06-20T20:46:25Z

@jkaberg If you want to try again, 3c6fe98 should give it another boost. This gets rid of most of the garbage collection overhead by re-using temporary buffers.

rfjakob · 2017-07-01T08:21:27Z

I think this can be closed - check out the performance history at performance.txt, the last commits gave us quite a boost.

jkaberg · 2017-07-03T10:27:13Z

@rfjakob Thanks. I'll have a go when I'm back from vacation 😄

rfjakob added a commit that referenced this issue Jun 1, 2017

fusefrontend: write: consolidate and move encryption to contentenc

a24faa3

Collect all the plaintext and pass everything to contentenc in one call. This will allow easier parallization of the encryption. #116

rfjakob closed this as completed Jul 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues - multi core? #116

Performance issues - multi core? #116

jkaberg commented Jun 1, 2017 •

edited

Loading

rfjakob commented Jun 1, 2017 via email •

edited

Loading

jkaberg commented Jun 1, 2017 •

edited

Loading

rfjakob commented Jun 1, 2017 via email

jkaberg commented Jun 1, 2017 •

edited

Loading

Nodens- commented Jun 1, 2017

rfjakob commented Jun 6, 2017

Nodens- commented Jun 6, 2017

rfjakob commented Jun 7, 2017

jkaberg commented Jun 7, 2017 •

edited

Loading

rfjakob commented Jun 10, 2017

rfjakob commented Jun 10, 2017

jkaberg commented Jun 10, 2017 •

edited

Loading

jkaberg commented Jun 10, 2017 •

edited

Loading

rfjakob commented Jun 10, 2017

rfjakob commented Jun 11, 2017

jkaberg commented Jun 11, 2017 •

edited

Loading

rfjakob commented Jun 11, 2017

rfjakob commented Jun 20, 2017

rfjakob commented Jul 1, 2017

jkaberg commented Jul 3, 2017

Performance issues - multi core? #116

Performance issues - multi core? #116

Comments

jkaberg commented Jun 1, 2017 • edited Loading

rfjakob commented Jun 1, 2017 via email • edited Loading

jkaberg commented Jun 1, 2017 • edited Loading

rfjakob commented Jun 1, 2017 via email

jkaberg commented Jun 1, 2017 • edited Loading

Nodens- commented Jun 1, 2017

rfjakob commented Jun 6, 2017

Nodens- commented Jun 6, 2017

rfjakob commented Jun 7, 2017

jkaberg commented Jun 7, 2017 • edited Loading

rfjakob commented Jun 10, 2017

rfjakob commented Jun 10, 2017

jkaberg commented Jun 10, 2017 • edited Loading

jkaberg commented Jun 10, 2017 • edited Loading

rfjakob commented Jun 10, 2017

rfjakob commented Jun 11, 2017

jkaberg commented Jun 11, 2017 • edited Loading

rfjakob commented Jun 11, 2017

rfjakob commented Jun 20, 2017

rfjakob commented Jul 1, 2017

jkaberg commented Jul 3, 2017

jkaberg commented Jun 1, 2017 •

edited

Loading

rfjakob commented Jun 1, 2017 via email •

edited

Loading

jkaberg commented Jun 1, 2017 •

edited

Loading

jkaberg commented Jun 1, 2017 •

edited

Loading

jkaberg commented Jun 7, 2017 •

edited

Loading

jkaberg commented Jun 10, 2017 •

edited

Loading

jkaberg commented Jun 10, 2017 •

edited

Loading

jkaberg commented Jun 11, 2017 •

edited

Loading