Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: encryption primitives for devices without AES cpu instructions #452

Closed
DavyLandman opened this issue Feb 6, 2020 · 79 comments

Comments

@DavyLandman
Copy link

Hi @rfjakob,

Thank you for this great application! The reverse mode is what really sets it apart from other options.

I checked the issues, and it doesn't seem to be discussed yet, but what do you think about adding support for a different collection of encryption primitives that are better suited for more low-end devices?

I'm running gocryptfs on a few ARMv6/7 based NAS machines, they are nice: low energy, and quite fast. But they lack native AES instructions, my fastest ARM device (Odroid XU4) maxes out at 40MB/s, while for example the raspberry-pi's and friends are quite a bit slower (rpi1 is at 15MB/s).

Maybe Google Adiantum (also added to linux kernel 5.0 for cryptfs) is a nice fit, Adiantum is based on XChaCha12 and Poly1305 and is roughly 5 quicker than AES-XTS for devices without AES instructions.

For the reverse mode maybe something based on ChaCha20Poly1305?

Just for comparison, on my Odroid XU4, ChaCha20Poly1305 runs at 320MB/s, on my RPi1 it gets close to 40MB/s.

So I'm just wondering what your view is on this topic.

Cheers,
Davy

@rfjakob
Copy link
Owner

rfjakob commented Feb 13, 2020

Hi, would you mind running gocryptfs -speed on your ARM machines and posting the result? (and cat /proc/cpuinfo | grep -E "model name|flags" | head -2).

I'd like to add it to our CPU zoo at ( https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks )

@DavyLandman
Copy link
Author

I've taken all different kind of ARM devices I have:

Odroid XU4 (Exynos 5422 - ARM Cortex-A15 - 2 GHz)

model name      : ARMv7 Processor rev 3 (v7l)
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
$ gocryptfs -speed
AES-GCM-256-OpenSSL       34.26 MB/s    (selected in auto mode)
AES-GCM-256-Go            17.24 MB/s
AES-SIV-512-Go            17.58 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                 16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305    64066.72k   130153.44k   275532.80k   306572.84k   320018.56k   307903.74k
aes-256-gcm          40323.87k   49980.74k    64734.47k    70323.03k    71862.66k    71786.19k

Raspberry Pi 3 B rev 1.2 (BCM2835 - ARM Cortex-A53 - 1.2Ghz)

model name      : ARMv7 Processor rev 4 (v7l)
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
$ gocryptfs -speed
AES-GCM-256-OpenSSL       17.13 MB/s    (selected in auto mode)
AES-GCM-256-Go             5.27 MB/s
AES-SIV-512-Go             4.31 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                 16 bytes     64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305    30020.39k    63560.13k    77169.32k    82019.33k    83536.55k    83645.78k
aes-256-gcm          16137.38k    19500.97k    20668.33k    20986.20k    21127.17k    21135.36k

Raspberry Pi B rev 2 (BCM2835 - ARM 11 - 700Mhz)

model name      : ARMv6-compatible processor rev 7 (v6l)
Features        : half thumb fastmult vfp edsp java tls
$ gocryptfs -speed
AES-GCM-256-OpenSSL        4.80 MB/s    (selected in auto mode)
AES-GCM-256-Go             1.85 MB/s
AES-SIV-512-Go             1.50 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                  16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305     8090.97k    18202.65k    23222.03k    24960.34k    25666.44k    24958.29k
aes-256-gcm           4525.91k    6268.65k     6972.36k     7141.38k     7230.33k     7150.88k

@rfjakob
Copy link
Owner

rfjakob commented Feb 29, 2020

Awesome, thanks! Added to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks .

rfjakob added a commit that referenced this issue Feb 29, 2020
@rfjakob
Copy link
Owner

rfjakob commented Feb 29, 2020

I have added an XChaCha20-Poly1305 benchmark to gocryptfs -speed in the xchacha20 branch. On my PC, the results look very promising, with xchacha20 being almost as fast as hardware-accelerated AES-GCM:

$ gocryptfs -speed
AES-GCM-256-OpenSSL 	 585.92 MB/s	
AES-GCM-256-Go      	 899.28 MB/s	(selected in auto mode)
AES-SIV-512-Go      	 164.05 MB/s	
XChaCha20-Poly1305-Go	 773.27 MB/s	

HOWEVER, looking at https://github.com/golang/crypto/tree/master/chacha20poly1305 , there only seems to an optimized assembly version for amd64 (xxx_amd64.s).

Could you run gocryptfs -speed from the xchacha20 branch on one of your ARM devices, so see how the fast Go implementation is there?

EDIT: But there is a chacha_arm64.s here: https://github.com/golang/crypto/tree/master/chacha20

@rfjakob
Copy link
Owner

rfjakob commented Feb 29, 2020

I have compiled that branch for Armv7, binary: gocryptfs.xchacha20.armv7.tar.gz

rfjakob added a commit that referenced this issue Feb 29, 2020
@DavyLandman
Copy link
Author

DavyLandman commented Mar 2, 2020

Thanks for the binary:

on the Odroid XU4:

$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL         N/A
AES-GCM-256-Go            17.04 MB/s    (selected in auto mode)
AES-SIV-512-Go            14.79 MB/s
XChaCha20-Poly1305-Go     23.37 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL       41.12 MB/s    (selected in auto mode)
AES-GCM-256-Go            16.92 MB/s
AES-SIV-512-Go            19.10 MB/s

The other ARM devices I have to try later.

Pitty golang has not added asm chacha versions yet, maybe the same openssl bridge for speed?

@rfjakob
Copy link
Owner

rfjakob commented Mar 2, 2020

I had the same idea, unfortunately, openssl does not have xchacha20 yet: openssl/openssl#5523

They do have chacha20, but this cannot be used with random nonces (too high risk of collisions)

@DavyLandman
Copy link
Author

that's a shame, could you add an option to also bench chacha20 case?

Just to get a sense of the impact of non-asm version, it might be that chacha20 is faster than xchacha20?

@DavyLandman
Copy link
Author

I'm reading a bit, and the size & message restrictions on chacha20 are not that bad right?

https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html
https://libsodium.gitbook.io/doc/advanced/stream_ciphers/chacha20

@rfjakob
Copy link
Owner

rfjakob commented Mar 2, 2020

The table on https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html is very nice!

The problem with ChaCha20: Max 200 000 messages. In gocryptfs, one "message" is a 4kiB data block, so that's a limit of 800 GiB data written over the lifetime of the filesystem!

@DavyLandman
Copy link
Author

The normal one in go (and I think also openssl) is the second row in that table.

@lechner
Copy link
Contributor

lechner commented Mar 3, 2020

Hi, I previously ported Gocryptfs to use wolfSSL. Does the code below allow the use of a random nonce with ChaCha20?

https://github.com/wolfSSL/wolfssl/blob/master/wolfcrypt/src/chacha.c#L111

@rfjakob
Copy link
Owner

rfjakob commented Mar 3, 2020

@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) .

And also, https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305 says,

XChaCha20-Poly1305 is a ChaCha20-Poly1305 variant that takes a longer nonce, suitable to be generated randomly without risk of collisions. It should be preferred when nonce uniqueness cannot be trivially ensured, or whenever nonces are randomly generated.

so I'd rather not go with ChaCha20.

@lechner Yes it does, but only 96 bits according to the function comment

this version uses the typical AEAD 96 bit nonce

@DavyLandman
Copy link
Author

DavyLandman commented Mar 4, 2020

@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) .

I was just reading the RFC5379, and it specifically notes that a random nonce is not needed, just as long as it is unique, a simple counter is just as secure.

4. Security Considerations

The most important security consideration in implementing this
document is the uniqueness of the nonce used in ChaCha20. Counters
and LFSRs are both acceptable ways of generating unique nonces

Also discussed on Crypto SE.

Assuming 4KiB sectors, you would have to write (2^96 * 4 KiB) bytes before this counter overflows. Which is after 324.518.554 yottabytes. That should be good enough right ? ;)

@DavyLandman
Copy link
Author

Was reading SE and per chance a relevant question popped up: https://crypto.stackexchange.com/questions/77982/how-to-generate-a-nonce-for-chacha20-poly1305

@rfjakob
Copy link
Owner

rfjakob commented Mar 6, 2020

Using a counter as the nonce would be nice, unfortunately, I don't think we can. There may be multiple gocryptfs processes writing to the folder at the same time (use case: encrypted folder on shared network drive).

@DavyLandman
Copy link
Author

DavyLandman commented Mar 7, 2020 via email

@rfjakob
Copy link
Owner

rfjakob commented Mar 8, 2020

I have added the gocryptfs.xchacha20.armv7 results to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks .

I'm afraid using XChaCha20-Poly1305-Go does not make sense, as it is slower than AES-GCM-256-OpenSSL.

We can revisit when openssl gets XChaCha20.

@rfjakob rfjakob closed this as completed Mar 8, 2020
@rfjakob rfjakob added the wontfix label Mar 8, 2020
@rfjakob rfjakob reopened this Apr 7, 2020
@rfjakob
Copy link
Owner

rfjakob commented Apr 7, 2020

Actually, on a Raspberry Pi 4 with Ubuntu 64 bit, things look differently:

$ ./gocryptfs -speed
AES-GCM-256-OpenSSL 	  21.50 MB/s	(selected in auto mode)
AES-GCM-256-Go      	  21.75 MB/s	
AES-SIV-512-Go      	  17.64 MB/s	
XChaCha20-Poly1305-Go	 109.78 MB/s	

@DavyLandman
Copy link
Author

I just ran it on my rpi3:

$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL         N/A
AES-GCM-256-Go             4.86 MB/s    (selected in auto mode)
AES-SIV-512-Go             4.53 MB/s
XChaCha20-Poly1305-Go      9.26 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL       16.83 MB/s    (selected in auto mode)
AES-GCM-256-Go             5.24 MB/s
AES-SIV-512-Go             4.20 MB/s

@rfjakob
Copy link
Owner

rfjakob commented Apr 8, 2020

Needs a 64 bit gocryptfs to be fast. Go has optimized xchacha assembly for arm64.

@DavyLandman
Copy link
Author

Ah, yes, okay so it's for the zoo then ;)

@DavyLandman
Copy link
Author

With quite some work you could link/cgo these asm versions: https://github.com/floodyberry/chacha-opt/tree/master/app/extensions/chacha

@DavyLandman
Copy link
Author

DavyLandman commented Aug 30, 2021

xu4 (armv7, running on tmpfs)

normal:

WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 14.4193 s, 18.2 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 9.40416 s, 27.9 MB/s
UNTAR: 108.232
MD5:   66.366
LS:    13.074
RM:    17.661

xchacha:

WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 9.28123 s, 28.2 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 6.03132 s, 43.5 MB/s
UNTAR: 87.695
MD5:   46.046
LS:    12.658
RM:    18.049

@durdin85
Copy link

Just tested it on armv7l, orange pi one, It is now the fastest from the go-implementations, but what is bit interesting is that openssl is still fastest. I've built the binary myself from git sources. btw for some reason the compiled binary was named "v2" instead of "gocryptfs", but I haven't figured out why, maybe the old go version? Anyway, this is bit outdated hardware now, so no miracles are expected.

gocryptfs v2.1-44-g4e3b770-dirty; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.11.6 linux/arm
AES-GCM-256-OpenSSL 	  14.76 MB/s	(selected in auto mode)
AES-GCM-256-Go      	   4.11 MB/s	
AES-SIV-512-Go      	   3.51 MB/s	
XChaCha20-Poly1305-Go	   8.61 MB/s	(use via -xchacha flag)

the benchmark won't fit into tmpfs, so it runs from sd card (-xchacha):

WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 62.7842 s, 4.2 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 29.1741 s, 9.0 MB/s
UNTAR: 476.098
MD5:   145.932
LS:    24.879
RM:    26.344

@rfjakob
Copy link
Owner

rfjakob commented Aug 30, 2021

The old go compiler may hurt you also in performance, can you see if the binary i posted above gives better results?

@durdin85
Copy link

I've though the same so I've now tested using go 1.15 and results seems better:

$ ./gocryptfs -speed
gocryptfs v2.1-44-g4e3b770; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.15.9 linux/arm
AES-GCM-256-OpenSSL       14.42 MB/s    (selected in auto mode)
AES-GCM-256-Go             3.72 MB/s
AES-SIV-512-Go             3.46 MB/s
XChaCha20-Poly1305-Go     11.39 MB/s    (use via -xchacha flag)

and this is using downloaded binary (go 1.17):

$ ./gocryptfs -speed
gocryptfs v2.1-37-g91d3b30 without_openssl; go-fuse v2.1.1-0.20210825070001-74a933d6e856; 2021-08-27 go1.17 linux/arm
AES-GCM-256-OpenSSL         N/A
AES-GCM-256-Go             4.29 MB/s   (selected in auto mode)
AES-SIV-512-Go             3.71 MB/s
XChaCha20-Poly1305-Go     11.43 MB/s   (use via -xchacha flag)

@rfjakob
Copy link
Owner

rfjakob commented Aug 30, 2021

@DavyLandman did the "normal" run have OpenSSL support? (the binaries I posted do not). Also, revisiting this:

I just want to bring back a single point, I proposed chacha20-poy1305 for devices that do not have crypto-extensions, so armv8 devices are not part of that bunch.

I read through the benchmarks in this ticket and in the wiki again, and, unfortunately, 32-bit ARM (armv7) devices don't gain anything with this iteration of xchacha support. On 32-bit ARM, AES-GCM-256-OpenSSL is faster than XChaCha20-Poly1305-Go, because OpenSSL has optimized assembly, and Go does not.

Using -xchacha now makes sense on:

  • amd64 (=Intel/AMD 64 bit) CPUs that lack AES acceleration. These are mostly older and low power CPUs.
  • arm64 (=ARM 64 bit) CPUs that lack AES acceleration. That's most of them.

On these, however, something else will be faster:

  • amd64 with AES accelerationn: AES-GCM-256-Go
  • armv7: AES-GCM-256-OpenSSL

@DavyLandman
Copy link
Author

DavyLandman commented Aug 30, 2021 via email

@DavyLandman
Copy link
Author

DavyLandman commented Aug 30, 2021

So, I just got out my odroid n2 (which has a very beafy arm64 with crypto extensions:)

odroidn2:~:# cat /proc/cpuinfo
processor       : 0
model name      : ARMv8 Processor rev 4 (v8l)
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4
./gocryptfs -speed
gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.
16.2 linux/arm64
AES-GCM-256-OpenSSL      282.90 MB/s
AES-GCM-256-Go           580.28 MB/s    (selected in auto mode)
AES-SIV-512-Go            88.85 MB/s
XChaCha20-Poly1305-Go    188.07 MB/s    (use via -xchacha flag)

benchmark:

Testing gocryptfs   at /tmp/benchmark.bash.cJW: gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.16.2 lin
ux/arm64
/tmp/benchmark.bash.cJW.mnt is a mountpoint
Downloading linux-3.0.tar.gz
/tmp/linux-3.0.tar.gz                100%[===================================================================>]  92.20M  24.1MB/s    in 4.2s
2021-08-30 20:44:52 URL:https://cdn.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.gz [96675825/96675825] -> "/tmp/linux-3.0.tar.gz" [1]
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 1.64059 s, 160 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 0.711058 s, 369 MB/s
UNTAR: 23.326
MD5:   8.565
LS:    3.315
RM:    4.802
root@odroidn2:/tmp/gocryptfs# ./benchmark.bash -xchacha
Testing gocryptfs -xchacha  at /tmp/benchmark.bash.P0M: gocryptfs v2.1-45-gc505e73; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-08-30 go1.
16.2 linux/arm64
/tmp/benchmark.bash.P0M.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 2.26622 s, 116 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 1.32793 s, 197 MB/s
UNTAR: 25.113
MD5:   11.183
LS:    3.027

@DavyLandman
Copy link
Author

@DavyLandman did the "normal" run have OpenSSL support? (the binaries I posted do not). Also, revisiting this:

here is the one that came with the distro (and has openssl enabled).

Testing gocryptfs   at /tmp/benchmark.bash.O7d: gocryptfs 1.6.1; go-fuse 0.0~git20190214.58dcd77; 2019-03-11 go1.11.5
/tmp/benchmark.bash.O7d.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 7.2748 s, 36.0 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 4.49824 s, 58.3 MB/s
UNTAR: 82.685
MD5:   43.736
LS:    5.387
RM:    17.843

so indeed, for armv7, not an improvement.

rfjakob added a commit that referenced this issue Sep 2, 2021
$ ./gocryptfs -speed
gocryptfs v2.1-56-gdb1466f-dirty.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.17 linux/amd64
AES-GCM-256-OpenSSL       	 529.53 MB/s
AES-GCM-256-Go            	 833.85 MB/s	(selected in auto mode)
AES-SIV-512-Go            	 155.27 MB/s
XChaCha20-Poly1305-Go     	 715.33 MB/s	(use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL	 468.94 MB/s

#452
@rfjakob
Copy link
Owner

rfjakob commented Sep 2, 2021

Dear armv7 users, I have something brewing in the "stupidchacha" branch. Could somebody build it on armv7:

git clone https://github.com/rfjakob/gocryptfs.git
cd gocryptfs
git checkout  stupidchacha
./build.bash # yes, must be with openssl

And then run

./gocryptfs -speed

?

@DavyLandman
Copy link
Author

DavyLandman commented Sep 2, 2021

Nice one 👏🏼 @rfjakob looks like openssl contains arm optimized xchacha indeed:

gocryptfs v2.1-57-g54e56ab.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.16.7 linux/arm
AES-GCM-256-OpenSSL               41.85 MB/s    (selected in auto mode)
AES-GCM-256-Go                    15.87 MB/s
AES-SIV-512-Go                    16.52 MB/s
XChaCha20-Poly1305-Go             33.77 MB/s    (use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL        75.68 MB/s

(on the odroid xu4)

@rfjakob
Copy link
Owner

rfjakob commented Sep 2, 2021

Ok, not bad! Thanks!

However, the "openssl speed" number you posted for the xu4 show 306MB/s for blocksize 1024 (gocryptfs uses 4k blocks, 1k should be comparable).

In other words, we lose a factor of 4 somewhere?

@DavyLandman
Copy link
Author

DavyLandman commented Sep 2, 2021

Ok, not bad! Thanks!

However, the "openssl speed" number you posted for the xu4 show 306MB/s for blocksize 1024 (gocryptfs uses 4k blocks, 1k should be comparable).

In other words, we lose a factor of 4 somewhere?

Sorry for the confusion, that was the n2 (armv8 with AES extensions).

the xu4 reported this in the benchmark:

Testing gocryptfs   at /tmp/benchmark.bash.O7d: gocryptfs 1.6.1; go-fuse 0.0~git20190214.58dcd77; 2019-03-11 go1.11.5
/tmp/benchmark.bash.O7d.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 7.2748 s, 36.0 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 4.49824 s, 58.3 MB/s
UNTAR: 82.685
MD5:   43.736
LS:    5.387
RM:    17.843

(update scratch this comment, I'm mixing stuff)

@rfjakob
Copy link
Owner

rfjakob commented Sep 2, 2021

PS: openssl does not have xchacha. In "XChaCha20-Poly1305-OpenSSL" , the "X" is from the Go crypto library and "ChaCha20-Poly1305" is from openssl. So it's expected to be somewhat slower than straight openssl chacha20-poly1305.

@DavyLandman
Copy link
Author

ah, so you run the first block manually? and then give it over to openssl to continue?

@rfjakob
Copy link
Owner

rfjakob commented Sep 2, 2021

The 306 MB/s was from #452 (comment)

@DavyLandman
Copy link
Author

ah, true, just ran it again, and indeed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305    66557.18k   125680.49k   275370.18k   302992.35k   302139.10k   310047.96k

@DavyLandman
Copy link
Author

Is this marshalling overhead for cgo? If I remember correclty there are some very specific ways to use c libraries in go to avoid memory copying? But my cgo is a bit rusty currently.

@DavyLandman
Copy link
Author

DavyLandman commented Sep 2, 2021

@rfjakob if you make a version on the branch that is just purely piping chacha20-poly1305 from openssl (so removing the X part), we could check what happens there? I'd be happy to compile and run the -speed on the xu4 again.

@durdin85
Copy link

durdin85 commented Sep 3, 2021

Dear armv7 users, I have something brewing in the "stupidchacha" branch. Could somebody build it on armv7:

git clone https://github.com/rfjakob/gocryptfs.git
cd gocryptfs
git checkout  stupidchacha
./build.bash # yes, must be with openssl

And then run

./gocryptfs -speed

?

I've got some troubles building it/running against openssl 1.1.1d, but once updated to 1.1.1k it went fine, and the speed benefit is clearly visible:

$ ./gocryptfs -speed
gocryptfs v2.1-57-g54e56ab.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-03 g
o1.15.9 linux/arm
AES-GCM-256-OpenSSL       	  12.99 MB/s	(selected in auto mode)
AES-GCM-256-Go            	   3.97 MB/s	
AES-SIV-512-Go            	   3.44 MB/s	
XChaCha20-Poly1305-Go     	  11.33 MB/s	(use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL	  36.76 MB/s	

@rfjakob
Copy link
Owner

rfjakob commented Sep 4, 2021

If you "git pull" now, you should see double-digit % improvements

@DavyLandman
Copy link
Author

DavyLandman commented Sep 4, 2021

If you "git pull" now, you should see double-digit % improvements

gocryptfs v2.1-68-gedf9d4c.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-04 go1.16.7 linux/arm
AES-GCM-256-OpenSSL               56.84 MB/s    (selected in auto mode)
AES-GCM-256-Go                    16.61 MB/s
AES-SIV-512-Go                    16.49 MB/s
XChaCha20-Poly1305-Go             39.08 MB/s    (use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL       141.82 MB/s

still no 300MB/s but quite an improvement indeed. Looking at the commits, it's all about cgo overhead? :( although these insights might also improve the AES-GCM via OpenSSSL performance?

in case you are interested:

/app/internal/speed # go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-8                       17583             71483 ns/op          57.30 MB/s
BenchmarkStupidGCMDecrypt-8                17916             66884 ns/op          61.24 MB/s
BenchmarkGoGCM-8                            5727            215568 ns/op          19.00 MB/s
BenchmarkGoGCMDecrypt-8                     5780            205670 ns/op          19.92 MB/s
BenchmarkAESSIV-8                           5294            236888 ns/op          17.29 MB/s
BenchmarkAESSIVDecrypt-8                    5380            226851 ns/op          18.06 MB/s
BenchmarkXchacha-8                         10000            101972 ns/op          40.17 MB/s
BenchmarkXchachaDecrypt-8                  11912             99798 ns/op          41.04 MB/s
BenchmarkStupidXchacha-8                   43904             30586 ns/op         133.92 MB/s
BenchmarkStupidXchachaDecrypt-8            38722             27773 ns/op         147.48 MB/s
BenchmarkStupidChacha-8                    49257             26900 ns/op         152.27 MB/s
BenchmarkStupidChachaDecrypt-8             51540             25752 ns/op         159.05 MB/s
PASS
ok      github.com/rfjakob/gocryptfs/v2/internal/speed  20.293s

all on the trusty old xu4 ;_

@DavyLandman
Copy link
Author

You could also consider either porting the arm specific asm from openssl, or trying to get the golang team to take up the assembly versions of chacha20-poly1305?

Here is the source: https://github.com/openssl/openssl/blob/master/crypto/chacha/asm/chacha-armv4.pl

interestingly it works for armv4+.

@rfjakob
Copy link
Owner

rfjakob commented Sep 7, 2021

it's all about cgo overhead? :( although these insights might also improve the AES-GCM via OpenSSSL performance?

Yes, it's mostly C call overhead ( https://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/ ). And the improvement is to call only once into C and do all needed openssl calls there ( b3e5ed8 ).

Yes, AES-GCM sees an improvement as well ( commit 275ebc1 ):

I managed to get an 32-bit arm docker container running on my rpi4, branch stupidchacha (currently at edf9d4c):

root@f13b37d6334c:~/gocryptfs/internal/speed# go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-4              	   14812	     80181 ns/op	  51.08 MB/s
BenchmarkStupidGCMDecrypt-4       	   14978	     79943 ns/op	  51.24 MB/s
BenchmarkGoGCM-4                  	    4616	    233316 ns/op	  17.56 MB/s
BenchmarkGoGCMDecrypt-4           	    4884	    232717 ns/op	  17.60 MB/s
BenchmarkAESSIV-4                 	    4827	    242162 ns/op	  16.91 MB/s
BenchmarkAESSIVDecrypt-4          	    4678	    241086 ns/op	  16.99 MB/s
BenchmarkXchacha-4                	   10000	    108352 ns/op	  37.80 MB/s
BenchmarkXchachaDecrypt-4         	   10000	    108356 ns/op	  37.80 MB/s
BenchmarkStupidXchacha-4          	   49172	     23936 ns/op	 171.13 MB/s
BenchmarkStupidXchachaDecrypt-4   	   49736	     24128 ns/op	 169.76 MB/s
BenchmarkStupidChacha-4           	   57219	     20778 ns/op	 197.13 MB/s
BenchmarkStupidChachaDecrypt-4    	   57183	     20882 ns/op	 196.15 MB/s
PASS
ok  	github.com/rfjakob/gocryptfs/v2/internal/speed	16.650s

Current master without the changes:

root@f13b37d6334c:~/gocryptfs/internal/speed# git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
root@f13b37d6334c:~/gocryptfs/internal/speed# go test -bench .
goos: linux
goarch: arm
pkg: github.com/rfjakob/gocryptfs/v2/internal/speed
BenchmarkStupidGCM-4   	    9729	    109484 ns/op	  37.41 MB/s
BenchmarkGoGCM-4       	    4522	    239585 ns/op	  17.10 MB/s
BenchmarkAESSIV-4      	    4574	    250865 ns/op	  16.33 MB/s
PASS
ok  	github.com/rfjakob/gocryptfs/v2/internal/speed	3.395s

I'll also attach the cpu profiles for later reference.

You could also consider either porting the arm specific asm from openssl, or trying to get the golang team to take up the assembly versions of chacha20-poly1305?

Now that I am used to Go, writing C code already feels like juggling chainsaws. I will not touch asm :)
But looking at the xchacha.pdf cpu profile, the Go parts runs really fast and does not seem to slow us down (HChaCha20).

BTW how XChaCha20-Poly1305-OpenSSL works is this: The HChaCha20 function (from Go stdlib) mixes key and nonce to get a new key for each encryption, which is normal ChaCha20-Poly1305, so we can call OpenSSL at this point:

key, nonce -> Go HChaCha20 -> key2, nonce2 -> OpenSSL ChaCha20-Poly1305

chacha.pdf
xchacha.pdf

@DavyLandman
Copy link
Author

DavyLandman commented Sep 7, 2021

Now that I am used to Go, writing C code already feels like juggling chainsaws. I will not touch asm :)

That seems wise

But looking at the xchacha.pdf cpu profile, the Go parts runs really fast and does not seem to slow us down (HChaCha20).

Indeed, quite optimal. a pitty about the overhead for cgo. but still, much better then where we started.

BTW how XChaCha20-Poly1305-OpenSSL works is this: The HChaCha20 function (from Go stdlib) mixes key and nonce to get a new key for each encryption, which is normal ChaCha20-Poly1305, so we can call OpenSSL at this point:

key, nonce -> Go HChaCha20 -> key2, nonce2 -> OpenSSL ChaCha20-Poly1305

Thanks for the refresher 👍🏼 (and also, creative solution 👏🏼 )

rfjakob added a commit that referenced this issue Sep 7, 2021
$ ./gocryptfs -speed
gocryptfs v2.1-56-gdb1466f-dirty.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.17 linux/amd64
AES-GCM-256-OpenSSL       	 529.53 MB/s
AES-GCM-256-Go            	 833.85 MB/s	(selected in auto mode)
AES-SIV-512-Go            	 155.27 MB/s
XChaCha20-Poly1305-Go     	 715.33 MB/s	(use via -xchacha flag)
XChaCha20-Poly1305-OpenSSL	 468.94 MB/s

#452
rfjakob added a commit that referenced this issue Sep 7, 2021
Maybe interesting for people following
#452
@rfjakob
Copy link
Owner

rfjakob commented Sep 30, 2021

gocryptfs v2.2.0 has been released, this is done.

@rfjakob rfjakob closed this as completed Sep 30, 2021
@giraffe2k giraffe2k mentioned this issue Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests