-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: encryption primitives for devices without AES cpu instructions #452
Comments
Hi, would you mind running I'd like to add it to our CPU zoo at ( https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks ) |
I've taken all different kind of ARM devices I have: Odroid XU4 (Exynos 5422 - ARM Cortex-A15 - 2 GHz)
Raspberry Pi 3 B rev 1.2 (BCM2835 - ARM Cortex-A53 - 1.2Ghz)
Raspberry Pi B rev 2 (BCM2835 - ARM 11 - 700Mhz)
|
Awesome, thanks! Added to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks . |
I have added an XChaCha20-Poly1305 benchmark to
HOWEVER, looking at https://github.com/golang/crypto/tree/master/chacha20poly1305 , there only seems to an optimized assembly version for amd64 (xxx_amd64.s). Could you run EDIT: But there is a chacha_arm64.s here: https://github.com/golang/crypto/tree/master/chacha20 |
I have compiled that branch for Armv7, binary: gocryptfs.xchacha20.armv7.tar.gz |
Thanks for the binary: on the Odroid XU4:
The other ARM devices I have to try later. Pitty golang has not added asm chacha versions yet, maybe the same openssl bridge for speed? |
I had the same idea, unfortunately, openssl does not have xchacha20 yet: openssl/openssl#5523 They do have chacha20, but this cannot be used with random nonces (too high risk of collisions) |
that's a shame, could you add an option to also bench chacha20 case? Just to get a sense of the impact of non-asm version, it might be that chacha20 is faster than xchacha20? |
I'm reading a bit, and the size & message restrictions on chacha20 are not that bad right? https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html |
The table on https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html is very nice! The problem with ChaCha20: |
The normal one in go (and I think also openssl) is the second row in that table. |
Hi, I previously ported Gocryptfs to use wolfSSL. Does the code below allow the use of a random nonce with https://github.com/wolfSSL/wolfssl/blob/master/wolfcrypt/src/chacha.c#L111 |
@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) . And also, https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305 says,
so I'd rather not go with ChaCha20. @lechner Yes it does, but only 96 bits according to the function comment
|
I was just reading the RFC5379, and it specifically notes that a random nonce is not needed, just as long as it is unique, a simple counter is just as secure.
Also discussed on Crypto SE. Assuming 4KiB sectors, you would have to write (2^96 * 4 KiB) bytes before this counter overflows. Which is after 324.518.554 yottabytes. That should be good enough right ? ;) |
Was reading SE and per chance a relevant question popped up: https://crypto.stackexchange.com/questions/77982/how-to-generate-a-nonce-for-chacha20-poly1305 |
Using a counter as the nonce would be nice, unfortunately, I don't think we can. There may be multiple gocryptfs processes writing to the folder at the same time (use case: encrypted folder on shared network drive). |
Ah, my bad, I hope something like a cluster offset/index or inode index would suffice.
Must be too simplistic of me.
…On Fri, Mar 6, 2020, 07:39 rfjakob ***@***.***> wrote:
Using a counter as the nonce would be nice, unfortunately, I don't think
we can. There may be multiple gocryptfs processes writing to the folder at
the same time (use case: encrypted folder on shared network drive).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#452?email_source=notifications&email_token=AABL3E3BKJDIOXZZXZQOS5LRGCLCVA5CNFSM4KQZFI42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOAIAOY#issuecomment-595623995>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABL3E6XALNDW342FOGWNBTRGCLCVANCNFSM4KQZFI4Q>
.
|
I have added the gocryptfs.xchacha20.armv7 results to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks . I'm afraid using XChaCha20-Poly1305-Go does not make sense, as it is slower than AES-GCM-256-OpenSSL. We can revisit when openssl gets XChaCha20. |
Actually, on a Raspberry Pi 4 with Ubuntu 64 bit, things look differently:
|
I just ran it on my rpi3:
|
Needs a 64 bit gocryptfs to be fast. Go has optimized xchacha assembly for arm64. |
Ah, yes, okay so it's for the zoo then ;) |
With quite some work you could link/cgo these asm versions: https://github.com/floodyberry/chacha-opt/tree/master/app/extensions/chacha |
xu4 (armv7, running on tmpfs)normal:
xchacha:
|
Just tested it on armv7l, orange pi one, It is now the fastest from the go-implementations, but what is bit interesting is that openssl is still fastest. I've built the binary myself from git sources. btw for some reason the compiled binary was named "v2" instead of "gocryptfs", but I haven't figured out why, maybe the old go version? Anyway, this is bit outdated hardware now, so no miracles are expected.
the benchmark won't fit into tmpfs, so it runs from sd card (-xchacha):
|
The old go compiler may hurt you also in performance, can you see if the binary i posted above gives better results? |
I've though the same so I've now tested using go 1.15 and results seems better:
and this is using downloaded binary (go 1.17):
|
@DavyLandman did the "normal" run have OpenSSL support? (the binaries I posted do not). Also, revisiting this:
I read through the benchmarks in this ticket and in the wiki again, and, unfortunately, 32-bit ARM (armv7) devices don't gain anything with this iteration of xchacha support. On 32-bit ARM, AES-GCM-256-OpenSSL is faster than XChaCha20-Poly1305-Go, because OpenSSL has optimized assembly, and Go does not. Using
On these, however, something else will be faster:
|
Ah, my bad, indeed these were the binaries you posted. I'm gonna get my
armv8 out and try it there as well.
…On Mon, Aug 30, 2021, 19:56 rfjakob ***@***.***> wrote:
@DavyLandman <https://github.com/DavyLandman> did the "normal" run have
OpenSSL support? (the binaries I posted do not). Also, revisiting this:
I just want to bring back a single point, I proposed chacha20-poy1305 for
devices that do not have crypto-extensions, so armv8 devices are not part
of that bunch.
I read through the benchmarks in this ticket and in the wiki
<https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks> again, and,
unfortunately, 32-bit ARM (armv7) devices don't gain anything with this
iteration of xchacha support. On 32-bit ARM, AES-GCM-256-OpenSSL is faster
than XChaCha20-Poly1305-Go, because OpenSSL has optimized assembly, and Go
does not.
Using -xchacha now makes sense on:
- amd64 (=Intel/AMD 64 bit) CPUs that lack AES acceleration. These are
mostly older and low power CPUs.
- arm64 (=ARM 64 bit) CPUs that lack AES acceleration. That's most of
them.
On these, however, something else will be faster:
- amd64 with AES accelerationn: AES-GCM-256-Go
- armv7: AES-GCM-256-OpenSSL
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#452 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABL3E3H3H3GZBKGLNXYKJTT7PA6NANCNFSM4KQZFI4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
So, I just got out my odroid n2 (which has a very beafy arm64 with crypto extensions:)
benchmark:
|
here is the one that came with the distro (and has openssl enabled).
so indeed, for armv7, not an improvement. |
$ ./gocryptfs -speed gocryptfs v2.1-56-gdb1466f-dirty.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.17 linux/amd64 AES-GCM-256-OpenSSL 529.53 MB/s AES-GCM-256-Go 833.85 MB/s (selected in auto mode) AES-SIV-512-Go 155.27 MB/s XChaCha20-Poly1305-Go 715.33 MB/s (use via -xchacha flag) XChaCha20-Poly1305-OpenSSL 468.94 MB/s #452
Dear armv7 users, I have something brewing in the "stupidchacha" branch. Could somebody build it on armv7:
And then run
? |
Nice one 👏🏼 @rfjakob looks like openssl contains arm optimized xchacha indeed:
(on the odroid xu4) |
Ok, not bad! Thanks! However, the "openssl speed" number you posted for the xu4 show 306MB/s for blocksize 1024 (gocryptfs uses 4k blocks, 1k should be comparable). In other words, we lose a factor of 4 somewhere? |
Sorry for the confusion, that was the n2 (armv8 with AES extensions). the xu4 reported this in the benchmark:
(update scratch this comment, I'm mixing stuff) |
PS: openssl does not have xchacha. In "XChaCha20-Poly1305-OpenSSL" , the "X" is from the Go crypto library and "ChaCha20-Poly1305" is from openssl. So it's expected to be somewhat slower than straight openssl chacha20-poly1305. |
ah, so you run the first block manually? and then give it over to openssl to continue? |
The 306 MB/s was from #452 (comment) |
ah, true, just ran it again, and indeed.
|
Is this marshalling overhead for cgo? If I remember correclty there are some very specific ways to use c libraries in go to avoid memory copying? But my cgo is a bit rusty currently. |
@rfjakob if you make a version on the branch that is just purely piping chacha20-poly1305 from openssl (so removing the X part), we could check what happens there? I'd be happy to compile and run the |
I've got some troubles building it/running against openssl 1.1.1d, but once updated to 1.1.1k it went fine, and the speed benefit is clearly visible:
|
If you "git pull" now, you should see double-digit % improvements |
still no 300MB/s but quite an improvement indeed. Looking at the commits, it's all about cgo overhead? :( although these insights might also improve the AES-GCM via OpenSSSL performance? in case you are interested:
all on the trusty old xu4 ;_ |
You could also consider either porting the arm specific asm from openssl, or trying to get the golang team to take up the assembly versions of chacha20-poly1305? Here is the source: https://github.com/openssl/openssl/blob/master/crypto/chacha/asm/chacha-armv4.pl interestingly it works for armv4+. |
Yes, it's mostly C call overhead ( https://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/ ). And the improvement is to call only once into C and do all needed openssl calls there ( b3e5ed8 ). Yes, AES-GCM sees an improvement as well ( commit 275ebc1 ): I managed to get an 32-bit arm docker container running on my rpi4, branch stupidchacha (currently at edf9d4c):
Current master without the changes:
I'll also attach the cpu profiles for later reference.
Now that I am used to Go, writing C code already feels like juggling chainsaws. I will not touch asm :) BTW how XChaCha20-Poly1305-OpenSSL works is this: The HChaCha20 function (from Go stdlib) mixes key and nonce to get a new key for each encryption, which is normal ChaCha20-Poly1305, so we can call OpenSSL at this point:
|
That seems wise
Indeed, quite optimal. a pitty about the overhead for cgo. but still, much better then where we started.
Thanks for the refresher 👍🏼 (and also, creative solution 👏🏼 ) |
$ ./gocryptfs -speed gocryptfs v2.1-56-gdb1466f-dirty.stupidchacha; go-fuse v2.1.1-0.20210825171523-3ab5d95a30ae; 2021-09-02 go1.17 linux/amd64 AES-GCM-256-OpenSSL 529.53 MB/s AES-GCM-256-Go 833.85 MB/s (selected in auto mode) AES-SIV-512-Go 155.27 MB/s XChaCha20-Poly1305-Go 715.33 MB/s (use via -xchacha flag) XChaCha20-Poly1305-OpenSSL 468.94 MB/s #452
Maybe interesting for people following #452
gocryptfs v2.2.0 has been released, this is done. |
Hi @rfjakob,
Thank you for this great application! The reverse mode is what really sets it apart from other options.
I checked the issues, and it doesn't seem to be discussed yet, but what do you think about adding support for a different collection of encryption primitives that are better suited for more low-end devices?
I'm running gocryptfs on a few ARMv6/7 based NAS machines, they are nice: low energy, and quite fast. But they lack native AES instructions, my fastest ARM device (Odroid XU4) maxes out at 40MB/s, while for example the raspberry-pi's and friends are quite a bit slower (rpi1 is at 15MB/s).
Maybe Google Adiantum (also added to linux kernel 5.0 for cryptfs) is a nice fit, Adiantum is based on XChaCha12 and Poly1305 and is roughly 5 quicker than AES-XTS for devices without AES instructions.
For the reverse mode maybe something based on ChaCha20Poly1305?
Just for comparison, on my Odroid XU4, ChaCha20Poly1305 runs at 320MB/s, on my RPi1 it gets close to 40MB/s.
So I'm just wondering what your view is on this topic.
Cheers,
Davy
The text was updated successfully, but these errors were encountered: