Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BLAKE2b ID hash functions #1800

Closed
enkore opened this issue Nov 3, 2016 · 14 comments
Closed

Add BLAKE2b ID hash functions #1800

enkore opened this issue Nov 3, 2016 · 14 comments
Assignees
Milestone

Comments

@enkore
Copy link
Contributor

enkore commented Nov 3, 2016

  • New key types which use BLAKE2b-256 for ID hash (and MAC?).
  • authenticated: no encryption
  • repokey|keyfile-blake2: the usual encryption.

From #1044


💰 there is a bounty for this

@enkore enkore added this to the 1.1 - near future goals milestone Nov 3, 2016
@enkore enkore mentioned this issue Nov 3, 2016
7 tasks
@ThomasWaldmann
Copy link
Member

For the encryption=none mode and to avoid having to do big changes in little time, I think we should just use the unauthenticated hash for 1.1 - otherwise we'll need a keyfile/repokey for that mode also (to store the id-mac key).

In general, doing big changes would create lots of conflicts in the crypto-aead pull request, I'ld like to avoid that.

So, is it maybe easier to read the desired hash / mac type from repo config? And just create new repos with configured blake2b hash / mac, while defaulting to sha256 / hmac-sha256?

@enkore
Copy link
Contributor Author

enkore commented Nov 4, 2016

Portable vs native/sse doesn't make that big of a difference:

image

3.7 CPB for the sse version, and 3.9 CPB for the ref version at 1.5 KB message length.

gcc version 6.2.1 20160830 (GCC)

But maybe I did something wrong here. So the diff on BLAKE2 for that:

diff --git a/bench/do.gplot b/bench/do.gplot
index 6a6832a..558d5da 100644
--- a/bench/do.gplot
+++ b/bench/do.gplot
@@ -1,8 +1,8 @@
-maxx = 256 
+maxx = 512
 set xrange [1:maxx]
 set xlabel "bytes "
 set ylabel "cycles"
-set xtics 0,32,maxx
+set xtics 0,64,maxx
 set grid
 set key left

@@ -12,8 +12,10 @@ set terminal pdfcairo
 set output "plotcycles.pdf"  

 plot    "blake2b.data" using 1:2 with lines title "BLAKE2b" 
+replot    "blake2b-ref.data" using 1:2 with lines title "BLAKE2b-ref"
 replot  "blake2s.data" using 1:2 with lines title "BLAKE2s"
 replot  "md5.data" using 1:2 with lines title "MD5"
+replot  "sha256.data" using 1:2 with lines title "SHA256"

 set output "plotcycles.pdf"  
-replot
\ No newline at end of file
+replot
diff --git a/bench/makefile b/bench/makefile
index b637c82..e06c067 100644
--- a/bench/makefile
+++ b/bench/makefile
@@ -7,14 +7,18 @@ all: bench

 bench: bench.c
        $(CC) $(FILES) $(CFLAGS) ../sse/blake2b.c -o blake2b
+       $(CC) $(FILES) $(CFLAGS) ../ref/blake2b-ref.c -o blake2b-ref
        $(CC) $(FILES) $(CFLAGS) ../sse/blake2s.c -o blake2s
        $(CC) $(FILES) $(CFLAGS) md5.c -o md5  -lcrypto -lz
+       $(CC) $(FILES) $(CFLAGS) sha256.c -o sha256  -lcrypto -lz

 plot: bench
        ./blake2b > blake2b.data
+       ./blake2b-ref > blake2b-ref.data
        ./blake2s > blake2s.data
        ./md5 > md5.data
+       ./sha256 > sha256.data
        gnuplot do.gplot

 clean:
-       rm -f blake2b blake2s md5 plotcycles.pdf blake2b.data blake2s.data md5.data
+       rm -f blake2b blake2s md5 plotcycles.pdf blake2b.data blake2b-ref.data blake2s.data md5.data sha256.data
diff --git a/bench/sha256.c b/bench/sha256.c
new file mode 100644
index 0000000..aca605b
--- /dev/null
+++ b/bench/sha256.c
@@ -0,0 +1,22 @@
+/*
+   BLAKE2 reference source code package - benchmark tool
+  
+   Copyright 2012, Samuel Neves <[email protected]>.  You may use this under the
+   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
+   your option.  The terms of these licenses can be found at:
+  
+   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
+   - OpenSSL license   : https://www.openssl.org/source/license.html
+   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
+  
+   More information about the BLAKE2 hash function can be found at
+   https://blake2.net.
+*/
+#include <stddef.h>
+#include <openssl/sha.h>
+
+int crypto_hash( unsigned char *out, const unsigned char *in, unsigned long long inlen )
+{
+  SHA256( in, inlen, out );
+  return 0;
+}

@enkore enkore closed this as completed Nov 4, 2016
@enkore enkore reopened this Nov 4, 2016
@enkore enkore self-assigned this Nov 4, 2016
@enkore
Copy link
Contributor Author

enkore commented Nov 4, 2016

Maybe that's the reason why OpenSSL also only uses the C implementation and not the SSE implementation; little performance difference, but more maintenance: https://github.com/openssl/openssl/tree/master/crypto/blake2

Although I remember there being another implementation coded directly in IIRC (SSE) assembly which was significantly faster. It also seemed to be abandoned / little used.

@ThomasWaldmann
Copy link
Member

Can you do a plot without the tiny msg lengths (they spoil the scaling and we rarely have such tiny msgs)?
Maybe 512B .. 512kB?

@enkore
Copy link
Contributor Author

enkore commented Nov 4, 2016

Beyond ~2 kB it doesn't change any more - the O(n) part takes over the constant setup time.

image

Note: to get so "smooth" results one has to kill/halt all resource intensive processes (eg. firefox) and run at high priority, otherwise there will be spikes esp. in the longer message length due to scheduling. This is expected.

@enkore
Copy link
Contributor Author

enkore commented Nov 7, 2016

Mostly done. I'll conduct some additional tests and will submit a PR.

But: it definitely improves performance when deduplicating. With repokey I get ~245 MB/s deduplication rate (=chunker + ID hash). With repokey-blake2 it's 325 MB/s, almost one third faster. When writing fresh data the impact will be less, of course (because I/O, encryption, MAC take their share).

I added the authenticated mode (repokey-like) - it's very little additional code. We already know most of it from PlaintextKey.

To keep the diff minimal I decided to not change the MAC function to B2 as well (but this doesn't really matter a lot due to #1034 and #1044).

#1034 does not apply cleanly, but all conflicts are easy (import lines, and git being cautious about places where both branches added hunks).

@ThomasWaldmann
Copy link
Member

@enkore claim the bounty.

@rugk
Copy link
Contributor

rugk commented Apr 18, 2017

So is there any way to convert existing repos to use blake2 after updating borg?

@ThomasWaldmann
Copy link
Member

@rugk no, there is no converter yet and also no co-existance of different id-hashes.

@rugk
Copy link
Contributor

rugk commented Apr 18, 2017

Mhh, that's bad. Is there any plan to add it or should I open a new issue for that?

@ThomasWaldmann
Copy link
Member

not sure we need one. for existing systems and repos of them, the daily backup just deals with the changes since last backup, which is usually not much. blake2b gives some speedup if there is a lot to hash, especially in the first backup or if there are new/changed big files.

in 1.2, we'll add some more flexibility for the crypto stuff, maybe we can deal better with this by then.

@rugk
Copy link
Contributor

rugk commented Apr 18, 2017

I think updating to newer crypto is often desired by some people… And if it just improves the speed of future backups a bit…

@ThomasWaldmann
Copy link
Member

Sure, we moved the blake2b change from 1.2 to 1.1 to make it available earlier than 1.2.
But we need some other stuff that will come in 1.2 for better flexibility...

@enkore
Copy link
Contributor Author

enkore commented Apr 18, 2017

There will be no further crypto changes for 1.1.

There are no plans for a converter for either 1.1 or 1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants