-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize HashTask using parallelism #1849
Comments
Very nice, thank you!!! But, have you tested it with real cases with files with different sizes to see if times of other tasks remain equal (or increase) or if overall time decreases? Maybe more threads could compete for CPU with other tasks... PS1: On #1834 I tested an approach that I think is similar but more general, and unfortunately overall results weren't good... PS2: Maybe a similar approach could be done to EntropyTask, I have seen it spending a lot of time with big files, but that is another improvement... |
I've just tested using a folder with a single file as evidence. I'll test in some real images I have here. |
I noticed that my personal computer CPU (AMD Ryzen 7 5700G) has instructions for SHA algorithms (https://en.wikipedia.org/wiki/Intel_SHA_extensions). I know that openssl uses such instructions if supported. So I hashed a file with 10G. The performance is huge: $ pv file-10G.bin | openssl dgst -sha256
9,31GiB 0:00:06 [1,38GiB/s] [=====================================================>] 100%
SHA2-256(stdin)= 792cd8bfd20e81a5c01648d1e45fcc301b255ff94a4865d9183440944fde0364 Compared with classical pv file-10G.bin | sha256sum
9,31GiB 0:00:27 [ 342MiB/s] [=====================================================>] 100%
792cd8bfd20e81a5c01648d1e45fcc301b255ff94a4865d9183440944fde0364 - |
I looked for some JCE Provider that supported such instructions and found that one: So I measured the HashTask processing same file. Using default Provider:
Using
|
I think the use |
Excellent! But seems AmazonCorretoCryptoProvider unfortunately supports just Linux and MacOS... We try to use just native dependencies that support Windows and Linux at least. Maybe Linux users could add it into plugins folder if they want? |
There is an issue for that: corretto/amazon-corretto-crypto-provider#48 |
I was processing a case (using 4.1.4 version) with a big file inside the case (~68G) and I noticed that HashTask was taking a lot of time.
Analyzing the code, I noticed that hashes could be parallelized.
I implemented it and did some experiments.
¹ Implemented reading buffer and hashing in sequence.
² Reads buffer while another threads are hashing.
One counterpoint is that with this optimization the HashTask is using more thread the it is supposed to use (just one).
I'll submit the code and reference in this issue.
The text was updated successfully, but these errors were encountered: