-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#1849: Hash each digest algo in separated threads #1850
Conversation
Thank you @aberenguel! Just pushed a small change to use less threads when possible, with the same upper bound. I'll run a small test and a medium size one to see how performance behaves. |
Great! Thanks, @lfcnassif ! Now I think it is very good! |
I ran in a small case (500k itens) and haven't seen any significant difference. |
My small case (250K items) test result was the same as yours, no significant difference. But the medium size test (2.2M items) result was not good... Total processing time increased from ~6,600s to ~8,600s. The evidence was in the local network, so it could have been affected by other storage or network activity, although the test was run after midnight... I'll copy the evidence locally and repeat the test. Let's see how your test behaves. |
Another thought, ideally we should also test this on a commodity computer with less CPU power, not every user has a machine like ours... |
Just finished the test with the 2.2M items evidence (750GB) processed locally. Results seem fine:
|
Great! I think the bigger difference will be when it finds a big file that is being hashed when other workers are idle. |
This test evidence is very mixed, with files of different kinds. There is a VDI of 40GB inside it. Of course other cases can benefit much better of this optimization. My concern was about performance loss in some cases, but seems we shouldn't worry about it. |
Hi @aberenguel, did you have time to run this on your larger case? If it seems good, I think we can merge this. |
I am processing again. The previous processing was not realiable because I was doing another things in computer. Now the test is scripted. Tomorrow I'll post the results. |
I processed in my personal computer a folder with 1.1M files. I ran twice with default profile.
|
Great! Seems fine to me, differences look normal fluctuations. If it is your larger (last) test and if no one objects, I'll merge this soon. |
I ran it again in a case with 8354170 items (560222 MB).
I'm executing the test again for a last time. |
Another test in a E01 file (related to an 160GB disk), with profile forensic and OCR enabled. master -> 09h 27m 26s |
Great, thank you! Have you had a chance to repeat the previous test that got a bit worse? |
With OCR enabled, there is more pressure on CPU, so I think this is a good scenario to test more thread concurrency and results were fine. @aberenguel, can I merge this or do you intend to repeat the previous test that got a bit worse? |
Given previous tests, I'm going to merge this. If anyone think this can cause a significant performance regression in some processing scenario and have any evidence of that, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @aberenguel!
Implements #1849