#1849: Hash each digest algo in separated threads #1850

aberenguel · 2023-08-30T16:36:53Z

Implements #1849

…new threads

lfcnassif · 2023-08-31T03:08:03Z

Thank you @aberenguel! Just pushed a small change to use less threads when possible, with the same upper bound. I'll run a small test and a medium size one to see how performance behaves.

aberenguel · 2023-08-31T04:45:45Z

Great! Thanks, @lfcnassif !
I had misunderstood how Tasks were instanced and finished. I'd thought it were instanced by item, but I've seen that it is instanced by worker.

Now I think it is very good!

aberenguel · 2023-08-31T04:47:45Z

I ran in a small case (500k itens) and haven't seen any significant difference.
Now I'll try a bigger one.

lfcnassif · 2023-08-31T13:37:07Z

My small case (250K items) test result was the same as yours, no significant difference.

But the medium size test (2.2M items) result was not good... Total processing time increased from ~6,600s to ~8,600s. The evidence was in the local network, so it could have been affected by other storage or network activity, although the test was run after midnight... I'll copy the evidence locally and repeat the test.

Let's see how your test behaves.

lfcnassif · 2023-08-31T13:39:06Z

Another thought, ideally we should also test this on a commodity computer with less CPU power, not every user has a machine like ours...

lfcnassif · 2023-08-31T18:42:07Z

Just finished the test with the 2.2M items evidence (750GB) processed locally. Results seem fine:

HashTask time decreased from 635s to 545s
Total processing time decreased just a bit from 6652s to 6635s.

aberenguel · 2023-08-31T19:28:42Z

Great! I think the bigger difference will be when it finds a big file that is being hashed when other workers are idle.

lfcnassif · 2023-08-31T20:19:42Z

This test evidence is very mixed, with files of different kinds. There is a VDI of 40GB inside it. Of course other cases can benefit much better of this optimization. My concern was about performance loss in some cases, but seems we shouldn't worry about it.

lfcnassif · 2023-09-06T02:17:29Z

Hi @aberenguel, did you have time to run this on your larger case? If it seems good, I think we can merge this.

aberenguel · 2023-09-06T05:04:39Z

Hi @aberenguel, did you have time to run this on your larger case? If it seems good, I think we can merge this.

I am processing again. The previous processing was not realiable because I was doing another things in computer. Now the test is scripted. Tomorrow I'll post the results.

aberenguel · 2023-09-06T13:01:14Z

I processed in my personal computer a folder with 1.1M files. I ran twice with default profile.

Profile	master (`f6eae92` ) [seconds]	hashparallel (merged) [seconds]
default	1697 / 1742	1716 / 1810
forensic	2707	2678

lfcnassif · 2023-09-06T14:21:56Z

Great! Seems fine to me, differences look normal fluctuations. If it is your larger (last) test and if no one objects, I'll merge this soon.

aberenguel · 2023-09-11T15:50:00Z

I ran it again in a case with 8354170 items (560222 MB).
It seems it got worse.

master	hashparallel
03:45:55	04:00:47

I'm executing the test again for a last time.

aberenguel · 2023-09-19T23:57:47Z

Another test in a E01 file (related to an 160GB disk), with profile forensic and OCR enabled.

master -> 09h 27m 26s
hash_parellel -> 09 h25m 16s

lfcnassif · 2023-09-20T00:27:37Z

Great, thank you! Have you had a chance to repeat the previous test that got a bit worse?

lfcnassif · 2023-09-22T12:48:18Z

Another test in a E01 file (related to an 160GB disk), with profile forensic and OCR enabled.

master -> 09h 27m 26s hash_parellel -> 09 h25m 16s

With OCR enabled, there is more pressure on CPU, so I think this is a good scenario to test more thread concurrency and results were fine. @aberenguel, can I merge this or do you intend to repeat the previous test that got a bit worse?

lfcnassif · 2023-10-11T01:44:06Z

Given previous tests, I'm going to merge this. If anyone think this can cause a significant performance regression in some processing scenario and have any evidence of that, please let me know.

lfcnassif

Thank you @aberenguel!

sepinf-inc#1849: Hash each digest algo in separated threads

c51d21a

lfcnassif linked an issue Aug 30, 2023 that may be closed by this pull request

Optimize HashTask using parallelism #1849

Closed

'sepinf-inc#1849: uses a static cached thread pool to avoid unneeded …

3febf6b

…new threads

lfcnassif mentioned this pull request Aug 31, 2023

Review Thread creation and re-usage through the application #1853

Open

aberenguel added 3 commits September 1, 2023 02:03

sepinf-inc#1849: makes BouncyCastleProvider global in HashTask

98cc55c

sepinf-inc#1849: uses a single BouncyCastleProvider in iped-engine

1b32faf

sepinf-inc#1849: no need to explicit provider for MD4

4d028d1

lfcnassif approved these changes Oct 11, 2023

View reviewed changes

lfcnassif merged commit 883b72a into sepinf-inc:master Oct 11, 2023

aberenguel deleted the hash_parallel branch October 4, 2024 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#1849: Hash each digest algo in separated threads #1850

#1849: Hash each digest algo in separated threads #1850

aberenguel commented Aug 30, 2023

lfcnassif commented Aug 31, 2023

aberenguel commented Aug 31, 2023

aberenguel commented Aug 31, 2023

lfcnassif commented Aug 31, 2023 •

edited

Loading

lfcnassif commented Aug 31, 2023 •

edited

Loading

lfcnassif commented Aug 31, 2023

aberenguel commented Aug 31, 2023

lfcnassif commented Aug 31, 2023

lfcnassif commented Sep 6, 2023

aberenguel commented Sep 6, 2023

aberenguel commented Sep 6, 2023

lfcnassif commented Sep 6, 2023

aberenguel commented Sep 11, 2023 •

edited

Loading

aberenguel commented Sep 19, 2023

lfcnassif commented Sep 20, 2023

lfcnassif commented Sep 22, 2023

lfcnassif commented Oct 11, 2023

lfcnassif left a comment

#1849: Hash each digest algo in separated threads #1850

#1849: Hash each digest algo in separated threads #1850

Conversation

aberenguel commented Aug 30, 2023

lfcnassif commented Aug 31, 2023

aberenguel commented Aug 31, 2023

aberenguel commented Aug 31, 2023

lfcnassif commented Aug 31, 2023 • edited Loading

lfcnassif commented Aug 31, 2023 • edited Loading

lfcnassif commented Aug 31, 2023

aberenguel commented Aug 31, 2023

lfcnassif commented Aug 31, 2023

lfcnassif commented Sep 6, 2023

aberenguel commented Sep 6, 2023

aberenguel commented Sep 6, 2023

lfcnassif commented Sep 6, 2023

aberenguel commented Sep 11, 2023 • edited Loading

aberenguel commented Sep 19, 2023

lfcnassif commented Sep 20, 2023

lfcnassif commented Sep 22, 2023

lfcnassif commented Oct 11, 2023

lfcnassif left a comment

Choose a reason for hiding this comment

lfcnassif commented Aug 31, 2023 •

edited

Loading

lfcnassif commented Aug 31, 2023 •

edited

Loading

aberenguel commented Sep 11, 2023 •

edited

Loading