Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big enhancement: Parallel Tasks #6

Merged
merged 49 commits into from
Dec 19, 2024
Merged

Big enhancement: Parallel Tasks #6

merged 49 commits into from
Dec 19, 2024

Conversation

Daru-san
Copy link
Owner

This has been something that I have been wanting to work on for quite some time now and now I aim to make that a reality.

Cloning would result in copying every single image file, which took up a
lot of memory, very quickly
This let's us pass the queue between different threads, which is useful
for parallel image decoding and soon, parallel processing
@Daru-san
Copy link
Owner Author

It seems that using std::mem::take() lets me obtain data much more lucratively, without having to clone images every second, which massively reduces memory usage.

This lets me reference it multiple times for different decoding images,
fixing the display issue
This would cause the program to process every image sequentailly, as the
next thread would wait for the lock while the image processed or saved.
We fix that by dropping the locks before beginning processing
This makes the conversions seemingly quite a bit faster, since we
convert and write to memory then write to disk all at once.
I need to test a bit more though.
@Daru-san
Copy link
Owner Author

Parallel decoding seems to work really well and brings a lot of improvement.
The issue is that parallel processing on the other hand is not as good, as expected the advantages of parallelism only go so far, so I may make this a hidden feature or at least implement it for decoding only

@Daru-san
Copy link
Owner Author

Using parallel iterators seems to bring pretty sizable improvements to performance, quite a bit more than I expected.
Comparing two operations, removing the backgrounds from 288 images:

Single threaded

[1/3] Task complete: 288 images were decoded successfully ^.^
[2/3] Task complete: Paths created successfully.
[288/288]   288 images were saved successfully.
[3/3] 3 tasks completed with 0 errors in 81s
######################################## [00:01:21]      

Parallel

[1/3] Task complete: 288 images were decoded successfully ^.^
[2/3] Task complete: Paths created successfully.
[287/287]   287 images were saved successfully.
[3/3] 3 tasks completed with 0 errors in 50s
######################################## [00:00:50]       

From my testing it seems that using par_iter() is a much better
implementation of multi-threading than using a threadpool.

Processing large batches of images using the thread pool causes the
program to get killed by, on my system, earlyoom, which does not happen
when running with parallel iterators.

However, threadpools can be a bit faster, since we can specify the
number of threads, we do not need them though
@Daru-san
Copy link
Owner Author

From a bit more testing it seems that our decode and process speeds see a significant increase in speed thanks to our parallelizing, however sadly so, writing images to disk is the slowest stage and ends up taking the most time from what I have tested. This makes sense, because in that case we would be limited by our disc speeds.

@Daru-san
Copy link
Owner Author

However it seems that the thread pool method was a bit faster in this right, but I assume that is because of the way threads are continually spawned in the pool in a queue fashion. So that each task immediately begins after the other, which is pretty useful when reading and writing.

@Daru-san
Copy link
Owner Author

The problem with the thread pool is that it would cause massive CPU utilization and if one were to process large amounts of images the process would easily hang the system, like on my own, having the process be forcibly killed by earlyoom because of CPU thread usage and massive memory usage.

@Daru-san
Copy link
Owner Author

Through my debugging I have noticed that rayon's parallel iterators let threads sleep for very short periods of time, although I need to research that a bit more.
The looping spawn method in the thread pool would leave no moments for threads to sleep, which is great for speed, but terrible for the rest of the system.

@Daru-san Daru-san merged commit 29a2daf into main Dec 19, 2024
5 checks passed
@Daru-san Daru-san deleted the feat/parallel-tasks branch December 19, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant