Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add concurrent working threads option to CLI args #760

Merged

Conversation

ttys3
Copy link
Contributor

@ttys3 ttys3 commented Nov 30, 2024

add concurrent working threads option to CLI args

@ttys3 ttys3 force-pushed the feature/add-concurrent-working-threads-option branch from 639ffbe to 221084b Compare November 30, 2024 17:48
@ttys3
Copy link
Contributor Author

ttys3 commented Nov 30, 2024

close due to ThreadPoolBuildError { kind: GlobalPoolAlreadyInitialized }

spawn_logger_thread() called and which uses rayon, the the global pool already initialized there

@ttys3 ttys3 closed this Nov 30, 2024
@ttys3
Copy link
Contributor Author

ttys3 commented Nov 30, 2024

@ttys3
Copy link
Contributor Author

ttys3 commented Nov 30, 2024

@marcospb19
Copy link
Member

You could call parse_and_validate_args, build the threadpool, and then call spawn_logger_thread, as parse_and_validate_args doesn't require the logger thread to be alive yet.

@ttys3
Copy link
Contributor Author

ttys3 commented Dec 2, 2024

You could call parse_and_validate_args, build the threadpool, and then call spawn_logger_thread, as parse_and_validate_args doesn't require the logger thread to be alive yet.

yes, that's a workaround.

but the problem is, I did not expect the logger and the business code share the same global rayon threadpool.

because, when I set the threads (the one I want to add) to 1, what I mean it decompress the files one by one, to reduce disk io impact.

But since the logger may require more concurrent threads, I'm unsure if this will cause problems for it. Is it okay for us to do this?

@ttys3
Copy link
Contributor Author

ttys3 commented Dec 2, 2024

maybe we could use async? that way, the async logger wouldn't have to depend on rayon crate to achieve that.

@marcospb19
Copy link
Member

But since the logger may require more concurrent threads, I'm unsure if this will cause problems for it. Is it okay for us to do this?

Oh, I see your concern, I overlooked the case where parallelism is set to 1.

The solution is then to use thread::spawn instead of rayon::spawn in the logger thread, so we limit rayon threads just to the decompression.

Setting up an async logger wouldn't help because we'd need at least one thread in the async runtime threadpool, and that would be equivalent (but more complicated) than just having a thread::spawn thread separated for it.

@ttys3 ttys3 reopened this Dec 2, 2024
@ttys3
Copy link
Contributor Author

ttys3 commented Dec 2, 2024

Oops, seems that the UI tests failed

@marcospb19
Copy link
Member

image

it changed what the --help looks like and we have that on UI tests, so you need to run cargo insta test and cargo insta review, or update those snapshots manually till they match.

I know this is kinda annoying, but makes life easier in the long term.

@ttys3
Copy link
Contributor Author

ttys3 commented Dec 5, 2024

UI tests fixed

@ttys3 ttys3 force-pushed the feature/add-concurrent-working-threads-option branch from b50674b to d7ebd27 Compare December 5, 2024 11:06
@ttys3 ttys3 marked this pull request as draft December 5, 2024 11:08
@ttys3 ttys3 marked this pull request as ready for review December 5, 2024 11:08
@ttys3 ttys3 force-pushed the feature/add-concurrent-working-threads-option branch from d7ebd27 to 0707403 Compare December 9, 2024 17:38
Copy link
Member

@marcospb19 marcospb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty, let me know if you think the default shouldn't be all cores.

@marcospb19 marcospb19 merged commit 28d0933 into ouch-org:main Dec 14, 2024
15 checks passed
@ttys3
Copy link
Contributor Author

ttys3 commented Dec 15, 2024

Ty, let me know if you think the default shouldn't be all cores.

If the compression algorithm itself is fast enough (and can effectively utilize all cores), I believe we should have all CPU cores focus on processing a single compressed document instead of dividing the CPU cores to handle multiple documents. This way, a single document can be decompressed more quickly. Meanwhile, in this scenario, if we decompress all documents concurrently, the bottleneck will not be the CPU, but rather disk IO, and threads can easily enter the D-state (the state in top or htop, mostly iowait).

If the compression algorithm cannot efficiently utilize the CPU (for example, when decompressing with 7z and rar, despite claiming to support multithreading, they cannot fully utilize the CPU, and overall decompression speed is very slow). Then decompressing documents one by one will be relatively slow.

Therefore, I feel that both approaches (processing one task at a time vs. concurrently processing multiple compressed documents) do not have absolute pros and cons; it depends on the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants