-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] improve thread scheduling and affinity #305
Comments
Completed the optimization and tested on MacOS (M1/arm64 and x64), Windows x64, Debian x64, Ubuntu x64, Android termux arm64, RPi3 w/ Debian-based Linux, and Cygwin. [old: This should compile on FreeBSD, but I have not been able to test with FreeBSD since we don't have a machine available to do so.] New: I received confirmation that the code is correct for FreeBSD. The thread affinity and priority is set for the calling thread as follows in the code I wrote (updated to support DragonFly and NetBSD):
Note: the performance impact on MacOS is not observable in my tests and don't differ from the current benchmarks, which is not entirely unexpected because MacOS doesn't offer a thread affinity API as far as I know and I've found the thread performance on MacOS quite optimal already. Setting the MacOS thread QoS and priority is probably a good idea anyway, so it can't hurt. Android with Termux is a different story. It won't always set affinity with the code above. The number of available cores changes all the time. Several online resources confirm this. Updated ugrep v4.3.1-1 is committed and can be cloned from this repo to build and test. I will release v4.3.2 later, after more testing. |
set thread affinity and priority #305 & improve TUI regex syntax highlighting of --bool AND/OR/NOT
Included with release v4.3.2. |
# By Robert van Engelen (55) and others # Via GitHub (16) and Robert van Engelen (2) * tag 'v4.5.2': released 4.5.2 tests: Fix tests with 7zip disabled 7zip: Do not build when configured with disable-7zip released 4.5.1 fix bzip3/7zip configure interference add Genivia#341 format %Z enhancement fix Genivia#10 --disable-7zip fix bzip3/7zip detection interference released 4.5.0 remove shebang from bash completion script released 4.4.1 Fix installation target to use DESTDIR when setting up completions add `installers-regex` to Winget Releaser workflow released 4.4.0 released 4.4.0 Update README.md improved zsh completions with option args Update README.md Update README.md Update README.md add bash fish zsh completions Bump github/codeql-action from 2 to 3 updated fish completions update completions add fish completions add bash completions docs: openSUSE install method added released 4.3.6 Update README.md released 4.3.5 released 4.3.5 Add Macports moar +pager variant (moar-pager) fix linker warning -L/lib directory not found fix Genivia#323 configure check released 4.3.4 Refactor Dockerfile for optimized build speed and image size Update Arch Linux package URL in README.md Update README.md update to fix Genivia#316 Genivia#317 Genivia#319 ugrep.cpp: Fix typo preceeded include bzip3 library only when --with-bzip3 is specified released 4.3.3 add bzip3 decompression Genivia#311 add brotli decompression Genivia#312 add brotli decompression Genivia#312 nested zip error recovery Genivia#313 redux nested zip error recovery Genivia#313 quicker TUI blanking when search restarts update README updated README Add Zig support released 4.3.2 released 4.3.2 Update README.md Update README.md Update README.md Update README.md add ugrep.com updated README update Genivia#305 to support DragonFly and NetBSD add thread affinity and priority fix Genivia#306 option --bool space in regex bracket list fix Genivia#306 option --bool space in regex bracket list updated README Add Kakoune updated README Bump actions/checkout from 3 to 4 released 4.3.1 updated README updated README add winget installation reference in the readme add Winget Releaser workflow updated README Signed-off-by: Stavros Ntentos <[email protected]> # Conflicts: # src/ugrep.cpp
I was asked by a ugrep user if I had plans to further accelerate the performance of recursive searching by setting thread affinity in ugrep and to improve worker job scheduling, if it helps. So yes, I thought about that and it is relatively easy to do with pthreads (I taught it in my HPC class at FSU), but the pthreads C++11 code is not portable, so I hesitated to do this until later.
We all know that setting thread affinity can improve threading performance when certain conditions are met in the way threads are used (thread lifetime, RAM/cache use, memory access time versus CPU time ratio).
Running a preliminary test of ugrep with Ubuntu quad core x64 w/ HTT (8 logical cores) shows that a search of /usr with thread affinity runs in 0.12 seconds which is up to 2x faster than without affinity. So yes, it is worthwhile to set the thread affinity of the worker thread pool in ugrep. However, MacOS and Windows do not seem to benefit much, if anything.
On Debian and Ubuntu 2.9GHz Intel Core i7 quad core with hyperthreading (8 logical cores) and 16GB 2133MHz LPDDR3 I get around 700% to 750% CPU utilization when searching
ugrep -Ilr zodiaq /usr
in a container in 0.12 seconds (20437 files and 2011 directories, none of them matchingzodiaq
). Now, we all know that CPU percentage is generally meaningless. It could just be running a busy-wait loop for that matter (so it's easy to increase CPU% by doing whatever stuff). The real time (aka. wall clock time) is ultimately the measure of performance to get the work done in parallel. It measures the time elapsed for the work performed with a level of parallelism i.e. when compared to a single CPU performance we get the speed up.To my surprise, also recursively searching with option
-z
(--decompress
) runs a lot faster. I was surprised by this, because decompression threads are used in ugrep to feed decompressed streams or plain input (when not compressed) to the search engines of the worker threads, thereby increasing the concurrency of ugrep beyond the number of available physical cores. Still, it looks even better now to make a push to set thread affinity of the worker thread pool in ugrep.This optimization will be included in the upcoming 4.3.2 release. Obviously, I need more time to test different programmatic ways to set thread affinity and measure the performance impact.
The text was updated successfully, but these errors were encountered: