-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange performance and scalability issues with some of the build systems #3163
Comments
In the downstream bug report for Arch (https://bugs.archlinux.org/task/75104), "Antonio Rojas (arojas)" made an interesting discovery: Apparently the issue is with line 28 in EnableCompilerFlag("-std=c99" true false) Commenting out this line brings performance up to par with the plain make version. Setting it to I have no idea why the C standard version would matter for threaded performance, but if so the fix is rather simple. Sidenote: It seems silly to maintain so many different build systems. The risk of something like this happening is much greater, and the testing matrix becomes much larger. |
The default for GCC would be
|
Confirmed on OpenMandriva (building with clang rather than gcc by default) as well. -std=c99 lowers performance significantly (especially with multiple threads), and higher standard is fine. No significant difference between cXX and gnuXX. Also confirmed that (even on a machine with many CPUs -- ThreadRipper 1950x, 16 cores, 32 threads) adding more threads beyond a certain point actually lowers performance.
Might be good to cap -T0 at something way lower than the number of available CPUs. |
Regarding the degradation with more threads, I could observe this on my system as well, but at other points. For me the peak is at
afterwards it degrades slowly (though sometimes one more thread is a bit faster again; see below for all results). Anyway, point being, it might not be straightforward to automatically decide the ideal number of threads to use, because it seems depend heavily on the specific CPU (and not just its core/thread count). Click to see all the benchmarking results
Update: Just confirmed that the behavior is similar with 1.5.2 (built with just make), though there the peak is at -T16 instead of -T17 and the performance is generally better than with 1.4.8: Click for results
|
Also discovered on FreeBSD half a year ago, see |
Officially only Make is supported, the others are all third-party contributions and, I believe, primarily exist to enable functionality such as Meson subproject wraps or cmake Somewhat relevant as well: #2261 Also interesting: #1609 changed meson from c99 to gnu99 for unclear reasons? which is inconsistent with both cmake and make. |
Can't confirm on macOS 12.4 arm64; hence it seems to be a gcc issue. |
The build systems have no inherent fault, the culprits are presumably the compiler and the code here relying on different clock assumptions. |
I had a quick look at the code, maybe this is due to this ( To test I ran it under
And this is the result for cmake:
Invocation for both cases was
Maybe the way I'm benchmarking it with Edit: Oh, this was already discovered on the freebsd mailing list post, sorry. |
What's the official upstream build system for It also seems that the different build systems have different install targets and different configuration options as well... |
it's not expected to be useful and can actually lead to subtle side effects such as #3163.
This issue has been featured on Phoronix. |
I don't see any |
fixed |
yep |
Describe the bug
After reading a reading a recent Phoronix benchmark (a bit down the page) I decided to investigate why Arch Linux was so much slower (10-20x) for zstd performance. It turned out that something is wrong with some of the build systems included with zstd!
When zstd is built with the cmake or meson build systems there is negative scaling with the number of threads, while when building with the Makefile in the top level directory, there is positive scaling with the number of threads.
To Reproduce
Steps to reproduce the behavior:
make
mkdir build && cmake ../zstd-1.5.2/build/cmake/ && make
meson setup builddir && cd builddir && ninja
path/to/zstd -T1 -b4 path/to/FreeBSD-13.1-RELEASE-amd64-memstick.img
path/to/zstd -T6 -b4 path/to/FreeBSD-13.1-RELEASE-amd64-memstick.img
(adjust -T6 based on the number of cores you have)Note! I see the same pattern at other compression levels such as 6 and 8, not just 4. So that value doesn't really appear to matter, as long as it is consistent of course.
Expected behavior
I expect that all build systems should result in binaries with roughly the same behaviour. Performance and scaling should be similar.
Actual results
The output below has been abbreviated for clarity, repeated command lines has been elided only showing the output. Three runs for each combination of program and flags has been performed. As can be seen the results are relatively consistent run-to-run (at least consistent enough given the huge discrepancies).
Analysis of results
For CMake and Meson: it can be seen that the performance goes down between 1 thread and 6 threads: ~1100 MB/s to ~700 MB/s.
For plain make, the performance goes up between 1 thread and 6 threads: ~1100 MB/s to ~3500 MB/s.
Decompression speed (the second value) does not seem to vary significantly across the experiments however.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: