Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes to improve compilation speed #137

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Aeledfyr
Copy link

This crate is small and compiles quickly, but it's also used everywhere, so trying to improve the compilation speed is worth it.

This PR has a number of optimizations; each commit is independent, so feel free to cherry-pick and take whichever changes you want. This PR preserves the current MSRV of 1.13.

Changes:

  • Add rust-version = "1.13" to the Cargo.toml, to partially document the current MSRV
  • Change internal methods that use AsRef<Path> to just take &Path
  • Use loops instead of iterator combinators in parsing functions
  • Make the debug! macro generate no code when not in use
  • Use a Vec instead of a HashMap when counting physical cores; using the HashMap requires compiling a lot more code. (The Vec should also be faster.)

My non-scientific benchmarks of a clean build of the library (libc + num_cpus)

  • in release mode: 0.8s -> 0.7s (-13%)
  • in debug mode: 0.63s -> 0.60s (-5%)

Measuring the amount of code passed to LLVM, using cargo-llvm-lines:
cargo llvm-lines --lib -p num_cpus (with --release for release builds)

  • in release mode: 16325 -> 13039 (-20%)
  • in debug mode: 24263 -> 18781 (-22%)

Iterator combinators have a noticable compilation time overhead
because they must be monomorphized and end up passing more code
to LLVM.  This code does get optimized out, but that takes time
and slows down the overall build.
I'm not sure how much of a compile time impact this makes, but this
prevents the compiler from having to generate formatting code when
the debug macro is not in use.
The current implementation uses a HashMap to deduplicate the output
from each core of the same cpu.  This commit instead collects the
output for each core in to a Vec, and then sorts it to deduplicate
physical CPUs.

This reduces the code size processed by LLVM by 15-20%, as counted
by `cargo llvm-lines --lib -p num_cpus` on both debug and release.

These implementations have different performance characteristics:
- the HashMap must hash each key, and SipHash is slow on small keys
- the number of cores will be small (<1024) so sorting the list
  should be very fast
- the list will likely already be sorted

I have not benchmarked this code, but it should be around the same
speed or slightly faster (from testing against randomized lists).
@Aeledfyr
Copy link
Author

Aeledfyr commented Feb 25, 2024

Improved benchmarks, only compiling num_cpus by directly invoking rustc (with the command given by cargo's --verbose):

  • release build: from 364.4 ms ± 7.6 ms to 270.4 ms ± 7.8 ms (-25%)
  • debug build: from 70.1 ms ± 3.0 ms to 60.4 ms ± 2.7 ms (-13%)

@Aeledfyr
Copy link
Author

It looks like most of the CI failures are due to an updated Rust toolchain:

  • Nightly tests fail to compile because of a new warning (and #[deny(warnings)])
  • Cross compiling to some architectures fails because of a missing rust-std components

The only test that actually fails is test-cgroups, which fails in the initial (non-cgroups) test, because it expects to be run on a machine with 2 cores, but the CI now runs it on a 4-core machine. I'm not 100% sure that that's what is happening, but it may be worth rerunning CI on the main branch to see if that's the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant