Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"How Should I Benchmark Small Functions?" section in book is nonsense #784

Open
orlp opened this issue Jun 28, 2024 · 1 comment
Open

"How Should I Benchmark Small Functions?" section in book is nonsense #784

orlp opened this issue Jun 28, 2024 · 1 comment

Comments

@orlp
Copy link

orlp commented Jun 28, 2024

This section uses the following motivating example to show how you should directly put your tiniest operations (such as a single i + 10 addition) into Criterion and it will benchmark it down to the single instruction level:

fn compare_small(c: &mut Criterion) {
    use criterion::black_box;

    let mut group = c.benchmark_group("small");
    group.bench_with_input("unlooped", 10, |b, i| b.iter(|| i + 10));
    group.bench_with_input("looped", 10, |b, i| b.iter(|| {
        for _ in 0..10000 {
            black_box(i + 10);
        }
    }));
    group.finish();
}

Allow me to demonstrate this is nonsense, and you're just benchmarking a pre-computed input being movd:

fn compare_nonsense(c: &mut Criterion) {
    use criterion::black_box;

    let mut group = c.benchmark_group("small");
    group.bench_with_input("unlooped", &10, |b, i| b.iter(|| i + 10));
    group.bench_with_input("looped", &10, |b, i| b.iter(|| {
        for _ in 0..10000 {
            black_box(i + 10);
        }
    }));
    group.bench_with_input("unlooped_nonsense", &10, |b, i| b.iter(|| (*i as f64).sin() as i32));
    group.bench_with_input("looped_nonsense", &10, |b, i| b.iter(|| {
        for _ in 0..10000 {
            black_box((*i as f64).sin() as i32);
        }
    }));
    group.finish();
}
small/unlooped          time:   [300.52 ps 301.17 ps 301.93 ps]
small/looped            time:   [3.0068 µs 3.0114 µs 3.0158 µs]
small/unlooped_nonsense time:   [299.58 ps 300.03 ps 300.45 ps]
small/looped_nonsense   time:   [3.0083 µs 3.0120 µs 3.0157 µs]

Needless to say, f64::sin does not take 300 picoseconds, it's optimized out to compute it once and re-use that computation.

It is sometimes suggested that benchmarks of small (nanosecond-scale) functions should iterate the function to be benchmarked many times internally to reduce the impact of measurement overhead. This is not required with Criterion.rs, and it is not recommended.

I've had a different experience, I would consider this bad general advice, especially when you need varying input to your function with something like iter_batched_ref which then also adds Vec::extend into your measurement, as well as a bunch of memory bandwidth.

@waywardmonkeys
Copy link
Contributor

FWIW, the black_box used before might not have been working, but now it is suggested to just use std::hint::black_box which might be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants