Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird different benchmark results for code that should be fairly identical #31

Open
vlovich opened this issue Nov 17, 2023 · 3 comments

Comments

@vlovich
Copy link

vlovich commented Nov 17, 2023

I have some benchmarks that looks like this:

use std::mem::MaybeUninit;

fn main() {
    let _ = memcache::CRATE_USED;
    divan::main();
}

fn weird_results_impl(b: divan::Bencher, size: usize) {
    const NUM_ITEMS: usize = 100_000;
    const CAPACITY: usize = NUM_ITEMS;
    let cache = vec![Default::default(); CAPACITY];
    let values = (0..NUM_ITEMS)
        .map(|_| vec![std::mem::MaybeUninit::<u8>::uninit(); size].into_boxed_slice())
        .collect::<Vec<_>>();
    b.counter(divan::counter::ItemsCount::new(NUM_ITEMS))
        .with_inputs(|| {
            (
                cache.clone(),
                values
                    .iter()
                    .enumerate()
                    .map(|(idx, v)| (idx % CAPACITY, v.clone()))
                    .collect::<Vec<_>>(),
            )
        })
        .bench_local_refs(|(cache, refs)| {
            for (entry, mem) in refs {
                cache[*entry] = std::mem::take(mem);
            }
        });
}

#[divan::bench]
fn weird_results_4kib(b: divan::Bencher) {
    weird_results_impl(b, 4 * 1024);
}

#[divan::bench]
fn weird_results_10b(b: divan::Bencher) {
    weird_results_impl(b, 10);
}

There's a fairly large discrepancy between the two

my-crate               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ weird_results_4kib  165.4 µs      │ 211.1 µs      │ 173.5 µs      │ 174.8 µs      │ 100     │ 100
│                      604.2 Mitem/s │ 473.5 Mitem/s │ 576.2 Mitem/s │ 571.8 Mitem/s │         │
╰─ weird_results_10b   80.53 µs      │ 110.5 µs      │ 83.22 µs      │ 84.07 µs      │ 100     │ 100
                       1.241 Gitem/s │ 904.2 Mitem/s │ 1.201 Gitem/s │ 1.189 Gitem/s │         │

This was run with mimalloc set as the allocator. AFAICT I'm not dropping any memory within the benchmark loop and the body of the loop shouldn't be doing anything more than shuffling some pointers around (i.e. should be the same amount of shuffling between the two runs I think). Is there something wrong with my benchmark or a bug in divan?

@nvzqz
Copy link
Owner

nvzqz commented Nov 17, 2023

I'm not able to reproduce your results when I don't add memcache or mimalloc. I can try again later with those added.

Screen Shot 2023-11-17 at 07 55 35

Also, you can use NUM_ITEMS directly since usize implements IntoCounter<Counter = ItemsCount>:

- b.counter(divan::counter::ItemsCount::new(NUM_ITEMS))
+ b.counter(NUM_ITEMS)

@vlovich
Copy link
Author

vlovich commented Nov 17, 2023

memcache is the name of my own crate - can be ignored. Strange that you're not seeing it. mimalloc might be needed to make things more obvious. You must have a faster machine for this benchmark since my 13900 doesn't get that fast with the standard allocator.

@OliverKillane
Copy link
Contributor

My suspicion is that:

The benchmarks are varied by the size parameter, which in turn affects the size of the values slices, which makes the clone inside .with_inputs(..) more expensive.

The with_inputs time seems to be caught in the main benchmark time, hence the large difference between the benchmarks.

See #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants