Python Comparison Scripts #9

alliepiper · 2021-03-29T16:57:52Z

NVBench has a work-in-progress JSON output format and I'm working on a very basic python script to compare two JSON files.

We should grow this functionality into a more complete set of analysis tools. At minimum, this should cover the features provided by Google Benchmarks' excellent comparison scripts.

If anyone is interested in writing some python to help with this, let me know. I'll update this issue once I have finalized the JSON output format.

Basic Regression Testing

P0: Compare two json files:
compare.py baseline.json test.json
P0: Specify a custom error threshold:
compare.py --gpu-threshold 5 baseline.json test.json
(gpu-threshold, cpu-threshold, batch-threshold)
P2: Run a benchmark executable and compare with a json
compare.py baseline.json --run test.exe -b 3 -a T=[I32,U64] -a Elements[pow2]=30

These should:

Compare the benchmarks with the same name + config.
Print abs/rel changes for cpu/gpu/batch measurements.
Highlight any entries that exceed a threshold time.
Return an error code if any exceed thresholds.

Analysis modes

Compare benchmarks with different names. Answers questions:

How much faster is benchmark X for input type T vs. U for a variety of input sizes?
Does Algorithm X take more time to run than Algorithms Y for the same inputs?

These will need some way of specifying the sets of configurations to compare. Google benchmark has worked out a general syntax for specifying this, we should adapt what they've done to use the NVBench axis syntax.

Output

Ideally markdown formatted, similar to NVBench's default output.

References

Google Benchmark compare.py docs
NVBench's axis specification syntax

The text was updated successfully, but these errors were encountered:

shwina · 2021-04-28T13:53:52Z

I'd be interested to help out here!

vyasr · 2021-04-28T15:37:09Z

I can help as well if you need extra hands.

alliepiper · 2021-05-04T16:40:39Z

Initial work is in NVIDIA/thrust#14.

vyasr · 2022-01-05T18:53:53Z

@allisonvacanti what's next here? NVIDIA/thrust#14 helped close the gap, but I don't recall exactly how far it got us or what we still need to do. RAPIDS is making a push to formalize and analyze our benchmarks more, so migrating fully to nvbench is probably going to become a priority in the near future and I'm happy to help out in making sure that we have sufficient feature parity with google bench.
CC @shwina in case you want to continue being involved too.

robertmaynard · 2022-01-05T19:22:20Z

Basic thresholding and comparing multiple files was added in NVIDIA/thrust#48

alliepiper · 2022-01-10T17:50:32Z

There's a lot we could still do, such as filtering the results by benchmark name/index and axis values. But I don't think these are essential right now.

Are there any "must have" features for RAPIDS that we're missing?

BTW, I'm working on a branch that makes some changes to the JSON file layout to make things more consistent. I hope to have that merged by the end of the week, time permitting 🤞

vyasr · 2022-01-12T19:16:49Z

I assume the changes you're referring to are NVIDIA/thrust#70? It looks great! 🎉

@robertmaynard @jrhemstad @harrism any thoughts on what we would need to see in nvbench to make the transition from gbench smooth for RAPIDS?

jrhemstad · 2022-01-12T19:20:51Z

The gbench compare script had the ability to do a U Test between two samples to determine if there was a statistically significant difference between the populations. Do we have anything like that in nvbench yet?

It's very helpful when looking at small differences in performance to establish if the difference is just "noise" or actually meaningful.

alliepiper · 2022-01-12T20:15:18Z

@vyasr Yep! That PR has all of my pending changes to the JSON/Python stuff.

@jrhemstad We don't have anything like that at the moment.

One feature that I'd like to see at some point is the ability to compare performance between different benchmarks that use the same axes.

For example, see NVIDIA/cccl#720, which points out that thrust::all_of is slower than thrust::count_if. It'd be nice to be able to write some automated tests that check the performance of equivalent algorithms and identify these sorts of issues.

alliepiper added the P0: must have Absolutely necessary. Critical issue, major blocker, etc. label Mar 29, 2021

alliepiper self-assigned this Mar 29, 2021

alliepiper added this to the 1.0 - Initial Public Release milestone Mar 29, 2021

alliepiper added the type: enhancement New feature or request. label Mar 29, 2021

jrhemstad mentioned this issue Apr 28, 2021

[FEA] Use nvbench for new benchmarks and transition old benchmarks to use nvbench rapidsai/cudf#7960

Closed

alliepiper assigned brycelelbach, vyasr and shwina May 4, 2021

jrhemstad added this to CCCL Aug 14, 2022

jrhemstad moved this to Needs Triage in CCCL Aug 14, 2022

jrhemstad removed the status in CCCL Aug 14, 2022

alliepiper unassigned alliepiper, brycelelbach, vyasr and shwina Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Comparison Scripts #9

Python Comparison Scripts #9

alliepiper commented Mar 29, 2021 •

edited

Loading

shwina commented Apr 28, 2021

vyasr commented Apr 28, 2021

alliepiper commented May 4, 2021

vyasr commented Jan 5, 2022

robertmaynard commented Jan 5, 2022

alliepiper commented Jan 10, 2022

vyasr commented Jan 12, 2022

jrhemstad commented Jan 12, 2022

alliepiper commented Jan 12, 2022 •

edited

Loading

Python Comparison Scripts #9

Python Comparison Scripts #9

Comments

alliepiper commented Mar 29, 2021 • edited Loading

Basic Regression Testing

Analysis modes

Output

References

shwina commented Apr 28, 2021

vyasr commented Apr 28, 2021

alliepiper commented May 4, 2021

vyasr commented Jan 5, 2022

robertmaynard commented Jan 5, 2022

alliepiper commented Jan 10, 2022

vyasr commented Jan 12, 2022

jrhemstad commented Jan 12, 2022

alliepiper commented Jan 12, 2022 • edited Loading

alliepiper commented Mar 29, 2021 •

edited

Loading

alliepiper commented Jan 12, 2022 •

edited

Loading