-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pick benchmarking tool #1
Comments
Very much to my surprise, I'm yet to find any tools which record performance stats at regular time intervals as each workload is run. (which perhaps isn't surprising because most existing benchmarking tools assume the workload will take a tiny fraction of a second.) I'm gonna put a little effort into exploring whether we could roll our own. Here's the draft design doc. |
I've started implementing a general-purpose IO-centric benchmarking tool: perfcapture. |
I'm gonna mark this as "done" now because we're using |
List of benchmarking tool as shared in today's meeting by @jakirkham:
|
Thanks! In my limited understanding, there's a distinction to be made between benchmarking tools versus profiliing tools. (Although I could be wrong!) My understanding is that benchmarking tools (like Profiling tools, in contrast, tend not to provide a "test harness" for defining and running workloads. Instead they measure the behaviour of any given process. Often in minute detail (CPU cache hits, memory bandwidth, IO bandwidth, etc.) I'll copy-and-paste @MSanKeys963's wonderful list into a new issue, to remind me to try at least one of the profiling tools |
We'd like to benchmark the performance of existing Zarr implementations, starting with Zarr-Python.
We've identified 5 benchmarking frameworks:
My current sense is that none of these packages are a perfect fit. The task is to decide if any of these packages fit well enough to be useful. I plan to more rigorously compare these 5 packages against our requirements.
The first 4 benchmarking frameworks are very focused on measuring the execution time of CPU-bound tasks, and detecting performance regressions. They answer questions like "does the latest release of the code reduce the runtime of the matrix multiplication function?". None of first 4 benchmarking frameworks are particularly interested in IO behaviour, or in comparing the behaviour of different projects.
I'm probably biased but I'd like our benchmarks to help answer questions like:
I'm almost certain that none of the 5 benchmarking frameworks listed above can help us answer these questions. So I'm wondering if we might be better rolling our own benchmarking framework. (Which shouldn't be too hard.
psutil
can measure utilisation of CPU(s), IO, etc. We could persist the benchmarks as JSON. And use something like streamlit to build a web UI.)Or maybe I'm getting over-excited and we should just use an existing benchmarking framework and be happy with just measuring execution time 🙂. We almost certainly don't want to try to answer all my questions for every PR! Maybe the automated benchmark suite should just measure execution time. And then we can do one-off, detailed, manual analyses to answer the questions above. Although i do think it could be extremely powerful to be able to share detailed, interactive analysis of Zarr's performance across a wide range of compute platforms, storage media, and Zarr implementations 🙂. And some of the questions above can only be answered after collecting a lot of performance data. So it might be nice to at least collect (but not analyse) lots of data on every benchmark run.
The text was updated successfully, but these errors were encountered: