Continuous performance benchmarking #234

TomNicholas · 2023-06-27T15:13:11Z

It would be useful if Cubed had performance tests that could check for regressions. This would especially useful to prevent complex optimizations (e.g. #221) from degrading performance or stability by accident.

There are really three things to test here: scaling (can we even run a large workload?), stability (how likely is it to fail?), & performance (how fast does it complete?). This is therefore related to but hopefully somewhat separable from #7.

Xarray uses airspeed velocity ("asv") for performance regression testing. Once set up, it can be run on any PR by adding a run-benchmark label, or run locally. (I think there is supposed to be a html dashboard of results somewhere, but the link to that on the readme badge appears to be broken.)

The text was updated successfully, but these errors were encountered:

tomwhite · 2023-06-27T15:59:36Z

This would be very useful. Is asv the right tool for this though? I always thought it was aimed at smaller benchmarks that run in a matter of seconds.

TomNicholas · 2023-12-09T20:22:56Z

Matt has suggested that we could use the coiled/benchmarks repo to run cubed benchmarks. We would need to provide the API key for the AWS/GCP account that pays for the compute though. Alternatively we might just look there for inspiration, rather than actually merging cubed benchmarks.

TomNicholas · 2024-02-16T01:59:52Z

I was thinking about how one good way to solve this problem in general would be to make a database, host it somewhere, then append to it when we run a new benchmark job... Then I looked closer at coiled/benchmarks and saw that that's pretty much exactly what they do.

I suggest we fork it into a new cubed_benchmarks repo¹, strip out all the non-array stuff and the complex dask diagnostics that get recorded, and point it to write the results into a different bucket. Then we can add our own keys to run Cubed on AWS/GCP, and run dask on Coiled for comparison.

Once that works locally we can maybe add a github action to this repo that imports and runs the benchmarks defined in the benchmarks repo on certain conditions (e.g. main + any PR with a particular run-benchmark tag added to it).

A first step might be to decide what information we would want to record, and adjust the database schema accordingly. The new compute id in #382 gives us the unique run id at the top.

What do you think? Is that over-engineering it?

Should we make a cubed-dev github organization? Then we can have cubed-dev/cubed, cubed-dev/cubed_xarray, and cubed-dev/benchmarks all in one place. ↩

tomwhite · 2024-02-16T10:50:47Z

Sounds great!

TomNicholas · 2024-02-22T16:10:41Z

I noticed that the lithops release just now added this:

[Stats] Added new CPU, Memory and Network statistics in the function results

Which sounds maybe like a useful thing to record.

I was wondering in general how much information we want to be trying to store in the benchmark results database? i.e. just total job execution time, or the start and end times of every single container?

tomwhite · 2024-02-22T16:22:42Z

That's interesting. I hadn't seen that. If it's easy to store fine-grained information for each task then go ahead, but I don't think it's needed at the moment.

tomwhite · 2024-03-21T10:00:27Z

Moved to https://github.com/cubed-dev/cubed-benchmarks

TomNicholas added optimization benchmarks Example benchmark problem labels Jun 27, 2023

TomNicholas mentioned this issue Jun 27, 2023

Performance regression - max_workers not being used #235

Closed

TomNicholas changed the title ~~Performance regression testing using asv?~~ Continuous performance benchmarking Feb 22, 2024

tomwhite closed this as completed Mar 21, 2024

tomwhite mentioned this issue Mar 26, 2024

Task-level networking stats #437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous performance benchmarking #234

Continuous performance benchmarking #234

TomNicholas commented Jun 27, 2023

tomwhite commented Jun 27, 2023

TomNicholas commented Dec 9, 2023

TomNicholas commented Feb 16, 2024

tomwhite commented Feb 16, 2024

TomNicholas commented Feb 22, 2024

tomwhite commented Feb 22, 2024

tomwhite commented Mar 21, 2024

Continuous performance benchmarking #234

Continuous performance benchmarking #234

Comments

TomNicholas commented Jun 27, 2023

tomwhite commented Jun 27, 2023

TomNicholas commented Dec 9, 2023

TomNicholas commented Feb 16, 2024

Footnotes

tomwhite commented Feb 16, 2024

TomNicholas commented Feb 22, 2024

tomwhite commented Feb 22, 2024

tomwhite commented Mar 21, 2024