Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous performance benchmarking #234

Closed
TomNicholas opened this issue Jun 27, 2023 · 7 comments
Closed

Continuous performance benchmarking #234

TomNicholas opened this issue Jun 27, 2023 · 7 comments
Labels
benchmarks Example benchmark problem optimization

Comments

@TomNicholas
Copy link
Member

It would be useful if Cubed had performance tests that could check for regressions. This would especially useful to prevent complex optimizations (e.g. #221) from degrading performance or stability by accident.

There are really three things to test here: scaling (can we even run a large workload?), stability (how likely is it to fail?), & performance (how fast does it complete?). This is therefore related to but hopefully somewhat separable from #7.

Xarray uses airspeed velocity ("asv") for performance regression testing. Once set up, it can be run on any PR by adding a run-benchmark label, or run locally. (I think there is supposed to be a html dashboard of results somewhere, but the link to that on the readme badge appears to be broken.)

@tomwhite
Copy link
Member

This would be very useful. Is asv the right tool for this though? I always thought it was aimed at smaller benchmarks that run in a matter of seconds.

@TomNicholas
Copy link
Member Author

Matt has suggested that we could use the coiled/benchmarks repo to run cubed benchmarks. We would need to provide the API key for the AWS/GCP account that pays for the compute though. Alternatively we might just look there for inspiration, rather than actually merging cubed benchmarks.

@TomNicholas
Copy link
Member Author

I was thinking about how one good way to solve this problem in general would be to make a database, host it somewhere, then append to it when we run a new benchmark job... Then I looked closer at coiled/benchmarks and saw that that's pretty much exactly what they do.

I suggest we fork it into a new cubed_benchmarks repo1, strip out all the non-array stuff and the complex dask diagnostics that get recorded, and point it to write the results into a different bucket. Then we can add our own keys to run Cubed on AWS/GCP, and run dask on Coiled for comparison.

Once that works locally we can maybe add a github action to this repo that imports and runs the benchmarks defined in the benchmarks repo on certain conditions (e.g. main + any PR with a particular run-benchmark tag added to it).

A first step might be to decide what information we would want to record, and adjust the database schema accordingly. The new compute id in #382 gives us the unique run id at the top.

What do you think? Is that over-engineering it?

Footnotes

  1. Should we make a cubed-dev github organization? Then we can have cubed-dev/cubed, cubed-dev/cubed_xarray, and cubed-dev/benchmarks all in one place.

@tomwhite
Copy link
Member

Sounds great!

@TomNicholas
Copy link
Member Author

I noticed that the lithops release just now added this:

[Stats] Added new CPU, Memory and Network statistics in the function results

Which sounds maybe like a useful thing to record.

I was wondering in general how much information we want to be trying to store in the benchmark results database? i.e. just total job execution time, or the start and end times of every single container?

@TomNicholas TomNicholas changed the title Performance regression testing using asv? Continuous performance benchmarking Feb 22, 2024
@tomwhite
Copy link
Member

That's interesting. I hadn't seen that. If it's easy to store fine-grained information for each task then go ahead, but I don't think it's needed at the moment.

@tomwhite
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks Example benchmark problem optimization
Projects
None yet
Development

No branches or pull requests

2 participants