-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance benchmarks #42
Comments
Is there a way to have a public record of the benchmarks? I'm thinking of something like what https://codecov.io is to |
There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%. |
That's a really nice blog post, thanks for sharing! The GitHub Actions doesn't look trivial to setup though 😅 I did find https://github.com/benchmark-action/github-action-benchmark but they don't support |
After #168 we'll have a pretty good suite of benchmarks. The following two tasks remain for closing out this issue:
|
We're starting to experiment with using If there's interest, I can help with setting up the CI infrastructure for CodSpeed this year. This would require some refactoring of the current benchmarks from ASV to |
I recently started using CodSpeed for ndpyramid after reading your comment and it seems really neat! I agree that it could work well for xbatcher, since the necessity for locally running the benchmarks is a barrier to use. It's also nice that the same code can be used for tests and benchmarks. Fully support you trying it for xbatcher! |
Xbatcher is meant to make it easy to generate batches from Xarray datasets and feed them into machine learning libraries. As we wrote in its roadmap, we are also considering various options to improve batch generation performance. I think it's clear to everyone that naively looping through arbitrary xarray datasets will not be sufficiently performant for most applications (see #37 for examples / discussion). We need tools/models/etc. to handle things like caching, shuffling, and parallel loading and we need a framework to evaluate the performance benefits added features.
Proposal
Before we start optimizing xbatcher, we should develop a framework for evaluating performance benefits. I propose we setup ASV and develop a handful of basic batch generation benchmarks. ASV is used by Xarray and a bunch of other related projects. It allows writing custom benchmarks like this:
example 1:
We could do the same here, but with a focus on batch generation. As we talk about adding performance optimizations, I think this is the only way we begin to evaluate their benefits.
The text was updated successfully, but these errors were encountered: