Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance benchmarks #42

Open
jhamman opened this issue Nov 18, 2021 · 6 comments
Open

Performance benchmarks #42

jhamman opened this issue Nov 18, 2021 · 6 comments

Comments

@jhamman
Copy link
Contributor

jhamman commented Nov 18, 2021

Xbatcher is meant to make it easy to generate batches from Xarray datasets and feed them into machine learning libraries. As we wrote in its roadmap, we are also considering various options to improve batch generation performance. I think it's clear to everyone that naively looping through arbitrary xarray datasets will not be sufficiently performant for most applications (see #37 for examples / discussion). We need tools/models/etc. to handle things like caching, shuffling, and parallel loading and we need a framework to evaluate the performance benefits added features.

Proposal

Before we start optimizing xbatcher, we should develop a framework for evaluating performance benefits. I propose we setup ASV and develop a handful of basic batch generation benchmarks. ASV is used by Xarray and a bunch of other related projects. It allows writing custom benchmarks like this:

example 1:

class HugeAxisSmallSliceIndexing:
    # https://github.com/pydata/xarray/pull/4560
    def setup(self):
        self.filepath = "test_indexing_huge_axis_small_slice.nc"
        if not os.path.isfile(self.filepath):
            xr.Dataset(
                {"a": ("x", np.arange(10_000_000))},
                coords={"x": np.arange(10_000_000)},
            ).to_netcdf(self.filepath, format="NETCDF4")

        self.ds = xr.open_dataset(self.filepath)

    def time_indexing(self):
        self.ds.isel(x=slice(100))

    def cleanup(self):
        self.ds.close()

We could do the same here, but with a focus on batch generation. As we talk about adding performance optimizations, I think this is the only way we begin to evaluate their benefits.

@weiji14
Copy link
Member

weiji14 commented Sep 1, 2022

Is there a way to have a public record of the benchmarks? I'm thinking of something like what https://codecov.io is to pytest-cov. I found airspeed-velocity/asv#796 which is a GitHub Action solution, but was wondering if there's a nicer way to track performance over time with each merged PR on a line chart.

@maxrjones
Copy link
Member

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

@weiji14
Copy link
Member

weiji14 commented Sep 1, 2022

That's a really nice blog post, thanks for sharing! The GitHub Actions doesn't look trivial to setup though 😅 I did find https://github.com/benchmark-action/github-action-benchmark but they don't support asv (yet). Maybe we should find a way to piggyback on to https://pandas.pydata.org/speed/xarray?

@maxrjones
Copy link
Member

After #168 we'll have a pretty good suite of benchmarks.

The following two tasks remain for closing out this issue:

  • Periodically run benchmarks in CI to identify any issues with the asv setup or performance regressions
  • Configure asv to compare new Xarray releases, since xbatcher's performance is so tied to Xarray's

@weiji14
Copy link
Member

weiji14 commented Jan 1, 2024

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

We're starting to experiment with using pytest-codspeed at PyGMT for benchmarking (see GenericMappingTools/pygmt#2910 and GenericMappingTools/pygmt#2908). CodSpeed seems to solve the problem of inconsistency by measuring CPU cycles and memory accesses instead of execution time, but this can be less intuitive in some cases, since more CPU cycles used doesn't always mean slower execution time.

If there's interest, I can help with setting up the CI infrastructure for CodSpeed this year. This would require some refactoring of the current benchmarks from ASV to pytest-benchmark, but this would allows us to track performance benchmarks publicly like CodeCov (see https://codspeed.io/explore), rather than having to compare runs locally. Thoughts anyone?

@maxrjones
Copy link
Member

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

We're starting to experiment with using pytest-codspeed at PyGMT for benchmarking (see GenericMappingTools/pygmt#2910 and GenericMappingTools/pygmt#2908). CodSpeed seems to solve the problem of inconsistency by measuring CPU cycles and memory accesses instead of execution time, but this can be less intuitive in some cases, since more CPU cycles used doesn't always mean slower execution time.

If there's interest, I can help with setting up the CI infrastructure for CodSpeed this year. This would require some refactoring of the current benchmarks from ASV to pytest-benchmark, but this would allows us to track performance benchmarks publicly like CodeCov (see https://codspeed.io/explore), rather than having to compare runs locally. Thoughts anyone?

I recently started using CodSpeed for ndpyramid after reading your comment and it seems really neat! I agree that it could work well for xbatcher, since the necessity for locally running the benchmarks is a barrier to use. It's also nice that the same code can be used for tests and benchmarks. Fully support you trying it for xbatcher!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants