Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How we do benchmarking #120

Open
nstarman opened this issue May 15, 2024 · 11 comments
Open

How we do benchmarking #120

nstarman opened this issue May 15, 2024 · 11 comments

Comments

@nstarman
Copy link
Member

nstarman commented May 15, 2024

Is there a reason, beyond historical, that we use ASV over pytest-benchmark ?
Looking at the two tools, pytest-benchmark has 1.5x more stars and 8x greater usage (as measured by GH dependencies). Also pytest-benchmark integrates into our existing pytest framework so that this repo might pull tests directly from astropy's test suite, using a pytest Mark (e.g. pytest.mark.benchmark_only).

@Cadair
Copy link
Member

Cadair commented May 15, 2024

As I very recently discovered the big feature asv has which pytest-benchmark does not is the ability to do memory usage benchmarking as well.

@nstarman
Copy link
Member Author

nstarman commented May 15, 2024

https://github.com/bloomberg/pytest-memray is a thin wrapper connecting https://github.com/bloomberg/memray (12K stars) to pytest.

Timing tests and memory benchmark tests are often two different tests, so IMO it's fine to use 2 popular tools, one for each.

@pllim
Copy link
Member

pllim commented May 15, 2024

Still not clear to me whether the pytest way can benchmark over a long period of time like asv can.

@nstarman
Copy link
Member Author

nstarman commented May 15, 2024

There is native capability: https://pytest-benchmark.readthedocs.io/en/latest/comparing.html, but this is where #117 also comes in to ingest that saved data from pytest-benchmark and then present that data in a more granular and explorable way. Pretty much the same way we do code coverage where pytest measure code coverage but we use a service (codecov.io) to better present and assess the data.

@pllim
Copy link
Member

pllim commented May 15, 2024

Given the dissatisfaction with Codecov since it was bought out, I am still unsure...

@nstarman
Copy link
Member Author

nstarman commented May 15, 2024

But with common open-source reports we can switch visualization methods. For codecov the tools use the same standard and we could switch to coveralls if we want. pytest-benchmark is similar in that people are building off of that format since it integrates with pytest. We can use pytest-benchmark out of the box or augment it. I find these concerns to be in favor of using more swappable tools and stuff that hooks into standard frameworks!

@Cadair
Copy link
Member

Cadair commented May 16, 2024

What other options for actually visualising the data exist? If there are already well maintained options I would feel a lot happier about it.

On the memory vs timing thing, while I agree the benchmarks are likely to be different, running two different pytest plugins, and (presumably?) two different visualisation / reporting tools feels like more effort? Maybe that's worth it, but I am not familiar with it all enough to know.

@astrofrog
Copy link
Member

Just to put it on the table, if we wanted we could use pytest-benchmark to define/run the benchmarks, and use asv for visualization - see discussion here - basically 'just' need to convert pytest-benchmark json to asv json.

@astrofrog
Copy link
Member

Interesting comment in that discussion

@nstarman
Copy link
Member Author

Pandas also has a good discussion about asv pandas-dev/pandas#45049

@astrofrog
Copy link
Member

astrofrog commented May 16, 2024

Looks like speed.python.org uses codespeed, not to be confused with https://codspeed.io (which apparently must be to measure the speed of fish)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants