Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline benchmarking tests #54

Merged
merged 5 commits into from
Jul 7, 2017
Merged

Baseline benchmarking tests #54

merged 5 commits into from
Jul 7, 2017

Conversation

goodboy
Copy link
Contributor

@goodboy goodboy commented Jul 5, 2017

This adds some basic speed tests for the _MultiCall loop.
The goal is to extend this out so what we avoid performance degradation in addressing #23 and #15.
The next set of tests will include full code path(s) initiated by _HookCaller methods.
I was also thinking of adding collection speed tests.

Treat this PR as a first draft to get some opinions going.
So far the results are about what I'd expect - pypy killing it, py27 and py36 the runners up.

@RonnyPfannschmidt
Copy link
Member

at first glance it looks good, is there a easy way to publish the results?

@goodboy
Copy link
Contributor Author

goodboy commented Jul 5, 2017

@RonnyPfannschmidt good question.
The travis CI runs contain results in the console output but I don't know if there's a better way.

Hopefully @ionelmc can comment.


@pytest.mark.benchmark
def test_hookwrapper_speed(benchmark):
benchmark(inner_exec, [w1, w2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have something like

execute = MC(methods, {'arg1': 1, 'arg2': 2, 'arg3': 3}).execute
benchmark(execute, [w1, w2])

to avoid including the wrapper overhead in your timings. This would matter if the execute method is really really fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ionelmc the wrapper overhead is actually more true to call usage by a _HookCaller which internally calls its _HookCaller._hookexec() which in turn eventually invokes the PlugginManager._inner_hookexec().

My next move once this stuff lands is to add benchmarks around those higher level components such that we speed test the full API code paths.

@ionelmc by the way did you see our question about how to better plubish the benchmarking results for CI or on github? I noticed in your docs that you've got notes on how to render results to json but is there any known way to present results here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right. I don't know much about the pluggy internals.

About displaying results, and running pytest-benchmark in CI ... well, it's not good. CI will run each with lots of external bias, so timings aren't consistent. You could look at perf counters or profile data (my plugin only shows some profile data but not well integrated in the stats yet) or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.

Just to restate, saving benchmark data somewhere and comparing against previous runs would be unreliable due to virtualization and contention in TravisCI. We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?

And testing on pypy further complicates things because of warmup (the plugin has such feature but you need to set it correctly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ionelmc I didn't even think about the effects of the CI server.. very good point.

or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.

Can you elaborate on this? Are you suggesting this approach will work in CI?

I think I'll still include a baseline from the machine I develop on at least as a reference.

We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?

Yes this would be super useful. Can I pull this from the profiling data; I'm guessing it can't be used with --benchmark-compare-fail?

@goodboy goodboy force-pushed the benchmarking branch 2 times, most recently from 4a2f399 to d8d6fdd Compare July 5, 2017 18:11
@nicoddemus
Copy link
Member

I like the output, nice work!

How do we plan to track if some change improves or decreases performance? Manually? Any idea if it would be possible to automate it without too much effort?

@goodboy
Copy link
Contributor Author

goodboy commented Jul 5, 2017

@nicoddemus I was planning a couple progress tracking ideas:

  • parameterize tests with different core-component implementations and group them to demonstrate perf differences
  • commit snapshots of benchmark results and track regressions with the –benchmark-compare=NUM|ID and --benchmark-compare-fail=EXPR flags

I was going to add this in my next PR which will attempt to address #23 with accompanying benchmarks to make everyone confident about the changes.

This PR should be viewed as just an initial (and non-invasive) introduction of pytest-benchmark

@goodboy
Copy link
Contributor Author

goodboy commented Jul 5, 2017

Also note the appveyor builds are failing because of missing pypy?
Do we actually need the appveyor runs given we already have travis?

@nicoddemus
Copy link
Member

@tgoodlet awesome, sounds like a good plan.

One idea: how about creating a new environment just to run the tests for pytest benchmark? That would avoid cluttering all runs with it, plus would give us flexibility to run different comparisons on the CI.

@nicoddemus
Copy link
Member

Also note the appveyor builds are failing because of missing pypy?
Do we actually need the appveyor runs given we already have travis?

Well it is a good practice, given that we also test all python versions in both Linux and Windows. It should be working though, pytest uses the same mechanism basically.

@nicoddemus
Copy link
Member

Actually pypy is installed, it is crashing midway though. But as I mentioned earlier, I suggest we have a separate env just for the benchmark tests. This would solve this problem, plus we can choose to skip them on Windows.

@goodboy
Copy link
Contributor Author

goodboy commented Jul 5, 2017

@nicoddemus ah I see the error now:
ERROR: InvocationError: 'C:\\projects\\pluggy\\.tox\\pypy-pytest30\\bin\\py.test.EXE testing/'
Any idea what's going on there - are you saying it's due to the new tests?

@nicoddemus in terms of separate (tox) envs that sounds fine to me although won't we still be interested in performance diffs on each pytest and python version anyway? I was also thinking maybe it makes more sense to stick all the perf tests in a separate test module clearly isolating them instead of relying solely on marks. I'll try to add the separate env later today.

@nicoddemus
Copy link
Member

Any idea what's going on there - are you saying it's due to the new tests?

It seems to me the tests are abruptly terminating, which indicates a crash to me.

in terms of separate (tox) envs that sounds fine to me although won't we still be interested in performance diffs on each pytest and python version anyway?

We might have different benchmark envs (bench-py27-pytest28, etc), although I feel that initially we should keep things simple with a single benchmark env; we want to keep benchmark logs on the repository for comparison, and adding another level of complexity might be overkill at this point.

I was also thinking maybe it makes more sense to stick all the perf tests in a separate test module clearly isolating them instead of relying solely on marks.

Good point. In fact we can name it so pytest won't pick it by default (benchmarks.py?), which makes sense, so the benchmark env would look like this:

[testenv:bench]
commands= py.test testing/benchmarks.py
deps=
    pytest
    pytest-benchmark

@ionelmc
Copy link
Member

ionelmc commented Jul 6, 2017

In fact we can name it so pytest won't pick it by default (benchmarks.py?), which makes sense, so the benchmark env would look like this

You can also stick --benchmark-disable or --benchmark-skip in addopts or somewhere for same effect.

Tyler Goodlet added 3 commits July 7, 2017 01:40
This begins an effort to incorporate run-time speed tests using
`pytest-benchmark`. This initial test set audits the `_MultiCall`
loop with hookimpls, hookwrappers and the combination of both.
The intention is to eventually have a reliable set of tests which
enable making core component modifications without disrupting
performance as per #37.
@goodboy
Copy link
Contributor Author

goodboy commented Jul 7, 2017

@nicoddemus @RonnyPfannschmidt alright made the requested changes.
I made two travis runtimes for py27 and 36 which can be used to track results.

If possible I'd like to get this PR in before adding more tests.
Let me know what you guys think!

@goodboy
Copy link
Contributor Author

goodboy commented Jul 7, 2017

Just made it even better by parametrizing over multiple hook/wrapper counts.

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Just a note: you might want to consider to push to your branches to your fork instead of the main repository; this avoids running the CI services twice. That's our recommendation in pytest and plugins at least. 👍

@goodboy
Copy link
Contributor Author

goodboy commented Jul 7, 2017

you might want to consider to push to your branches to your fork instead of the main repository

@nicoddemus I'm fine to do that if it's preferred. What do you mean by CI running twice? Seems to me it's the same either way?

@goodboy goodboy merged commit 9b2bc8c into master Jul 7, 2017
@goodboy goodboy deleted the benchmarking branch July 7, 2017 19:07
@nicoddemus
Copy link
Member

What do you mean by CI running twice? Seems to me it's the same either way?

The end result is the same, is just that Travis and AppVeyor will run twice each: 1 for the PR and another for the branch (at least in the current configuration). Also this leaves the main repository cleaner without topic branches. But really, it is no big deal and I'm mentioning just because 4 builds might take longer than just two. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants