Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get the Pyston macrobenchmarks working on main #175

Open
ericsnowcurrently opened this issue Dec 13, 2021 · 42 comments
Open

Get the Pyston macrobenchmarks working on main #175

ericsnowcurrently opened this issue Dec 13, 2021 · 42 comments
Assignees
Labels
benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc.

Comments

@ericsnowcurrently
Copy link
Collaborator

ericsnowcurrently commented Dec 13, 2021

@gvanrossum
Copy link
Collaborator

@brandtbucher Could you contribute the specific list of benchmark failures that you've experienced? Maybe we need a separate issue for each (unless they're all just "needs to use the fixed Cython release" :-).

@brandtbucher
Copy link
Member

Yep!

2021-12-11 01:49:11,016: 9 benchmarks failed:
2021-12-11 01:49:11,016: - aiohttp
2021-12-11 01:49:11,016: - djangocms
2021-12-11 01:49:11,016: - genshi
2021-12-11 01:49:11,016: - gevent_hub
2021-12-11 01:49:11,016: - gunicorn
2021-12-11 01:49:11,016: - kinto
2021-12-11 01:49:11,016: - mypy
2021-12-11 01:49:11,016: - pylint
2021-12-11 01:49:11,016: - pytorch_alexnet_inference

Give me a moment to gather the individual failure messages...

@brandtbucher
Copy link
Member

brandtbucher commented Dec 13, 2021

The logs are too long to leave in a comment. I've dumped the failures into a Gist instead:

https://gist.github.com/brandtbucher/9c1368c43f31b1d31c1efbd3d2fdf5e8

@gvanrossum
Copy link
Collaborator

So the "gist" is:

  • aiohttp: can't build dependency uvloop
  • djangocms: ditto Pillow
  • gevent_hub: ditto greenlets
  • gunicorn: depends on aiohttp (!)
  • kinto: failed to build dep uWSGI
  • mypy: failure building typed-ast (IIRC I've seen a fix, maybe a faulty version pin?)
  • pytorch_alexnet_inference: can't find torch 1.5.1

Runtime failures:

  • genshi: instantiates types.CodeType() with incorrect parameters (should be a simple fix)
  • pylint: uses inspect.formatargspecs; 3.10 docs state: "Deprecated since version 3.5: Use signature() and Signature Object, which provide a better introspecting API for callables."

@gvanrossum
Copy link
Collaborator

I propose to create separate TODO issues for each. I suspect the "In Review" column of the Project board might come in handy once we start working down the list...

@brandtbucher
Copy link
Member

brandtbucher commented Dec 13, 2021

@gvanrossum
Copy link
Collaborator

Do you mind if I assign this to you? Somebody needs to track all those fixes, nudge them along if needed, and update the requirements.txt files in PyPerformance once the fixes have been released. Since you tracked down so many of these already maybe we can make it official that you are the "whip" for this topic.

@brandtbucher brandtbucher self-assigned this Dec 13, 2021
@brandtbucher
Copy link
Member

Yup, I can herd these.

@JelleZijlstra
Copy link
Contributor

mypy pins an overly old version of typed-ast. The current release (0.910) has typed_ast >= 1.4.0, < 1.5.0 and typed-ast 1.5.1 has the fixes for 3.11.

typed-ast is an optional dependency for mypy though, only necessary on Python <3.8 or to typecheck Python 2 code. Perhaps you can get away with removing it from your benchmark.

@gvanrossum
Copy link
Collaborator

Benchmarks usually pin some older version of a package since we want benchmark numbers to be comparable over time. (Though sometimes we have to just toss old data anyway.)

@ericsnowcurrently
Copy link
Collaborator Author

Also see python/pyperformance#113.

@ericsnowcurrently
Copy link
Collaborator Author

@brandtbucher, what is the status on this? I ran the benchmarks today and am still seeing all the same failures. (#257) What else needs to be done to get the benchmarks to run successfully?

@gvanrossum
Copy link
Collaborator

Is this issue just about the pyston benchmarks?

@brandtbucher
Copy link
Member

Is this issue just about the pyston benchmarks?

Yeah, it should probably be renamed.

Also, I haven't forgotten about this; I'm just prioritizing the inline caching stuff right now since that needs to be done before the beta freeze. I have a hunch that the Pyston macrobenchmarks will help us out a lot with 3.12, but not so much with 3.11.

@markshannon
Copy link
Member

@kmod
OOI, do you have the full pyperformance results for Pyston? Something like https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst

@brandtbucher
Copy link
Member

@brandt Is there hope for fixing the djangocms and kinto dependencies?

Wrong Brandt. 😉

I haven't looked at those since last year, so I'm not entirely sure. If I remember correctly, the Pillow issue can be fixed by apt-installing some required JPEG library?

Looking at my old gist of the failures, the kinto ones look like they would be fairly mechanical fixes for our older frame changes. Not positive, though.

@gvanrossum
Copy link
Collaborator

Hm, I cannot get myself to claim that this is a priority before beta 1. Though it would be nice to have more numbers at the language summit, I don't think we should drop other work for it.

@kmod
Copy link

kmod commented Apr 11, 2022

Here are the full Pyston results on pyperformance:

+-------------------------+---------------------+------------------+--------------+------------------------+
| Benchmark               | cpython-3.8.12.json | pyston-main.json | Change       | Significance           |
+=========================+=====================+==================+==============+========================+
| 2to3                    | 381 ms              | 253 ms           | 1.51x faster | Significant (t=289.82) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| chameleon               | 10.7 ms             | 5.60 ms          | 1.90x faster | Significant (t=360.60) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| chaos                   | 123 ms              | 54.0 ms          | 2.28x faster | Significant (t=326.78) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| crypto_pyaes            | 123 ms              | 75.0 ms          | 1.63x faster | Significant (t=517.55) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| deltablue               | 7.86 ms             | 3.86 ms          | 2.03x faster | Significant (t=197.85) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| django_template         | 55.4 ms             | 29.5 ms          | 1.88x faster | Significant (t=141.57) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| dulwich_log             | 104 ms              | 62.4 ms          | 1.66x faster | Significant (t=245.64) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| fannkuch                | 537 ms              | 293 ms           | 1.83x faster | Significant (t=624.22) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| float                   | 132 ms              | 70.1 ms          | 1.88x faster | Significant (t=318.20) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| genshi_text             | 33.1 ms             | 17.3 ms          | 1.91x faster | Significant (t=125.83) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| genshi_xml              | 70.1 ms             | 39.9 ms          | 1.76x faster | Significant (t=161.55) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| go                      | 284 ms              | 152 ms           | 1.87x faster | Significant (t=244.38) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| hexiom                  | 11.1 ms             | 5.02 ms          | 2.20x faster | Significant (t=496.59) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| html5lib                | 104 ms              | 47.1 ms          | 2.21x faster | Significant (t=92.37)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| json_dumps              | 14.7 ms             | 11.8 ms          | 1.24x faster | Significant (t=132.17) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| json_loads              | 28.7 us             | 28.5 us          | 1.01x faster | Not significant        |
+-------------------------+---------------------+------------------+--------------+------------------------+
| logging_format          | 10.9 us             | 5.00 us          | 2.17x faster | Significant (t=182.81) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| logging_silent          | 216 ns              | 95.1 ns          | 2.28x faster | Significant (t=154.70) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| logging_simple          | 9.80 us             | 4.62 us          | 2.12x faster | Significant (t=116.45) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| mako                    | 17.8 ms             | 9.61 ms          | 1.85x faster | Significant (t=503.72) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| meteor_contest          | 122 ms              | 98.0 ms          | 1.24x faster | Significant (t=239.17) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| nbody                   | 151 ms              | 55.0 ms          | 2.75x faster | Significant (t=563.10) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| nqueens                 | 107 ms              | 66.4 ms          | 1.62x faster | Significant (t=451.94) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pathlib                 | 22.8 ms             | 19.1 ms          | 1.19x faster | Significant (t=82.63)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pickle                  | 11.3 us             | 10.3 us          | 1.09x faster | Significant (t=18.35)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pickle_dict             | 27.8 us             | 28.1 us          | 1.01x slower | Not significant        |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pickle_list             | 3.57 us             | 3.61 us          | 1.01x slower | Not significant        |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pickle_pure_python      | 498 us              | 234 us           | 2.13x faster | Significant (t=207.23) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pidigits                | 206 ms              | 206 ms           | 1.00x faster | Not significant        |
+-------------------------+---------------------+------------------+--------------+------------------------+
| pyflate                 | 758 ms              | 420 ms           | 1.80x faster | Significant (t=335.77) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| python_startup          | 10.8 ms             | 9.98 ms          | 1.08x faster | Significant (t=171.85) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| python_startup_no_site  | 7.12 ms             | 6.90 ms          | 1.03x faster | Significant (t=58.10)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| raytrace                | 543 ms              | 264 ms           | 2.06x faster | Significant (t=424.18) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| regex_compile           | 202 ms              | 88.5 ms          | 2.28x faster | Significant (t=349.36) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| regex_dna               | 195 ms              | 210 ms           | 1.08x slower | Significant (t=-93.74) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| regex_effbot            | 3.13 ms             | 3.31 ms          | 1.06x slower | Significant (t=-17.33) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| regex_v8                | 24.5 ms             | 23.6 ms          | 1.04x faster | Significant (t=16.80)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| richards                | 76.8 ms             | 42.4 ms          | 1.81x faster | Significant (t=72.64)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| scimark_fft             | 406 ms              | 233 ms           | 1.74x faster | Significant (t=571.42) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| scimark_lu              | 167 ms              | 77.2 ms          | 2.17x faster | Significant (t=199.71) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| scimark_monte_carlo     | 118 ms              | 45.5 ms          | 2.60x faster | Significant (t=415.92) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| scimark_sor             | 226 ms              | 94.4 ms          | 2.39x faster | Significant (t=264.28) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| scimark_sparse_mat_mult | 5.30 ms             | 3.16 ms          | 1.68x faster | Significant (t=628.21) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| spectral_norm           | 154 ms              | 97.1 ms          | 1.59x faster | Significant (t=637.32) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sqlalchemy_declarative  | 189 ms              | 133 ms           | 1.43x faster | Significant (t=64.29)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sqlalchemy_imperative   | 29.2 ms             | 15.7 ms          | 1.87x faster | Significant (t=84.83)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sqlite_synth            | 3.38 us             | 2.87 us          | 1.18x faster | Significant (t=48.58)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sympy_expand            | 616 ms              | 337 ms           | 1.83x faster | Significant (t=169.36) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sympy_integrate         | 27.3 ms             | 18.2 ms          | 1.50x faster | Significant (t=126.03) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sympy_str               | 376 ms              | 211 ms           | 1.78x faster | Significant (t=167.17) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| sympy_sum               | 218 ms              | 133 ms           | 1.65x faster | Significant (t=250.17) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| telco                   | 7.26 ms             | 5.71 ms          | 1.27x faster | Significant (t=77.84)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| tornado_http            | 180 ms              | 115 ms           | 1.56x faster | Significant (t=81.05)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| unpack_sequence         | 71.4 ns             | 37.8 ns          | 1.89x faster | Significant (t=140.30) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| unpickle                | 16.2 us             | 15.3 us          | 1.06x faster | Significant (t=3.94)   |
+-------------------------+---------------------+------------------+--------------+------------------------+
| unpickle_list           | 5.54 us             | 5.13 us          | 1.08x faster | Significant (t=30.67)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| unpickle_pure_python    | 371 us              | 183 us           | 2.02x faster | Significant (t=247.67) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| xml_etree_generate      | 102 ms              | 67.2 ms          | 1.51x faster | Significant (t=207.40) |
+-------------------------+---------------------+------------------+--------------+------------------------+
| xml_etree_iterparse     | 115 ms              | 91.3 ms          | 1.26x faster | Significant (t=67.72)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| xml_etree_parse         | 174 ms              | 160 ms           | 1.08x faster | Significant (t=25.69)  |
+-------------------------+---------------------+------------------+--------------+------------------------+
| xml_etree_process       | 80.6 ms             | 50.5 ms          | 1.60x faster | Significant (t=174.50) |
+-------------------------+---------------------+------------------+--------------+------------------------+

@gvanrossum
Copy link
Collaborator

@kmod We are supposed to have the pyperformance issues ironed out in the next release: 1.0.5, hopefully this week. To make sure that we're not overlooking anything, could you try to run that one with pyperformance checked out from its repo head? (That should work. If it doesn't we may have to hold up the 1.0.5 release until we've fixed the new issue.)

We finally released pyperformance 1.0.5 and that should be able to build and run all benchmarks. So now you should be able to update the pyston speed center with one more bar. :-)

@mdboom
Copy link
Contributor

mdboom commented Jun 8, 2022

I'm working on picking this back up.

It seems like there's two categories of issues (1) dependencies not compatible with Python 3.11 yet, and (2) the issue with measuring a webserver in a separate process.

For (1), that's just fiddly work, and we can probably produce some custom wheels and put them somewhere until packages are updated.

For (2), @kmod, did you get anywhere with this? I can at least attempt to carry the baton forward.

I think these could be addressed by starting up a single webserver and using it for all calls to the benchmarking function. I tried a quick hack of this but it didn't work due to the multi-process architecture of pyperf/pyperformance.

@kmod
Copy link

kmod commented Jun 8, 2022

Yes! This issue is addressed in the latest version of our macrobenchmarks repo. I still consider my solution a hack, but chatting briefly with @ericsnowcurrently we didn't come up with a proper solution that seemed worth the effort.

So anyway while it's a bit hacky it does work, and I switched from our custom script to using pyperformance for the numbers I collected in our blog post today.

@mdboom
Copy link
Contributor

mdboom commented Jun 9, 2022

Great news!

Maybe I'm just doing something wrong -- I think there still might be guards against running the benchmarks in pyperformance.

Running

python3 -m pyperformance run --manifest $PWD/benchmarks/MANIFEST -b all

I still get

Exception: pyperformance doesn't support the pyston macrobenchmark suite yet

from most of the benchmarks.

@kmod
Copy link

kmod commented Jun 9, 2022

Oh, sorry we only look at WEB_MANIFEST on a daily basis, so I'm not sure the other ones are all properly migrated at this point.

I believe though that that particular error message should be gone with my recent commits; are you maybe using the macrobenchmarks submodule from our main pyston repo? The PR to update the submodule commit hasn't been merged yet so that one will still give you this message.

@kmod
Copy link

kmod commented Jun 9, 2022

Separately, I'll work on migrating the other benchmarks that need it.

@mdboom
Copy link
Contributor

mdboom commented Jun 9, 2022

Sorry -- it looks like many of these were fixed just in the last 2 days and I hadn't updated my fork.

@mdboom mdboom moved this from Todo to In Progress in Fancy CPython Board Jun 10, 2022
@mdboom mdboom self-assigned this Jun 10, 2022
@mdboom
Copy link
Contributor

mdboom commented Jun 13, 2022

A status update:

I have all but three of the benchmarks working with Python 3.11. You can see my changes/hacks that were necessary here.

Benchmarks requiring changes

aiohttp (DONE: pyston/python-macrobenchmarks#8), gunicorn: (PR: pyston/python-macrobenchmarks#9)

  • Re-Cythonizing the dependencies uvloop and yarl.

mypy: (DONE: pyston/python-macrobenchmarks#7)

  • Update mypy to 0.961 and the corresponding requirements.
  • Pass clean_exit=True to mypy to prevent it from tearing down the process quickly with os._exit and then missing the opportunity to report the benchmark timing.
  • Remove the mypyc benchmark which is identical, except for fewer loops, but reports the same "name" to bench_time_func thus raising an error.

pylint: (DONE: Submitted upstream)

  • Update wrapt to 1.14.1

Benchmarks still not working

gevent_hub:

  • greenlet 's git main works on Python 3.11, however, there are API changes required to gevent to make this work. Waiting to see if the 3.11 changes to greenlet will be backported to a stable branch.
  • Lots of detail on what's blocking this here

kinto:

  • uwsgi requires changes to replace deprecated C API calls. I was able to get this to work and passing uswgi's test suite, however, the benchmark itself doesn't seem to start nginx successfully. I don't know if those two things are related.

pytorch_alexnet_inference:

Haven't tackled this one yet.

@brandtbucher brandtbucher removed their assignment Jun 20, 2022
@Lalufu
Copy link

Lalufu commented Jun 27, 2022

@mdboom could you share what you did to make uwsgi build and work against 3.11?

@mdboom
Copy link
Contributor

mdboom commented Jun 27, 2022

I have a branch here, that compiles and passes the test suite, but I don't consider it a "mergeable" solution. In particular, I think the initialization of the interpreter to use the new PyConfig API may not be correct -- earlier versions let "initialization" things happen later, but the new API forces a more strict ordering that isn't really compatible with the current logic in uwsgi.

@Lalufu
Copy link

Lalufu commented Jun 27, 2022

Thanks, that's pretty helpful.

@Lalufu
Copy link

Lalufu commented Jun 30, 2022

There's a PR on uwsgi for making 3.11 work coming from the Fedora side which looks roughly similar (at least to me): unbit/uwsgi#2453

@mdboom mdboom moved this from In Progress to Todo in Fancy CPython Board Jul 26, 2022
@mdboom mdboom added the benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc. label Aug 2, 2022
@mdboom mdboom moved this from Todo to In Progress in Fancy CPython Board Aug 16, 2022
@stuaxo
Copy link

stuaxo commented Sep 27, 2022

So the uwsgi patch mentioned above is in, so if that works, that leaves just gevent I guess ?

@mdboom mdboom moved this from In Progress to Done in Fancy CPython Board Oct 31, 2022
@markshannon
Copy link
Member

@mdboom Is this done now?

@mdboom
Copy link
Contributor

mdboom commented Jan 8, 2023

No, the gevent stuff is still pending as far as I can tell.

@mdboom
Copy link
Contributor

mdboom commented Feb 28, 2023

Still only waiting on gevent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc.
Projects
Development

No branches or pull requests

9 participants