Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each instruction is two codewords, and consists of "opcode, oparg, 0, 0" #100106

Closed
wants to merge 8 commits into from

Conversation

iritkatriel
Copy link
Member

This emits "opcode, oparg, 0, 0" for each instruction.

Still debugging some test failures related to line numbers/tracing etc. But this works well enough to benchmark with pyperformance:

+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Benchmark               | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670439089-iritkatriel-linux/pyperformance-results.json.gz | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz |
+=========================+================================================================================================================+================================================================================================================+
| 2to3                    | 247 ms                                                                                                         | 255 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_generators        | 356 ms                                                                                                         | 360 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_none         | 533 ms                                                                                                         | 541 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_cpu_io_mixed | 741 ms                                                                                                         | 762 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_io           | 1.33 sec                                                                                                       | 1.34 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_memoization  | 636 ms                                                                                                         | 677 ms: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chameleon               | 6.57 ms                                                                                                        | 6.30 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chaos                   | 67.3 ms                                                                                                        | 69.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| bench_thread_pool       | 769 us                                                                                                         | 785 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| coroutines              | 25.2 ms                                                                                                        | 25.9 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| crypto_pyaes            | 77.0 ms                                                                                                        | 74.9 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy                | 329 us                                                                                                         | 335 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_reduce         | 2.86 us                                                                                                        | 2.95 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_memo           | 34.3 us                                                                                                        | 34.9 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deltablue               | 3.24 ms                                                                                                        | 3.44 ms: 1.06x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| django_template         | 32.7 ms                                                                                                        | 33.3 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| docutils                | 2.49 sec                                                                                                       | 2.52 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| dulwich_log             | 61.0 ms                                                                                                        | 61.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| fannkuch                | 380 ms                                                                                                         | 387 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| float                   | 72.8 ms                                                                                                        | 76.6 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_text             | 20.6 ms                                                                                                        | 20.7 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_xml              | 47.9 ms                                                                                                        | 47.4 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| go                      | 137 ms                                                                                                         | 143 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| hexiom                  | 6.11 ms                                                                                                        | 6.35 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| html5lib                | 59.0 ms                                                                                                        | 62.1 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| json_dumps              | 9.29 ms                                                                                                        | 9.34 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_format          | 6.27 us                                                                                                        | 6.43 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_silent          | 91.6 ns                                                                                                        | 94.8 ns: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_simple          | 5.71 us                                                                                                        | 5.81 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mako                    | 9.73 ms                                                                                                        | 9.62 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mdp                     | 2.51 sec                                                                                                       | 2.59 sec: 1.03x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nbody                   | 94.3 ms                                                                                                        | 90.2 ms: 1.05x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nqueens                 | 83.3 ms                                                                                                        | 81.1 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle                  | 10.1 us                                                                                                        | 10.2 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_dict             | 30.9 us                                                                                                        | 31.1 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_list             | 4.16 us                                                                                                        | 4.06 us: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_pure_python      | 280 us                                                                                                         | 290 us: 1.04x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pycparser               | 1.13 sec                                                                                                       | 1.12 sec: 1.02x faster                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pyflate                 | 405 ms                                                                                                         | 425 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup          | 8.56 ms                                                                                                        | 8.59 ms: 1.00x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup_no_site  | 6.28 ms                                                                                                        | 6.31 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| raytrace                | 278 ms                                                                                                         | 284 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_compile           | 130 ms                                                                                                         | 133 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_dna               | 206 ms                                                                                                         | 202 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_effbot            | 3.76 ms                                                                                                        | 3.62 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_v8                | 22.2 ms                                                                                                        | 21.9 ms: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| richards                | 42.3 ms                                                                                                        | 43.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_fft             | 315 ms                                                                                                         | 310 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_lu              | 106 ms                                                                                                         | 109 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_monte_carlo     | 68.3 ms                                                                                                        | 69.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sor             | 105 ms                                                                                                         | 119 ms: 1.13x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sparse_mat_mult | 4.24 ms                                                                                                        | 3.99 ms: 1.06x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| spectral_norm           | 99.4 ms                                                                                                        | 95.8 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_parse           | 1.34 ms                                                                                                        | 1.36 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_transpile       | 1.63 ms                                                                                                        | 1.65 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_optimize        | 50.9 ms                                                                                                        | 51.3 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_normalize       | 105 ms                                                                                                         | 106 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlite_synth            | 2.59 us                                                                                                        | 2.64 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_expand            | 454 ms                                                                                                         | 463 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_integrate         | 20.4 ms                                                                                                        | 20.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_sum               | 163 ms                                                                                                         | 165 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_str               | 281 ms                                                                                                         | 287 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| telco                   | 6.32 ms                                                                                                        | 6.58 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| thrift                  | 763 us                                                                                                         | 750 us: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpack_sequence         | 42.1 ns                                                                                                        | 43.8 ns: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_list           | 4.93 us                                                                                                        | 4.98 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_pure_python    | 202 us                                                                                                         | 214 us: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_iterparse     | 106 ms                                                                                                         | 103 ms: 1.03x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_generate      | 76.7 ms                                                                                                        | 77.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_process       | 53.1 ms                                                                                                        | 53.8 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Geometric mean          | (ref)                                                                                                          | 1.01x slower                                                                                                   |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+

Benchmark hidden because not significant (13): bench_mp_pool, coverage, generators, json, json_loads, meteor_contest, mypy, pathlib, pidigits, pprint_safe_repr, pprint_pformat, unpickle, xml_etree_parse
Ignored benchmarks (3) of /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz: aiohttp, gunicorn, tornado_http

@netlify
Copy link

netlify bot commented Dec 8, 2022

Deploy Preview for python-cpython-preview canceled.

Name Link
🔨 Latest commit 414665b
🔍 Latest deploy log https://app.netlify.com/sites/python-cpython-preview/deploys/63970f4f73026d0008111626

@iritkatriel iritkatriel marked this pull request as draft December 8, 2022 10:45
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool work. So the doubling of the instruction size only costs us 1%. That means if we can realize the removal of LOAD/STORE_FAST and LOAD_CONST we should be able to gain quite a bit.

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

@iritkatriel
Copy link
Member Author

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

@gvanrossum
Copy link
Member

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

Sounds good. Maybe we should add that to faster-cpython/ideas#485 (or one of the other issues about registers?)

Python/ceval.c Outdated Show resolved Hide resolved
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to start making one simple instruction use an extra oparg? Without even optimizing LOAD/STORE -- we could just tackle UNARY_NEGATIVE and give it a second oparg that designates the destination, and make the compiler write the bytecode like that.

Lib/dis.py Show resolved Hide resolved
@@ -230,6 +230,9 @@ extern "C" {
#define NB_INPLACE_TRUE_DIVIDE 24
#define NB_INPLACE_XOR 25

/* number of codewords for opcode+oparg(s) */
#define OPSIZE 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for now we're not contemplating the size depending on the opcode. Probably just as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it won’t be hard to change this macro if we decide to do that.

@iritkatriel
Copy link
Member Author

I made a new PR with this stuff on today's version of main: #100276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants