Split micro-ops that have different behavior depending on low bit of oparg. #115457

markshannon · 2024-02-14T14:42:52Z

Splitting these micro-ops will improve performance by reducing the number of branches, the size of code generated, and the number of holes in the JIT stencils. There is no real downside; the increase in complexity at runtime is negligible and there isn't much increased complexity in the tooling.

Taking _LOAD_ATTR_INSTANCE_VALUE as an example, as it is the dynamically most common.

    op(_LOAD_ATTR_INSTANCE_VALUE, (index/1, owner -- attr, null if (oparg & 1))) {
        ...

can be split into

    op(_LOAD_ATTR_INSTANCE_VALUE_0, (index/1, owner -- attr)) {
        assert((oparg & 1) == 0);
        ...

and

    op(_LOAD_ATTR_INSTANCE_VALUE_1, (index/1, owner -- attr, null)) {
        assert((oparg & 1) == 1);
        ...

Each of these is simpler, thus smaller and faster than the base version.
We can always choose one of the two split version when projecting the trace, so we don't need an implementation of the base version at all. This means that the tier 2 interpreter and stencils aren't much bigger than before.

Linked PRs

GH-115457: Support splitting and replication of micro ops. #115558

The text was updated successfully, but these errors were encountered:

markshannon · 2024-02-16T11:56:19Z

It makes sense to do replication at the same time as splitting.

By replication, I mean creating a copy of the replicated uop for each oparg in a given set.
This is not such an obvious win, as we need multiple stencils, but each stencil can be significantly smaller than the original.

The best example is _INIT_CALL_PY_EXACT_ARGS which has a stencil of 753 bytes, whereas the stencil for _INIT_CALL_PY_EXACT_ARGS_0 is only 308 bytes, a ~60% saving.

_LOAD_FAST shows a smaller saving, from 45 bytes down to 31 for LOAD_FAST_0.

…honGH-115558)

markshannon added the performance Performance or resource usage label Feb 14, 2024

markshannon mentioned this issue Feb 14, 2024

Things to do for 3.13 faster-cpython/ideas#654

Open

9 tasks

erlend-aasland added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 14, 2024

markshannon self-assigned this Feb 15, 2024

bedevere-app bot mentioned this issue Feb 16, 2024

GH-115457: Support splitting and replication of micro ops. #115558

Merged

markshannon added a commit that referenced this issue Feb 20, 2024

GH-115457: Support splitting and replication of micro ops. (GH-115558)

626c414

markshannon closed this as completed Feb 21, 2024

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024

pythonGH-115457: Support splitting and replication of micro ops. (pyt…

12ba88b

…honGH-115558)

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythonGH-115457: Support splitting and replication of micro ops. (pyt…

567e3ca

…honGH-115558)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split micro-ops that have different behavior depending on low bit of oparg. #115457

Split micro-ops that have different behavior depending on low bit of oparg. #115457

markshannon commented Feb 14, 2024 •

edited by bedevere-app bot

Loading

markshannon commented Feb 16, 2024

Split micro-ops that have different behavior depending on low bit of oparg. #115457

Split micro-ops that have different behavior depending on low bit of oparg. #115457

Comments

markshannon commented Feb 14, 2024 • edited by bedevere-app bot Loading

Linked PRs

markshannon commented Feb 16, 2024

markshannon commented Feb 14, 2024 •

edited by bedevere-app bot

Loading