gh-116968: Reimplement Tier 2 counters #117144

gvanrossum · 2024-03-22T01:24:33Z

This introduces a unified 16-bit backoff counter type (_Py_BackoffCounter), shared between the Tier 1 adaptive specialization machinery and the Tier 2 optimizer.

The latter's side exit temperature now uses exponential backoff, and starts initially at 64, to avoid creating side exit traces for code that hasn't been respecialized yet (since the latter only happens after the cooldown counter has reached zero from an initial value of 52).

The threshold value for back-edge optimizations is no longer dynamic; we just use a backoff counter initialized to (16, 4).

Issue: Side exit temperature requires exponential backoff #116968

mdboom · 2024-03-22T13:33:35Z

Results of the benchmark are available here: 1% faster geom. mean, 0% faster HPT, 1% less memory. Though notably this strongly improves a lot of the interpreter-heavy benchmarks. https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240321-3.13.0a5+-716c0c6-JIT

mdboom · 2024-03-22T13:54:43Z

Additionally, concerning the results, it's probably safe to say this PR makes things better, but given the #116206 increasing the overall benchmark times from 1h15 to 2h30, they are probably noisier than usual.

gvanrossum · 2024-03-22T15:07:28Z

the #116206 increasing the overall benchmark times from 1h15 to 2h30

I missed that. Do we run every benchmark twice for incremental GC? Or did that make CPython twice as slow? Regardless, it seems unfortunate that the benchmarks now take 2h30.

Regarding the benchmark numbers, I'm guessing the improvements come from not wasting so much time on fruitless efforts like in hexiom. And possibly because the JUMP_BACKWARD implementation no longer needs to reference interp->optimizer_backedge_threshold in its fast path.

There's still a lot of cleanup to do in this PR. @markshannon What do you think of my general approach?

mdboom · 2024-03-22T15:39:48Z

I missed that. Do we run every benchmark twice for incremental GC? Or did that make CPython twice as slow? Regardless, it seems unfortunate that the benchmarks now take 2h30.

Nothing changed in how we run the benchmarks -- #116206 just seems to be a large regression overall, though more than made up by the follow-up in #117120. Once that's merged we should hopefully have working pystats and closer-to-baseline timings again.

gvanrossum · 2024-03-22T15:59:25Z

Exciting!

ericsnowcurrently · 2024-03-25T17:14:30Z

Regarding the WASI failures, "call stack exhausted" means a stack overflow. Our WASI builds have a stack of about 8MB. ¹² From the notes in the build script, that's derived from the stack size on Linux (ulimit -s).

There are two possibilities with the failures here:

the stack is going deeper than before
at least some stack frames are larger than before

In either case I'd normally expect a new "call stack exhausted" crash to indicate that we were already close to the limit. I'm not sure how well that does or doesn't apply for this PR. That said, given that the stack size on WASI is meant to be similar to Linux, I'd expect a problem on WASI to manifest on Linux too. Perhaps WASI is simply our canary in a coalmine here?

The next step is probably to take a look at how close were are getting to the stack size on Linux (since we know we're hitting it on WASI). If we're not getting close then we'll need to see what's so special about WASI here.

For similar failures see https://github.com/python/cpython/issues?q=is%3Aissue+%22call+stack+exhausted%22. There were cases where lowering the Python recursion limit was the solution. However, I don't think that applies here.

CC @brettcannon

ericsnowcurrently · 2024-03-25T17:17:29Z

Also note that 226 tests did pass.

Python/bytecodes.c

gvanrossum · 2024-03-26T20:50:21Z

This is still in draft mode. Here's my a plan:

Create a new (internal) counter API named BACKOFF_COUNTER / backoff_counter
Alias the ADAPTIVE_COUNTER APIs to use the new API
Use the new API names for the Tier 2 counters
Get rid of resume_threshold (it's unused)
Investigate and fix the WASI stack overflows

Once that's all done (EDIT: and the tests pass) I'll request reviews.

This reverts commit 8f79a60.

…RUCTION

This changes a lot of things but the end result is arguably better.

- The initial exit temperature is 64; this must be greater than the specialization cooldown value (52) otherwise we might create a trace before we have re-specialized the Tier 1 bytecode - There's now a handy helper function for every counter initialization

gvanrossum · 2024-04-03T20:50:14Z

I'm starting 3 benchmarks:

Main Linux machine: Tier 1, pystats (to see if I didn't screw up Tier 1)
Alt Linux machine: Tier 2, pystats (to see if the Tier 2 exp. backoff is working)
Mac/ARM64: JIT, no pystats (to see if this makes the JIT faster or slower)

I also have prepared a blurb, but I'll merge it only when I have something else to merge (to save CI resources):

+Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
+shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
+API used for adaptive specialization counters is changed but the behavior is
+(supposed to be) identical.
+
+The behavior of the Tier 2 counters is changed:
+
+- There are no longer dynamic thresholds (we never varied these). - All
+counters now use the same exponential backoff. - The counter for
+``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in
+side exits starts counting down from 64.

gvanrossum · 2024-04-03T20:53:50Z

Include/internal/pycore_code.h

@@ -477,13 +473,9 @@ write_location_entry_start(uint8_t *ptr, int code, int length)
 #define ADAPTIVE_COOLDOWN_VALUE 52
 #define ADAPTIVE_COOLDOWN_BACKOFF 0

-#define MAX_BACKOFF_VALUE (16 - ADAPTIVE_BACKOFF_BITS)
-
-
 static inline uint16_t
 adaptive_counter_bits(uint16_t value, uint16_t backoff) {


I could dispense with adaptive_counter_bits(), using make_backoff_counter() directly below, but I would oppose getting rid of the cooldown() and warmup() helpers, because they are used in several/many places.

gvanrossum · 2024-04-03T20:55:21Z

Include/cpython/optimizer.h

@@ -89,7 +89,7 @@ static inline uint16_t uop_get_error_target(const _PyUOpInstruction *inst)

 typedef struct _exit_data {
    uint32_t target;
-    int16_t temperature;
+    _Py_BackoffCounter temperature;


temperature is now a bit of a misnomer, since it counts down. Maybe it should be renamed to counter (same as in CODEUNIT)?

I'm fine with either.

I'll keep temperature, despite the misnomer -- it stands out and makes it easy to grep for this particular concept.

gvanrossum · 2024-04-03T20:57:21Z

Include/internal/pycore_backoff.h

 {
    return counter.value == 0;
 }

-static inline uint16_t
-initial_backoff_counter(void)
+/* Initial JUMP_BACKWARD counter.


There's a lot of boilerplate for these initial values; I followed your lead for adaptive_counter_warmup() and adaptive_counter_cooldown(), more or less.

gvanrossum · 2024-04-04T00:40:57Z

Benchmarking results comment (will update as they complete):

JIT run -- not faster or slower, but uses 1% less memory. I notice that several benchmarks now have distinctly bimodal outcomes (perhaps due to the benchmark outpacing GC?).
Tier 1 run -- everything is good, almost no changes (a bit fewer method cache misses, that's presumably the result of hash randomization)
Tier 2 pystats -- I think this looks even slightly better than before: 653k optimization attempts (waaay down), 108k traces created (9% down), 6.3B traces executed (about even), 183B uops executed (1% up).

markshannon

Looks good. Thanks for doing this.

markshannon · 2024-04-04T10:47:32Z

A possible further improvement (for another PR) would be to make the code generator aware of counters.
Then we could change

        specializing op(_SPECIALIZE_TO_BOOL, (counter/1, value -- value)) {
            if (ADAPTIVE_COUNTER_TRIGGERS(counter)) {
                ... 
            ADVANCE_ADAPTIVE_COUNTER(this_instr[1].counter);

to

        specializing op(_SPECIALIZE_TO_BOOL, (counter/1, value -- value)) {
            if (backoff_counter_triggers(counter)) {
                ... 
            advance_backoff_counter(counter);

by having the code generator generate:

    _Py_BackoffCounter *counter = &this_instr[1].counter;

instead of

    uint16_t counter = read_u16(&this_instr[1].cache);

bedevere-bot · 2024-04-04T16:26:29Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Ubuntu NoGIL 3.x has failed when building commit 060a96f.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/1225/builds/1939) and take a look at the build logs.
Check if the failure is related to this commit (060a96f) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/1225/builds/1939

Failed tests:

test_math

Summary of the results of the build (if available):

==

Click to see traceback logs

remote: Enumerating objects: 69, done.        
remote: Counting objects:   1% (1/69)        
remote: Counting objects:   2% (2/69)        
remote: Counting objects:   4% (3/69)        
remote: Counting objects:   5% (4/69)        
remote: Counting objects:   7% (5/69)        
remote: Counting objects:   8% (6/69)        
remote: Counting objects:  10% (7/69)        
remote: Counting objects:  11% (8/69)        
remote: Counting objects:  13% (9/69)        
remote: Counting objects:  14% (10/69)        
remote: Counting objects:  15% (11/69)        
remote: Counting objects:  17% (12/69)        
remote: Counting objects:  18% (13/69)        
remote: Counting objects:  20% (14/69)        
remote: Counting objects:  21% (15/69)        
remote: Counting objects:  23% (16/69)        
remote: Counting objects:  24% (17/69)        
remote: Counting objects:  26% (18/69)        
remote: Counting objects:  27% (19/69)        
remote: Counting objects:  28% (20/69)        
remote: Counting objects:  30% (21/69)        
remote: Counting objects:  31% (22/69)        
remote: Counting objects:  33% (23/69)        
remote: Counting objects:  34% (24/69)        
remote: Counting objects:  36% (25/69)        
remote: Counting objects:  37% (26/69)        
remote: Counting objects:  39% (27/69)        
remote: Counting objects:  40% (28/69)        
remote: Counting objects:  42% (29/69)        
remote: Counting objects:  43% (30/69)        
remote: Counting objects:  44% (31/69)        
remote: Counting objects:  46% (32/69)        
remote: Counting objects:  47% (33/69)        
remote: Counting objects:  49% (34/69)        
remote: Counting objects:  50% (35/69)        
remote: Counting objects:  52% (36/69)        
remote: Counting objects:  53% (37/69)        
remote: Counting objects:  55% (38/69)        
remote: Counting objects:  56% (39/69)        
remote: Counting objects:  57% (40/69)        
remote: Counting objects:  59% (41/69)        
remote: Counting objects:  60% (42/69)        
remote: Counting objects:  62% (43/69)        
remote: Counting objects:  63% (44/69)        
remote: Counting objects:  65% (45/69)        
remote: Counting objects:  66% (46/69)        
remote: Counting objects:  68% (47/69)        
remote: Counting objects:  69% (48/69)        
remote: Counting objects:  71% (49/69)        
remote: Counting objects:  72% (50/69)        
remote: Counting objects:  73% (51/69)        
remote: Counting objects:  75% (52/69)        
remote: Counting objects:  76% (53/69)        
remote: Counting objects:  78% (54/69)        
remote: Counting objects:  79% (55/69)        
remote: Counting objects:  81% (56/69)        
remote: Counting objects:  82% (57/69)        
remote: Counting objects:  84% (58/69)        
remote: Counting objects:  85% (59/69)        
remote: Counting objects:  86% (60/69)        
remote: Counting objects:  88% (61/69)        
remote: Counting objects:  89% (62/69)        
remote: Counting objects:  91% (63/69)        
remote: Counting objects:  92% (64/69)        
remote: Counting objects:  94% (65/69)        
remote: Counting objects:  95% (66/69)        
remote: Counting objects:  97% (67/69)        
remote: Counting objects:  98% (68/69)        
remote: Counting objects: 100% (69/69)        
remote: Counting objects: 100% (69/69), done.        
remote: Compressing objects:   3% (1/32)        
remote: Compressing objects:   6% (2/32)        
remote: Compressing objects:   9% (3/32)        
remote: Compressing objects:  12% (4/32)        
remote: Compressing objects:  15% (5/32)        
remote: Compressing objects:  18% (6/32)        
remote: Compressing objects:  21% (7/32)        
remote: Compressing objects:  25% (8/32)        
remote: Compressing objects:  28% (9/32)        
remote: Compressing objects:  31% (10/32)        
remote: Compressing objects:  34% (11/32)        
remote: Compressing objects:  37% (12/32)        
remote: Compressing objects:  40% (13/32)        
remote: Compressing objects:  43% (14/32)        
remote: Compressing objects:  46% (15/32)        
remote: Compressing objects:  50% (16/32)        
remote: Compressing objects:  53% (17/32)        
remote: Compressing objects:  56% (18/32)        
remote: Compressing objects:  59% (19/32)        
remote: Compressing objects:  62% (20/32)        
remote: Compressing objects:  65% (21/32)        
remote: Compressing objects:  68% (22/32)        
remote: Compressing objects:  71% (23/32)        
remote: Compressing objects:  75% (24/32)        
remote: Compressing objects:  78% (25/32)        
remote: Compressing objects:  81% (26/32)        
remote: Compressing objects:  84% (27/32)        
remote: Compressing objects:  87% (28/32)        
remote: Compressing objects:  90% (29/32)        
remote: Compressing objects:  93% (30/32)        
remote: Compressing objects:  96% (31/32)        
remote: Compressing objects: 100% (32/32)        
remote: Compressing objects: 100% (32/32), done.        
remote: Total 36 (delta 33), reused 5 (delta 4), pack-reused 0        
From https://github.com/python/cpython
 * branch                  main       -> FETCH_HEAD
Note: switching to '060a96f1a9a901b01ed304aa82b886d248ca1cb6'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 060a96f1a9 gh-116968: Reimplement Tier 2 counters (#117144)
Switched to and reset branch 'main'

make: *** [Makefile:2231: buildbottest] Error 2

tacaswell · 2024-04-04T17:46:54Z

This change appears to have broken building scipy

FAILED: scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/meson-generated__specfun.cpp.o
ccache c++ -Iscipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p -Iscipy/special -I../scipy/special -I../../../../home/tcaswell/.virtualenvs/py313/lib/python3.13/site-packages/numpy/_core/include -I/home/tcaswell/.pybuild/py313/include/python3.13 -fvisibility=hidden -fvisibility-inlines-hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++17 -O3 -fpermissive -fPIC -DNPY_NO_DEPRECATED_API=NPY_1_9_API_VERSION -MD -MQ scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/meson-generated__specfun.cpp.o -MF scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/meson-generated__specfun.cpp.o.d -o scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/meson-generated__specfun.cpp.o -c scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/_specfun.cpp
In file included from /home/tcaswell/.pybuild/py313/include/python3.13/internal/pycore_code.h:461,
                 from /home/tcaswell/.pybuild/py313/include/python3.13/internal/pycore_frame.h:13,
                 from scipy/special/_specfun.cpython-313-x86_64-linux-gnu.so.p/_specfun.cpp:14948:
/home/tcaswell/.pybuild/py313/include/python3.13/internal/pycore_backoff.h: In function ‘_Py_BackoffCounter make_backoff_counter(uint16_t, uint16_t)’:
/home/tcaswell/.pybuild/py313/include/python3.13/internal/pycore_backoff.h:47:67: error: designator order for field ‘_Py_BackoffCounter::<unnamed union>::<unnamed struct>::backoff’ does not match declaration order in ‘_Py_BackoffCounter::<unnamed union>::<unnamed struct>’
   47 |     return (_Py_BackoffCounter){.value = value, .backoff = backoff};
      |

conformed scipy builds with 63bbe77

Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``), shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The API used for adaptive specialization counters is changed but the behavior is (supposed to be) identical. The behavior of the Tier 2 counters is changed: - There are no longer dynamic thresholds (we never varied these). - All counters now use the same exponential backoff. - The counter for ``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in side exits starts counting down from 64.

- Fix a few places where we were not using atomics to (de)instrument opcodes. - Fix a few places where we weren't using atomics to reset adaptive counters. - Remove some redundant non-atomic resets of adaptive counters that presumably snuck as merge artifacts of python#118064 and python#117144 landing close together.

bedevere-app bot mentioned this pull request Mar 22, 2024

Side exit temperature requires exponential backoff #116968

Open

gvanrossum force-pushed the exp-backoff branch from 716c0c6 to e822cb9 Compare March 22, 2024 23:00

gvanrossum added the skip news label Mar 23, 2024

gvanrossum added 6 commits March 26, 2024 13:37

Baby steps: reimplement thresholds using adaptive counter abstractions

e7e819b

Make temperature an adaptive counter like the rest

8c74dfa

Fix tests

79036ff

Remove dead adaptive_counter_jump_init()

54c1f8e

Fix no-GIL build failure in _COLD_EXIT

015bb00

Use the right named constant in initial temperature

1730295

gvanrossum force-pushed the exp-backoff branch from e822cb9 to 1730295 Compare March 26, 2024 20:39

gvanrossum commented Mar 26, 2024

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

gvanrossum added 11 commits March 26, 2024 16:47

Add pycore_backoff.h, and include it, but don't use it yet

95f93b7

Reimplement adaptive counters in terms of backoff_counter

7df0f10

Redefine T2 temperature as a backoff counter

925cae7

Don't increment branch cache (bitmask) in INSTRUMENTED_INSTRUCTION

f0c7fb0

Don't increment branch cache (bitmask) in INSTRUMENTED_LINE

8f79a60

Don't update unreachable counters

149e9c4

Simplify dynamic counter initialization for JUMP_BACKWARD

1d76112

Revert "Don't increment branch cache (bitmask) in INSTRUMENTED_LINE"

a5ffe02

This reverts commit 8f79a60.

Different approach to avoid incrementing bitmask in INSTRUMENTED_LINE

ce7726c

Different approach to avoid incrementing bitmask in INSTRUMENTED_INST…

e2c39f2

…RUCTION

Fix dynamic counter initialization for JUMP_BACKWARD

8d22790

gvanrossum added 9 commits April 3, 2024 11:05

Small cleanup in .h files

42c1f26

Rename DECREMENT_ADAPTIVE_COUNTER to ADVANCE_...

a80cd0a

Rename ADAPTIVE_COUNTER_IS_ZERO to ..._TRIGGERS

545c60e

Rename backoff_counter_is_zero to ..._triggers

6c0bb30

Rename reset_background_counter to restart_...

a7c9b6d

Make _Py_BackoffCounter a member of _Py_CODEUNIT

3fee35f

This changes a lot of things but the end result is arguably better.

Export tier 2 threshold from _testinternalcapi

72f6b0d

Merge remote-tracking branch 'origin/main' into exp-backoff

dcee362

gvanrossum requested a review from markshannon April 3, 2024 20:28

Add news

f38d922

gvanrossum commented Apr 3, 2024

View reviewed changes

markshannon approved these changes Apr 4, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Apr 4, 2024

Fix blurb formatting (I hope)

ef6366b

gvanrossum enabled auto-merge (squash) April 4, 2024 14:33

gvanrossum merged commit 060a96f into python:main Apr 4, 2024
60 of 61 checks passed

bedevere-app bot removed the awaiting merge label Apr 4, 2024

gvanrossum deleted the exp-backoff branch April 4, 2024 15:06

This was referenced Apr 4, 2024

New tier 2 counters break some C extensions due to order of field mismatch #117549

Closed

gh-117549: Match declaration order for _Py_BackoffCounter initializer #117551

Merged

eli-schwartz mentioned this pull request Jun 9, 2024

Python.h doesn't follow C99 standard. #120293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-116968: Reimplement Tier 2 counters #117144

gh-116968: Reimplement Tier 2 counters #117144

gvanrossum commented Mar 22, 2024 •

edited

Loading

mdboom commented Mar 22, 2024

mdboom commented Mar 22, 2024

gvanrossum commented Mar 22, 2024

mdboom commented Mar 22, 2024

gvanrossum commented Mar 22, 2024

ericsnowcurrently commented Mar 25, 2024 •

edited

Loading

ericsnowcurrently commented Mar 25, 2024

gvanrossum commented Mar 26, 2024 •

edited

Loading

gvanrossum commented Apr 3, 2024

gvanrossum Apr 3, 2024

gvanrossum Apr 3, 2024

markshannon Apr 4, 2024

gvanrossum Apr 4, 2024

gvanrossum Apr 3, 2024

gvanrossum commented Apr 4, 2024 •

edited

Loading

markshannon left a comment

markshannon commented Apr 4, 2024

bedevere-bot commented Apr 4, 2024

tacaswell commented Apr 4, 2024

gh-116968: Reimplement Tier 2 counters #117144

gh-116968: Reimplement Tier 2 counters #117144

Conversation

gvanrossum commented Mar 22, 2024 • edited Loading

mdboom commented Mar 22, 2024

mdboom commented Mar 22, 2024

gvanrossum commented Mar 22, 2024

mdboom commented Mar 22, 2024

gvanrossum commented Mar 22, 2024

ericsnowcurrently commented Mar 25, 2024 • edited Loading

Footnotes

ericsnowcurrently commented Mar 25, 2024

gvanrossum commented Mar 26, 2024 • edited Loading

gvanrossum commented Apr 3, 2024

gvanrossum Apr 3, 2024

Choose a reason for hiding this comment

gvanrossum Apr 3, 2024

Choose a reason for hiding this comment

markshannon Apr 4, 2024

Choose a reason for hiding this comment

gvanrossum Apr 4, 2024

Choose a reason for hiding this comment

gvanrossum Apr 3, 2024

Choose a reason for hiding this comment

gvanrossum commented Apr 4, 2024 • edited Loading

markshannon left a comment

Choose a reason for hiding this comment

markshannon commented Apr 4, 2024

bedevere-bot commented Apr 4, 2024

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

tacaswell commented Apr 4, 2024

gvanrossum commented Mar 22, 2024 •

edited

Loading

ericsnowcurrently commented Mar 25, 2024 •

edited

Loading

gvanrossum commented Mar 26, 2024 •

edited

Loading

gvanrossum commented Apr 4, 2024 •

edited

Loading