Tier 2 optimizer. #557

markshannon · 2023-03-09T14:28:15Z

This is the top level issue for the tier 2 optimizer for CPython 3.13 and beyond.

Try to keep discussion here at a high level and discuss the details on the sub-issues.

The tier 2 optimizer has always promised to optimize larger regions than the tier 1 (PEP 659) optimizer. But we have been a bit vague as to what those regions would be.

In an earlier discussion, I referred to them as "projected short traces".
The term "trace" is a bit misleading, as it suggest some sort of recording of program execution.

The optimization I propose is more akin to basic block versioning, than the trace recording of PyPy.
However, instead of basic blocks, we would be optimizing dynamic superblocks.
The extent of the superblocks would be determined at runtime from profiling data gathered by the tier 1 interpreter.

The term "superblocks" might also be a bit misleading as they might include inlined calls, but I it's the best name I could come up with for now. We could call the tier 2 optimization, "superblock versioning", as we intend to handle polymorphism in much the same way as BBV.

For this we to work, we need to be able to do the following:

Modify the tier 1 interpreter to detect "hotspots" to determine where to start the dynamic superblocks.
Create the superblocks
Optimize the superblocks (it might make sense to merge some or all of this with the creation of the superblocks)
Deoptimize the superblocks (to enable more speculation in the optimizer)
Manage the superblocks, discarding cold ones and preventing memory use getting too large.

Fidget-Spinner · 2023-03-09T17:40:54Z

What's the general timeline you're looking for implementing all these?

markshannon · 2023-03-10T16:39:18Z

What's the general timeline you're looking for implementing all these?

3.13. Except the compiler, which will probably be 3.14

Fidget-Spinner · 2023-03-10T16:44:54Z

Sorry I'm confused: which compiler are you referring to? The one used for the optimization passes on the tier 2 code?

markshannon · 2023-03-10T16:58:43Z

What is a "superblock"?

A superblock is a linear piece of code with one entry and multiple exits. It differs from an extended (basic) block in that it may duplicate some code.

Take this CFG:

With extended blocks the CFG would look like this:

Whereas with superblocks, we get this:

Why use superblocks?

When considering regions for optimization, we want as large as regions as possible without excessive duplication.
But we also want regions that we can optimize quickly and effectively.
Linear sequences of code can be optimized quickly. However basic blocks are too small to optimized effectively.
So we choose superblocks.

In the example above, the duplication of D allows it to be specialized for the context.
Obviously we don't want excessive duplication of code, but used judiciously it can help optimization considerably.

markshannon · 2023-03-10T17:01:09Z

Sorry I'm confused: which compiler are you referring to? The one used for the optimization passes on the tier 2 code?

The compiler to machine code. Nothing else is referred to as a "compiler" in this tier in order to avoid any more confusion.
It is already confusing enough that we have the source -> bytecode compiler, the C compiler and in the future a bytecode -> machine compiler.

[Accidently edited your comment earlier]

Fidget-Spinner · 2023-03-10T17:08:03Z

Whereas with superblocks, we get this:

The basic blocks in the lazy basic block versioning paper generate this and not the former (in fact, it's exactly the same as the diagram you drew). The paper's name is a little misleading. The only part of it that's about "basic blocks" is that the type propagation occurs at basic block boundaries, the actual code generated are superblocks as well.

In fact, this is how it would look like:

markshannon · 2023-03-10T17:19:44Z

I don't see how they have the type information available to compile the specialized D' when compiling A.
They want the final code to look like the above, but it is built a basic block at a time. Isn't it?
They cannot know whether B or C is the most likely successor of A, just which happens to execute first immediately after A is compiled.

Compiling one block at a time allows propagation of type information, but can prevent effective partial evaluation.

Fidget-Spinner · 2023-03-10T17:30:21Z

They cannot know whether B or C is the most likely successor of A, just which happens to execute first immediately after A is compiled.

Yes their argument is that the code will naturally generate like that since it's just generated according to the path taken at runtime.

They want the final code to look like the above, but it is built a basic block at a time. Isn't it?

Yes. The code object is like a stack, so the next generated block is written onto the empty space following the next basic block. That illustration is after both branches are generated.

Compiling one block at a time allows propagation of type information, but can prevent effective partial evaluation.

If we're emitting tier 2 bytecode, the final machine code compiler can convert the entire chunk that is generated from the basic block versioning and do the same large-scale partial evaluation and other optimisations that this proposes right?

So like

Tier 1 --- (n times) ---> Tier 2 bytecode ------ (x times) -----> Machine code

By the time the machine code is generated, assuming the tier 2 bytecode is executed many times, there will be enough basic blocks generated to do the same large scale optimisations you're suggesting.

markshannon · 2023-03-10T17:32:53Z

We don't want to have to convert to machine code to do partial evaluation.

Creating larger regions and optimizing those is the job of the tier 3 optimizer.

markshannon · 2023-03-20T18:20:45Z

The optimizer pipeline should look something like this:

maximecb · 2023-06-16T14:11:48Z

Yes their argument is that the code will naturally generate like that since it's just generated according to the path taken at runtime.

When considering branches that are taken for dynamic type tests, it happens very often that these branches are heavily biased in one direction (or even only ever go one direction). So, being able to see which branch is first executed at run time can give you a nice linear sequence of blocks. It's a very good heuristic in practice.

GoEdgar · 2023-06-19T21:23:54Z

Am I correct that peephole optimizations like removing unnecessary stack operations and so on will be performed on uops, which a bit earlier have been formed by tracer? If so, how do we convert optimized uops to the super instructions? It looks like that we need to return to the interpreter with a super instruction whose logic can have a huge number of variations of combinations of uops, and in this case the interpreter will need to know a particular variation of combinations in order to be able to interpret

diohabara · 2023-07-04T21:45:56Z

Hi @markshannon!
I have a question about Tier 2 optimizer. If this feature is implemented, will the format of future .pyc be impacted? Will this change influence only runtime bytecode format?

markshannon · 2023-07-05T08:36:24Z

This all happens at runtime. The format of .pyc files will not be changed.
We might change the format of .pyc files for other reasons, like faster imports, though.

gvanrossum · 2023-07-12T23:52:56Z

@GoEdgar

Am I correct that peephole optimizations like removing unnecessary stack operations and so on will be performed on uops, which a bit earlier have been formed by tracer? If so, how do we convert optimized uops to the super instructions? It looks like that we need to return to the interpreter with a super instruction whose logic can have a huge number of variations of combinations of uops, and in this case the interpreter will need to know a particular variation of combinations in order to be able to interpret

Maybe there's some terminological confusion here. "Super-instructions" are the exact opposite of what we emit to superblocks (for the latter, we emit micro-ops, usually called uops). For an example of a superblock, see this comment: python/cpython#106529 (comment)

Also, we're not using tracing to form superblocks -- we're using projection.

markshannon added the epic-tier2-optimizer Linear code region optimizer for 3.13 and beyond. label Mar 9, 2023

markshannon mentioned this issue Mar 23, 2023

PEP 659 -- Specializing Adaptive Interpreter python/steering-council#180

Closed

mdboom mentioned this issue May 24, 2023

METABUG: Tier 2 optimizer #587

Open

markshannon mentioned this issue Dec 3, 2024

gh-119786: Add jit.md. Move adaptive.md to a section of interpreter.md. python/cpython#127175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tier 2 optimizer. #557

Tier 2 optimizer. #557

markshannon commented Mar 9, 2023

Fidget-Spinner commented Mar 9, 2023

markshannon commented Mar 10, 2023

Fidget-Spinner commented Mar 10, 2023 •

edited by markshannon

Loading

markshannon commented Mar 10, 2023

markshannon commented Mar 10, 2023

Fidget-Spinner commented Mar 10, 2023 •

edited

Loading

markshannon commented Mar 10, 2023 •

edited

Loading

Fidget-Spinner commented Mar 10, 2023 •

edited

Loading

markshannon commented Mar 10, 2023

markshannon commented Mar 20, 2023

maximecb commented Jun 16, 2023

GoEdgar commented Jun 19, 2023 •

edited

Loading

diohabara commented Jul 4, 2023

markshannon commented Jul 5, 2023

gvanrossum commented Jul 12, 2023

Tier 2 optimizer. #557

Tier 2 optimizer. #557

Comments

markshannon commented Mar 9, 2023

Fidget-Spinner commented Mar 9, 2023

markshannon commented Mar 10, 2023

Fidget-Spinner commented Mar 10, 2023 • edited by markshannon Loading

markshannon commented Mar 10, 2023

What is a "superblock"?

Why use superblocks?

markshannon commented Mar 10, 2023

Fidget-Spinner commented Mar 10, 2023 • edited Loading

markshannon commented Mar 10, 2023 • edited Loading

Fidget-Spinner commented Mar 10, 2023 • edited Loading

markshannon commented Mar 10, 2023

markshannon commented Mar 20, 2023

maximecb commented Jun 16, 2023

GoEdgar commented Jun 19, 2023 • edited Loading

diohabara commented Jul 4, 2023

markshannon commented Jul 5, 2023

gvanrossum commented Jul 12, 2023

Fidget-Spinner commented Mar 10, 2023 •

edited by markshannon

Loading

Fidget-Spinner commented Mar 10, 2023 •

edited

Loading

markshannon commented Mar 10, 2023 •

edited

Loading

Fidget-Spinner commented Mar 10, 2023 •

edited

Loading

GoEdgar commented Jun 19, 2023 •

edited

Loading