-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tier 2 optimizer. #557
Comments
What's the general timeline you're looking for implementing all these? |
3.13. Except the compiler, which will probably be 3.14 |
Sorry I'm confused: which compiler are you referring to? The one used for the optimization passes on the tier 2 code? |
The compiler to machine code. Nothing else is referred to as a "compiler" in this tier in order to avoid any more confusion. [Accidently edited your comment earlier] |
The basic blocks in the lazy basic block versioning paper generate this and not the former (in fact, it's exactly the same as the diagram you drew). The paper's name is a little misleading. The only part of it that's about "basic blocks" is that the type propagation occurs at basic block boundaries, the actual code generated are superblocks as well. |
I don't see how they have the type information available to compile the specialized Compiling one block at a time allows propagation of type information, but can prevent effective partial evaluation. |
Yes their argument is that the code will naturally generate like that since it's just generated according to the path taken at runtime.
Yes. The code object is like a stack, so the next generated block is written onto the empty space following the next basic block. That illustration is after both branches are generated.
If we're emitting tier 2 bytecode, the final machine code compiler can convert the entire chunk that is generated from the basic block versioning and do the same large-scale partial evaluation and other optimisations that this proposes right? So like
By the time the machine code is generated, assuming the tier 2 bytecode is executed many times, there will be enough basic blocks generated to do the same large scale optimisations you're suggesting. |
We don't want to have to convert to machine code to do partial evaluation. Creating larger regions and optimizing those is the job of the tier 3 optimizer. |
When considering branches that are taken for dynamic type tests, it happens very often that these branches are heavily biased in one direction (or even only ever go one direction). So, being able to see which branch is first executed at run time can give you a nice linear sequence of blocks. It's a very good heuristic in practice. |
Hi @markshannon! |
This all happens at runtime. The format of |
Maybe there's some terminological confusion here. "Super-instructions" are the exact opposite of what we emit to superblocks (for the latter, we emit micro-ops, usually called uops). For an example of a superblock, see this comment: python/cpython#106529 (comment) Also, we're not using tracing to form superblocks -- we're using projection. |
This is the top level issue for the tier 2 optimizer for CPython 3.13 and beyond.
Try to keep discussion here at a high level and discuss the details on the sub-issues.
The tier 2 optimizer has always promised to optimize larger regions than the tier 1 (PEP 659) optimizer. But we have been a bit vague as to what those regions would be.
In an earlier discussion, I referred to them as "projected short traces".
The term "trace" is a bit misleading, as it suggest some sort of recording of program execution.
The optimization I propose is more akin to basic block versioning, than the trace recording of PyPy.
However, instead of basic blocks, we would be optimizing dynamic superblocks.
The extent of the superblocks would be determined at runtime from profiling data gathered by the tier 1 interpreter.
The term "superblocks" might also be a bit misleading as they might include inlined calls, but I it's the best name I could come up with for now. We could call the tier 2 optimization, "superblock versioning", as we intend to handle polymorphism in much the same way as BBV.
For this we to work, we need to be able to do the following:
The text was updated successfully, but these errors were encountered: