-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dead store elimination in the compiler #104635
Comments
I just experimented with my curiosity. (please understand that I'm not an optimization expert) I think that this is because of the non-existence of super instruction(POP_TOP__POP_TOP / POP_TOP__STORE_FAST) If we try to implement such optimization, do you think we must also implement the super instruction? Please let me know if I understood your suggestion wrongly :) baseline
suggestion
benchmarkimport pyperf
runner = pyperf.Runner()
runner.timeit(name="bench dead_store",
stmt="""
a, a, b, b = 1, 2, 3, 4
""") |
With super-instruction: 6-7% faster
For more realistic example
|
pyperformance benchmark: https://gist.github.com/corona10/2cfd404ac0cc9d9a61c71d4365e7c5bb |
Ah, I hadn't considered the possibility that we'd need new super-instructions to make this a win instead of a regression. Thanks for trying it out! Not sure if there are reasons to avoid adding new super-instructions, but apart from that issue, this still seems worth it to me. Made some comments on the draft PR. |
OOI what are the advantages of this over the superoptimizer approach? The superoptimizer would need to understand stores, but that doesn't seem too complex. |
In my opinion, the two optimizations are responsible for optimization in different layers. I believe that flowgraph-based optimization can optimize with lower cost compared to simulating stack side effects, and the superoptimizer can optimize a wider window that a flowgraph cannot handle. Therefore, I think these two optimizations complement each other, and if flowgraph-based optimization does not interfere with the optimization of the superoptimizer, it is acceptable to add additional optimizations. Maybe @carljm has a different opinion from me. |
@markshannon This issue is just tracking the fact that we can (and should) replace dead I don't have strong opinions about whether we do this via the existing The main wrinkle is that it seems we need to add some |
Also need to make sure there isn't something between them that could raise an exception. We don't currently have this, we probably need to add this in the opcode_metadata. |
Are there many dead stores in Python, given that two writes must occur on the same line for the first to count as dead? I suspect that they are quite rare. Has anyone tried to measure how many? |
Assuming that super instruction is applied when the corresponding code pattern is used, it is true that it is a small proportion when observed indirectly. However, this only means that super instruction does not need to be applied, and whether reducing the opcodes for ( |
Okay we don't need to add extra super instructions for this optimization after Mark's PR is merged. Benchmarks
|
I also suspect it's not common. But given that the latest PR shows a clear improvement for this case without the need to add superinstructions, and with just a few lines in the compiler, it seems almost pure win to do it, even if the case is not common. |
There are several code patterns that can cause multiple stores to the same locals index in near succession, without any intervening instructions that can possibly use the stored value. One such code pattern is e.g.
a, a, b = x, y, z
. Another (since PEP 709) isa = [_ for a in []]
. There are likely others.In such cases, all but the final store can be replaced with
POP_TOP
. This alone isn't a huge gain, but this can also allowapply_static_swaps
to take effect in these cases (because it can't safely reorder writes to the same location, but it can safely reorder POP_TOPs at will), removing SWAPs, which shrinks code size and reduces runtime overhead.Not sure how much of a gain this will be in realistic cases, but it's also not hard to implement, so it seems worth doing as a marginal improvement to compiler output.
Note that in order to maintain tracing / debugger compatibility, we can only do this if line number is the same for all the store instructions. And we definitely can't eliminate the final store, even if it also appears to be unused, since it can always be visible to tracers/debuggers/locals()/etc.
Linked PRs
The text was updated successfully, but these errors were encountered: