-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-106529: Make FOR_ITER a viable uop #112134
Conversation
|
e909a83
to
b28effa
Compare
Tools/cases_generator/flags.py
Outdated
@@ -176,7 +175,7 @@ def variable_used_unspecialized(node: parsing.Node, name: str) -> bool: | |||
tokens: list[lx.Token] = [] | |||
skipping = False | |||
for i, token in enumerate(node.tokens): | |||
if token.kind == "MACRO": | |||
if token.kind == "CMACRO": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This fix resulted in _SPECIALIZE_UNPACK_SEQUENCE
becoming a viable uop. It was missing a TIER_ONE_ONLY
marker; I've added it back. (The fix is needed to restore the feature that this function doesn't look inside #if TIER_ONE
.)
@brandtbucher Let me know how bad this interferes with the JIT branch. The stuff I put in |
So, this doesn't actually look too bad. I think with a tiny bit of reworking, this could be merged into the JIT branch with no pain:
|
Okay, I think I managed to do that. I can now merge into the justin branch with only a single trivial merge conflict (we both added something to the end of ceval_macros.h). Next up of course, how would I redefine CURRENT_TARGET() in templace.c? |
|
Yup, and I also needed to move the It seems to work, but unsure how to prove it (it compiles and passes tests, but it would too if the JIT was never invoked). Anyway, we should probably wait until we've decided that making Tier 2 5% slower is a good idea. |
There seem to be a few unrelated changes in this PR. |
Python/bytecodes.c
Outdated
/* iterator ended normally */ | ||
Py_DECREF(iter); | ||
STACK_SHRINK(1); | ||
/* HACK: Emulate DEOPT_IF to jump over END_FOR */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No hacks, please 🙂
The code should look like this:
if (next == NULL) {
if (_PyErr_Occurred(tstate)) {
if (!_PyErr_ExceptionMatches(tstate, PyExc_StopIteration)) {
GOTO_ERROR(error);
}
_PyErr_Clear(tstate);
}
/* iterator ended normally */
Py_DECREF(iter);
STACK_SHRINK(1);
DEOPT_IF(true);
}
The trace generator can adjust the target, so it points after the END_FOR
.
Sure, see gh-112214. I've merged that into main, since the tests pass, next I'll merge it into this PR and do the other thing you requested. |
Okay, here's the new version. @markshannon Want to review one more time? |
Benchmark says 4% slower (with Tier 2):
|
FWIW, I have a branch on top of this that makes the uop interpreter 3% faster. The approach is to let the code generator generate code that extracts |
Honestly, as long as it’s done in macros, it shouldn’t be too bad. Macros are a great escape hatch in the short term before we start generating bespoke “JIT cases”. |
So it could generate something like
where the default definitions for those (in ceval_macros.h) would be
|
Yup! |
- Double max trace size to 256 - Add a dependency on executor_cases.c.h for ceval.o - Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY` - Add debug output back showing the optimized trace - Bunch of cleanups to Tools/cases_generator/
This uses the new mechanism whereby certain uops are replaced by others during translation, using the `_PyUop_Replacements` table. We further special-case the `_FOR_ITER_TIER_TWO` uop to update the deoptimization target to point just past the corresponding `END_FOR` opcode. Two tiny code cleanups are also part of this PR.
- Double max trace size to 256 - Add a dependency on executor_cases.c.h for ceval.o - Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY` - Add debug output back showing the optimized trace - Bunch of cleanups to Tools/cases_generator/
This uses the new mechanism whereby certain uops are replaced by others during translation, using the `_PyUop_Replacements` table. We further special-case the `_FOR_ITER_TIER_TWO` uop to update the deoptimization target to point just past the corresponding `END_FOR` opcode. Two tiny code cleanups are also part of this PR.
Also clean up a few nits in the code generator, and add back an important Make dependency for ceval.o.