-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better codegen for (T)x | (T)y
#58727
Conversation
I added a morph pass to fold expressions like `(T)x | (T)y` into `(T)(x | y)`. This results in fewer `movzx` instructions in the asm. Fixes #13816
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits from me, Egor's noted the substantial correctness issues already.
Codegen diffs for this change will be asked for, one of the easier ways to get them is with the SPMI tool.
I also note that there are more variants of this optimizations possible, since, e. g. AND
with a zero-extending cast makes the other cast effectively zero-extending as well. Seems unlikely to be that valuable to catch them all here, however.
Thank you very much for the feedback! I will fix up the code with your suggestions this afternoon. |
* Rename function to fgMorphCastedBitwiseOp * Don't fold checked arithmetic * Reuse op1 node for return value * Don't run outside global morphing * Various code style and comment tweaks
If it was folded, it was folded to a unary (cast) operation and gtGetOp2() will crash. I also tweaked fgMorphCastedBitwiseOp to return nullptr if it didn't do anything (to match behaviour of fgMorphCommutative)
* Removed all but one csproj * Added tests for scenarios with overflow, compound exprs, side effects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo suggestions. You will also need to fix the formatting.
* Formatting * Use getters instead of fields * set flags on op1
@benjamin-hodgson Thanks for your first contribution. |
Sorry for the delay, looks good to me |
aspnet.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
|
A few small regressions due to CSE, can be fixed if we enable this OPT post GlobalMorph I guess, but not sure it's worth it. Thanks, @benjamin-hodgson! |
This is causing a build break with gcc 11: https://dev.azure.com/dnceng/public/_build/results?buildId=1416610&view=logs&j=9e42d44c-0ead-5638-7035-5fe3dab45232&t=a04f53ad-2969-5223-5463-9bb65c1630a9&l=2110 Simple fix is here: #60333 |
I added a morph pass to fold expressions like
(T)x | (T)y
into(T)(x | y)
. This results in fewermovzx
instructions in the asm.This is my first contribution to the JIT and I am still learning the code, so any feedback you have for me would be much appreciated! In particular, I'm not certain I got the code to manipulate the
GenTree
exactly right - I was following other examples in the codebase but I may have missed something about theGenTree
API.I added my morph to the bottom-up ("post-order") passes in
fgMorphSmpOp
, right after the morph which folds bitwise rotations. If you think this isn't an appropriate place for this optimisation please let me know.The last example from the original issue (
DowncastPtr
) doesn't seem to work yet - it seems the JIT doesn't recognise*(T*)&x == (T)x
(when the cast is a downcast). That seems to me like a separate optimisation; I can file an issue and/or fix it if you like.I added a test in
CodeGenBringUpTests
to make sure this branch was being exercised, but I'm not sure whether that's the right place for it. I couldn't find any lower-level tests which look atGenTree
s directly, I assumed that was a deliberate testing policy choice but would be happy to add one if I missed it.Tested on my 64-bit Intel MacBook.
Fixes #13816