-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global flag optimizations #4027
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alyssarosenzweig
force-pushed
the
opt/global-flag
branch
5 times, most recently
from
September 6, 2024 15:21
45fbec5
to
1c2ecf0
Compare
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
more optimized than TestNZ if we don't care about the sign bit. Signed-off-by: Alyssa Rosenzweig <[email protected]>
so we can get the new axflag optimizations on billy's x13s. Signed-off-by: Alyssa Rosenzweig <[email protected]>
so we can optimize it globally Signed-off-by: Alyssa Rosenzweig <[email protected]>
so we can gate optimizations efficiently Signed-off-by: Alyssa Rosenzweig <[email protected]>
alyssarosenzweig
force-pushed
the
opt/global-flag
branch
2 times, most recently
from
September 7, 2024 14:45
699e861
to
cf055aa
Compare
this makes reasoning about them a little easier, e.g. for flags. about 1% win in nodejs. Signed-off-by: Alyssa Rosenzweig <[email protected]>
not needed and getting in the way Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
nonzero <==> not dummy Signed-off-by: Alyssa Rosenzweig <[email protected]>
nfc Signed-off-by: Alyssa Rosenzweig <[email protected]>
Gather a control flow graph and use it to propagate flags throughout the program. Signed-off-by: Alyssa Rosenzweig <[email protected]>
now that we know whether flags are killed on the edge, we can improve branch isel Signed-off-by: Alyssa Rosenzweig <[email protected]>
this saves uops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
another global CFG-based optimization -- if we know that the raw PF is already 1-bit we can skip parity evaluation, saving work with floating point compares. Signed-off-by: Alyssa Rosenzweig <[email protected]>
it's only really load bearing for pf/af, which is handled as a global flag opt now. this mitigates some of the compile time hit from globalizing flag opts. Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>
alyssarosenzweig
force-pushed
the
opt/global-flag
branch
from
September 7, 2024 14:59
cf055aa
to
46f36dc
Compare
Sonicadvance1
approved these changes
Sep 8, 2024
Nice little opts |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Soup up our flag opt pass to optimize globally for a speed up with multiblock. Then use that framework to implement peephole passes to fuse comparisons & axflag into branches when the flags don't otherwise escape.
Perf #s aren't as impressive as i'd hoped, but Billy found a 0.9% improvement on geekbench on x13s on an early version of this series. Hopefully better now. At this point would like to cut my losses and get this in since it's no worse and XTA does something similar so I was going to get to it eventually anyways.