-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make til::color's COLORREF conversion more optimal #7619
Conversation
Clang (10) has no trouble optimizing the COLORREF conversion operator, below, to a simple 32-bit load with mask (!) even though it's a series of bit shifts across mutliple struct members. MSVC (19.24) doesn't make the same optimization decision, and it emits three 8-bit loads and some shifting. In any case, the optimimization only applies at -O2 (clang) and above. In this commit, we leverage the spec-legality of using unions for type conversions and the overlap of four uint8_ts and a uint32_t to make the conversion very obvious to both compilers. x86_64 msvc | O0 | O1 | O2 ------------|----|----|-------------------- shifts | 12 | 11 | 11 (fully inlined) union | 5 | 1 | 1 (fully inlined) x86_64 clang | O0 | O1 | O2 + O3 -------------|----|----|-------------------- shifts | 14 | 5 | 1 (fully inlined) union | 9 | 3 | 1 (fully inlined)
Poke around at the implementation here: |
You work on the weirdest things in your free time 😄 |
@zadjii-msft HA! haha. You know, you're right about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's a micro optimization if I've seen one. But sure.
Hello @DHowett! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
@DHowett This is awesome! Thank you for doing this. |
@j4james Thanks for being a champion for perf here! 😄 |
Clang (10) has no trouble optimizing the COLORREF conversion operator to a simple 32-bit load with mask (!) even though it's a series of bit shifts across multiple struct members. MSVC (19.24) doesn't make the same optimization decision, and it emits three 8-bit loads and some shifting. In any case, the optimization only applies at -O2 (clang) and above. In this commit, we leverage the spec-legality of using unions for type conversions and the overlap of four uint8_ts and a uint32_t to make the conversion very obvious to both compilers. x86_64 msvc | O0 | O1 | O2 ------------|----|----|-------------------- shifts | 12 | 11 | 11 (fully inlined) union | 5 | 1 | 1 (fully inlined) x86_64 clang | O0 | O1 | O2 + O3 -------------|----|----|-------------------- shifts | 14 | 5 | 1 (fully inlined) union | 9 | 3 | 1 (fully inlined) j4james brought up some concerns about til::color's minor wastefulness in #7578 (comment). This is a clear, simple transformation that saves us a few instructions in a relatively common case, so I'm accepting a micro-optimization even though we don't have data showing this to be a hot spot. (cherry picked from commit c17f448)
Clang (10) has no trouble optimizing the COLORREF conversion operator to
a simple 32-bit load with mask (!) even though it's a series of bit
shifts across multiple struct members.
MSVC (19.24) doesn't make the same optimization decision, and it emits
three 8-bit loads and some shifting.
In any case, the optimization only applies at -O2 (clang) and above.
In this commit, we leverage the spec-legality of using unions for type
conversions and the overlap of four uint8_ts and a uint32_t to make the
conversion very obvious to both compilers.
j4james brought up some concerns about til::color's minor wastefulness
in #7578 (comment).
This is a clear, simple transformation that saves us a few instructions
in a relatively common case, so I'm accepting a micro-optimization even
though we don't have data showing this to be a hot spot.