Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make til::color's COLORREF conversion more optimal #7619

Merged
3 commits merged into from
Sep 14, 2020
Merged

Conversation

DHowett
Copy link
Member

@DHowett DHowett commented Sep 13, 2020

Clang (10) has no trouble optimizing the COLORREF conversion operator to
a simple 32-bit load with mask (!) even though it's a series of bit
shifts across multiple struct members.

MSVC (19.24) doesn't make the same optimization decision, and it emits
three 8-bit loads and some shifting.

In any case, the optimization only applies at -O2 (clang) and above.

In this commit, we leverage the spec-legality of using unions for type
conversions and the overlap of four uint8_ts and a uint32_t to make the
conversion very obvious to both compilers.

x86_64 msvc O0 O1 O2
shifts 12 11 11 (fully inlined)
union 5 1 1 (fully inlined)
x86_64 clang O0 O1 O2 + O3
shifts 14 5 1 (fully inlined)
union 9 3 1 (fully inlined)

j4james brought up some concerns about til::color's minor wastefulness
in #7578 (comment).

This is a clear, simple transformation that saves us a few instructions
in a relatively common case, so I'm accepting a micro-optimization even
though we don't have data showing this to be a hot spot.

Clang (10) has no trouble optimizing the COLORREF conversion operator,
below, to a simple 32-bit load with mask (!) even though it's a series
of bit shifts across mutliple struct members.

MSVC (19.24) doesn't make the same optimization decision, and it emits
three 8-bit loads and some shifting.

In any case, the optimimization only applies at -O2 (clang) and above.

In this commit, we leverage the spec-legality of using unions for type
conversions and the overlap of four uint8_ts and a uint32_t to make the
conversion very obvious to both compilers.

x86_64 msvc | O0 | O1 | O2
------------|----|----|--------------------
shifts      | 12 | 11 | 11 (fully inlined)
union       |  5 |  1 |  1 (fully inlined)

x86_64 clang | O0 | O1 | O2 + O3
-------------|----|----|--------------------
shifts       | 14 |  5 |  1 (fully inlined)
union        |  9 |  3 |  1 (fully inlined)
@DHowett DHowett changed the title Adjust til::color's 32-bit conversion to be more optimal Adjust til::color's COLORREF to be more optimal Sep 13, 2020
@DHowett DHowett changed the title Adjust til::color's COLORREF to be more optimal Adjust til::color's COLORREF conversion to be more optimal Sep 13, 2020
@DHowett
Copy link
Member Author

DHowett commented Sep 13, 2020

Poke around at the implementation here:

https://godbolt.org/z/oWvj6z

@zadjii-msft
Copy link
Member

You work on the weirdest things in your free time 😄

@DHowett
Copy link
Member Author

DHowett commented Sep 13, 2020

@zadjii-msft HA! haha. You know, you're right about that.

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's a micro optimization if I've seen one. But sure.

@DHowett DHowett changed the title Adjust til::color's COLORREF conversion to be more optimal Make til::color's COLORREF conversion more optimal Sep 14, 2020
@DHowett DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Sep 14, 2020
@ghost
Copy link

ghost commented Sep 14, 2020

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit c17f448 into master Sep 14, 2020
@ghost ghost deleted the dev/duhowett/optimal branch September 14, 2020 18:51
@j4james
Copy link
Collaborator

j4james commented Sep 14, 2020

@DHowett This is awesome! Thank you for doing this.

@DHowett
Copy link
Member Author

DHowett commented Sep 14, 2020

@j4james Thanks for being a champion for perf here! 😄

DHowett added a commit that referenced this pull request Sep 18, 2020
Clang (10) has no trouble optimizing the COLORREF conversion operator to
a simple 32-bit load with mask (!) even though it's a series of bit
shifts across multiple struct members.

MSVC (19.24) doesn't make the same optimization decision, and it emits
three 8-bit loads and some shifting.

In any case, the optimization only applies at -O2 (clang) and above.

In this commit, we leverage the spec-legality of using unions for type
conversions and the overlap of four uint8_ts and a uint32_t to make the
conversion very obvious to both compilers.

x86_64 msvc | O0 | O1 | O2
------------|----|----|--------------------
shifts      | 12 | 11 | 11 (fully inlined)
union       |  5 |  1 |  1 (fully inlined)

x86_64 clang | O0 | O1 | O2 + O3
-------------|----|----|--------------------
shifts       | 14 |  5 |  1 (fully inlined)
union        |  9 |  3 |  1 (fully inlined)

j4james brought up some concerns about til::color's minor wastefulness
in #7578 (comment).

This is a clear, simple transformation that saves us a few instructions
in a relatively common case, so I'm accepting a micro-optimization even
though we don't have data showing this to be a hot spot.

(cherry picked from commit c17f448)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoMerge Marked for automatic merge by the bot when requirements are met
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants