-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set memory order on slow atomics #6920
Conversation
@lhecker and @greg904.... you two seemed to fully understand memory ordering in the SPSC PR. Can you look at my changes here and confirm that my ordering is correct? I sat down this morning to try to understand the ordering and this was what I gathered was the correct behavior. I've tested it out locally and I don't see any ill effects from it. And I've run performance analysis and it's definitely faster this way. |
{ | ||
// <-- | ||
// If `NotifyPaint` is called at this point, then it will not | ||
// set the event because `_fWaiting` is not `true` yet so we have | ||
// to check again below. | ||
|
||
_fWaiting.store(true); | ||
_fWaiting.store(true, std::memory_order_release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it properly, this one probably shouldn't be relaxed because you're checking two different atomics instead of just one (and hoping for consistency across both), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be "relaxed" because the "release" would make prior writes from this thread visible to other threads who "acquire" _fWaiting
's value. But in this case the only one "acquiring" its value is NotifyPaint()
, which doesn't need any prior writes done by _ThreadProc
, since the only thing NotifyPaint()
does is write to _fNextFrameRequested
(but it doesn't rely upon its value, nor does it rely on any other value).
(This information is supplied without liability. ⚖😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking b/c discussion of making it relaxed so nobody merges it in the meantime
If we're concerned about this, then let's just leave it as it stands right now with the release/acquire as I wrote it here. It's better than seq_cst already. Also @lhecker said in a chat that "relaxed" isn't really a thing on x86 or amd64 anyway. It only works on ARM. And we build ARM64 but it's like 0.00001% of our usage. So I'd rather just leave it for now. |
@@ -218,13 +218,13 @@ DWORD WINAPI RenderThread::_ThreadProc() | |||
|
|||
void RenderThread::NotifyPaint() | |||
{ | |||
if (_fWaiting.load()) | |||
if (_fWaiting.load(std::memory_order_acquire)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean... Technically you can shave off a bit time here on x86 as well. While x86 doesn't make a distinction between relaxed and release, it does invalidate caches during an acquire (and that way previously "released" data is being sync'd into your "acquiring" thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm just going to leave it for now. I'm still not 100% comfortable with all this "manual memory order" business. And you did say above that your relaxed recommendation was provided without liability. :P So if the liability is mine, I'd like to stay in my comfy place for right now and consider relaxed in the future.
Hello @miniksa! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Carlos Zamora (1) * UIA: use full buffer comparison in rects and endpoint setter (GH-6447) Dan Thompson (2) * Tweaks: normalize TextAttribute method names (adjective form) (GH-6951) * Fix 'bcz exclusive' typo (GH-6938) Dustin L. Howett (4) * Fix VT mouse capture issues in Terminal and conhost (GH-7166) * version: bump to 1.3 on master * Update Cascadia Code to 2007.15 (GH-6958) * Move to the TerminalDependencies NuGet feed (GH-6954) James Holderness (3) * Render the SGR "underlined" attribute in the style of the font (CC-7148) * Add support for the "crossed-out" graphic rendition attribute (CC-7143) * Refactor grid line renderers with support for more line types (CC-7107) Leonard Hecker (1) * Added til::spsc, a lock-free, single-producer/-consumer FIFO queue (CC-6751) Michael Niksa (6) * Update TAEF to 10.57.200731005-develop (GH-7164) * Skip DX invalidation if we've already scrolled an entire screen worth of height (GH-6922) * Commit attr runs less frequently by accumulating length of color run (GH-6919) * Set memory order on slow atomics (GH-6920) * Cache the viewport to make invalidation faster (GH-6918) * Correct comment in this SPSC test as a quick follow up to merge. Related work items: MSFT-28208358
🎉 Handy links: |
By default, the memory order on atomics is
seq_cst
. This is a relatively expensive ordering and it shows in situations where we're rapidly signaling a consumer to pick up something from a producer. I've instead attempted to switch these torelease
(producer) andacquire
(consumer) to improve the performance of these signals.Validation Steps Performed
time cat big.txt
andtime cat ls.txt
under VS Performance Profiler.PR Checklist