Set memory order on slow atomics #6920

miniksa · 2020-07-14T21:58:23Z

By default, the memory order on atomics is seq_cst. This is a relatively expensive ordering and it shows in situations where we're rapidly signaling a consumer to pick up something from a producer. I've instead attempted to switch these to release (producer) and acquire (consumer) to improve the performance of these signals.

Validation Steps Performed

Run time cat big.txt and time cat ls.txt under VS Performance Profiler.

PR Checklist

…sumer situation.

miniksa · 2020-07-14T22:00:15Z

@lhecker and @greg904.... you two seemed to fully understand memory ordering in the SPSC PR. Can you look at my changes here and confirm that my ordering is correct?

I sat down this morning to try to understand the ordering and this was what I gathered was the correct behavior.

I've tested it out locally and I don't see any ill effects from it. And I've run performance analysis and it's definitely faster this way.

miniksa · 2020-07-14T22:10:51Z

Before:

After:

It's more pronounced in the NotifyPaint case than in the ThrottledFunc::Run<> case.

DHowett · 2020-07-15T20:12:35Z

src/renderer/base/thread.cpp

        {
            // <--
            // If `NotifyPaint` is called at this point, then it will not
            // set the event because `_fWaiting` is not `true` yet so we have
            // to check again below.

-            _fWaiting.store(true);
+            _fWaiting.store(true, std::memory_order_release);


So if I understand it properly, this one probably shouldn't be relaxed because you're checking two different atomics instead of just one (and hoping for consistency across both), right?

It can be "relaxed" because the "release" would make prior writes from this thread visible to other threads who "acquire" _fWaiting's value. But in this case the only one "acquiring" its value is NotifyPaint(), which doesn't need any prior writes done by _ThreadProc, since the only thing NotifyPaint() does is write to _fNextFrameRequested (but it doesn't rely upon its value, nor does it rely on any other value).
(This information is supplied without liability. ⚖😄)

DHowett

blocking b/c discussion of making it relaxed so nobody merges it in the meantime

miniksa · 2020-07-17T16:25:29Z

blocking b/c discussion of making it relaxed so nobody merges it in the meantime

If we're concerned about this, then let's just leave it as it stands right now with the release/acquire as I wrote it here. It's better than seq_cst already. Also @lhecker said in a chat that "relaxed" isn't really a thing on x86 or amd64 anyway. It only works on ARM. And we build ARM64 but it's like 0.00001% of our usage. So I'd rather just leave it for now.

lhecker · 2020-07-17T16:32:44Z

src/renderer/base/thread.cpp

@@ -218,13 +218,13 @@ DWORD WINAPI RenderThread::_ThreadProc()

 void RenderThread::NotifyPaint()
 {
-    if (_fWaiting.load())
+    if (_fWaiting.load(std::memory_order_acquire))


I mean... Technically you can shave off a bit time here on x86 as well. While x86 doesn't make a distinction between relaxed and release, it does invalidate caches during an acquire (and that way previously "released" data is being sync'd into your "acquiring" thread).

I think I'm just going to leave it for now. I'm still not 100% comfortable with all this "manual memory order" business. And you did say above that your relaxed recommendation was provided without liability. :P So if the liability is mine, I'd like to stay in my comfy place for right now and consider relaxed in the future.

ghost · 2020-07-17T17:11:42Z

Hello @miniksa!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

Carlos Zamora (1) * UIA: use full buffer comparison in rects and endpoint setter (GH-6447) Dan Thompson (2) * Tweaks: normalize TextAttribute method names (adjective form) (GH-6951) * Fix 'bcz exclusive' typo (GH-6938) Dustin L. Howett (4) * Fix VT mouse capture issues in Terminal and conhost (GH-7166) * version: bump to 1.3 on master * Update Cascadia Code to 2007.15 (GH-6958) * Move to the TerminalDependencies NuGet feed (GH-6954) James Holderness (3) * Render the SGR "underlined" attribute in the style of the font (CC-7148) * Add support for the "crossed-out" graphic rendition attribute (CC-7143) * Refactor grid line renderers with support for more line types (CC-7107) Leonard Hecker (1) * Added til::spsc, a lock-free, single-producer/-consumer FIFO queue (CC-6751) Michael Niksa (6) * Update TAEF to 10.57.200731005-develop (GH-7164) * Skip DX invalidation if we've already scrolled an entire screen worth of height (GH-6922) * Commit attr runs less frequently by accumulating length of color run (GH-6919) * Set memory order on slow atomics (GH-6920) * Cache the viewport to make invalidation faster (GH-6918) * Correct comment in this SPSC test as a quick follow up to merge. Related work items: MSFT-28208358

ghost · 2020-08-26T17:01:42Z

🎉Windows Terminal Preview v1.3.2382.0 has been released which incorporates this pull request.:tada:

Handy links:

miniksa added 2 commits July 14, 2020 14:50

use acquire/release instead of seq_cst for single producer single con…

824e07d

…sumer situation.

use release/acquire over seq cst on the atomic flag in throttledfunc

226cc95

miniksa self-assigned this Jul 14, 2020

ignore std memory_order spelling.

fcbbd72

miniksa marked this pull request as ready for review July 15, 2020 16:34

carlos-zamora approved these changes Jul 15, 2020

View reviewed changes

DHowett reviewed Jul 15, 2020

View reviewed changes

DHowett requested changes Jul 17, 2020

View reviewed changes

ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jul 17, 2020

ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jul 17, 2020

DHowett approved these changes Jul 17, 2020

View reviewed changes

lhecker reviewed Jul 17, 2020

View reviewed changes

miniksa added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 17, 2020

ghost merged commit ea2bd42 into master Jul 17, 2020

ghost deleted the dev/miniksa/perf_ordering branch July 17, 2020 17:11

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set memory order on slow atomics #6920

Set memory order on slow atomics #6920

miniksa commented Jul 14, 2020

miniksa commented Jul 14, 2020

miniksa commented Jul 14, 2020

DHowett Jul 15, 2020

lhecker Jul 15, 2020

DHowett left a comment

miniksa commented Jul 17, 2020

lhecker Jul 17, 2020

miniksa Jul 17, 2020

ghost commented Jul 17, 2020

ghost commented Aug 26, 2020

Set memory order on slow atomics #6920

Set memory order on slow atomics #6920

Conversation

miniksa commented Jul 14, 2020

Validation Steps Performed

PR Checklist

miniksa commented Jul 14, 2020

miniksa commented Jul 14, 2020

DHowett Jul 15, 2020

Choose a reason for hiding this comment

lhecker Jul 15, 2020

Choose a reason for hiding this comment

DHowett left a comment

Choose a reason for hiding this comment

miniksa commented Jul 17, 2020

lhecker Jul 17, 2020

Choose a reason for hiding this comment

miniksa Jul 17, 2020

Choose a reason for hiding this comment

ghost commented Jul 17, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

ghost commented Aug 26, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.