Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix output stuttering using a ticket lock #10653

Merged
2 commits merged into from
Jul 14, 2021
Merged

Fix output stuttering using a ticket lock #10653

2 commits merged into from
Jul 14, 2021

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Jul 14, 2021

SRWLOCK, as used by std::shared_mutex, is a inherently unfair mutex
and makes no guarantee whatsoever whether a thread may acquire the lock
in a timely manner. This is problematic for our renderer which relies on
being able to acquire the lock in a timely and predictable manner.
Drawing stalls of up to one minute have been observed in tests.

This issue can be solved with a primitive ticket lock, which is 10x
slower than a SRWLOCK but still sufficiently fast for our use case
(10M locks per second per thread). It's likely that any non-trivial lock
duration will diminish the difference to "negligible".

Validation Steps Performed

  • It still blends ✔️

@lhecker lhecker added Area-Quality Stability, Performance, Etc. Product-Terminal The new Windows Terminal. labels Jul 14, 2021
@skyline75489
Copy link
Collaborator

May I ask what exactly is the "stuttering" you mentioned? Also how can I repro it locally.

@zadjii-msft
Copy link
Member

Are there any before/after benchmarks on this?

@WSLUser
Copy link
Contributor

WSLUser commented Jul 14, 2021

Would something like https://github.com/AlexeyAB/object_threadsafe be a good alternative that also improves speed?

@lhecker
Copy link
Member Author

lhecker commented Jul 14, 2021

@WSLUser While that project shows very impressive benchmark numbers, it's an unfair lock design as well and would lead to the same issue that the SRWLOCK has in our use case.

@lhecker
Copy link
Member Author

lhecker commented Jul 14, 2021

@zadjii-msft I've finished benchmarking the new lock.

Before:

Function Name Total CPU [unit, %] Self CPU [unit, %]
ControlCore::_connectionOutputHandler 7863 (12.83 %) 2 (0.00 %)
┣ StateMachine::ProcessString 7840 (12.79 %) 308 (0.50 %)
┣ [External Call] RtlAcquireSRWLockExclusive 11 (0.02 %) 11 (0.02 %)
┣ [External Call] RtlReleaseSRWLockExclusive 5 (0.01 %) 5 (0.01 %)
┣ _ReceivedOutputHandlers 5 (0.01 %) 0 (0.00 %)

After:

Function Name Total CPU [unit, %] Self CPU [unit, %]
ControlCore::_connectionOutputHandler 7592 (13.10 %) 1 (0.00 %)
┣ StateMachine::ProcessString 7572 (13.07 %) 284 (0.49 %)
┣ til::ticket_lock::lock 15 (0.03 %) 1 (0.00 %)
┣ [External Call] RtlpWakeByAddress 2 (0.00 %) 2 (0.00 %)
┣ _ReceivedOutputHandlers 2 (0.00 %) 0 (0.00 %)

The difference appears to be within the margin of error.

I'd love to make a video of the stuttering/freezes, but they only occur when the stars align, since it heavily depends on how the scheduler feels like on any given day. It's pretty spectacular when it happens though, because it seemingly freezes the screen for multiple seconds.
In fact I'm about 100% certain that this is exactly what happened in the later parts of the refterm v2 YouTube video, when WT seemingly freezes when printing long lines. If you didn't see it yet, I recommend checking out those freezes - it's pretty "spectacular" indeed.

@DHowett DHowett requested a review from miniksa July 14, 2021 16:55
src/inc/til/ticket_lock.h Show resolved Hide resolved
for (;;)
{
const auto current = _now_serving.load(std::memory_order_acquire);
if (current == ticket)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is provably no case where now_serving will jump multiple steps above the expected ticket value, correct?

Copy link
Member Author

@lhecker lhecker Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, if someone where to call unlock() multiple times.
But if you don't, it works exactly like the Wikipedia article describes the algorithm.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to consider the work we've done to make sure we were using a reader/writer lock correctly, and possibly how little of an impact it actually had on our performance. Is there a possible net loss for operations that require reading, as they now contend with operations that require writing? Ideally, reading would be much more common and therefore "fast-pathable".

@lhecker
Copy link
Member Author

lhecker commented Jul 14, 2021

@DHowett A shared lock, like the one we used before this PR, would only help with concurrent reads, as locking the mutex as a reader prevents any writers from progressing.
And I don't believe we have any high-frequency concurrent reads into our console buffer. We only use it for:

  • Retrieving the current selected text - low read rate
  • When hovering or clicking links - low read rate
  • When hovering a cell (I guess? it's UpdateHoveredCell which I don't understand what it's for) - low read rate
  • Accessibility cursor notifications (CurrentCursorPosition event) - might be high-ish read rate?
  • Rendering - high read rate, high lock duration

We can't really build a lock that fast-paths reads, because that would lead to an unfair lock again.
However we could build a lock which allows concurrent reads, but the lock design would then be much more complicated than it currently is.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this.

As a note, I edited your pull request body to match the commit message style of the repo.

//
// But we can abuse the fact that the surrounding members rarely change and are huge
// (std::function is like 64 bytes) to create some natural padding without wasting space.
til::ticket_lock _readWriteLock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the alignas now, would that make it safer (in case somebody changes the order) and not compromise the layout by adding wasted padding space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because... this is a micro-optimization somebody could easily accidentally remove or not fully understand and fail to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignas would add 56 bytes of padding space and force _readWriteLock to be 64 bytes large.
I had hoped that the comment I added there would be enough to prevent others from accidentally changing the order. It someone would move it to be next to the later members in this class it would double the overhead of the ticket lock (from 0.03% to 0.06% CPU usage), but I think the comment should be an okay-ish reminder to not do that. I hope?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is OK. I wonder if we'd ever be possessed to remove all the std::functions though and then the trick wouldn't work anymore.

@DHowett
Copy link
Member

DHowett commented Jul 14, 2021

@msftbot make sure @miniksa signs off

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 14, 2021
@ghost
Copy link

ghost commented Jul 14, 2021

Hello @DHowett!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @miniksa

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

{
return std::shared_lock<std::shared_mutex>(_readWriteLock);
return std::unique_lock{ _readWriteLock };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no difference any more between Locking for Writing and Reading thanks to the mechanics of the ticket lock... wouldn't it make sense to have one call the other to make that super clear until the future revision where you build that more complicated fair read/write lock for funsies?

//
// But we can abuse the fact that the surrounding members rarely change and are huge
// (std::function is like 64 bytes) to create some natural padding without wasting space.
til::ticket_lock _readWriteLock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is OK. I wonder if we'd ever be possessed to remove all the std::functions though and then the trick wouldn't work anymore.

@ghost
Copy link

ghost commented Aug 31, 2021

🎉Windows Terminal Preview v1.11.2421.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Quality Stability, Performance, Etc. AutoMerge Marked for automatic merge by the bot when requirements are met Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants