-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize hot path in textBufferCellIterator #10621
Changes from all commits
3f67edf
8aadccb
7feea77
737fe3c
f04c94e
53cfb27
7ed2a76
050dc7e
8753573
2a12085
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1021,6 +1021,7 @@ IAction | |
IApi | ||
IApplication | ||
IBase | ||
ICache | ||
icacls | ||
iccex | ||
icch | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ Module Name: | |
- OutputCellView.hpp | ||
|
||
Abstract: | ||
- Read-only view into a single cell of data that someone is attempting to write into the output buffer. | ||
- Read view into a single cell of data that someone is attempting to write into the output buffer. | ||
- This is done for performance reasons (avoid heap allocs and copies). | ||
|
||
Author: | ||
|
@@ -36,6 +36,21 @@ class OutputCellView | |
TextAttribute TextAttr() const noexcept; | ||
TextAttributeBehavior TextAttrBehavior() const noexcept; | ||
|
||
void UpdateText(const std::wstring_view& view) noexcept | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand why calling each of these bits separately is more performant than replacing them all at once via a new-construction. It's also sort of confusing why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like exposing these methods, either. I love the idea of read-only view. Here's my understanding of why this helps. Before this PR, the code looks like this: const auto diff = gsl::narrow_cast<ptrdiff_t>(newPos.X) - gsl::narrow_cast<ptrdiff_t>(_pos.X);_
_attrIter += diff;
// Here we know _attrIter is hot and in cache.
//
// ...Some code here.
_view = OutputCellView(_pRow->GetCharRow().GlyphAt(_pos.X),
_pRow->GetCharRow().DbcsAttrAt(_pos.X),
// After all the _pRow operations, _attrIter may not be in cache anymore.
*_attrIter,
TextAttributeBehavior::Stored); After this PR: const auto diff = gsl::narrow_cast<ptrdiff_t>(newX) - gsl::narrow_cast<ptrdiff_t>(oldX);
_attrIter += diff;
// We know for sure _attrIter is hot. And after this point, it can be safely discarded.
_view.UpdateTextAttribute(*_attrIter); This is why I think the fine-grained I'd love to see how @lhecker see this in the assembly view, which would be more accurate. I for one can't read assembly that well yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, it's because I don't need |
||
{ | ||
_view = view; | ||
}; | ||
|
||
void UpdateDbcsAttribute(const DbcsAttribute& dbcsAttr) noexcept | ||
{ | ||
_dbcsAttr = dbcsAttr; | ||
} | ||
|
||
void UpdateTextAttribute(const TextAttribute& textAttr) noexcept | ||
{ | ||
_textAttr = textAttr; | ||
} | ||
|
||
bool operator==(const OutputCellView& view) const noexcept; | ||
bool operator!=(const OutputCellView& view) const noexcept; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,20 +94,93 @@ bool TextBufferCellIterator::operator!=(const TextBufferCellIterator& it) const | |
// - Reference to self after movement. | ||
TextBufferCellIterator& TextBufferCellIterator::operator+=(const ptrdiff_t& movement) | ||
{ | ||
// Note that this method is called intensively when the terminal is under heavy load. | ||
// We need to aggressively optimize it, comparing to the -= operator. | ||
ptrdiff_t move = movement; | ||
auto newPos = _pos; | ||
while (move > 0 && !_exceeded) | ||
if (move < 0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's say we are doing the "1 million yes" benchmark with a typical 120x80 windows. The First, this line here is the early branching to avoid the very rare case (move < 0) and leave it to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if Same goes for I'm OK with it being more complex inside this operator specifically as long as it's well documented why things are ordered and computed in the way they are (like specifically calling out which items below are taking advantage of cache improvements so no one tries to refactor it out to be "easier to read" but significantly worse performance in the future.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides the "early optimization is the root of all evil" cliche, I'd like to point out that Take a look at my "before" screenshot and you will see Another example: I've actually tried to remove the code inside this method step by step. At some point, even a single line like In conclusion, it takes accurate benchmarking to "port" this optimization somewhere else. And the optimization may very well be NOT necessary in other code paths. I got the inspiration from @lhecker and he might want to add his opinions here, too. I'll try to add more comments as documentation to prevent "easier to read" refactoring in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great justification. I'm fine with you not optimizing the other ones. Just call out some of this in the comments in the code and I'm sold. |
||
{ | ||
_exceeded = !_bounds.IncrementInBounds(newPos); | ||
// Early branching to leave the rare case to -= operator. | ||
// This helps reducing the instruction count within this method, which is good for instruction cache. | ||
return *this -= -move; | ||
} | ||
|
||
// The remaining code in this function is functionally equivalent to: | ||
// auto newPos = _pos; | ||
// while (move > 0 && !_exceeded) | ||
// { | ||
// _exceeded = !_bounds.IncrementInBounds(newPos); | ||
// move--; | ||
// } | ||
// _SetPos(newPos); | ||
// | ||
// _SetPos() necessitates calling _GenerateView() and thus the construction | ||
// of a new OutputCellView(). This has a high performance impact (ICache spill?). | ||
// The code below inlines _bounds.IncrementInBounds as well as SetPos. | ||
// In the hot path (_pos.Y doesn't change) we modify the _view directly. | ||
|
||
// Hoist these integers which will be used frequently later. | ||
const auto boundsRightInclusive = _bounds.RightInclusive(); | ||
const auto boundsLeft = _bounds.Left(); | ||
const auto boundsBottomInclusive = _bounds.BottomInclusive(); | ||
const auto boundsTop = _bounds.Top(); | ||
const auto oldX = _pos.X; | ||
const auto oldY = _pos.Y; | ||
|
||
// Under MSVC writing the individual members of a COORD generates worse assembly | ||
// compared to having them be local variables. This causes a performance impact. | ||
auto newX = oldX; | ||
auto newY = oldY; | ||
|
||
while (move > 0) | ||
{ | ||
if (newX == boundsRightInclusive) | ||
{ | ||
newX = boundsLeft; | ||
newY++; | ||
if (newY > boundsBottomInclusive) | ||
{ | ||
newY = boundsTop; | ||
_exceeded = true; | ||
break; | ||
} | ||
} | ||
else | ||
{ | ||
newX++; | ||
_exceeded = false; | ||
} | ||
move--; | ||
} | ||
while (move < 0 && !_exceeded) | ||
|
||
if (_exceeded) | ||
{ | ||
_exceeded = !_bounds.DecrementInBounds(newPos); | ||
move++; | ||
// Early return because nothing needs to be done here. | ||
return *this; | ||
} | ||
_SetPos(newPos); | ||
return (*this); | ||
|
||
if (newY == oldY) | ||
{ | ||
// hot path | ||
const auto diff = gsl::narrow_cast<ptrdiff_t>(newX) - gsl::narrow_cast<ptrdiff_t>(oldX); | ||
_attrIter += diff; | ||
_view.UpdateTextAttribute(*_attrIter); | ||
|
||
const CharRow& charRow = _pRow->GetCharRow(); | ||
_view.UpdateText(charRow.GlyphAt(newX)); | ||
_view.UpdateDbcsAttribute(charRow.DbcsAttrAt(newX)); | ||
_pos.X = newX; | ||
} | ||
else | ||
{ | ||
// cold path (_GenerateView is slow) | ||
_pRow = s_GetRow(_buffer, { newX, newY }); | ||
_attrIter = _pRow->GetAttrRow().cbegin() + newX; | ||
_pos.X = newX; | ||
_pos.Y = newY; | ||
_GenerateView(); | ||
} | ||
|
||
return *this; | ||
} | ||
|
||
// Routine Description: | ||
|
@@ -118,7 +191,22 @@ TextBufferCellIterator& TextBufferCellIterator::operator+=(const ptrdiff_t& move | |
// - Reference to self after movement. | ||
TextBufferCellIterator& TextBufferCellIterator::operator-=(const ptrdiff_t& movement) | ||
{ | ||
return this->operator+=(-movement); | ||
ptrdiff_t move = movement; | ||
if (move < 0) | ||
{ | ||
return (*this) += (-move); | ||
} | ||
|
||
auto newPos = _pos; | ||
while (move > 0 && !_exceeded) | ||
{ | ||
_exceeded = !_bounds.DecrementInBounds(newPos); | ||
move--; | ||
} | ||
_SetPos(newPos); | ||
|
||
_GenerateView(); | ||
return (*this); | ||
} | ||
|
||
// Routine Description: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's.. not read-only any more? You can write through it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, you can now UpdateSomething in it, which disqualified the “read-only” part, right?
Update: ah I now see what you mean. No, you can't write through it to the real buffer. It's still only a "view" of the buffer. But now you can update properties in it, so I don't think it's "read-only" anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I've tried various ways to keep the original
_GenerateView
(and preserve the read-only-ness inOutputCellView
), but I failed to find a way to achieve the same level of performance as the current implementation. I've discussed with @lhecker about this, and he seems to concur with the solution.