Introduce AtlasEngine - A new text rendering prototype #11623

lhecker · 2021-10-27T02:05:12Z

This commit introduces "AtlasEngine", a new text renderer based on DxEngine.
But unlike it, DirectWrite and Direct2D are only used to rasterize glyphs.
Blending and placing these glyphs into the target view is being done using
Direct3D and a simple HLSL shader. Since this new renderer more aggressively
assumes that the text is monospace, it simplifies the implementation:
The viewport is divided into cells, and its data is stored as a simple matrix.
Modifications to this matrix involve only simple pointer arithmetic and is easy
to understand. But just like with DxEngine however, DirectWrite
related code remains extremely complex and hard to understand.

Supported features:

Basic text rendering with grayscale AA
Foreground and background colors
Emojis, including zero width joiners
Underline, dotted underline, strikethrough
Custom font axes and features
Selections
All cursor styles
Full alpha support for all colors
Should work with Windows 7

Unsupported features:

A more conservative GPU memory usage
The backing texture atlas for glyphs is grow-only and will not shrink.
After 256MB of memory is used up (~20k glyphs) text output
will be broken until the renderer is restarted.
ClearType
Remaining gridlines (left, right, top, bottom, double underline)
Hyperlinks don't get full underlines if hovered in WT
Softfonts
Non-default line renditions

Performance:

Runs at up to native display refresh rate
Unfortunately the frame rate often drops below refresh rate, due us
fighting over the buffer lock with other parts of the application.
CPU consumption is up to halved compared to DxEngine
AtlasEngine is still highly unoptimized. Glyph hashing
consumes up to a third of the current CPU time.
No regressions in WT performance
VT parsing and related buffer management takes up most of the CPU time (~85%),
due to which the AtlasEngine can't show any further improvements.
~2x improvement in raw text throughput in OpenConsole
compared to DxEngine running at 144 FPS
≥10x improvement in colored VT output in WT/OpenConsole
compared to DxEngine running at 144 FPS

zadjii-msft

well I'm at 29/56 but obviously haven't reviewed the ~~scary~~good stuff yet

src/host/settings.cpp

src/host/selection.cpp

zadjii-msft · 2021-10-27T12:18:40Z

src/renderer/base/RenderEngineBase.cpp

@@ -74,5 +74,6 @@ HRESULT RenderEngineBase::PrepareLineTransform(const LineRendition /*lineRenditi
 // - Blocks until the engine is able to render without blocking.
 void RenderEngineBase::WaitUntilCanRender() noexcept
 {
-    // do nothing by default
+    // Throttle the render loop a bit by default.
+    Sleep(8);


This Sleep(8) was removed from RenderThread and moved here, allowing AtlasEngine to run at more than 60 FPS.

‽

Look at that beautiful use of an interrobang.

src/interactivity/win32/window.cpp

DHowett · 2021-10-27T16:10:07Z

src/renderer/dx/DxRenderer.hpp

@@ -40,98 +40,61 @@ namespace Microsoft::Console::Render
    {
    public:
        DxEngine();
-        ~DxEngine();


this is a lot of churn for a renderer that isn't changing in this PR!

It seems like the changes to this file include...

rearrange the functions

remove the <ctor> = default stuff

add const noexcept to ToggleShaderEffects

clean up the #includes at the top

remove _EnableDisplayAccess

remove the const after pointers (i.e. const XXX* const --> const XXX*)

Honestly, this seems mostly fine to me, just a bit jarring. Mainly concerned about the last two removals.

DHowett · 2021-10-27T16:10:53Z

src/renderer/inc/IRenderEngine.hpp

@@ -14,12 +14,16 @@ Author(s):

 #pragma once

+#include <d2d1.h>


fwiw this is some unusual layering -- giving the general interface knowledge of D2D primitives seems like the wrong way to go given that 4/6 of the engines aren't D2D-based

Good! Because I also don't like having D2D stuff in my API. The solution here would be to have our own enum instead. I think that would be a good fit for a followup PR, since it's a minor change.

Splitting this interface into one for 4/6 engines and one for 2/6 engines, would not just unnecessarily increase the binary size further, but also unnecessarily Javarize the code base if I'm honest. I think we are better served by both reducing the number of interfaces as well as the number of functions on those interfaces.
There's technically no reason for the IRenderEngine to have that many functions if we:

use inversion of control and write "config providers" instead of having setters for everything

use fat data transfer objects instead of finely granular setters

we only render at ~240 FPS at most. 240 COM calls per second are nothing to the CPU, so if we use both of the previous points, we can just ask the IRenderData provider 240 per second for the complete set of configuration anew and performance would be better than it is now (due to lack of CFG checks on each virtual call)

I remember discussing with @lhecker about something like this, which eventually narrowed down to a single question: should we pretend that we don't know what exact engines are
there in the codebase. Code-wise, we all know what the "perfect" solution might look like ("Javazire" perhaps?), but if that's causing too much performance penalty than it should, I'm OK with a more practical solution, even though it may seem a bit unorthodox.

I've been sorely tempted to do this in the past as well (add the d2d headers... its sort of where the initial til::point/til::coord/til::rectangle came from was avoiding that.)

We've proven @lhecker is super correct about us needing to reduce the number of calls and make big data transfers in a future work item.

DHowett · 2021-10-27T16:11:55Z

src/renderer/inc/IRenderEngine.hpp

+        // be abstracted away and integrated into the above or simply get removed.
+
+        // DxRenderer - getter
+        virtual HRESULT Enable() noexcept { return S_OK; }


FWIW this is our first foray into having a base implementation in this renderer; we should observe the cost it has on conhost

Ha! I knew this question would come up. The answer is: none. At least not one we will care about for now.
The reason for this is that abstract classes aren't actually compiled into binaries. Only vtables are and those only exist on concrete classes. Since the number of concrete classes hasn't changed the size increase is limited to virtual_functions * concrete_classes * sizeof(void*) which is 168 byte for inbox conhost on x64. If I had put these functions into a new, special purpose interface just for AtlasEngine and DxEngine, we'd only save 84 byte. If we merge IRenderEngine and BaseEngine together however we already save like 2kB alone.

DHowett · 2021-10-27T16:13:22Z

src/inc/til/pair.h

+    // pair is a simple clone of std::pair, with one difference:
+    // copy and move constructors and operators are explicitly defaulted.
+    // This allows pair to be std::is_trivially_copyable, if both T and S are.
+    // --> pair can be used with memcpy(), unlike std::pair.


do we need a followup to deduplicate til::rle_pair?

No I think it's fine if their live their independent lives...
til::pair has first/second members and til::rle_pair calls them length/value, which is a lot more descriptive.

src/renderer/atlas/AtlasEngine.cpp

DHowett · 2021-10-27T16:16:24Z

src/renderer/atlas/AtlasEngine.cpp

+        {
+            _recreateSizeDependentResources();
+        }
+        if (WI_IsFlagSet(_invalidations, InvalidationFlags::font))


looks like James' enumset

After merging my enumset PR, I'll gladly use it.
Until then I'd like to refrain from depending on std::bitset.

DHowett · 2021-10-27T16:22:55Z

src/renderer/atlas/AtlasEngine.cpp

+    static constexpr
+#endif
+        // Why D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS:
+        // This flag prevents the driver from creating a large thread pool for things like shader computations


Would this flag help DxEngine as well?

DHowett · 2021-10-27T16:23:33Z

src/renderer/atlas/AtlasEngine.cpp

+    {
+        deviceFlags |= D3D11_CREATE_DEVICE_DEBUG;
+
+        const auto DXGIGetDebugInterface = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(GetProcAddress(module.get(), "DXGIGetDebugInterface"));


is all of this stuff publicly documented? Is there a header you could use to get the signature for DXGIGetDebugxxx, so we can simplify this and use the wil module helper?

Why use D3D11 and not D3D12?

Let me turn this question around: Why use D3D12 and not D3D11? Newer software isn't faster than old software.
But hardware is and D3D12 allows you to better access these newer capabilities.
However this engine is so simple that the core rendering loop only needs 3 graphics commands per frame. There's not much that the newer API could help with performance here.

But most importantly, this engine is supposed to work on Windows 7 which only supports D3D10 (accessible through D3D11), because it is used in Visual Studio 2019 (and that one supports Windows 7 until 2029 - not kidding). As such D3D12 could only ever be added as an alternative engine and not be the only implementation.

Oh I just found out that there's a "D3D12onWin7" document!
But it's only for x64 and doesn't support WARP, which we need because we have to support DirectX 9 hardware. 😔

One day I'll probably check out D3D12 anyways, because it's interesting. 😄
I suspect that it won't improve performance, because - as far as I understand it - D3D12 helps parallelize things, whereas this engine doesn't have enough complexity to benefit from this. But maybe I'm wrong and in that case it'd be fun having a cool D3D12 engine for modern hardware.

DHowett · 2021-10-27T16:26:22Z

src/renderer/atlas/AtlasEngine.cpp

+        if (SUCCEEDED(DXGIGetDebugInterface(IID_PPV_ARGS(infoQueue.addressof()))))
+        {
+            // I didn't want to link with dxguid.lib just for getting DXGI_DEBUG_ALL. This GUID is publicly documented.
+            static constexpr GUID dxgiDebguAll = { 0xe48ae283, 0xda80, 0x490b, { 0x87, 0xe6, 0x43, 0xe9, 0xa9, 0xcf, 0xda, 0x8 } };


debgu.

i'd encourage not excluding the entire atlas renderer from the spell checker ;)

The atlas engine introduces about 20 new words to the spell checker.
Personally I expect that I'm going to rewrite large chunks of the renderer in the future, potentially multiple times, until I find a good solution that fits all requirements.
That's why I've decided to put spell checking on hold for now. I'd like to enable spell checking when I feel that the implementation is good enough. Until then this saves me quite a headache since I can't run the spell check locally.

src/renderer/atlas/AtlasEngine.cpp

DHowett · 2021-10-27T16:39:31Z

Bad news! It doesn't work on ARM64.

zadjii-msft · 2021-10-27T16:53:37Z

WT: no perf change
(Additional WT buffer management and VT processing dwarves rendering overhead at ~85% vs. ~5% CPU usage.)

It's really not fast
This PR doesn't include any changes necessary to make this engine fast.
The current text buffer API is modeled around GDI and DirectWrite and doesn't lend itself to this engine's design paradigms yet. It'll require a series of followup PRs, each of which will unblock this engine's capabilities and bring significant performance improvements with them.

I'm curious what those proposals are, if you've got a good idea for what they will be. Is the fact that it doesn't really improve WT throughput mostly because of conpty limiting frames to 60FPS / painting a frame every time the screen scrolls? Or is there some other reason?

this kind of chart: #1064 (comment) might be helpful for illustrative purposes

skyline75489 · 2021-10-28T09:58:47Z

Regarding to @zadjii-msft's question about throughput in WT: besides the reason you mentioned, the "Text Buffer Rewrite of 2021" epic (deliberately no link here to reduce the noise) is a huge deal when it comes to pure throughput. I'm pretty sure @lhecker has a lot to say here.

If this PR is to be merged later, I think it definitely deserves its own follow-up issue to track future works. We don't want to touch those old issues any more, right? Because of...obvious reasons.

skyline75489 · 2021-10-28T10:05:57Z

src/cascadia/TerminalSettingsEditor/Resources/en-US/Resources.resw

@@ -923,6 +923,10 @@
    <value>Controls what happens when the application emits a BEL character.</value>
    <comment>A description for what the "bell style" setting does. Presented near "Profile_BellStyle".{Locked="BEL"}</comment>
  </data>
+  <data name="Profile_UseAtlasEngine.Header" xml:space="preserve">


I was expecting this to be an option in "Rendering" section, where you can choose to use "Software rendering". Like a dropdown menu to choose between DX, Atlas and DX-software. But this is just my opinion.

meh, I kinda like this being per-profile. For example, WSL profiles with this + passthrough (#11264) would probably have the highest possible throughput, but powershell has a super hard time with passthrough.

I did it this way as it simplifies comparing the DxEngine and AtlasEngine with each other.
You can have two profiles for the same shell using different engines.

I was also expecting it to be in the other location... but I agree with the utility of setting it per profile.

skyline75489 · 2021-10-28T10:10:02Z

src/renderer/atlas/AtlasEngine.cpp

+    }
+
+    // Due to the current IRenderEngine interface (that wasn't refactored yet) we need to assemble
+    // the current buffer line first as the remaining function operates on whole lines of text.


@zadjii-msft An example of how text buffer implementation bottlenecks the throughput.

Yeah this really needs a better way of bulk transferring this information.

skyline75489 · 2021-10-28T10:19:29Z

src/renderer/atlas/AtlasEngine.cpp

+    //
+    // # What do we want?
+    //
+    // Segment a line of text (_api.bufferLine) into unicode "clusters".


There's a small library https://github.com/contour-terminal/libunicode here, which is used in https://github.com/contour-terminal/contour to do similar thing. I think there might be other OSS libraries that can also be helpful.

By know the clusters segmentation before generating glyphs, you can safely avoid "call IDWriteFontFallback::MapCharacters first". Same idea as in #10858

Last I recall, @christianparpart hadn't yet finished up on that lib though I do agree that WT could very much benefit from it, once he's ready to declare it suitable for others.

Oh i can help on that. I am in commute but can carefully read and reply tonight. :)

Okay. Hello!
So libunicode I specifically developed to serve the Unicode needs in my terminal emulator. I could have used ICU, but I really did not feel comfortable with it nor did the API convince me to be suitable for my usecase.
So all in all, I'd love to get libunicode evolve into a general purpose unicode library, in fact it is, but I'd like it to be more complete (no BiDi algo implemented yet for example).

When i was learning about how to properly render all kinds of emoji and other complex unicode sequences including having support for programming ligatures, I did read up on "Blink's text stack" as well as Blink's source code. You may see some API similarity when you look at libunicode, such as emoji segmentation and script segmentation is inspired by Blink's API.

Since the whole subject was so god damn complicated (at least if you start from scratch with like-zero prior knowledge) I decided to write it all up such that I can read it myself whenever I need to get into the subject again.

I can highly recommend you having a look at it: A Look into a terminal emulator's text stack. (so I do not need to spam this thread with details or unnecessarily duplicate text in here).

If there are any questions (or corrections!), I'm all open and in. :-)

p.s.: wrt libunicode, I'd be happy if it serves a purpose outside contour. Licensing should not be an issue. Maybe this lib will also help others :)

I'm unsubscribing this issue due to my own personal reason. Huge thanks to @christianparpart for jumping in and offering valuable information on this specific topic.

src/inc/operators.hpp

christianparpart · 2021-10-28T18:38:14Z

src/renderer/atlas/AtlasEngine.cpp

+    // ## The unfortunate preface
+    //
+    // DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that.
+    // What it offers are a large number of low level APIs that can sort of be used in combination to do this.


This is exactly what I have implemented in libunicode. As I use harfbuzz for text shaping, harfbuzz also does not provide this functionality.

christianparpart · 2021-10-28T18:41:32Z

src/renderer/atlas/AtlasEngine.cpp

+    //
+    // DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that.
+    // What it offers are a large number of low level APIs that can sort of be used in combination to do this.
+    // The resulting text parsing is very slow unfortunately, consuming up to 95% of rendering time in extreme cases.


Text segmentation isn't too expensive. Text shaping indeed the hell is. But since you can also segment by word boundaries, it should be easy to use a "word" as cache key for the text shaping result and the resulting glyph indices in that result again can be used as cache key into the texture atlas. I know there are some terminal emulators out there trying hard not to implement complex text shaping but in my personal experience (and with the caching explained also in the document above), I do not see any reason not to do that.

Currently I'm using "clusters" the way harfbuzz defines them:

In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit. A single letter or symbol can be a cluster of its own. Other clusters correspond to longer subsequences of the input code points — such as a ligature or conjunct form — and require the shaper to ensure that the cluster is not broken during the shaping process.

I wrote this DirectWrite code to break the text into such clusters and use the characters in each cluster as the key for a hash map that holds texture atlas indices. I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from (since each font has it's own list of glyph indices right?). But how do you hash a IDWriteFontFace? I didn't find a good answer to this question so I'm just hashing the characters of clusters for now instead of the glyphs they result in.
The reason I wrote it this way (segmentation into clusters) is because it seemed to me the most straightforward choice of building a Unicode glyph atlas.

Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?

I'm asking as the biggest CPU consumption occurs during MapCharacters in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters. Both IDWriteTextAnalyzer1::GetTextComplexity as well as IDWriteTextAnalyzer::GetGlyphs which do the cluster segmentation for me are rather fast fortunately (~10% CPU usage).

In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit
...

I wrote this DirectWrite code to break the text into such clusters

Exactly. And especially with regards to what harfbuzz says, that is a grapheme cluster (user perceived character composed of one or more codepoints). This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement. The link I posted above goes in more detail on that matter. And apart from that, I started drafting a Unicode Core spec for terminals that you can read up here. I'd love you (all) to have a look at it. It clearly defines the semantics on that matter and I wanted to ping the WT team on that spec anyways in order to find a common concensus in the TE land (at least a little) such that toolkits/apps can rely on this new extended behaviour. Feedback on this we do not need to spam here, just use that repo's discussion area or there where also jerch and j4james helped me already on getting this started.

I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from

Yup. I'm doing that here and there - mind, I'm not using the most efficient way for cache keys here either. I'll very soon move to strong cache keys so that I can avoid that double-mapping and to also have simply integer values as keys.

Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?

I know that my code was definitely slow before I started to implement caching. So I started naive, too. Then moved to unordered_map for caching, which is naive and dangerous, and now it's an LRU-cache, much better in terms of resource usage. the key should (as mentioned) by en integer to further improve lookup performance and avoid an extra strcmp inside the hash map impl.

Text shaping is definitely expensive! I chose caching words (delimited by spaces but also by common SGR) because the text shaper cannot deal with mixed font faces in one shape call as I am passing in one font face per shape call (either regular or italic/bold/...), and then walk down the font-fallback chain until a shape call could fully shape the whole word. This result ends up in the cache, so that I do not need to rerun the whole font-fallback queue again and again ... .

I'm asking as the biggest CPU consumption occurs during MapCharacters in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters.

That is exactly the part that is hidden behind the cache. I shape only if I don't have it in cache, and the shape() function does all the fallback logic.

Ok, sorry, all those hidden links point to master instead of explicit commit hash. They may not work for years. ;-( (I may fix that later?)

This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement.

This worries me. (It's possible I'm misunderstanding -- I am jumping between threads!) I know that ideally we'd be doing segmentation earlier, but I don't think we can do it that early². Cursor positioning with regards to the measurable size of a set of codepoints specifically cannot depend on the font in the terminal, right? Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.

¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.

² Under the assumption that the font would be required for segmentation. If it's not, carry on, we should almost certainly have a better text buffer.

³ (Thanks for jumping in to discuss this with us!)

Oh. I believe my misapprehension was that we would need to use the segmented stream as the backing store. "Doing segmentation" while data streams in doesn't mean that we need to impose any requirement on the font that a client application would be able to observe.

Hey; so segmentation is done in multiple stages.

grapheme cluster (GC) segmentation can be done on the VT backend side. (font independant). this is the required here because with that you can reliably associate characters (GC) to grid cells.

glyph run segmentation is done on the renderer side. so after the characters (grapheme clusters) have been aligned into the grid already. this does therefore not even impact VT backend performance and is a step that can be easily cached (tm). (also font independant). This stage of segmentation is required to properly feed the text shaping engine (harfbuzz, DirectWrite, CoreText), see my text-stack-link I posted above :-)

Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.

Luckily this should never depend on font and currently also does not. All the algorithms (I used) are font file independent.

¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.

Yeah would be nice if we can converge here on something. In my TE I do not yet implement the VT sequence proposed in that spec, but the semantics are on by default in my case anyways. I'll be adding this mode soon though - unless you have strong objections against such a thing, then we need to rethink, but I'm pretty certain that is the right way to go. Mind: BiDi is explicitly not addressed. That's another big undefined thing in the TE-land that could be specced at a later point. :)

src/cascadia/TerminalSettingsEditor/Profiles.xaml

lhecker · 2021-10-29T02:14:47Z

src/renderer/dx/CustomTextLayout.cpp

+        const auto& features = _fontRenderData->DefaultFontFeatures();
+#pragma warning(suppress : 26492) // Don't use const_cast to cast away const or volatile (type.3).
+        DWRITE_TYPOGRAPHIC_FEATURES typographicFeatures = { const_cast<DWRITE_FONT_FEATURE*>(features.data()), gsl::narrow<uint32_t>(features.size()) };


I don't understand how this code is passing right now, but this fails the AuditMode for me locally so I was forced to change this code to get this to pass. &featureList[0] isn't valid code for null pointers so that got fixed as well (dereferencing a null pointer to a struct is UB IIRC).

Bonus effect: The std::vector for features isn't copied on every call anymore.
Negative effect: Const-correctness for the DWrite headers really isn't good. DWRITE_TYPOGRAPHIC_FEATURES wants its argument to be a mutable pointer. Had to use const_cast to circumvent that.

Oh well. if it works it works. I'm not that sad about casting away const for APIs that didn't define it.

FontInfoBase and it's descendents are missing noexcept annotations, which virally forces other code to not be noexcept as well during AuditMode checks. Apart from adding noexcept, this commit also * Passes std::wstring_view by reference. * Pass the FillLegacyNameBuffer argument as a simple pointer-to-array, allowing us to fill the buffer with a single memcpy. (gsl::span's iterators inhibit any internal STL optimizations.) * Move operator== declarations inside the class to reduce code size. All other changes are an effect of the virality of noexcept. This is an offshoot from #11623. ## Validation Steps Performed * It still compiles ✔️

lhecker · 2021-10-29T13:56:11Z

src/host/ft_host/CJK_DbcsTests.cpp

-        // expected read is either the size of the buffer or the number of characters remaining, whichever is smaller.
-        DWORD const dwReadExpected = (DWORD)std::min(cbBuffer, cchExpectedText - cchRead);
-


In case you wonder: I can't compile these tests locally otherwise, as the v143 toolchain introduced a few new warnings at /W4.

So do you have to #pragma suppress these? Or... do we just not need dwReadExpected?

The warnings were due to these variables never being used.

src/interactivity/win32/lib/win32.LIB.vcxproj

miniksa · 2021-11-11T22:37:46Z

src/renderer/atlas/AtlasEngine.cpp

+                {
+                    _emplaceGlyph(mappedFontFace.get(), scale, idx + i, idx + i + 1u);
+                }
+            }


I feel like this doesn't save us that much in the majority case because of the regional variations in i/j that ruin it for Cascadia Code, the default font.

Cascadia Mono is the default font isn't it? That one definitely doesn't have the i/j issue and all basic ASCII characters are considered "simple" for that font (which makes it a lot faster than Cascadia Code). 🙂

src/renderer/atlas/AtlasEngine.cpp

miniksa · 2021-11-11T22:40:31Z

src/renderer/atlas/AtlasEngine.cpp

+    // Additionally IDWriteTextAnalyzer::GetGlyphs requires an instance of DWRITE_SCRIPT_ANALYSIS,
+    // which can only be attained by running IDWriteTextAnalyzer::AnalyzeScript first.
+    //
+    // Font fallback with IDWriteFontFallback::MapCharacters is very slow.


Nnnn ok. Now I understand why you want that other segmenting/shaping.

src/renderer/atlas/AtlasEngine.cpp

miniksa · 2021-11-11T22:46:20Z

src/renderer/atlas/AtlasEngine.r.cpp

+    // > IDXGISwapChain::Present: Partial Presentation (using a dirty rects or scroll) is not supported
+    // > for SwapChains created with DXGI_SWAP_EFFECT_DISCARD or DXGI_SWAP_EFFECT_FLIP_DISCARD.
+    // ---> No need to call IDXGISwapChain1::Present1.
+    //      TODO: Would IDXGISwapChain1::Present1 and its dirty rects have benefits for remote desktop?


Solid maybe. Do note that the most recent versions of Remote Desktop have an automatic identification of areas of high motion on the screen and can effectively MPEG-video-streamify what's happening there to the remote client as a subsection of the display. I've successfully played Guild Wars 2 through an RDP session and watched YouTube videos through one at reasonable resolution (720p+). I bet we don't have to worry about this as much as we think we do (or did in the past).

RDP maintainers confirmed poggers.

miniksa · 2021-11-11T22:47:35Z

src/renderer/atlas/AtlasEngine.r.cpp

+
+AtlasEngine::f32x4 AtlasEngine::_getGammaRatios(float gamma) noexcept
+{
+    static constexpr f32x4 gammaIncorrectTargetRatios[13]{


gammaIncorrectTargetRatios

"Incorrect"?

Yeah there's also gamma-correct correction, uhm, somewhere iykwim.
Now you don't start assuming that I know what I'm doing here, alright? I think - and that's a huge, the way Donald calls stuff huge - stretch of my saying "I think"... I think gamma incorrect correction applies to display-native gamma surfaces and gamma correct correction (lol) applies to sRGB surfaces. gammaIncorrectTargetRatios was """definitely""" the table to be used for D3D non-sRGB surfaces. I think. Maybe, but probably.

Okay. I thought you were making a joke on the opposite of "correct" as you explained to me the inverse correction required over chat the other day.

miniksa · 2021-11-11T22:56:13Z

src/renderer/atlas/AtlasEngine.r.cpp

+        // (avoiding either a batch flush or the system maintaining multiple copies of the resource behind the scenes).
+        //
+        // Since our shader only draws whatever is in the atlas, and since we don't replace glyph tiles that are in use,
+        // we can safely (?) tell the GPU that we don't overwrite parts of our atlas that are in use.


safely (?)

the ? made me laugh

Don't you know how people need to print "don't eat the plastic" on food because people would choke on it?
Well the D3D folks didn't do that. I'm choking.

miniksa

I'm not going to hold this back. Overall, this is fantastic and I'm so excited. It's exactly what I hoped for and more. There's still plenty to do, but it's definitely more with-it than DxEngine was when it launched, so I think it's fair to roll with it and improve from here.

Thank you so much for all of this, Leonard!

carlos-zamora

Still have AtlasEngine.*.cpp to review, but I don't think I'll get through them today. So figured I'd post a few comments for now.

src/renderer/atlas/shader_vs.hlsl

src/renderer/atlas/shader_ps.hlsl

carlos-zamora · 2021-11-11T21:55:42Z

src/renderer/atlas/AtlasEngine.h

@@ -0,0 +1,759 @@
+// Copyright (c) Microsoft Corporation.


This header file is huge. Would there be any value/detriment to breaking it up into a few separate header files in src/renderer/atlas? I get that they're all a part of the atlas engine, but it's a lot.

That might also help provide descriptions for each one and make this a bit more clear (as abstracts)? It's just pretty overwhelming and for somebody like me who doesn't know much about the renderer, it's pretty easy to get lost and lose context by the time I get to line 200/759 of this file.

It would definitely be helpful to break it up. I intend to do so in the future when I'm more certain about what's needed and what isn't. The other AtlasEngine is just as messy as the header file and needs a similar treatment.
It would be difficult for me to clean it up at this point already, because it's highly likely that I'll rewrite at least half of the code in the header file (anything related to the glyph cache basically).

DHowett

52/56, obviously deferred the hard ones for ROUND II

DHowett · 2021-11-12T20:14:00Z

src/cascadia/TerminalControl/TermControl.cpp

@@ -1873,36 +1865,47 @@ namespace winrt::Microsoft::Terminal::Control::implementation
        //      The family is only used to determine if the font is truetype or
        //      not, but DX doesn't use that info at all.
        //      The Codepage is additionally not actually used by the DX engine at all.
-        FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontHeight) }, CP_UTF8, false };
+        FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontSize) }, CP_UTF8, false };
        FontInfoDesired desiredFont = { actualFont };

        // Create a DX engine and initialize it with our font and DPI. We'll
        // then use it to measure how much space the requested rows and columns
        // will take up.
        // TODO: MSFT:21254947 - use a static function to do this instead of


I'm moving this MSFT backlog item out to GitHub -- we really should do this.

src/interactivity/win32/lib/win32.LIB.vcxproj

DHowett · 2021-11-12T20:18:23Z

src/renderer/atlas/atlas.vcxproj

+      <HeaderFileOutput>$(OutDir)$(ProjectName)\%(Filename).h</HeaderFileOutput>
+      <TreatWarningAsError>true</TreatWarningAsError>
+      <AdditionalOptions>/Zpc %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)'=='Release'">/O3 /Qstrip_debug /Qstrip_reflect %(AdditionalOptions)</AdditionalOptions>


magic compiler flags -- cause for concern?

They're not that magic at all! For some weird reason neither of these 3 parameters are available as msbuild options.
But they're what we at Microsoft use ourselves in our own sample code. If you search for these inside this GitHub org you'll find dozens if not hundreds of results. 🙂
/O3 is for future compat and enabled all available optimizations and /Qstrip_debug/reflect remove debug/reflection metadata from the bytecode, reducing it's size a lot.

DHowett · 2021-11-12T20:18:51Z

src/renderer/atlas/atlas.vcxproj

+  <ItemDefinitionGroup>
+    <ClCompile>
+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
+      <AdditionalIncludeDirectories>$(OutDir)$(ProjectName)\;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>


FWIW this is only necessary for projects that produce header files as part of the build. does Atlas have that?

Yes, the shaders are compiled into header files at $(OutDir)$(ProjectName)\%(Filename).h.

DHowett · 2021-11-12T20:24:35Z

src/renderer/vt/vtrenderer.hpp

-        [[nodiscard]] HRESULT GetFontSize(_Out_ COORD* const pFontSize) noexcept override;
-        [[nodiscard]] HRESULT IsGlyphWideByFont(const std::wstring_view glyph, _Out_ bool* const pResult) noexcept override;
+        [[nodiscard]] HRESULT GetFontSize(_Out_ COORD* pFontSize) noexcept override;
+        [[nodiscard]] HRESULT IsGlyphWideByFont(std::wstring_view glyph, _Out_ bool* pResult) noexcept override;


I believe these signature changes either (1) do not matter for Bgfx/Wddmcon, or (2) will fail the windows build because we did not update Bgfx/Wddm 😄

A while ago I hacked our project files to build the WDDM one and it worked.
If you're just asking about const-ness of these declarations then those don't matter: #11623 (comment)

DHowett

Header: reviewed.

DHowett · 2021-11-12T20:30:24Z

src/renderer/atlas/AtlasEngine.h

+
+            bool is_inline() const noexcept
+            {
+                return (__builtin_bit_cast(uintptr_t, allocated) & 1) != 0;


Sorry, I don't totally understand. Since this is a union, how do we ensure that the inline value (which shares storage with allocated) doesn't have this bit set in this position?

Okay, I had to think about how this works for a while. This will always hold a T, either internal or allocated, but sometimes that T will have been allocated with extra space at the end.
The T inside this union needs to be complicit - it has to have a member at the end that is effectively flexible but also has the appropriate amount of padding (if it wants).

I still don't understand how we ensure that the inlined value doesn't have this bit set.

SmallObjectOptimizer is pretty much an implementation detail of AtlasKey and AtlasValue and the code doesn't make any sense whatsoever outside of that context.
The reason this works is because both AtlasKeyData as well as AtlasValueData coincidentially don't use the first bit for anything except for flagging whether inlined storage is used (for instance CellFlags::Inlined).

This code doesn't have to be super duper good, because I'm planning to make it a lot more robust and simpler with my custom hashmap implementation, which is required in order to get the LRU behavior we need. It will simultaneously allow us write simpler code here, as we can choose ourselves how to manage our linking behavior. For instance we don't need to allocate AtlasValue on the heap, if we simply use the linked list of the LRU itself to chain multiple atlas positions together.
In either case I expect this code to be rewritten and only the idea to remain, but in an improved form.

DHowett · 2021-11-12T20:39:42Z

src/renderer/atlas/AtlasEngine.h

+            }
+
+            T* initialize(size_t byteSize)
+            {


I believe it's UB to not placement-new a T into the returned space, even if T is_trivially_copyable and has_unique_object_representation.

Oh it totally is! I can't even use malloc() here, because malloc() is UB in C++ in general. P0593R6 should deal with this. Until then we should just assume that the compiler will be reasonable and treat neither malloc() nor operator new() as UB....

DHowett · 2021-11-12T20:41:47Z

src/renderer/atlas/AtlasEngine.h

+            // * Minimum alignment is 4 bytes (like `#pragma pack 4`)
+            // * Members cannot straddle 16 byte boundaries
+            //   This means a structure like {u32; u32; u32; u32x2} would require
+            //   padding so that it is {u32; u32; u32; <4 byte padding>; u32x2}.


can the job here be done with alignas?

It is! The way this code is written, as long as others adhere to this style, it'll produce compilation errors if you violate this warning.

DHowett

LET'S DO THIS.

ghost · 2021-11-13T00:09:56Z

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

jero288

Nice

## Summary of the Pull Request Currently, the TermControl and ControlCore recieve a settings object that implements `IControlSettings`. They use for this for both reading the settings they should use, and also storing some runtime overrides to those settings (namely, `Opacity`). The object they recieve currently is a `T.S.M.TerminalSettings` object, as well as another `TerminalSettings` object if the user wants to have an `unfocusedAppearance`. All these are all hosted in the same process, so everything is fine and dandy. With the upcoming move to having the Terminal split into multiple processes, this will no longer work. If the `ControlCore` in the Content Process is given a pointer to a `TerminalSettings` in a certain Window Process, and that control is subsequently moved to another window, then there's no guarantee that the original `TerminalSettings` object continues to exist. In this scenario, when window 1 is closed, now the Core is unable to read any settings, because the process that owned that object no longer exists. The solution to this issue is to have the `ControlCore`'s own their own copy of the settings they were created with. that way, they can be confident those settings will always exist. Enter `ControlSettings`, a dumb struct for just storing all the contents of the Settings. I used x-macros for this, so that we don't need to copy-paste into this file every time we add a setting. Changing this has all sorts of other fallout effects: * Previewing a scheme/anything is a tad bit more annoying. Before, we could just sneak the previewed scheme into a `TerminalSettings` that lived between the settings we created the control with, and the settings they were actually using, and it would _just work_. Even explaining that here, it sounds like magic, because it was. However, now, the TermControl can't use a layered `TerminalSettings` for the settings anymore. Now we need to actually read out the current color table, and set the whole scheme when we change it. So now there's also a `Microsoft.Terminal.Core.Scheme` _struct_ for holding that data. - Why a `struct`? Because that will go across the process boundary as a blob, rather than as a pointer to an object in the other process. That way we can transit the whole struct from window to core safely. * A TermControl doesn't have a `IControlSettings` at all anymore - it initalizes itself via the settings in the `Core`. This will be useful for tear-out, when we need to have the `TermControl` initialize itself from just a `ControlCore`, without being able to rebuild the settings from scratch. * The `TabTests` that were written under the assumption that the Control had a layered `TerminalSettings` obviously broke, as they were designed to. They've been modified to reflect the new reality. * When we initialize the Control, we give it the settings and the `UnfocusedAppearance` all at once. If we don't give it an `unfocusedAppearance`, it will just use the focused appearance as the unfocused appearance. * The Control no longer can _write_ settings to the `ControlSettings`. We don't want to be storing things in there. Pretty much everything we set in the control, we store somewhere other than in the settings object itself. However, `opacity` and `useAcrylic`, we need to store in a handy new `RUNTIME_SETTING` property. We can write those runtime overrides to those properties. * We no longer store the color scheme for a pane in the persisted state. I'm tracking that in #9800. I don't think it's too hard to add back, but I wanted this in front of eyes sooner than later. ## References * #1256 * #5000 * #9794 has the scheme previewing in it. * #9818 is WAY more possible now. ## PR Checklist * [x] Surprisingly there wasn't ever a card or issue for this one. This was only ever a bullet point in #5000. * A bunch of these issues were fixed along the way, though I never intended to fix them: * [x] Closes #11571 * [x] Closes #11586 * [x] Closes #7219 * [x] Closes #11067 * [x] I think #11623 actually ended up resolving this one, but I'm double tapping on it here: Closes #5703 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated ## Detailed Description of the Pull Request / Additional comments Along the way I tried to clean up code where possible, but not too agressively. I didn't end up converting the various `MockTerminalSettings` classes used in tests to the x macros quite yet. I wanted to merge this with #11416 in `main` before I went too crazy. ## Validation Steps Performed * [x] Scheme previewing works * [x] Adjusting the font size works * [x] focused/unfocused appearances still work * [x] mouse-wheeling opacity still works * [x] acrylic & cleartype still does the right thing * [x] saving the settings still works * [x] going wild on sliding the opacity slider in the settings doesn't crash the terminal * [x] toggling retro effects with a keybinding still works * [x] toggling retro effects with the command palette works * [x] The matrix of (`useAcrylic(true,false)`)x(`opacity(50,100)`)x(`antialiasingMode(cleartype, grayscale)`) works as expected. Slightly changed, falls back to grayscale more often, but looks more right.

lhecker force-pushed the dev/lhecker/atlas-engine branch from bc106c5 to 8c4c708 Compare October 27, 2021 02:08

Introduce AtlasEngine - A new text rendering prototype

602714c

lhecker force-pushed the dev/lhecker/atlas-engine branch from 8c4c708 to 602714c Compare October 27, 2021 02:14

zadjii-msft reviewed Oct 27, 2021

View reviewed changes

DHowett reviewed Oct 27, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

Merge remote-tracking branch 'origin/main' into dev/lhecker/atlas-engine

8aef31b

lhecker mentioned this pull request Oct 27, 2021

Improve conhost CPU usage during text selection #11634

Merged

This comment has been minimized.

Sign in to view

skyline75489 reviewed Oct 28, 2021

View reviewed changes

lhecker mentioned this pull request Oct 28, 2021

Add noexcept to all FontInfo structs #11640

Merged

lhecker commented Oct 28, 2021

View reviewed changes

src/inc/operators.hpp Outdated Show resolved Hide resolved

christianparpart reviewed Oct 28, 2021

View reviewed changes

DHowett reviewed Oct 28, 2021

View reviewed changes

src/cascadia/TerminalSettingsEditor/Profiles.xaml Outdated Show resolved Hide resolved

Fix build errors, fix crashes, address comments

c1eccc9

This comment has been minimized.

Sign in to view

lhecker added 2 commits October 29, 2021 03:40

Fix spelling, address comments

753f6bd

Fix build errors

d995f6d

lhecker commented Oct 29, 2021

View reviewed changes

lhecker added 2 commits October 29, 2021 14:47

Fix build errors

6fa6190

Fix build errors

ecc139b

lhecker commented Oct 29, 2021

View reviewed changes

lhecker added 2 commits October 29, 2021 16:14

Merge remote-tracking branch 'origin/main' into dev/lhecker/atlas-engine

3b6d786

Scaling fix for SwapChainPanel, Emoji alpha blending, All cursor types

0823eee

miniksa reviewed Nov 11, 2021

View reviewed changes

src/renderer/atlas/AtlasEngine.cpp Show resolved Hide resolved

miniksa reviewed Nov 11, 2021

View reviewed changes

src/renderer/atlas/AtlasEngine.cpp Show resolved Hide resolved

miniksa reviewed Nov 11, 2021

View reviewed changes

miniksa previously approved these changes Nov 11, 2021

View reviewed changes

carlos-zamora reviewed Nov 11, 2021

View reviewed changes

Addressed feedback, Fixed font fallback replacement

254f118

lhecker dismissed miniksa’s stale review via 254f118 November 12, 2021 02:08

miniksa previously approved these changes Nov 12, 2021

View reviewed changes

Fixed font scaling after swap chain recreation

07e2ee1

lhecker dismissed miniksa’s stale review via 07e2ee1 November 12, 2021 19:01

miniksa approved these changes Nov 12, 2021

View reviewed changes

DHowett reviewed Nov 12, 2021

View reviewed changes

DHowett reviewed Nov 13, 2021

View reviewed changes

DHowett approved these changes Nov 13, 2021

View reviewed changes

DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Nov 13, 2021

ghost merged commit 2353349 into main Nov 13, 2021

ghost deleted the dev/lhecker/atlas-engine branch November 13, 2021 00:10

zadjii-msft mentioned this pull request Nov 16, 2021

Change the ControlCore layer to own a copy of its settings #11619

Merged

18 tasks

jero288 reviewed Nov 17, 2021

View reviewed changes

migueldeicaza mentioned this pull request Jan 2, 2022

Metal renderer migueldeicaza/SwiftTerm#202

Open

lhecker added the Area-AtlasEngine label Jan 26, 2022

cinnamon-msft mentioned this pull request Jan 31, 2022

experimental.useAtlasEngine missing from JSON schema #12302

Closed

lhecker mentioned this pull request Mar 24, 2024

Implement grapheme clusters #16916

Merged

This pull request was closed.

		// expected read is either the size of the buffer or the number of characters remaining, whichever is smaller.
		DWORD const dwReadExpected = (DWORD)std::min(cbBuffer, cchExpectedText - cchRead);

Introduce AtlasEngine - A new text rendering prototype #11623

Introduce AtlasEngine - A new text rendering prototype #11623

Conversation

lhecker commented Oct 27, 2021 • edited Loading

zadjii-msft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Oct 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DHowett commented Oct 27, 2021

zadjii-msft commented Oct 27, 2021

This comment has been minimized.

This comment has been minimized.

skyline75489 commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christianparpart Oct 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DHowett Oct 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christianparpart Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

lhecker Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Nov 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker Nov 12, 2021 • edited by DHowett Loading

Choose a reason for hiding this comment

miniksa left a comment

Choose a reason for hiding this comment

carlos-zamora left a comment

lhecker commented Oct 27, 2021 •

edited

Loading

lhecker Oct 27, 2021 •

edited

Loading

lhecker Oct 27, 2021 •

edited

Loading

lhecker Oct 31, 2021 •

edited

Loading

christianparpart Oct 28, 2021 •

edited

Loading

DHowett Oct 28, 2021 •

edited

Loading

christianparpart Oct 29, 2021 •

edited

Loading

lhecker Oct 29, 2021 •

edited

Loading

lhecker Nov 10, 2021 •

edited

Loading

lhecker Nov 12, 2021 •

edited

Loading

lhecker Nov 12, 2021 •

edited by DHowett

Loading