Introduce breaking changes to ReadConsoleOutput #13321

lhecker · 2022-06-18T00:26:04Z

#8000 will change the way we store text from a strict grid/matrix where
one UTF16 character or surrogate pair always equals 1 column with the
possibility of joining exactly 2 to a wide character pair, to a dynamic
buffer where 1 or more characters can form 1 or more columns in any
arbitrary combination. Our long term goal is to properly support both
complex grapheme clusters like Emojis and complex ligatures that a wider
than 2 columns. This change requires us to break our API as
ReadConsoleOutputA/W assumes the existence of exactly this grid/matrix
storage. Since we store wide characters like "い" as a single codepoint
that is simply marked as being 2 columns wide in the future, we cannot
reconstruct trailing DBCS characters that were written to the buffer
like we used to. On the other hand this new behavior allows us to
implement better Unicode support and most likely significantly improve
our performance.

Minor breaking changes

ReadConsoleOutputA will now always zero the high byte in
(CHAR_INFO).Char.UnicodeChar. Only the .AsciiChar can be used
then. This prevents users from storing "additional" data in the
terminal buffer.
ReadConsoleOutputA will now zero the .AsciiChar if it fails to
convert the Unicode character into an appropriate DBCS.
- Example: It's possible to write "い" into a narrow column despite
  being a wide character. In these cases WriteConsoleOutputA will
  now return 0x00 instead of 0x44 (the lower half of い's code
  point 0x3044).

Major breaking changes

ReadConsoleOutputW will now repeat the leading Unicode character
twice and ignore the trailing one.
- Example 1: Writing the pair 0x3044 0xabcd with
  WriteConsoleOutputW used to yield the same 0x3044 0xabcd if read
  back with ReadConsoleOutputW. This worked because conhost
  effectively ignored the trailing codepoint, allowing one to
  "smuggle" data. In the future this trailing character will be
  discarded and produce 0x3044 0x3044 instead.
- Example 2: Writing い with WriteConsoleOutputA can be done with
  code page 932 (Shift-JIS) and the DBCS 0x82 0xa2. If read back
  with ReadConsoleOutputW this would previously yield the two
  Unicode characters 0x3044 0xffff. After this commit it'll yield
  0x3044 0x3044.

Alternative approaches

It's possible to "tag"/"mark" written data as originating from
WriteConsoleOutputA/W so that it can be reconstructed accurately later
on. However this lead to implementation complexities that we're actively
trying to avoid in the new buffer implementation. Effectively
everything that touches the buffer's text would have to handle these
marks and either write or clear them. Given the most likely small amount
of users who depend on the current quirky behavior, it'd be an
unwarranted maintenance and performance burden and prevent Windows
Terminal to ever truly migrate to full Unicode support.

Validation Steps Performed

Adjusted feature tests complete successfully ✅

zadjii-msft

So, from what I remember about the discussion in #13339, the tl;dr was "this never worked right, so changing what it does in the broken case shouldn't make anyone more broke", then

:ship_it:

zadjii-msft · 2022-07-11T12:13:02Z

src/host/directio.cpp

+                // If .AsciiChar and .UnicodeChar have the same offset (since they're a union),
+                // we can just write the latter with a byte-sized value to set the former
+                // _and_ simultaneously clear the upper byte of .UnicodeChar to 0. Nice!
+                static_assert(offsetof(CHAR_INFO, Char.AsciiChar) == offsetof(CHAR_INFO, Char.UnicodeChar));


This might just be me being obtuse, but it seems weird to me that this static_assert is inside both loops, and not like, at the top of the method, or just at the top of the file or something.

The other assertion is in the DBCS test file unfortunately. I felt like keeping those assertions close to the code is better since there are alternative ways to write this code (for instance assigning UnicodeChar and AsciiChar independently).

…king-changes

lhecker · 2022-07-15T20:34:59Z

@msftbot do not merge unless @DHowett approves

ghost · 2022-07-15T20:35:06Z

Hello @lhecker!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

I'll only merge this pull request if it's approved by @DHowett

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

ghost · 2022-09-13T17:04:53Z

🎉Windows Terminal Preview v1.16.252 has been released which incorporates this pull request.:tada:

Handy links:

#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅

#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅ (cherry picked from commit 9dcdcac) Service-Card-Id: 88719297 Service-Version: 1.17

zadjii-msft added this to the Terminal v1.16 milestone Jun 20, 2022

Extend test coverage of TestDbcsWriteRead

f1eb06a

lhecker force-pushed the dev/lhecker/8000-more-tests branch from 714dbf9 to f1eb06a Compare July 2, 2022 14:17

Introduce breaking changes to ReadConsoleOutput

444d44f

lhecker force-pushed the dev/lhecker/8000-breaking-changes branch from d74ced1 to 444d44f Compare July 2, 2022 14:17

zadjii-msft approved these changes Jul 11, 2022

View reviewed changes

ghost deleted the branch main July 11, 2022 22:03

ghost closed this Jul 11, 2022

lhecker reopened this Jul 11, 2022

Base automatically changed from dev/lhecker/8000-more-tests to main July 11, 2022 22:04

lhecker added 2 commits July 12, 2022 00:07

Merge remote-tracking branch 'origin/main' into dev/lhecker/8000-brea…

e1ab2cd

…king-changes

Fix formatting

85b1868

lhecker marked this pull request as ready for review July 15, 2022 20:34

ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 15, 2022

DHowett approved these changes Jul 22, 2022

View reviewed changes

ghost merged commit 8a26f14 into main Jul 22, 2022

ghost deleted the dev/lhecker/8000-breaking-changes branch July 22, 2022 23:02

zadjii-msft mentioned this pull request Jan 13, 2023

Rendering errors in tmux split panes #6987

Closed

lhecker mentioned this pull request Feb 13, 2023

Ignore CHAR_INFO trailers during WriteConsoleOutput #14840

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce breaking changes to ReadConsoleOutput #13321

Introduce breaking changes to ReadConsoleOutput #13321

lhecker commented Jun 18, 2022 •

edited by DHowett

Loading

zadjii-msft left a comment

zadjii-msft Jul 11, 2022

lhecker Jul 11, 2022

lhecker commented Jul 15, 2022

ghost commented Jul 15, 2022

ghost commented Sep 13, 2022

Introduce breaking changes to ReadConsoleOutput #13321

Introduce breaking changes to ReadConsoleOutput #13321

Conversation

lhecker commented Jun 18, 2022 • edited by DHowett Loading

Minor breaking changes

Major breaking changes

Alternative approaches

Validation Steps Performed

zadjii-msft left a comment

Choose a reason for hiding this comment

zadjii-msft Jul 11, 2022

Choose a reason for hiding this comment

lhecker Jul 11, 2022

Choose a reason for hiding this comment

lhecker commented Jul 15, 2022

ghost commented Jul 15, 2022

ghost commented Sep 13, 2022

lhecker commented Jun 18, 2022 •

edited by DHowett

Loading