Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce breaking changes to ReadConsoleOutput #13321

Merged
4 commits merged into from
Jul 22, 2022

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Jun 18, 2022

#8000 will change the way we store text from a strict grid/matrix where
one UTF16 character or surrogate pair always equals 1 column with the
possibility of joining exactly 2 to a wide character pair, to a dynamic
buffer where 1 or more characters can form 1 or more columns in any
arbitrary combination. Our long term goal is to properly support both
complex grapheme clusters like Emojis and complex ligatures that a wider
than 2 columns. This change requires us to break our API as
ReadConsoleOutputA/W assumes the existence of exactly this grid/matrix
storage. Since we store wide characters like "い" as a single codepoint
that is simply marked as being 2 columns wide in the future, we cannot
reconstruct trailing DBCS characters that were written to the buffer
like we used to. On the other hand this new behavior allows us to
implement better Unicode support and most likely significantly improve
our performance.

Minor breaking changes

  • ReadConsoleOutputA will now always zero the high byte in
    (CHAR_INFO).Char.UnicodeChar. Only the .AsciiChar can be used
    then. This prevents users from storing "additional" data in the
    terminal buffer.
  • ReadConsoleOutputA will now zero the .AsciiChar if it fails to
    convert the Unicode character into an appropriate DBCS.
    • Example: It's possible to write "い" into a narrow column despite
      being a wide character. In these cases WriteConsoleOutputA will
      now return 0x00 instead of 0x44 (the lower half of い's code
      point 0x3044).

Major breaking changes

  • ReadConsoleOutputW will now repeat the leading Unicode character
    twice and ignore the trailing one.
    • Example 1: Writing the pair 0x3044 0xabcd with
      WriteConsoleOutputW used to yield the same 0x3044 0xabcd if read
      back with ReadConsoleOutputW. This worked because conhost
      effectively ignored the trailing codepoint, allowing one to
      "smuggle" data. In the future this trailing character will be
      discarded and produce 0x3044 0x3044 instead.
    • Example 2: Writing い with WriteConsoleOutputA can be done with
      code page 932 (Shift-JIS) and the DBCS 0x82 0xa2. If read back
      with ReadConsoleOutputW this would previously yield the two
      Unicode characters 0x3044 0xffff. After this commit it'll yield
      0x3044 0x3044.

Alternative approaches

It's possible to "tag"/"mark" written data as originating from
WriteConsoleOutputA/W so that it can be reconstructed accurately later
on. However this lead to implementation complexities that we're actively
trying to avoid in the new buffer implementation. Effectively
everything that touches the buffer's text would have to handle these
marks and either write or clear them. Given the most likely small amount
of users who depend on the current quirky behavior, it'd be an
unwarranted maintenance and performance burden and prevent Windows
Terminal to ever truly migrate to full Unicode support.

Validation Steps Performed

  • Adjusted feature tests complete successfully ✅

@zadjii-msft zadjii-msft added this to the Terminal v1.16 milestone Jun 20, 2022
@lhecker lhecker force-pushed the dev/lhecker/8000-more-tests branch from 714dbf9 to f1eb06a Compare July 2, 2022 14:17
@lhecker lhecker force-pushed the dev/lhecker/8000-breaking-changes branch from d74ced1 to 444d44f Compare July 2, 2022 14:17
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, from what I remember about the discussion in #13339, the tl;dr was "this never worked right, so changing what it does in the broken case shouldn't make anyone more broke", then

:ship_it:

// If .AsciiChar and .UnicodeChar have the same offset (since they're a union),
// we can just write the latter with a byte-sized value to set the former
// _and_ simultaneously clear the upper byte of .UnicodeChar to 0. Nice!
static_assert(offsetof(CHAR_INFO, Char.AsciiChar) == offsetof(CHAR_INFO, Char.UnicodeChar));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might just be me being obtuse, but it seems weird to me that this static_assert is inside both loops, and not like, at the top of the method, or just at the top of the file or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other assertion is in the DBCS test file unfortunately. I felt like keeping those assertions close to the code is better since there are alternative ways to write this code (for instance assigning UnicodeChar and AsciiChar independently).

@ghost ghost deleted the branch main July 11, 2022 22:03
@ghost ghost closed this Jul 11, 2022
@lhecker lhecker reopened this Jul 11, 2022
Base automatically changed from dev/lhecker/8000-more-tests to main July 11, 2022 22:04
@lhecker lhecker marked this pull request as ready for review July 15, 2022 20:34
@lhecker
Copy link
Member Author

lhecker commented Jul 15, 2022

@msftbot do not merge unless @DHowett approves

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 15, 2022
@ghost
Copy link

ghost commented Jul 15, 2022

Hello @lhecker!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @DHowett

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@lhecker lhecker added Product-Conhost For issues in the Console codebase Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. Area-Quality Stability, Performance, Etc. this-will-be-a-breaking-change and removed Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. labels Jul 16, 2022
@ghost ghost merged commit 8a26f14 into main Jul 22, 2022
@ghost ghost deleted the dev/lhecker/8000-breaking-changes branch July 22, 2022 23:02
@ghost
Copy link

ghost commented Sep 13, 2022

🎉Windows Terminal Preview v1.16.252 has been released which incorporates this pull request.:tada:

Handy links:

DHowett pushed a commit that referenced this pull request Feb 15, 2023
#13626 contains a small "regression" compared to #13321:
It now began to store trailers in the buffer wherever possible to allow
a region
of the buffer to be backed up and restored via Read/WriteConsoleOutput.
But we're unfortunately still ill-equipped to handle anything but UCS-2
via
WriteConsoleOutput, so it's best to again ignore trailers just like in
#13321.

## Validation Steps Performed
* Added unit test ✅
DHowett pushed a commit that referenced this pull request Mar 31, 2023
#13626 contains a small "regression" compared to #13321:
It now began to store trailers in the buffer wherever possible to allow
a region
of the buffer to be backed up and restored via Read/WriteConsoleOutput.
But we're unfortunately still ill-equipped to handle anything but UCS-2
via
WriteConsoleOutput, so it's best to again ignore trailers just like in
#13321.

## Validation Steps Performed
* Added unit test ✅

(cherry picked from commit 9dcdcac)
Service-Card-Id: 88719297
Service-Version: 1.17
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Quality Stability, Performance, Etc. AutoMerge Marked for automatic merge by the bot when requirements are met Product-Conhost For issues in the Console codebase this-will-be-a-breaking-change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants