-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CHAR_INFO munging for raster fonts #17681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you note it in https://github.com/microsoft/terminal/wiki/Console:-Potential-Breaking-Changes
Love this cleanup work. Thanks!
{ | ||
// When written with WriteConsoleOutputW and read back with ReadConsoleOutputW when the font is Raster, | ||
// we will get a deduplicated set of Unicode characters with no lead/trailing markings and space padded at the end... | ||
// ... except something weird happens with truncation (TODO figure out what) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes
Given that I was wrong in my initial PR description, I've done some more & better due diligence: This essentially uses WriteConsole as a test. The test string is:
This basically shows what I meant with that "munging" was removed from anything but these two I'm considering writing some additional tests to run on each version, but merely looking at the code I know that any version past the first one lacks support for the Unicode "falsification" and it's certainly not in our main branch right now either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
RealUnicodeToFalseUnicode
was described as:In other words, it takes a UCS2 string, translates it to the current
codepage and translates it back to UCS2 in the US version of Windows.
In the "eastern" DBCS version it "reinterprets" the DBCS string as
CP_USA
(a particularly weird quirk).The original implementation used to do this translation at every
opportunity where text went into or out of conhost.
The translation was weird, but it was consistent.
In Windows 10 RS1 conhost got a new UCS2-aware text buffer and
this translation was removed from most places, as the text buffer
was converted to store proper UCS2. This broke the entire concept
of the translation though. Whatever data you previously wrote with
something like
WriteConsoleOutputCharacter
now came back withsomething entirely else via
ReadConsoleOutput
.In other words, I believe past RS1 there was technically never any
point in "munging"
CHAR_INFO
s, as this only covered 2 API functions.Still, this does mean that this PR represents an API breaking change.
It's a minor one though, because it only affects 2 API functions.
And more importantly, it's a necessary breaking change as we move
further and further away from correlating codepoint and column counts.
Validation Steps Performed