-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConPTY mangles U+1F600 to U+FFFD #2770
Comments
Sorry, does this fail in Windows Terminal? If so: we're seeing it working here. What shell are you using? Windows PowerShell and CMD have very poor support for high-unicode input, whereas PowerShell Core supports it handily. We made some fixes to ConPTY that aren't in 19H1, but are in Terminal, which could explain why Alacritty is having trouble. |
This is failing for me in both Windows Terminal and Alacritty-with-ConPTY, with both cmd and WSL bash as the shell, running both Python (e.g. Command Prompt and Alacritty-with-WinPTY are not experiencing this problem. |
I repro'd this against master today with an Ubuntu tab inside Windows Terminal, the WinCompose utility emitting the U+1F600 on Right Alt+Left Shift+;+D I've traced through the Windows Terminal project and the U+1F600 character is landing as 0xF0 0x9F 0x98 0x80 (this is the valid UTF-8 per http://www.fileformat.info/info/unicode/char/1f600/index.htm) in the call to send it into PTY's input channel. Next step is to attach to the underlying PTY and see how it might be losing it. |
OK. This goes off the rails here: terminal/src/terminal/adapter/InteractDispatch.cpp Lines 60 to 86 in 0c8a4df
In InteractDispatch, we appear to be feeding each piece of the surrogate-pair UTF-16 sequence in one wchar_t at a time. This heads into terminal/src/types/convert.cpp Lines 135 to 168 in 0c8a4df
At this point, we're identifying these things as invalid keys and synthesizing numpad events instead. But there's not really a valid numpad equivalent to half of a surrogate pair ( terminal/src/types/convert.cpp Line 160 in 0c8a4df
We did identify above on the terminal/src/types/convert.cpp Line 147 in 0c8a4df
CharType has been filled with 0x800 at this point, which does map to C3_HIGHSURROGATE .
The conversion here ends up becoming effectively just the equivalent input of the ALT button going down and up. It then does the same thing for the low surrogate half. This then travels for a bit until it hits terminal/src/host/inputBuffer.cpp Lines 598 to 673 in 0c8a4df
None of the code finds an appropriate mapping for the ALT so it is just stored into the input buffer. The input buffer alerts waiting readers that data is available. The WSL driver calls |
Typing the 😀 character inserts this into the buffer: Pasting the 😀 character inserts this into the buffer: Typing the 😂 character inserts this into the buffer: Pasting the 😂 character inserts this into the buffer: 😀 does not work. So it appears that the buffer state is consistent between typing and pasting, ruling that out. Now I suspect the WSL TTY code. I'll look there next. |
Huh, does look like something related to WSL TTY code now. I had incorrectly assumed that running cmd.exe was equivalent to running cmd.exe from inside WSL, but it turns out not to be. Shows how poor a grasp I have on the moving parts!
|
@benhillis, this is For a The 😀↑ wch:0xd83d '☐' mod:None (0x00000000) repeat:0x0001 vk:0x0012 vsc:0x0038
↑ wch:0xde00 '☐' mod:None (0x00000000) repeat:0x0001 vk:0x0012 vsc:0x0038
😂↑ wch:0xd83d '☐' mod:None (0x00000000) repeat:0x0001 vk:0x0012 vsc:0x0038
↑ wch:0xde02 '☐' mod:None (0x00000000) repeat:0x0001 vk:0x0012 vsc:0x0038
For the 😀 case, the high surrogate passes the check and is appended to the TTY input. The low surrogate fails and is not appended. Then For the 😂 case, the high and low surrogate pass the check, are appended tot he TTY input, and then converted successfully to UTF8 and forwarded into the WSL instance. So overall, it looks like the |
Now tracking internally as MSFT:23541483 |
The fix for this just went out with WSL in insiders’ build 19002! |
@DHowett - Wow you beat me to it, was planning on looping back here to close this :) |
@benhillis y'can't beat a guy on the train to the airport with nothing to do except check twitter! 😁 |
Environment
Windows build number: 10.0.18362.356
Windows Terminal version: 0.4.2382.0
Also reproducible with Alacritty on alacritty/alacritty#2438 with its ConPTY backend (
enable_experimental_conpty_backend: true
), and not with its WinPTY backend.Steps to reproduce
Type U+1F600. (Ways of achieving this are discussed below.)
Expected behavior
U+1F600,
😀
, should be the input.Actual behavior
U+FFFD, REPLACEMENT CHARACTER, gets sent through instead.
This doesn’t affect most characters. This is the only such character that I regularly use that is affected. U+1F600 and U+1F700 are two examples of affected characters, while U+1F5FF and U+1F601 are examples of characters that are not affected.
How to reproduce if you’re not sure
If you don’t have an IME that lets you insert such characters, you can install WinCompose and type Compose1f600Enter, or put a line like this into %USERPROFIEL%\.XCompose:
… which will allow you to use Compose:D.
To inspect what’s been sent, I like to use Vim with its
ga
command.The text was updated successfully, but these errors were encountered: