-
-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra spacing after emoji variants #1978
Comments
First of, dont use them in a shell. Shells are full of unicode handling bugs. Instead run cat and see if the issue reproduces there, if so let me know and I will take a look. |
I copied the unicode sequence from here: https://emojipedia.org/rainbow-flag/ First cat is just that emoji in a text file ten times. nospace.txt:
with-space.txt:
|
I think that the emoji are making this problem seem harder than it is. It is at its core an error in ligature handling. Consider Nimbus Mono PS, or my font TT2020 Style B (which is how I first knew about this problem): (Note: This will only work for Nimbus Mono PS post–d250555, because I removed the specific processing for it now that we have Kitty renders ligatures across multiple cells according to the |
There is no wcswidth of the completed form, since in general the |
Indeed, I agree fully. My "fix" would be just better spacing—the extra space wouldn't look so bad if it was just factored into the whole word. We already have ability to Thought? |
I dont think that's possible. In kitty character images per cell (or |
Could not an extra step be added right at the end, to look for badly spaced ligatures, and fix them? |
There is no end. Rendering works by mapping each cell to a number. That |
Idea for you @kovidgoyal, and I'm sorry, it might be a stupid one. Maybe so stupid you don't even want to reply, which I will understand and not hold against you. But in case you haven't thought of this already: The OSWindow contains one or more Windows, which are made of a Screen, which contains cells. Each cell is literally a quad built out of an array of OpenGL vertices, known internally as Let's say we have the word affiliate, and the following sprites: «a f_f_i i l t e». Laid out it would look something like:
We know that when we rendered But why we can't bump the vertices around to make things look a little nicer, when cursor not over text? (In formal terms, assuming a quad of four vertices Q0…Q3, add a bump factor b to the x axis of all four vertices.) As I see it there are two possible ways to bump the vertices: either you bump them to remove the space where it is (“bump style A”), and the bump is always expressible in the form b × cell_width, or you bump cells according to a formula (“bump style B”). Assuming bump style A, the layout appears as:
Cells 0 and 1 go unbumped, cells 2 and 3 are bumped to the end, every other cell until the end of the word is bumped backwards one cell. Bump style B is not expressable in monospace terms, because it works by breaking the grid; cells are bumped in fractions of a cell. It assumes bump style A has already been applied. So, the absolute value of So all cells which were not bumped to the end need to be bumped one cell:
We can, and ought to I think, also include cell 9 in our calculation, after all, it is visible space. So, really, S=3 and we need to bump each cell one and one half cells. Is this overkill and am I overthinking it? Could very well be. But if cells can be nudged around, don't see why it can't work. |
Rendering happens per cell, there is no vertex data for a cell. The You could of course have a separate array to displace cells that you |
I think this should be closed then. Improved spacing is really the best solution to this problem. |
The general problem of ligatures certainly. This particular issue however, might be solved by improving wcswidth calculations for flags. I have to look into it someday when i have time. |
No @kovidgoyal, not without making the flag itself have a widechar width of 0, but 🏳 on its own has a widechar width of 2. Using the word "affiliate" is just an easier way to think about the general problem because all its components have a widechar width of 1, instead of 2-0-0-2 as in the rainbow flag ZWJ sequence. |
yes the idea would be for wcswidth to know about flags sequences. I dont know how feasible that is, will have to see. |
It's not feasible if wcswidth(S) must always equal sum(wcwidth(C) for C in S); changing that will certainly break applications...wcwidth itself was always a hack :-) |
wcswidth is defintely not always equal to sum wcwidth() if it were then emoji variation selectors would not work. See the kitty implementation of wcswidth() in screen.c. The only reason for wcswidth to exist at all is that it is not in general equal to sum(wcwidth()) |
Sorry to say, that if your internal implementation does not match that of glibc's, (which works how I explain,) the mismatch will lead to subtle rendering errors (unless Kitty somehow forces client applications to use its internal wcswidth? But anyway, most programmers familiar with the glibc implementation consider them always equal and wcswidth to be a mere convenience function, so I don't understand how this can work) |
And yet it does. Pretty much all advanced terminal applications use their own wcswidth() implementations, precisely because glibc's is a broken umm POS. glibc is not the canonical source for how to calculate widths, the unicode standard is. And kitty's wc(s)width is autogenerated from the unicode standard. Indeed using the system libc's wcwidth() is fundamentally a bad idea because it can be arbitrarily old and broken. Not to mention it can vary between systems when you ssh. Any serious terminal application needs to use a standards based implementation. This has all been discussed before, ad nauseum, search this issue tracker of wcwidth |
Interesting :-) Well in that case, yes, to my knowledge all emoji ZWJ sequences have a visual wcswidth of 2, but a glibc wcswidth of 4 or more. I'm sorry to waste your time with repeated discussions |
No worries, and yes looking into modifying wcswidth for ZWJ/flags is why this issue remains open. |
Hi - is this the same issue I'm seeing when I do For some reason extra spaces are added after the sun emojis. I've tried this in a few other terminals (gnome-terminal, alacritty, st) and they all render this as expected. Also tried multiple different emoji fonts with the same result. This only seems to happen in kitty. |
@jsravn: No, there is no zero width joiner or flags in that output. The reason you're seing extra spaces after the sun emojis is that the output actually contains extra spaces. This is probably done because most other terminals wrongly consider the sun emoji to be only one character wide, so that service has added spaces until it aligns in those terminals. You can verify this by running this command:
The sun emoji is the two escaped characters, and as you see, I've added a space between the emoji and the text. In kitty this renders as sun-emoji, space, text, which is the correct rendering. In all other terminals I've tried, except qterminal, you don't see the space. If you remove the space and run `echo -e "\u2600\ufe0fSunny", you can see that the S appears on top of the sun emoji in the terminals you mention. |
I don't understand anything of the discussion about the low level implementation for rendering so I won't tell you how to do it but I would just say it's possible to implement ZWJ support correctly, an example is Virtual Studio Code or Firefox. I'm working on Unicode research, writing my code full of weird Unicode sequences in VScode is not problem but as soon as I need to copy stuff in a language interpreter (in the terminal with kitty) it ends up being a nightmare. Eg. pasting 👩❤️👨 Kitty Alacrity Foot Hyper it's also the same for Konsole, Tabby, Qterminal, Darktile, Extraterm etc. I found the reason why most TE fails to implement ZJW support even when they claim full Unicode support: https://github.com/nmeum/saneterm#motivation
saneterm was the only TE that was rendering grapheme with ZJW correctly but is just a PoC and not really a daily usable TE. So in the end maybe a violent breaking change is required to take a similar approach or closing this issue as it will never be possible to implement this in an escape sequence based terminal. |
This problem is not caused by the terminal emulator, but by the shell you're using. Run e.g. cat first and paste the emoji and you'll see that it's rendered correctly, but has some extra spacing after it. That extra spacing is what this issue is about. A terminal emulator that does implement ZWJ sequences correctly and uses the correct width is foot.
The issue described there is not why kitty uses the wrong width for ZWJ sequences. What is described there also applies for emojis using variant selectors, and kitty supports that.
I don't think that is required, but the terminal emulator and the terminal application has to agree on the display widths of grapheme clusters which there currently isn't any way to do. What this issue is about (extra spacing) is possible to fix without that though, but it would make the problem worse for TUIs that don't use the correct display widths (which is most). |
It's what I feared, my previous message was offtopic.
Do you know a shell with decent Unicode support? Even a non-POSIX one. |
I've only tried zsh, bash and briefly fish. Of those, only zsh has the problem you describe. In both bash and fish 👩❤️👨 is rendered correctly, but you may encounter some other problems. In bash you get problems when it tries to change the background color (which happens on paste) and if you move the cursor before the emoji. With fish in kitty it seems to work fine, since fish seems to use the same widths as kitty. You get problems in foot though, because that considers the emoji to be 2 characters wide rather than 6. |
While this screenshot illustrate that 👩❤️👨 can be displayed on foot terminal and also by fish and bash command line shell but not by zsh... ... there is still an issue with kitty rendering ZJW (outside the extra spacing). Kitty can display the ZJW sequence but as soon as you hit the spacebar or enter key the grapheme is decomposed into its code points. The following video was record with bash and there is a comparison with foot. ZJW-kitty.webm |
@noraj, What you see there is what your shell is making out of this paste. When you paste a byte sequence into your terminal, this will be fed into your terminal applications standard input, which is ZSH in your case. And ZSH cannot handle grapheme clusters, but tries to be clever by rendering ZWJ / unknown or unprintable (to ZSH's knowledge) Unicode codepoints in the form of On the other hand, most likely they won't fix it, because most terminals itself can't handle grapheme clusters. So far I only know one (you mentioned |
As I put in my previous message the video is recorded with bash not zsh. |
No, this is one of the issues I mentioned you'll encounter with bash. Try with
Nowadays, several terminal emulators more or less implement this. At least kitty, foot, wezterm, Konsole and as you mentioned contour. |
@noraj i apologize for not being specific. i was referring to your comment here: #1978 (comment) :) |
Ok @trygveaa, I apologize once more time for not having understood the issue correctly.
@christianparpart They have no issue tracker, just an email address, I got an answer but I feel the person didn't understand the issue.
There are maybe more, all my example where maybe wrong because the issue was zsh and not the TE themselves. |
Variant selectors are handled correctly in kitty. As for ZWJ, that is on my TODO list, see #3810 |
I ran across this with the "rainbow flag" emoji, which is in hex =>
\u1f3f3\ufe0f\u200d\u1f308
, or "waving white flag", "variant selector", "zero-width joiner" and "rainbow". However I input it, either by copy/pasting it or using the ctrl-shift-u kitty unicode input, it always renders with extra space afterwards:If I try to print several of them in a row, it prints only a few of them, with really wide spacing. If I add an ascii space between each, then it prints all of them, and closer together.
The text was updated successfully, but these errors were encountered: