-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace go-runewidth with uniseg #73
Comments
Pull request #71 addresses this issue. As mentioned in the pull request, there are performance regressions we need to look into. This may be a case of increased complexity means more CPU cycles and we may never be able to achieve the original performance, however this needs to be thoroughly investigated before we accept these performance reductions. |
@muesli Any guidance you can provide on how we should tackle the breaking nature of these changes? @rivo Seeking any help you can provide in helping us track down the performance issues. @meowgorithm This will have a ripple effect on many of the other |
I believe all the work for this task is complete and we just need some review of the pull request. Hoping @muesli will be able to provide his feedback to help move this work closer to being merged. |
@muesli Any further thoughts regarding this? All direct references to Heads up for @meowgorithm that we are making serious progress in dealing with emoji and unicode alignment issues. Nearly there 🤞 |
Nearly there indeed! I'm currently testing your changes with a bunch of "historically" problematic (edge) cases and in various terminals. Thanks for the all the effort you put into this. |
Is there anything I can do to help? I realise there is still a lot more work to do to get the Charm suite of tools working better with Unicode. I feel this is only the beginning of the process and know I likely have to move onto many more repositories to do changes for. Feel free to allocate any of the "wood chopping" work. Even if it is just adding a ton of test cases, I am willing to step in and pick up some of the boring but necessary work. |
Regarding your edge cases with terminals, have you seen these 2 blog entries? https://www.jeffquast.com/post/ucs-detect-test-results/ This all leads back to this repository: The results of terminals are listed here: You can test to the results directly against the terminal with this: Unfortunately, it has become obvious that while many libraries are doing a great job of evaluating printable widths, the terminals themselves have been causing issues as well. Hopefully the results of |
Looks like @jquast reinvented grapheme clustering in his "Specification". Unfortunately, he's also missing some things. For example, U+FE0E (VS15) changes the width of an emoji from 2 to 1: From what I can see, his algorithm would return 2 for both of these characters. I don't blame him, though. Even Chrome (at least on macOS) gets this wrong, Firefox renders it correctly. iTerm2 (which I used for this screenshot) renders the emojis correctly but also assigns a width of 2 to both of them. As you noted, and as evidenced by his study results, there are lots of variations among terminals. This is the result of everyone inventing their own algorithm or simply copying such an invention (many of them the original I had a discussion about the example above elsewhere with the authors of the VS Code editor:
They find my suggestions "reasonable" but then still refuse to implement them. Unless the Unicode team publishes a formal specification for character widths, we will always have this messy and, shall I say, ugly landscape. |
Hello @rivo, thanks for attention at this matter. You are right, that I should return a final width of 1 for I will add this to the specification and add a VS-15 test to the ucs-detect tool. Thank you. |
Closing this issue as this problem has been resolved for the Charm suite of tools by the creation of the new |
There are a number of issues with the
go-runewidth
package that makes it problematic when used with emojis. Working with runes is a flawed approach as explained by the author ofuniseg
:All references to
RuneWidth
andStringWidth
need to be replaced to improve the width calculations when dealing with emojis.It is trivial to replace
StringWidth
fromgo-runewidth
touniseg
as they provide the same input and output. There is also a performance increase of 4x migrating to using theuniseg
version so there are some very clearly benefits here.The complexity lies with what to do about the
RuneWidth
function. The right choice will be stop thinking in runes and switch over to grapheme clusters. This will require logic changes to any function using runes.From an acceptance criteria perspective, the goals should be:
go-runewidth
as a dependency.We can expect different results for some of the calculations but these should be considered improvements and bug fixes. It is likely this will cause breaking changes to applications that depend on this package. This may warrant a major version bump to reflect that. We should also make it clear in the documentation that the calculations involving some emojis will have changed.
The text was updated successfully, but these errors were encountered: