UTF-16 text interpretation #74
datasynergyuk
started this conversation in
General
Replies: 1 comment
-
Because it's all experimental and unfinished, even non-started, the same goes for absolutely any wide encoding as well. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thank you for fixing the display of non-Latin characters encoded in UTF-16.
There remains something odd about the way this text is displayed. I know you are already aware of this and it is a work in progress so this post is just to describe the problem and provide a test case.
I have attached a test file called "Multilanguage.txt". This contains a mixture of text in different languages and scripts:
The first text is in Chinese and consists of approximately 14 symbols over 30 hex bytes (two rows of standard hex). In WinHex this is displayed correctly on two text rows but in the current HexCtrl all symbols are displayed on the first text row. The Chinese characters on the second hex row are rendered on the row above. This is shown in red below.
The second text is in Russian and consists of approximately 30 symbols over about 60 hex bytes. In WinHex, this is shown correctly starting at offset 0x28 and running over 4.5 lines of text. However, in HexCtrl, this is shown starting at offset 0x15 and running over about 2 lines of text. This starting point is actually still in the previous Chinese text. This pattern repeats with each successive language.
I'm not sure why this happens. Maybe it is because of the variable length encoding but I suspect it may be something more basic than that and because HexCtrl still assumes there are 16 text symbols per text row or is interpreting the text as a string rather than individual characters? Maybe the solution would be to decode each text character in isolation in UTF-16 mode?
Multilanguage.txt
Beta Was this translation helpful? Give feedback.
All reactions