UTF-16 text interpretation #74

datasynergyuk · 2021-11-24T09:50:18Z

datasynergyuk
Nov 24, 2021

Thank you for fixing the display of non-Latin characters encoded in UTF-16.

There remains something odd about the way this text is displayed. I know you are already aware of this and it is a work in progress so this post is just to describe the problem and provide a test case.

I have attached a test file called "Multilanguage.txt". This contains a mixture of text in different languages and scripts:

The first text is in Chinese and consists of approximately 14 symbols over 30 hex bytes (two rows of standard hex). In WinHex this is displayed correctly on two text rows but in the current HexCtrl all symbols are displayed on the first text row. The Chinese characters on the second hex row are rendered on the row above. This is shown in red below.

The second text is in Russian and consists of approximately 30 symbols over about 60 hex bytes. In WinHex, this is shown correctly starting at offset 0x28 and running over 4.5 lines of text. However, in HexCtrl, this is shown starting at offset 0x15 and running over about 2 lines of text. This starting point is actually still in the previous Chinese text. This pattern repeats with each successive language.

I'm not sure why this happens. Maybe it is because of the variable length encoding but I suspect it may be something more basic than that and because HexCtrl still assumes there are 16 text symbols per text row or is interpreting the text as a string rather than individual characters? Maybe the solution would be to decode each text character in isolation in UTF-16 mode?

Multilanguage.txt

jovibor · 2021-11-26T01:37:45Z

jovibor
Nov 26, 2021
Maintainer

Because it's all experimental and unfinished, even non-started, the same goes for absolutely any wide encoding as well.
Take it as just a simple addition, nothing more.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-16 text interpretation #74

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

UTF-16 text interpretation #74

datasynergyuk Nov 24, 2021

Replies: 1 comment

jovibor Nov 26, 2021 Maintainer

datasynergyuk
Nov 24, 2021

jovibor
Nov 26, 2021
Maintainer