Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

META - UTF-8 and BiDi support for the various languages #19

Open
9 of 16 tasks
XVilka opened this issue Feb 3, 2015 · 32 comments
Open
9 of 16 tasks

META - UTF-8 and BiDi support for the various languages #19

XVilka opened this issue Feb 3, 2015 · 32 comments

Comments

@XVilka
Copy link
Contributor

XVilka commented Feb 3, 2015

@XVilka XVilka self-assigned this Feb 3, 2015
@XVilka
Copy link
Contributor Author

XVilka commented Apr 11, 2015

For the RTL part - it supports Arabic filenames and comments but fails only in direction:
ar

@schrotthaufen
Copy link

#include <stdio.h>

int main(void) {
        printf("ÄÖæxßf®\n");
        return 0;
}

Actual result:
[0x00400410]> fs strings
[0x00400410]> f
0x004005a4 13 str.____x__f

Expected result:
[0x00400410]> fs strings
[0x00400410]> f
0x004005a4 13 str.ÄÖæxßf®\n

@radare
Copy link

radare commented Apr 16, 2015

flags cant contain strange chars, and by strange i mean non-7bit-ascii

thats how the r_name_filter() works

not going to fix/change this before the release. flags needs to be rewritten to use sdb

On 16 Apr 2015, at 17:52, schrotthaufen [email protected] wrote:

#include <stdio.h>

int main(void) {
printf("ÄÖæxßf®\n");
return 0;
}
Actual result:
[0x00400410]> fs strings
[0x00400410]> f
0x004005a4 13 str.____x__f

Expected result:
[0x00400410]> fs strings
[0x00400410]> f
0x004005a4 13 str.ÄÖæxßf®\n


Reply to this email directly or view it on GitHub https://github.com/radare/radare2/issues/2032#issuecomment-93769332.

@XVilka
Copy link
Contributor Author

XVilka commented Apr 16, 2015

@radare updated the bug, thanks!

@XVilka
Copy link
Contributor Author

XVilka commented Apr 18, 2015

Btw, Arabic comments are working in the mlterm:

@XVilka XVilka changed the title Support for the Asian languages (except RTL ones) Support for the various languages (except RTL ones) Jun 27, 2015
@XVilka XVilka changed the title Support for the various languages (except RTL ones) Support for the various languages Sep 27, 2015
@holdsworth
Copy link

In Hebrew, using XTERM terminal - the comments are being represented, however they are written the other around as a problem in XTERM in general that the first character is in the left and the next one is afterwards, ltr instead of rtl(reverts the text).

Example given:

  1. proper:
    זאת הערה
  2. how it is actually represented:
    הרעה תאז

As written in xterm/README(https://github.com/joejulian/xterm/blob/master/README.i18n) 👍
"Xterm does not support bi-directional or RTL languages such as Hebrew
and Arab."

From a little bit of research I am taking a look into another console which might offer the support for rtl languages and it will be more correct to operate r2 under that console

@XVilka
Copy link
Contributor Author

XVilka commented Sep 28, 2015

@holdsworth try mlterm

@holdsworth
Copy link

@XVilka mlterm doesn't represent Hebrew at all, with Konsole it works perfectly fine for some reason.

@radare
Copy link

radare commented Nov 2, 2016

please update the checkboxes

@holdsworth
Copy link

please be more precise

@radare
Copy link

radare commented Jul 9, 2017

cc @kazarmy

@Maijin Maijin changed the title Support for the various languages META - UTF-8 / Support for the various languages Sep 10, 2017
@Maijin
Copy link

Maijin commented Sep 10, 2017

CC @queenp

@kazarmy
Copy link

kazarmy commented Sep 10, 2017

I've run r2 in a (Linux) Emacs shell and it works fine for RTL and Arabic shaping (visual mode is unusable though). For consoles that don't support RTL, implementing a full-fledged Unicode Bidirectional Algorithm in r2 appears to require humongous and probably-not-worth-it effort but a simple algorithm that reverses Arabic and Hebrew character sequences shouldn't be hard to do. An Arabic shaping algorithm shouldn't be hard to do either.

@XVilka
Copy link
Contributor Author

XVilka commented Sep 12, 2017

May be have an option to work with FriBiDi somehow for bidirectional texts. But first, all visual modes should be fixed to properly work with Unicode and RTL chars.

@kazarmy
Copy link

kazarmy commented Sep 12, 2017

May be have an option to work with FriBiDi somehow for bidirectional texts.

Yep. Output of:

r2 -c 'iz~Arabic:2' -q bins/pe/testapp-msvc64.exe | fribidi --nobreak

appears promising (testapp-msvc64.exe is an r2r binary).

@XVilka
Copy link
Contributor Author

XVilka commented Dec 29, 2017

We need better page for BiDirectional text support across terminals, like we did for TrueColors. So it will be easier to push terminal developers and easier track the progress.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 6, 2018

radareorg/radare2#9608 is also related

@XVilka
Copy link
Contributor Author

XVilka commented May 11, 2018

@Vane11ope can you please try to enable the unicode reflines and stuff in the visual panels code?

@XVilka
Copy link
Contributor Author

XVilka commented Jul 3, 2018

@cyanpencil @Vane11ope @kazarmy please help to review the current state, and what is needed to be done.

@cyanpencil
Copy link

@kazarmy do you for any chance have a binary on which we can test the things you just posted?

Or did you test them by inserting a comment in the disasm?

@kazarmy
Copy link

kazarmy commented Jul 3, 2018

Try looking at the r2r bins bins/pe/testapp-msvc64.exe, bins/elf/strenc and bins/elf/strenc-guess-utf32le. They don't cover all the things posted though and could be more comprehensive.

The test binaries were produced using string literals (C / C++).

Btw, there are some UTF utility functions in libr/util/utf8.c / utf16.c / utf32.c. Better not to have duplicate code.

I see you did:

  • Support for fullwidth characters in graph mode

Thanks!

@radareorg radareorg deleted a comment from kazarmy Jul 3, 2018
@Maijin
Copy link

Maijin commented Jul 3, 2018

merged @kazarmy checkbox with main post

@cyanpencil
Copy link

Btw, there are some UTF utility functions in libr/util/utf8.c / utf16.c / utf32.c. Better not to have duplicate code.

Unfortunately I knew the existence of that file, but did not read it throughly and re-implemented function r_utf8_decode... will fix it in the next pr

Thanks for the heads up!

@XVilka
Copy link
Contributor Author

XVilka commented Jul 4, 2018

From what I know shaping and diacritics will require dependency on ICU, no less. Those are very complex algorithms and depend highly on the language.

@kazarmy
Copy link

kazarmy commented Jul 4, 2018

Actually at least for Arabic, shaping is not that complex. Remember that in real life shaping has to be done manually by kids.

@radareorg radareorg deleted a comment from alkeryn Jul 13, 2018
@radareorg radareorg deleted a comment from alkeryn Jul 13, 2018
@radareorg radareorg deleted a comment from XVilka Jul 13, 2018
@XVilka
Copy link
Contributor Author

XVilka commented Dec 21, 2018

For a future reference - mintty/mintty#837

@XVilka
Copy link
Contributor Author

XVilka commented Jan 7, 2019

This library seems also interesting https://github.com/JuliaStrings/utf8proc

@XVilka
Copy link
Contributor Author

XVilka commented Jan 9, 2019

See also radareorg/radare2#12629

@XVilka
Copy link
Contributor Author

XVilka commented Jan 30, 2019

See https://terminal-wg.pages.freedesktop.org/bidi/ for details/proposal about BiDi

@XVilka XVilka changed the title META - UTF-8 / Support for the various languages META - UTF-8 and BiDi support for the various languages Mar 29, 2019
@XVilka
Copy link
Contributor Author

XVilka commented Jul 2, 2019

@XVilka
Copy link
Contributor Author

XVilka commented Oct 12, 2019

Note, that with the release of GNOME 3.34 the support of BiDi is available in Gnome Terminal out of the box, which makes testing/implementing it in the other programs, such as radare2 way easier. FYI @deepakchethan

@ret2libc
Copy link

This issue has been moved from radareorg/radare2 to radareorg/ideas as we are trying to clean our backlog and this issue has probably been created a long while ago. This is an effort to help contributors understand what are the actionable items they can work on, prioritize issues better and help users find active/duplicated issues more easily. If this is not an enhancement/improvement/general idea but a bug, feel free to ask for re-transfer to main repo. Thanks for your understanding and contribution with this issue.

@ret2libc ret2libc transferred this issue from radareorg/radare2 Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants