-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S with caron rendered as tofu by elinks, rendered correctly by links #249
Comments
Similar problem with unicode non breaking space... it is rendered as tofu by elinks, and not by links.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<head>
<title>Test UTF-8 NBSP</title>
</head>
<body>
<p class="indent">Chapter 8</p>
</body>
</html> |
Did you compile elinks with utf-8 enabled? |
@rkd77 I have never used meson, so am not sure if the build options in
I am not sure if
|
Here is a simple build script for meson:
and cd /dev/shm/builddir && ninja install Seems configure script also built binary with utf-8 support. On Debian 12, konsole and LANG=pl_PL.UTF-8 is displayed fine. |
@rkd77 On macOS aarch64. macOS don't have
Terminal: tested on multiple: iTerm2, Alacritty, WezTerm, Kitty. They all have different fonts too... I wonder if it is the
UPDATE: configure with --enable-combining doesn't change anything. |
UPDATE: I installed elinks in an Arch VM and opened it in the same tmux session on macOS. The locale, font, terminal, tmux, terminfo is same, but it renders correctly in the Arch VM but not in macOS in adjacent pane of the same tmux session. For some reasons --version does not produce anything on macOS. Here's the --version output: macOS (+/- Fastmem doesn't matter):
Arch Linux:
|
@amanvm Could you confirm, that the same bug (wrong utf-8 letter) occurs on FreeBSD VM ? I have no access to such hardware, but I guess FreeBSD is similar to MacOS in this case. |
@rkd77 Just tested, it does not happen in FreeBSD VM! I mean it renders correctly in FreeBSD and Linux. Both tested in same tmux session on macOS with defaults (no config). I also tried this with default config (no config) on macOS, but the problem still persists. So it is not a config problem either... Mine is a aarch64 macOS machine, not sure if that affects anything. Searching "macos virtual machine on linux" shows a whole bunch of videos and guides... |
@rkd77 One observation: Unlike most other systems where libs/include files are in standard directories `/usr/local On macOS I used this:
macOS (
ArchLinux (
|
unsigned char -> char conversion in the past is suspected. |
@rkd77 Tested on 0.13.0, its the same there! I wonder if there is any other unicode shaping library that you could use, or how links is able to display these glyphs. |
What if charset is added in meta?
|
@rkd77 For 1) I get an error "Bad url syntax" with elinks, but not with links. For 2) (the -dump option): same problem. The output from links has correct S with caron, the dump from links has tofu there. It's a little late here, so any other followup will be after a while. Thanks! |
I guess it has something common with detection of encoding. If this ^ commit did not resolve it, I have no idea. |
@rkd77 Didn't resolve it. One comment I have is: a lot of unicode seems to just render fine. It's only a subset that doesn't. If you could think of a patch that does some kind of text log generation for interesting function arguments and ret values for a input test document like this, I can volunteer for that for sure. |
@amanvm you can prepare test cases and save dumps (elinks --dump) . And show hex view of these dumps. |
@rkd77 Am not a unicode/utf-8 expert and we might end up doing a lot of back and forth that way. Don't you want to add a fprintf or two to some important functions that shapes/processes unicode data so there is faster convergence? A branch or patch with some fprintfs would help. |
@amanvm There are many places where it can break. First I want to know how it "looks" like. |
Another question. How is rendered plain text with this character? Also "tofu" or ok? |
@rkd77 Ok I have something for you! I used the test document from pragmatapro's repository. As that is one of the most comprehensive terminal fonts, and the test file has all of its glyphs mentioned with unicode code points. There is a free clone called pragmasevka, you are welcome to try viewing the documents with that. There are 3 files, and 2 screenshots attached here:
Files: Screenshots:
More tofu's can be seen on the screen, and by downloading the txt files to see it for yourself. Regarding your other question about "ELinks 0.17.GIT c09b5da-dirty": I just tried installing the homebrew version with from head branch using command |
@amanvm on branch utf I added debug statements and test/chars.txt. |
@rkd77 Here we go, this is what the log file has:
|
Added more debug statements. Could you rerun test? |
@rkd77 Here's the output:
Clang, as gcc/g++ is actually clang/clang++ on Apple macOS. This is also the aarch64 (ARM64) version.
|
Added another commit to the utf branch. I disabled maybe_preformat_hook in dump to exclude it from suspected. |
@rkd77 It's the same stderr output as before:
stdout has this:
|
@rkd77 One observation: If I open the document without -dump, the character renders correctly (no error is seen). Are you sure the path taken by application for -dump is the same? Actually, the error wasn't seen on a previous version of elinks too for this Å when not opened with -dump. But error is always still seen with S with Caron (Š). So there is some difference in -dump and non-dump behavior. Let's try the S with Caron (Š) and its neighbors perhaps? |
dump interprets document as html, normal view as plain text. You can check latest commits and show log. |
@amanvm, thanks, could you continue? I make mistake in commit log, but we are closer. |
I guess isspace returns different results than on Linux. warnings don't matter here, at least not yet. |
I added code for isspace. Could you check whether it works? You can redirect stderr to /dev/null |
@rkd77 It works well everywhere now! No tofus! Btw, I would recommend you to mention this somewhere that users should close the other instances of elinks before they try their hands on a new version. If old version is open, the old behavior persists for some reason even with new binary. When I closed all old instances, the new binary's behavior kicked in. I know you have some socket file to communicate between elinks instances, not sure though how it is being used, couldn't find much info in documentation. |
This commit was added to the master branch. Likely more characters must be added to isspace. In docs there is info about sessions and elinks instances. |
S with caron (Š) rendered as tofu by elinks, rendered correctly by links
links output:
elinks output:
The text was updated successfully, but these errors were encountered: