Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for "less" LESSUTFCHARDEF (Private Use Area characters definition) option #1337

Closed
3 tasks done
tkapias opened this issue Aug 13, 2023 · 69 comments · Fixed by #1716
Closed
3 tasks done

Comments

@tkapias
Copy link

tkapias commented Aug 13, 2023

  • I have searched the issues for my request and found nothing related and/or helpful
  • I have searched the FAQ for help
  • I have searched the Wiki for help

Issue

On Linux,less, the most common pager, will no longer print Private Use Area characters by default.
I don't know when the transition took place, the version present in the Debian testing repositories (590) does still display Nerd-fonts characters, but the compiled master (633) no longer displays them.

A commit dated September 25, 2022 introduces a new option (LESSUTFCHARDEF) that allows you to declare the type of character display according to the code points range.

Using this option to declare code points listed in the wiki with the last version force less to display Nerd-fonts characters.

export LESSUTFCHARDEF=23fb-23fe:p,2665:p,26a1:p,2b58:p,e000-e00a:p,e0a0-e0a2:p,e0a3:p,e0b0-e0b3:p,e0b4-e0c8:p,e0ca:p,e0cc-e0d4:p,e200-e2a9:p,e300-e3e3:p,e5fa-e6a6:p,e700-e7c5:p,ea60-ebeb:p,f000-f2e0:p,f300-f32f:p,f400-f532:p,f500-fd46:p,f0001-f1af0:p

Solution

I think that instructions to export the correct variable should be given to Nerd-fonts users in the wiki or the readme.

Other

image

@acidghost
Copy link

Thanks for the fix!

I just spent an entire morning trying to pin-point the issue. I found this #1337 issue while looking for the code points used by Nerd Fonts.

For additional reference, I the version of less that introduced the LESSUTFCHARDEF is 632, more details are discussed in gwsw/less#275.

From what I understand of the linked issue PUA characters were never meant to be displayed. Maybe it has something to do with the v3 changes in Nerd Fonts?

@Finii
Copy link
Collaborator

Finii commented Sep 22, 2023

more is just more then less ;-D

After I took me years to switch from more to less I probably will switch back ;-)

image

I will try to come up with a documentation change.
Thank you both very much!! 🏅

@Cyb3rGhoul
Copy link

Can you assign me this issue?

@Finii
Copy link
Collaborator

Finii commented Oct 16, 2023

@Cyb3rGhoul Of course, thank you

@powerman
Copy link
Collaborator

powerman commented Jul 9, 2024

Here is value suitable for Nerd Fonts v3.2.1 non-Mono version (with nice double width large icons 😄):

LESSUTFCHARDEF=23fb-23fe:w,2665-2665:w,2b58-2b58:w,e000-e00a:w,e0a0-e0a3:p,e0b0-e0bf:p,e0c0-e0c8:w,e0ca-e0ca:w,e0cc-e0d7:w,e200-e2a9:w,e300-e3e3:w,e5fa-e6b5:w,e700-e7c5:w,ea60-ec1e:w,ed00-efce:w,f000-f2ff:w,f300-f375:w,f400-f533:w,f0001-f1af0:w

Mono version will needs the same value with all :w replaced by :p.

@injust
Copy link

injust commented Jul 10, 2024

Here is value suitable for Nerd Fonts v3.2.1 non-Mono version (with nice double width large icons 😄):

LESSUTFCHARDEF=23fb-23fe:w,2665-2665:w,2b58-2b58:w,e000-e00a:w,e0a0-e0a3:p,e0b0-e0bf:p,e0a0-e0c8:w,e0ca-e0ca:w,e0cc-e0d7:w,e200-e2a9:w,e300-e3e3:w,e5fa-e6b5:w,e700-e7c5:w,ea60-ec1e:w,ed00-efce:w,f000-f2ff:w,f300-f375:w,f400-f533:w,f0001-f1af0:w

Mono version will needs the same value with all :w replaced by :p.

How is this value generated? I'm not sure how to tell which code points are double-width.

Also, I think e0a0-e0c8:w overlaps with e0a0-e0a3:p and e0b0-e0bf:p?

@Finii Finii added this to the v3.3.0 milestone Jul 10, 2024
@powerman
Copy link
Collaborator

How is this value generated? I'm not sure how to tell which code points are double-width.

Manually. 😞

Also, I think e0a0-e0c8:w overlaps with e0a0-e0a3:p and e0b0-e0bf:p?

Yeah, sorry, it was a typo I have already fixed somewhere else but missed to fix here: it should be e0c0-e0c8:w (just fixed in my original comment).

@injust
Copy link

injust commented Jul 10, 2024

According to https://github.com/gwsw/less/blob/v661/less.nro.VER#L2015-L2017, the code point ranges can also be single code points. It's probably clearer to do that for 2665, 2b58, and e0ca, which results in:

LESSUTFCHARDEF=23fb-23fe:w,2665:w,2b58:w,e000-e00a:w,e0a0-e0a3:p,e0b0-e0bf:p,e0c0-e0c8:w,e0ca:w,e0cc-e0d7:w,e200-e2a9:w,e300-e3e3:w,e5fa-e6b5:w,e700-e7c5:w,ea60-ec1e:w,ed00-efce:w,f000-f2ff:w,f300-f375:w,f400-f533:w,f0001-f1af0:w

@nikunjmathur08
Copy link
Contributor

nikunjmathur08 commented Oct 10, 2024

Is this issue still open? Can I work on this? @tkapias @Finii

@Finii
Copy link
Collaborator

Finii commented Oct 10, 2024 via email

@tkapias
Copy link
Author

tkapias commented Oct 10, 2024

Please do @nikunjmathur08,

I was still using the outdated values that I wrote in the first comment.

I will now switch to the ones in @injust's post.

The only way to close this issue will probably be by finding an updated and parsable list of codepoints in the repository (I don't know if they are listed in other places than the wiki) and writing a script in bin/script/ to help users generate the right env variables for less.

@Finii
Copy link
Collaborator

Finii commented Oct 10, 2024 via email

@nikelborm
Copy link

nikelborm commented Nov 11, 2024

To avoid constantly updating ranges you can just use this:

export LESS="-r"
export LESSUTFBINFMT='*n%C'

@Finii Finii closed this as completed in 4b5ec06 Nov 20, 2024
Finii added a commit that referenced this issue Nov 20, 2024
…uctions

Add LESSUTFCHARDEF instructions for less PUA character display Fixes: #1337
@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Added @nikelborm's fix to the Wiki page.

@powerman
Copy link
Collaborator

Terminal:
image
Less with "large" (a lot of ranges) LESSUTFCHARDEF:
image
image
Less without LESSUTFCHARDEF:
image
Less with LESSUTFCHARDEF=e000-e8ff:p,f0001-fffff:p:
image

So, yeah, looks like these lower ranges are handled correctly by default.

My unicode-symbols.txt file was created a long time ago so it may contains some outdated ranges, but there are some ranges which does not work with LESSUTFCHARDEF=e000-e8ff:p,f0001-fffff:p:

<U+EA60> nf-cod-add; ea60
<U+EA61> nf-cod-lightbulb; ea61
...
<U+EC1D> nf-cod-git_fetch; ec1d
<U+EC1E> nf-cod-copilot; ec1e

<U+ED00> nf-fa-location_dot; ed00
<U+ED01> nf-fa-medapps; ed01
...
<U+F2FE> nf-fa-poo; f2fe
<U+F2FF> nf-fa-magento; f2ff

<U+F300> nf-linux-alpine; f300
<U+F301> nf-linux-aosc; f301
...
<U+F374> nf-linux-postmarketos; f374
<U+F375> nf-linux-qt; f375

<U+F400> nf-oct-light_bulb; f400
<U+F401> nf-oct-repo; f401
...
<U+F532> nf-oct-zoom_out; f532
<U+F533> nf-oct-bookmark_slash; f533

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Added a comment in the aforementioned Issue at less.

Thank you for testing. But what you test is just your terminal emulator and not less at all, right?

Screenshot 2024-11-20 at 13 12 51

How much the cursor advances after 2665 for example has nothing to do at all with less and it's setting, that is a terminal emulator decision and less does only need to know/decide if the line is cut off on the right or something.

So the difference between p and w is not how the glyph is rendered (that does your terminal and less has no power to change any of that), but only what less thinks on which cell coordinate the cursor is after outputting the characters, for purposes of correctly paging.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

I will already add you as contributor 💚 This is not to silence you ;)
Thank your for all the input / dialogue.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

@allcontributors please add @powerman for doc

Copy link
Contributor

@Finii

@powerman already contributed before to doc

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Oh :-D

@powerman
Copy link
Collaborator

So the difference between p and w is not how the glyph is rendered (that does your terminal and less has no power to change any of that), but only what less thinks on which cell coordinate the cursor is after outputting the characters, for purposes of correctly paging.

Good point!

So only issue is missing ranges starting somewhere near EA60 and ending somewhere near F533. If current version of Nerd Font still uses these ranges then they should be included in LESSUTFCHARDEF.

@powerman
Copy link
Collaborator

As for p and w… well, okay, terminal is defined 1- or 2-cell width, but isn't LESSUTFCHARDEF should use p and w accordingly? Sure, this is probably out of scope for Nerd Fonts project, but if wiki will mention only :p then it may worth to at least add a note about this issue (e.g. "Depending on your terminal settings some ranges may be shown using 2 cells and these ranges should use :w inplace of :p in LESSUTFCHARDEF.")?

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Screenshot 2024-11-20 at 13 20 55

These are codepoints not in PUA (which ends with E8FF), and Nerd Font dropped them with v3.0.0.
Well, specifically because of that, that we patched non-PUA codepoints and thus broke glyphs that are rightfully on that codepoints (Chinese, etc).

Hmm, I see some are still PUA characters albeit being in no PUA block ?!

Screenshot 2024-11-20 at 13 33 37

I'll add what you mentioned in your last comment.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Screenshot 2024-11-20 at 13 40 47

Edit: Correct image

@powerman
Copy link
Collaborator

These are codepoints not in PUA (which ends with E8FF)

Hmm. https://www.unicode.org/charts/PDF/UE000.pdf says:

Range: E000-F8FF

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

🤦‍♀️

Whatever, correcting, tnx

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Screenshot 2024-11-20 at 13 44 27

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

/spent 3h

Wah, this is not Gitlab ;)

@powerman
Copy link
Collaborator

Sorry about this! :)

I'm a bit afraid to saying so, but to me it looks like your latest version (one with :w) should include these too: 23fb-23fe:w,2665:w,2b58:w.

Thanks!!!

P.S. Probably I'll have to update ranges in https://github.com/powerman/wcwidth-icons this way too.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

This whole issue that less creates in so ridiculous...

WHY is it assuming 2665 is single width, when unicode says it's ambiguous?
For the PUA they said: Its ambiguous width, so we can not display it.

And then, why not assuming E000 to be single width, as it does with 2665

Screenshot 2024-11-20 at 15 32 09

@nikelborm
Copy link

nikelborm commented Nov 20, 2024

export LESSUTFBINFMT='*n%C'

For the PUA they said: Its ambiguous width, so we can not display it.

Just curious about what effect LESSUTFBINFMT here does?

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

I would assume ...

*     I have no clue
n     normal (not bold or whatnot)
%C    just use the char itself

@powerman They themselves suggest to set what we do:

Screenshot 2024-11-20 at 16 14 34

@nikelborm
Copy link

nikelborm commented Nov 20, 2024

I've read the doc, I mean practically if you set this env var does it help to render character on the screenshot beautifully? It helped for me in all of such cases where I've seen this <U+E000> kind of rendering them, because *n%C should render all characters as is

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

"Beautifully"? Well, that is all just about enabling less to know how long one line is, to cut it off at the right place.

It has nothing to do with how the glyphs are rendered/placed by the terminal. less can not influence that.

Now the question is what you mean by beauty ;)

@nikelborm
Copy link

nikelborm commented Nov 20, 2024

"Beautifully"? Well, that is all just about enabling less to know how long one line is, to cut it off at the right place.

It has nothing to do with how the glyphs are rendered/placed by the terminal. less can not influence that.

Now the question is what you mean by beauty ;)

Sorry I'm not a native speaker. I don't know how to say it in other words. It's the less's job to give the character to the terminal. And if less sees some character which is not whitelisted it prints it as sequence of ascii characters representing the hex of the initial character. I mean that <U+E000> is ugly and whatever symbol was behind is beatifull.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

This is why less wants to know how wide the character is rendered by the terminal...

$ cd git/nerd-fonts
$ less -S font-patcher
Screenshot 2024-11-20 at 16 27 39

Less cutting off long lines, needs to knpow how wide each char is rendered

@nikelborm
Copy link

nikelborm commented Nov 20, 2024

export LESSUTFBINFMT='*n%C'

Setting this variable at least gives the terminal the chance to decide if it can render it. Without it the terminal emulator (?) (GUI program) just gets sequence of ascii characters

rendered/placed by the terminal. less can not influence that

So it's not just can. It does.

And basically I'm asking if you set this var, will your terminal be able to render the character or is it just breaks?

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

less can not influence that

So it's not just can. It does.

less has no possibility to influence the glyph rendering of the terminal emulator. It can just output a character or not.
It needs the information about how wide the terminal renders a glyph (which it also can not ask for, so that we need to manually tell less via environment variable) to cut off overlong lines in the right position.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

And basically I'm asking if you set this var, will your terminal be able to render the character or is it just breaks?

Well, your terminal can render what it can. less does not know nor can it change that behaviour. So setting the variable does not change the terminal's capabilities.
With the variable un-set less just "protects" the user from strange line-endings by not outputting whole ranges of characters.

If you set this *p%C it should pass all characters unchanged, and not change them to the "user friendly" (or not) hexdump format that noone wants. If the terminal can display it depends on your terminal and used font.

@powerman
Copy link
Collaborator

how wide the terminal renders a glyph (which it also can not ask for

At a glance - less can ask for it: using wcwidth(3). The correctness of result will depends, of course, but it's anyway better to fix such a response in a single place than fix it for each individual app like less or vim which decide to not use this API.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

But it can not ask the terminal, and that is free to render them as it likes. 🤷‍♀️

All these problems do not occur if you use Nerd Font or Nerd Font Mono, the fonts designed for terminal uses, that utilize additional blanks for rendering purposes, that the typical applications all understand. Nerd Font Propo is a new addition and intended to use in proportional contexts like GUI elements etc.

@powerman
Copy link
Collaborator

All these problems do not occur if you use Nerd Font or Nerd Font Mono

These fonts has much more critical issue: too small icons, all using just 1 cell! 😢 Small enough to be hardly usable at all - at least for people with less than perfect vision.

@Finii
Copy link
Collaborator

Finii commented Nov 20, 2024

Hmm, Nerd Font and Nerd Font Propo has the icon in the exact same size.

Only Nerd Font Mono scales them down into one cell.

🤷‍♀️

Maybe have a look at #1103, but then, you know that already I guess.

@powerman
Copy link
Collaborator

powerman commented Nov 20, 2024

Ahh, I see. Thanks for the explanation!

TBH, I don't think this will change anything, to me at least. Having to add extra space after an icon always feels like a bug to me. It turns out to be a feature… well, it happens. 😄 These extra spaces really create issues with a lot of tools: they might insert an unwanted line wraps after an icon, they might show only left half of an icon if it happens to start at last cell, the cursor movement over icons became counter-intuitive, it breaks words counting/matching/etc. using regexp/patterns, GUI tools (e.g. browsers) will actually render that extra space, and probably worst of these is icon is often not in the middle of a word but at the end or is a separate "word" which leads to having 2 spaces after an icons which many tools will "fix" by removing 2nd "extra" space.

So, to me it looks like only real way to solve these issues is to use Propo fonts, use LD_PRELOAD=/path/to/libwcwidth-icons.so to ensure most apps (including terminal) will know all "wide" icons are actually 2-cell wide, plus additionally configure (or patch) few "too clever" apps like less or vim which are hardcoded their own lists of wide codepoints instead of using wcwidth(3) API. This may feels like a "hacks", but I'd prefer to have such a 2-3 hacks in my whole OS+apps setup than "extra space after each icon" hack in each text file, shell prompt configuration, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants