Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do columns in Position mean? #1

Closed
michaelpj opened this issue Apr 22, 2022 · 7 comments
Closed

What do columns in Position mean? #1

michaelpj opened this issue Apr 22, 2022 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@michaelpj
Copy link

"Columns" of text are not well-defined in the presence of unicode input. I'd suggest specifying that it means unicode code points, and maybe testing with some awkward unicode text.

@Mesabloo
Copy link
Owner

You are absolutely right!
This is something that I completely forgot to specify.

I am aware that I unfortunately don't support (yet) variable-width characters (e.g. tabs, but also some other unicode characters if I recall correctly), which will break the output (though colors should still be applied correctly).
What would be a correct way of handling those?

@Mesabloo Mesabloo added the documentation Improvements or additions to documentation label Apr 22, 2022
Mesabloo added a commit that referenced this issue Apr 22, 2022
As per #1, the term 'columns' has been specified to refer to a unicode codepoint count
@michaelpj
Copy link
Author

I think if you just say that the column numbers are code-point offsets in the line, then you're fine. All you have to worry about then is creating your markers, which I guess require generating whitespace which takes up a given number of code points (easy) and generating a marker element that takes up a given number of code points (also easy?).

I actually have no idea how tabs work 😅

@Mesabloo
Copy link
Owner

All you have to worry about then is creating your markers, which I guess require generating whitespace which takes up a given number of code points (easy) and generating a marker element that takes up a given number of code points (also easy?).

If everything has been correctly implemented, then all this should already be the case.
This still requires some testing on some weird looking Unicode strings though.

I actually have no idea how tabs work sweat_smile

As far as I'm aware, they just are single byte characters which are rendered differently depending on many factors (editor configuration, terminal emulators, user environment, etc).
However, in the case of this library, this may produce very inconsistent output, as tabs will be output as simple spaces.
So this is something that I have to look after.
prettyprinter may also include a way to work with tabs, which would solve all these problems I believe.


Thanks for the issue! 🙂

@michaelpj
Copy link
Author

Yeah, the most important thing is just documenting what they're supposed to be so clients can put the right thing in!

@sofia-m-a
Copy link

The solution which I want to integrate into my error library Chapelure is to use wcswidth. Sadly there is not yet a standalone binding package for this on Hackage (although there is Graphics.Text.Width from vty). A proper wcswidth implementation handling emoji and double-width unicode characters, together with some preset for converting tabs into n spaces, seems to be the most 'standard' approach, although there's hardly a consensus between text editors, programming language diagnostics, and terminals about what a 'column' is.

@Mesabloo
Copy link
Owner

Mesabloo commented Apr 25, 2022

Thanks for the link to wcwidth/wcswidth. I actually didn't know about it!
As for a Haskell implementation, there are two ways to go for:

I do not want to depend on vty, as it is a very big dependency and does not currently work on Windows.
Edit: just found at that this package is actually available on Hackage: https://hackage.haskell.org/package/wcwidth

A proper wcswidth implementation handling emoji and double-width unicode characters, together with some preset for converting tabs into n spaces, seems to be the most 'standard' approach

This is ideally what I would like to follow, now knowing that functions like those exist. 😀

@Mesabloo
Copy link
Owner

Using wcwidth + space conversion for TABs seems to work perfectly!
image
The space after the weird sum is a TAB, and is rendered above as 4 spaces (though this is now configurable).
Both the 🎉 emoji and the chinese character take 2 cells, which has correctly been identified.

Thanks again everybody for all the precisions and the issue itself!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants