-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text encoding detection #580
Comments
For implementation there are 2 approaches:
I believe one fallback (ISO-8859-15) would be enough for almost everyone but for enhanced version user could define the fallback charset per-channel. |
We implemented this using the node-irc encoding option: TeDomum@6f556eb It is far from perfect and the heuristics sometimes backfire, but it does 99% of the job. |
If I understand correctly, it sets encoding for server connection. It does not prevent clients from using any other encoding, unless the server has some logic for it. |
The option is weirdly named in node-irc, but it does enable heuristic detection of other clients encoding and automatic transcription to utf8 (default encoding for JS strings). |
Just checked how this could be done client-side. Looks like it's not possible as all invalid chars come as 65535's and it's not possible to distinguish between ä's, ö's and other problem chars. |
Added ability to set fallback encoding for non-UTF-8 strings. Implements #580.
This has now been implemented and works fine. Thanks! |
It has difficulties on IRC messages containing "mIRC" colour codes leading to double encoding issue (e.g. See the following Matrix message (
|
Traditional IRC networks don't specify which text encoding is used. This has led to situation where some users use UTF-8, but some still use Latin-1 or other encodings. This is pain for non-english speaking users as clients must be able to detect which encoding is used by other users.
I suggest:
For example Irssi does this really well.
The text was updated successfully, but these errors were encountered: