-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Unicode in nicknames #259
Comments
There are existing implementations of this (e.g. InspIRCd's m_nationalchars) but nothing standard. I believe that @DanielOaks was looking into trialling RFC 3454 in @mammon-ircd with a desire for standardising it though. It isn't as simple as just allowing it though. Compatibility is a concern (there are clients which break when they get a CASEMAPPING which is not ascii or rfc1459) as well as masquerading with characters that look similar (e.g. character 97 "a" looks very similar to character 1072 "а"). |
There's also cases of servers improperly implementing rfc1459 vs. strict-rfc1459 (see inspircd/inspircd#1017). Ideally, wouldn't we want this to match how it is done for channel names? |
For what its worth I made a test branch for hexchat supporting rfc3454 though no network implements it afaik to try it. |
Relevant reading: UTR 36: Unicode Security Considerations UTS 39: Unicode Security Mechanisms Bitlbee has a 'utf8_nicks' setting, disabled by default and with a small warning about potential breakage in the help text. It doesn't perform any cleanup, deferring that to the IM server (XMPP for example cleans them with the nodeprep/resourceprep profiles of stringprep), but i'd really like to change this. I haven't heard of clients with big issues when enabling this, just minor visual issues like miscalculating the width when displaying the nicks in a terminal. |
How are we going to maintain the backward compatibility? |
In practice, it's probably already compatible because many clients don't care. |
Hmm, well then this should really be in IRCv3.2, it's awesome |
With So long as you continue to disallow characters that break the protocol (i.e. commas, periods in client names, etc), and reject nicks/channel names that fail to casefold (i.e. strings that fail because they contain a character prohibited by the profile), I haven't seen too many issues with it. |
In charybdis, we plan to implement |
It is a joke but it is a solution, convert it to punycode (or similiar) for non unicode clients.
Not sure what you mean by that, many clients respect the casemapping and rely upon its behavior. |
@kaniini That makes sense, once it's implemented/specced out give me a yell and I can see about switching my personal stuff over to use it as well. |
There is no plan in charybdis for backwards compatibility. Deployments which switch from |
How would tab completion work if someone used a nick on international channel that is not in latin alphabet? What if the client is configured to use not-UTF-8-charset? What if the the person using UTF-8 nick uses something that my client cannot show due to old glibc in my system which is in the wild? (ref: weechat/weechat#79) |
Presumably the same as it does with the latin alphabet.
I think detecting a specifically UTF-8-based casemapping from the server should make the client default to using UTF-8, if they're not already. If the user decides not to use it, they may get corrupted characters, just like what happens today when two clients using utf8 and non-utf8 try to send weird characters to each other.
Then it will not show those characters because your client (or the system you're using it on) does not work properly. I don't think this is an issue for us to worry about, it's a bug that will get fixed by more distros over time, and I especially think will be fixed enough for us to not care about it by the time a unicode casemapping actually gets into proper usage. |
I fully support moving away from legacy rfc1459 towards rfc7700. |
Not sure if possible, but ideally
Clients which support Existing clients would work the same way they already do when someone sends a UTF-8 message (i.e. some would detect UTF-8 anyway, others would mis-decode it as ISO-8859-42 or whatever such).
🤷 I guess it'd be less likely to happen if only "word" characters were accepted, similar to how Python etc. filter characters allowed in variable names. |
So long as the client takes the casefolding into account when evaluating tab-complete matches, should work without an issue I'd imagine. |
Maybe we can do some math in the IRC server to create an alias and send to the client? so the client uses the alias to complete the actual nick? Like if the nick is This way the user can just put like Or, the math part might be left up to the clients, as the whole thing is really client side anyways. |
I'd suggest it's just up to the clients to implement tab completion in a sane manner. UI interfaces shouldn't be speced in a protocol. |
Right. |
Hmm, how do people use tab-completion in the existing ISO-2022-JP networks? |
Wouldn't this just mean that aaa / ąãå were the same nick and all variations of ąãå which the IRCd would interpret to aaa and get very confusing? This is why I gave 👎 to your comment. |
Another idea could be to treat it like capitalizations? |
Proposed client behaviour would be in a non-normative part of the spec at best, so it's not even worth bothering with. I suspect with the way this discussion is going, this will be an area where the IRCv3 process fails us and we just form a coalition of IRCd vendors to make it happen, and then IRCv3 maybe documents it after the point. |
So business as usual, then? |
Pretty much what @kaniini says. It's not a huge issue to worry about. |
I'd still be concerned about breakage - even clients which support UTF-8 messages likely have made assumptions about nicknames, particularly any clients which support tab completion or which maintain a cached member list for channels for some purpose. I'd be afraid that this is likely to expose a lot of undefined behaviors around input sanitation of nicknames received from the server (or the lack thereof). Some possible manifestations of incompatibility with UTF-8 nicknames
Some of these issues already exist today with channel names, and chat messages, but nicknames are more fundamental, as they are identifiers that the client absolutely has to deal with correctly - if a channel name breaks a client, the user can avoid that channel, a user can't necessarily choose to avoid all users with UTF-8 nicknames. There's also a severe usability concern that needs to be addressed - a channel operator MUST be able to quickly and unambiguously specify nicknames for use in commands with only keyboard input, regardless of what language's characters might happen to be in those nicknames. Even if that client properly supports UTF-8 nicknames, if the use of such nicknames complicates the effective management of channels in the slightest, then user acceptance of internationalized nicknames will either be dead in the water as a feature users rebel against, or there will be demands for restrictive channel modes to prohibit all internationalized nicknames on a channel.. (Yes, I realize that in most cases, a user has access to a GUI, tab completion, or copy/paste, but there is no guarantee of this - there are environments where none of these will be a viable option. Tab completion, for example, often requires the user specify at least a partial match, or requires them to iterate through every nickname on the channel, copy/paste may not be available if the user is at an actual console session rather than running a terminal inside a GUI, GUI userlists aren't available in a terminal, and so on.) |
rfc7700, when properly implemented, handles all of those issues and more. have you read it? |
I have, and it is so extremely light on practical details about exactly how it would be implemented within the IRC protocol that it leaves more questions than answers.While IRC is mentioned as a possible application, aside from that mention, the rest of the RFC consists of a set of guidelines that can be generically applied to problems inherent with nickname internationalization. across a wide variety of existing and future protocols. While the specifications set out in the RFC address a number of potential issues, the lack of any formal guidance of how to integrate them into the IRC protocol, combined with a lack of IRC specific recommendations effectively make it nothing more than a building block, and my concerns from a user standpoint above about IRC-specific implementation details remain at most partially addressed by RFC7700. Of more concern, there are some security considerations that should be readily apparent to any long time user of IRC, which are not mentioned - specifically, the potential for disruption if the effective use of channel management and ignore functionality is obstructed or defeated by internationalized nicknames. This is especially important here because users might first have to learn how to deal with inputting i18n nicknames while under the pressure of on ongoing disruption or attack. Any demonstration or reference implementations will have to be especially aware of these and other considerations, to avoid an implementation that is perceived as creating more problems that it solves. |
If you look at the IRCX Draft v04 (Microsoft, 1998) it provides a way to allow Unicode nicknames in IRC. Client's that don't support Unicode (non-IRCX in the draft) see:
This has been supported in many clients / servers since the 1990s, why not use it? |
Curious which ones? |
Just remember to ban for security reasons all Unicode confusable symbols (allow only one version of those chars). |
What about handling emojis? 👸🏻 may appear as either multiple characters or a single character while being visually different or identical to 👸depending on the system or application support. Additionally, many clients will use a shortcode such as Both |
Shortcodes are handled explicitly by the client (i.e. if the client wants to convert them then cool), the protocol doesn't treat shortcodes any differently or give them any special conversion. At least in #272 right now, it allows emoji as a part of names so far as rfc7700 does, but servers are free to block whatever characters they want. |
@grawity NFKD + case folding is likely to help with the tab completion for accented characters. Decomposing characters will let you handle diacritics by either matching or skipping them, and compatibility decompositions will handle a lot of other stuff. (It's designed for search operations.) See http://unicode.org/reports/tr15/ But Unicode mapping tables won't handle things like a -> あ, since they don't have tables for romanization of non-Latin scripts. It's not really an easily standardizable thing... in many languages, there's multiple possibilities; e.g. Chinese has several formalized romanization schemes in common use, and Persian is romanized rather haphazardly by Persian-speakers. |
Worth considering whether just using a metadata key might resolve this sufficiently. e.g. |
RFC 1459 only allows ASCII letters, numerals and some special characters in Nicknames, leaving people from non-anglophone countries at a disadvantage. Using the wealth of human writing is possible in the body of messages, it should be possible in the nicknames too.
The text was updated successfully, but these errors were encountered: