-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IRC formatting codes break UTF-8 auto detection and force encodingFallback use #44
Comments
On another channel where my client is configured to send ISO-8859-15, it works correctly: IRC:
Matrix:
|
matrix-org/matrix-appservice-irc#580 (comment) made me realise that the notices that get mangled have text formatting codes in them: {
"content": {
"body": "[1] Dagfinn Ilmari Mannsåker",
"format": "org.matrix.custom.html",
"formatted_body": "<b>[1]</b> Dagfinn Ilmari Mannsåker",
"msgtype": "m.notice"
},
"event_id": "$1591183492575491NNmPv:irc.snt.utwente.nl",
"origin_server_ts": 1591183492274,
"sender": "@_ircnet_lorelai:irc.snt.utwente.nl",
"type": "m.room.message",
"unsigned": {
"age": 3376
},
"room_id": "!WNiVmWxmsBkMsusLnT:irc.snt.utwente.nl"
} |
Explanation by @leonerd:
|
Not only used in CTCP's but the payload itself may contain codes as well in case of colours and text formatting. So, proper implementation should parse the message codes first and then recode. |
Yep, this becomes a bit complex as the CTCP messages need quite lot more tuning. Perhaps we could just disable fallback encoding on any CTCP strings - this way at least UTF-8 actions would work as expected. |
That still doesn't solve it for messages with colour/formatting codes. |
Discussed this today and got a new possible solution idea:
Implementing this should be relatively simple. Although a bit hacky, it should do the trick. |
This turns out to be because An alternative mentioned in the above ticket is |
Good find, thanks. |
PR #49 made. |
Believe this is fixed now. |
On the IRCnet bridge, actions appear to always be interpreted as the fallback encoding (ISO-8859-15)
On the IRC side (client configured to send UTF-8)
On the Matrix side:
While my notice was correctly decoded as UTF-8 here, on another channel a bot's UTF-8 notices are decoded as ISO-8859-15:
IRC:
Matrix:
The text was updated successfully, but these errors were encountered: