Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling ISO-8859-1 characters #157

Closed
ossiangrr opened this issue Mar 19, 2013 · 6 comments
Closed

Handling ISO-8859-1 characters #157

ossiangrr opened this issue Mar 19, 2013 · 6 comments

Comments

@ossiangrr
Copy link

I'm not sure if this is a problem with irc in general, or with Javascript, or node.

I have been writing a simple bot that works as a search engine for a card game (VTES). Some cards have names with foreign characters, and I'd like them to be searchable by literal character.
I am listening with addListener("message#",callback) and addListener("pm",callback)

If someone sends a UTF-8 character -- say, ö or ç -- it works great!

But if their encoding is ISO-8859-1, my bot sees all of the "special" characters as the same character sequence: �
Not even a different sequence of bytes that I could brute-force translate.

How can I get my bot to see these as different characters?
Or is this just a limitation of javascript/node that I'll have to suck up and deal?

(I do have an option for users to search by "ascii-ized" versions of the name, so there's a workaround, but it would be nice if I could handle more literally-typed or copy-pasted strings)

Here is a real-world excerpt.
In the first of each of these cases, the "foreign" character is UTF-8. In the second case, it is ISO-8859-1.

-> gramle whois Zöe
Gramle Zöe. Clan: Malkavian Group: 2 Capacity: 3 cel obf AUS
Gramle Camarilla: Zöe does not get the usual +1 stealth when hunting.

-> gramle whois Zöe
Gramle No results found for 'whois Z�e'.


-> gramle whois Monçada
Gramle Ambrosio Luis Monçada, Plenipotentiary. Clan: Lasombra Group: 2 Capacity: 10 aus for DOM OBT POT PRE
Gramle Sabbat cardinal: Monçada cannot block. Other Methuselahs' actions targeting Monçada cost an additional pool. If Monçada is ready during your discard phase, he can untap another ready Lasombra.

-> gramle whois Monçada
Gramle No results found for 'whois Mon�ada'.

@katanacrimson
Copy link
Contributor

You can do most of this using the buffer builtin. http://nodejs.org/api/buffer.html#buffer_new_buffer_str_encoding

You'll need to determine somehow if the character set isn't utf8 chars. That'll have to be up to you.

@ossiangrr
Copy link
Author

Well, the earliest moment that I have access to the string (inside an addListener callback), it's already in the "garbled" state.
So I guess what you're saying to me is that the changes would have to be made inside the node-irc library itself. I guess I could attempt to locally modify it and see what happens... I'm just a relative newcomer to node so I was hoping there was something within the irc library that I had just overlooked.

@katanacrimson
Copy link
Contributor

@ossiangrr is there an actual difference when looking at the buffer's state directly?

check this. use console.dir on the string provided there and look at the hex values, see if they do differ. that'll tell you how low you've gotta go.

@ossiangrr
Copy link
Author

Yeah, those still come out as the "same character" using console.dir.. so it would have to be something inside node-irc.

@ossiangrr
Copy link
Author

I've found references in node-irc's forums about "encoding" patches but I don't understand node and/or github enough to figure out if I can use this patch: #113

I have also found this: https://github.com/bnoordhuis/node-iconv
Which, again, I would use to modify node-irc itself if I was a little more well-versed in the code.

Maybe the core node-irc team could work with these links better than me?

@jacobrask
Copy link

Did anyone figure out a solution, in node-irc or outside? I have both ISO-8859-1 users and UTF-8 users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants