Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sopel's casefolding is hard-coded #2161

Closed
dgw opened this issue Jul 15, 2021 · 1 comment · Fixed by #2231
Closed

Sopel's casefolding is hard-coded #2161

dgw opened this issue Jul 15, 2021 · 1 comment · Fixed by #2231
Labels
Bug Things to squish; generally used for issues Core/IRC Protocol Handling
Milestone

Comments

@dgw
Copy link
Member

dgw commented Jul 15, 2021

We've had some shenanigans with casemapping before, e.g. #1744. In fact, I mentioned there that ISUPPORT provides a CASEMAPPING token… but I/we forgot about this as 7.x development went on. We should think about it again. Details about allowed values for the CASEMAPPING token are, as usual, at Horse Docs.

While this is in the 8.0 milestone at present, I doubt that it's going to be a one-and-done fix. It will require some careful planning to avoid breaking existing databases in weird ways. The last thing we want is for someone to boot up Sopel after an update and suddenly have some channels' or users' DB values "lost" (still in the DB, but not found any more) because the casemapping that obeys ISUPPORT doesn't match what Sopel has been assuming (mostly rfc1459).

One big step along the way is probably, finally, supporting one-off DB migrations (#1784). Even with that, though, we'll have to account for plugins with custom tables (e.g. github), plugins that store data in their own files (e.g. remind and tell), etc.—which means we'll want hooks (#1908) that plugins can subscribe to for triggering their own migrations based on what core does.

And since I'm already thinking about how to handle even more of what we already did before, plus the need to handle servers changing their existing CASEMAPPING value just in case (no pun intended), it almost seems like we could use an Identifier.casefolds property that lists all of the possibilities. The database and other uses of "normalized" nicks could check against all of them in some priority order, starting with the value of CASEMAPPING from the server's ISUPPORT if present…

You know, when I started drafting this yesterday, the problem didn't seem nearly as messy.

@dgw dgw added the Bug Things to squish; generally used for issues label Jul 15, 2021
@dgw dgw added this to the 8.0.0 milestone Jul 15, 2021
@Exirel
Copy link
Contributor

Exirel commented Dec 29, 2021

Alright, I've been doing some homework on that. First, I want to talk about CASEMAPPING:

  • ascii, rfc1459, and rfc1459-strict are the well supported options
  • rfc7613 doesn't look ready yet and involve using UTF8MAPPING as well (I followed the link given in the doc to this PR that was closed without decision taken yet)

So my current target is to implement a way to use the CASEMAPPING when possible:

  • if not given: use our current default (I think it's rfc1459)
  • if given, but not from the list of supported value: use our current default (or use ascii?)
  • if given and supported: use it!
  • ignore UTF8MAPPING for now, for simplicity. It'll be always possible to implement that part later if it becomes relevant

Second, I want to talk about what need to be done: since the Server is the source of truth, we need to listen to what it says. Therefore, we need the connection, listen to isupport, etc. And then, we need to pass that information to every single Identifier created here and there.

As a result, there are 3 things (at least) that I want to do:

  • control where the Identifier are created so no one is responsible but the bot to create them with the right case-mapping
  • add some fancy shortcuts on sopel.irc.AbstractBot, like AbstractBot.lower or AbstractBot.make_identifier
  • make sure tools don't create Identifier themselves, or at least not without the bot around (I'm thinking about the DB in particular); this might generate some backward incompatibility or create some friction with existing code, but that's why we are having a 8.0, right?

As for the one-off migration... that'll be for another time. I'm still thinking about it tho!

@Exirel Exirel linked a pull request Dec 30, 2021 that will close this issue
4 tasks
@dgw dgw closed this as completed in #2231 Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Things to squish; generally used for issues Core/IRC Protocol Handling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants