Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irc: implement CASEMAPPING parameter for ISUPPORT #2231

Merged
merged 17 commits into from
Jan 21, 2022

Conversation

Exirel
Copy link
Contributor

@Exirel Exirel commented Dec 30, 2021

Description

This PR contains various changes with the end goal to use the server's casemapping instead of the hardcoded one. Sadly, this includes breaking changes.

Non-breaking changes

  • The bot has a new method make_identifier: it uses the CASEMAPPING parameter from isupport to select the appropriate casemapping function (from sopel.tools.identifiers)
  • The Identifier class has several changes:
    • Identifier now lives its best life in sopel.tools.identifiers, mostly to prevent cyclic import errors. This comes with some more documentation.
    • Identifier doesn't override str.__new__ anymore, I figured out a way to use __init__ instead, and it works fine.
    • Identifier has a new argument: casemapping. It's a callable that must take one parameter (a string) and return a string.
    • The default value of casemapping is rfc1459_lower, making this argument optional (at the moment).
    • self.casemapping is now used in place of Identifier._lower to get a lowercase version of the identifier string
  • Classes that create Identifier have a new argument: It's a callable that must take one parameter (a string) and return an instance of Identifier with properly configured casemapping.
    • User and Channel (both from sopel.tools.target)
    • PreTrigger (from sopel.trigger)
    • SopelDB (from sopel.db)
    • SopelIdentifierMemory (from sopel.tools)
  • The new argument is used to instantiate Identifier when necessary
  • coretasks and other modules are now using bot.make_identifier when they need an Identifier
  • All method from SopelDB now accept a str (some used to accept Identifier only) since it can always convert them into an Identifier when necessary
  • Various type hint here and there because why not

Breaking changes

  • Identifier._lower doesn't lower all character as it used to do, now it uses the rfc1459_lower function which lowercase only [A-Z] characters. In theory, this is not a breaking change because IRC identifiers shouldn't accept non-ASCII characters anyway. However, in the case of IRC server supporting Unicode identifiers (for nicknames and/or channels), this is a breaking change. A solution to that would be to implement rfc7613 (PRECIS) as suggested by https://modern.ircdocs.horse/index.html#casemapping-parameter
  • core.nick is now a str and not an Identifier anymore, meaning that any plugin using that config option will need to transform it into an Identifier; this is considered a breaking change, even tho it shouldn't be a problem considering that most Identifier-related operations are through objects that already convert str into Identifier when necessary

Going further

The next steps would be:

  • to merge sopel: move deprecated() to sopel.lifecycle #2232
  • to implement a PRECIS profile for casemapping—I think it would even work on current ascii-only identifier, making Sopel UTF-8 ready before servers can be
  • to move memory classes into their own submodule
  • and probably to move other things outside of sopel/tools/__init__.py
  • to implement dynamic chantype for identifiers

Checklist

  • I have read CONTRIBUTING.md
  • I can and do license this contribution under the EFLv2
  • No issues are reported by make qa (runs make quality and make test)
  • I have tested the functionality of the things this change touches

@Exirel Exirel linked an issue Dec 30, 2021 that may be closed by this pull request
Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it through. 😩 Expected to break and finish tomorrow, but nope.

While I'm (understandably?) annoyed by all the extra constructor parameters that had to happen to make this work, I get that it's impossible—or at least, very code-smell-y—to do it another way.

sopel/tools/identifiers.py Outdated Show resolved Hide resolved
docs/source/api.rst Outdated Show resolved Hide resolved
docs/source/plugin/bot.rst Show resolved Hide resolved
sopel/tools/__init__.py Show resolved Hide resolved
sopel/tools/time.py Outdated Show resolved Hide resolved
sopel/db.py Outdated Show resolved Hide resolved
sopel/tools/target.py Outdated Show resolved Hide resolved
sopel/tools/target.py Outdated Show resolved Hide resolved
sopel/trigger.py Outdated Show resolved Hide resolved
sopel/trigger.py Show resolved Hide resolved
@Exirel Exirel force-pushed the identifier-casemapping branch from 5705f24 to 02d7044 Compare January 9, 2022 22:46
@Exirel
Copy link
Contributor Author

Exirel commented Jan 9, 2022

I did a rebase and squashed fixup commits. There was some conflicts a bit complicated to sort out properly, I'm glad I made atomic commit so it was easier to test every single one with py.test and flake8 (except for one commit but heh it's squashed now).

I added one commit to update SopelIdentifierMemory's docstring, as requested (I almost forgot about it, so nice catch!).

What's left is:

  • to decide if you want me to move memory classes into their own submodule of sopel.tools in that PR or not
  • to write the PR's description, that must contain a list of all the breaking changes and things to consider for the upgrade

I won't write a migration script in that PR, that's too big of a job, and I'm not even sure how to properly do it at the moment. I know it's doable, because I had to write a database migration system into one of my application a few years ago that was using SQLAlchemy and Alembic, so it's clearly doable.

@Exirel Exirel added this to the 8.0.0 milestone Jan 10, 2022
@Exirel Exirel marked this pull request as ready for review January 10, 2022 11:54
@Exirel
Copy link
Contributor Author

Exirel commented Jan 15, 2022

What's left is:

  • to decide if you want me to move memory classes into their own submodule of sopel.tools in that PR or not
  • to write the PR's description, that must contain a list of all the breaking changes and things to consider for the upgrade

This is ready for a last round of review.

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I found a surprising number of things left to say, despite the many previous comments and despite only re-reviewing about half the files (mostly the ones that "Changed since your last review"). Nobody would ever accuse me of being a code-review Yes Man, huh?

sopel/db.py Outdated Show resolved Hide resolved
sopel/irc/__init__.py Show resolved Hide resolved
sopel/irc/__init__.py Outdated Show resolved Hide resolved
sopel/modules/adminchannel.py Show resolved Hide resolved
test/test_db.py Show resolved Hide resolved
sopel/tools/__init__.py Outdated Show resolved Hide resolved
sopel/tools/identifiers.py Outdated Show resolved Hide resolved
sopel/tools/identifiers.py Show resolved Hide resolved
sopel/tools/identifiers.py Show resolved Hide resolved
sopel/tools/identifiers.py Outdated Show resolved Hide resolved
@Exirel
Copy link
Contributor Author

Exirel commented Jan 16, 2022

Alright this is good, the last review is all about a better presentation and attention to details, I really like it. 👍

@dgw dgw added the Breaking Change Stuff that probably should be mentioned in a migration guide label Jan 17, 2022
@Exirel
Copy link
Contributor Author

Exirel commented Jan 17, 2022

That should be good now, @dgw you can read my replies and see if I need to take action or if I can rebase & squash away.

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one conversation from before left unresolved—but I started some new ones. Hopefully they won't make you 🙁 too much, since it's about fixing details & consistency in the new changes.

sopel/irc/__init__.py Show resolved Hide resolved
sopel/modules/adminchannel.py Show resolved Hide resolved
test/test_db.py Outdated Show resolved Hide resolved
test/test_db.py Outdated Show resolved Hide resolved
test/test_db.py Outdated Show resolved Hide resolved
test/test_db.py Outdated Show resolved Hide resolved
sopel/tools/identifiers.py Outdated Show resolved Hide resolved
sopel/tools/identifiers.py Outdated Show resolved Hide resolved
sopel/tools/identifiers.py Outdated Show resolved Hide resolved
@Exirel
Copy link
Contributor Author

Exirel commented Jan 20, 2022

And once again! But this time, no new docstrings. 😁

Exirel added a commit to Exirel/sopel that referenced this pull request Jan 20, 2022
@dgw
Copy link
Member

dgw commented Jan 20, 2022

I looked, all good. Squash what you're going to squash when ready.

@Exirel Exirel force-pushed the identifier-casemapping branch from 39541fb to 151a9bf Compare January 20, 2022 17:56
Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do what I should have done earlier and actually approve this.

You can squash the fixup into the commit SHA I gave and trust me (and CI) that everything will work out post-merge, or rebase on master if you like.

Exirel and others added 9 commits January 20, 2022 22:13
And fix the default casemapping to follow the RFC it is supposed to
implement (RFC 1459): only A-Z letters should be lowercased.

The lowercase function is now part of a set of public functions, each
implement a different "lower" rule:

* ascii: implement the "ascii" CASEMAPPING, i.e. A-Z only
* rfc1459: implement the "rfc1459" CASEMAPPING, i.e. A-Z + []\~
* rfc1459-strict: implement the same RFC but correctly, i.e. no
  replacement for ~ (it shouldn't be)

The `Identifier` class now use the rfc1459 function, instead of using
str.lower.

In order to propagate this change correctly, I updated `sopel.db` as
well, because str.lower() is not equivalent to Identifier.lower()
anymore.

Note that this doesn't support unicode Identifier, which would use
str.lower directly; the "PRECIS" version of the IRC v3 spec is not ready
yet, so it is not implemented for the moment. Also, it would requires to
manage not only CASEMAPPING, but also UTF8MAPPING, and that's for
another day/feature!

See also https://modern.ircdocs.horse/index.html#casemapping-parameter
and any links from that document.

Co-authored-by: dgw <[email protected]>
Instead of instantiating `tools.Identifier` directly, built-in plugins
are now using `bot.make_identifier(nick)` to get an Identifier for
nicks and channel. This method comes from `sopel.irc.AbstractBot`, and
can be used as an Identifier factory: take a string as input, return an
Identifier.

In the future, this factory will be able to take advantage of the
configuration and the ISUPPORT to select the appropriate CASEMAPPING
function.

The next steps are:

* to use that factory outside of plugins (i.e. triggers and targets)
* to implement a different CASEMAPPING depending on context
* add documentation about "how to get an Identifier"
Instead of using `tools.Identifier(nick)`, the PreTrigger class now uses
a factory to instantiate an Identifier. As a result, the bot
(irc.AbstractBot) provides such factory.

The Test Factories have been updated to also use the bot's factory when
creating test Trigger & PreTrigger. While doing so, type-hint has been
added to said factories.
As per PreTrigger, Channel no longuer instantiate Identifier itself: it
relies on an Identifier factory to do so.

When creating a Channel in coretasks, it provides bot.make_identifier as
a factory.

Docstrings have been updated, and type-hint have been added for the
occasion.

Co-authored-by: dgw <[email protected]>
As per PreTrigger and Channel, SopelIdentifierMemory also uses an
identifier factory.

Sopel and the built-in plugins have been updated accordingly.
The core configuration "nick" option was an Identifier: it was
convenient to always have an Identifier for the configured nick.
However, it doesn't make a lot of sense: the nick is used in various
place as it is (no lowercase), and where it needs to be an Identifier it
is always to compare to to another one.

As a result:

* type(core.nick) is now str
* when comparing the config value to an identifier, use
  bot.make_identifier
And Sopel provides its `make_identifier` method as a factory.
Exirel and others added 8 commits January 20, 2022 22:13
ISUPPORT provide a "CASEMAPPING" parameter, that can tell which
casemapping function to use:

* ascii: lower A-Z only
* rfc1459: ascii + map some special char
* rfc1459-strict: ascii + map some special char strictly (no mapping
  for ~)

Whenever the bot received the ISUPPORT parameter, it automatically
rebuild its nick according to that parameter.

Note: this doesn't implement the UTF8MAPPING parameter; future PR

Co-authored-by: dgw <[email protected]>
For type hint, I followed the same type hint provided by SQLAlchemy for
connection and session (i.e. return type is an implicit "Any").

Then I switched back the get nick/channel value to accepting an str
instead of only an Identifier, since we can always use the Identifier
factory now.

Co-authored-by: dgw <[email protected]>
@Exirel Exirel force-pushed the identifier-casemapping branch from 4ce843d to 5f605ea Compare January 20, 2022 21:15
@dgw dgw merged commit aecc6db into sopel-irc:master Jan 21, 2022
@Exirel Exirel deleted the identifier-casemapping branch February 23, 2022 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change Stuff that probably should be mentioned in a migration guide Core/IRC Protocol Handling Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sopel's casefolding is hard-coded
2 participants