Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent SopelMemory behavior with Identifier keys & lookups by string #1341

Closed
dgw opened this issue Jun 5, 2018 · 8 comments · Fixed by #1938
Closed

Inconsistent SopelMemory behavior with Identifier keys & lookups by string #1341

dgw opened this issue Jun 5, 2018 · 8 comments · Fixed by #1938

Comments

@dgw
Copy link
Member

dgw commented Jun 5, 2018

Looking up a unicode/str key that is not all-lowercase in a SopelMemory (dict) that uses Identifiers as keys works inconsistently. For example, 'Sopel' == Identifier('Sopel'), but as far as SopelMemory.__contains__() is concerned it's not equal.

The cause is, unicode/str objects are case sensitive when hashed, whereas Identifier always returns self._lowered.__hash__(). Among other things, this affects the ip module looking up hostnames by nick (as string) from bot.users without casting to Identifier first (functionality introduced in #1033).

I can think of two solutions. One is just always casting to Identifier when indexing dictionaries by nickname (or channel name), but that's not as batteries-included as I'd like. The other solution, which I'm toying with, is creating a subclass of SopelMemory that's designed for using Identifiers as keys, and always normalizes everything. Call it an IdentifierMemory or Idict (Sopel already has a Ddict), for now. (Other ideas are welcome, as always.)

I'd like to get this resolved for 6.6.0, ideally. It's already been "broken" for some time, just patched around in my (and presumably others') modules by always manually casting string inputs to Identifier when needed. I don't think those manual casts should be necessary.


Thoughts on what direction to take

SopelMemory could be re-implemented as an extension of collections.MutableMapping, since extending dict might not be the best idea. As Jochen Ritzel says there, what we really need is the interface of a dict. What's under the hood doesn't matter that much. Abstract base classes are available from Python 2.6 and up, so this option is safe on the compatibility front. The main concern with it is performance—the dict built-in type is generally implemented in lower-level code (for example, C in CPython implementations), and defining a custom class in Python would mean slower operations on the object.

I'm going to also investigate overriding more of dict to get the desired behavior, of course. It would be ideal to preserve as much performance as possible. Just haven't looked deep enough into that path yet; focused on too many other things.

@dgw dgw added this to the 6.6.0 milestone Jun 5, 2018
@dgw dgw self-assigned this Jun 5, 2018
@kwaaak
Copy link
Contributor

kwaaak commented Sep 28, 2018

Looks like a "|" in the nickname also trips it up. Running bot.users.get("nick|name") doesn't find the user, it works with nicks that do not contain the pipe character.

@dgw
Copy link
Member Author

dgw commented Sep 29, 2018

I wonder if that's the same issue… | is considered to be "lowercase" \ per RFC 2812, so in theory that shouldn't be a problem.

Now that I take a close look at what Identifier._lower() actually does, I think it's going the wrong way, converting | to \, etc.—using the "uppercase" symbols as defined in the RFC with lowercase letters.

Fixing it wouldn't help with this, though, even if it really is doing the wrong thing. It'd just make e.g. bot.users.get("nick\name") fail instead of bot.users.get("nick|name").

@kwaaak
Copy link
Contributor

kwaaak commented Sep 29, 2018

Good shout! Using str.replace("|","\") on the nickname before getting it works!

@dgw
Copy link
Member Author

dgw commented Sep 29, 2018

@kwaaak You should instead from sopel.tools import Identifier and do value = Identifier(value) rather than str.replace. That will 1) handle all of the characters, not just |, and 2) be future-proof against fixing the underlying bug (see #1389).

@dgw dgw modified the milestones: 6.6.0, 7.0.0 Jan 3, 2019
@dgw
Copy link
Member Author

dgw commented Jan 3, 2019

In the interest of getting 6.6.0 out sooner—it's already taken a month longer than I hoped—I'm punting this fix. Jumped the gun on changing the milestone, though, since the planned change should also be mentioned in the changelog along with #1389.

@dgw dgw modified the milestones: 6.6.0, 7.0.0 Jan 5, 2019
@dgw
Copy link
Member Author

dgw commented Jan 5, 2019

Changelog drafted; milestone reminder no longer needed.

@dgw dgw modified the milestones: 7.0.0, 7.1.0 Nov 13, 2019
@dgw
Copy link
Member Author

dgw commented Nov 13, 2019

Re-prioritizing stuff for 7.0

@Exirel Exirel self-assigned this Sep 12, 2020
@dgw dgw closed this as completed in #1938 Sep 26, 2020
@Exirel
Copy link
Contributor

Exirel commented Sep 26, 2020

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants