Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mypy to flag str matching against Sequence[str] as an error #11001

Open
sirosen opened this issue Aug 20, 2021 · 6 comments
Open

Allow mypy to flag str matching against Sequence[str] as an error #11001

sirosen opened this issue Aug 20, 2021 · 6 comments
Labels

Comments

@sirosen
Copy link

sirosen commented Aug 20, 2021

Feature

A new optional setting for mypy which forbids matching str against Sequence[str] or Iterable[str]. This will catch a wide variety of bugs which users encounter when defining APIs which consume these types and are unintentionally passed strings. It also allows correct overloading of str vs Sequence[str] to describe behavior.

The option would require the explicit (but "redundant") Union[str, Sequence[str]] or Union[str, Iterable[str]] for values which can be a string or non-string iterable of strings.

Pitch

python/typing#256 has been an open issue for a while, with no clear resolution within the type system itself.

However the pytype maintainers recently weighed in to mention that they took this approach, forbidding str from matching Sequence[str], and their users have been happy with the results. I think mypy should match this behavior if possible/feasible.

In answer to Guido's question about whether or not this should apply to AnyStr, the answer is "ideally yes". However, if that's significantly harder, only applying to str is probably fine: str is likely the most common case for python developers.

Use Cases

# simple function which the author doesn't realize can consume a single string
# count_starts_lowercase("some string") should fail
def count_starts_lowercase(x: Sequence[str]) -> int:
    return len([c for c in x if c[0] in string.ascii_lowercase])

# function with overloaded definitions based on str/Sequence[str]
# the following definition should work
@typing.overload
def poly_getlen(value: str) -> int: ...
@typing.overload
def poly_getlen(value: Sequence[str]) -> List[int]: ...
def poly_getlen(value: Union[str, Sequence[str]]) -> Union[int, List[int]]:
    if isinstance(value, str):
        return len(value)
    return [len(x) for x in value]

# a sequence of unions containing str should be carefully considered
# ideally `count_terminal_digit_is_0("00011101")` would fail
# if this adds too much complexity, it can be allowed
def count_terminal_digit_is_0(x: Sequence[Union[str, int]]) -> int:
    terminal_digits = [(n % 10 if isinstance(n, int) else n[-1]) for n in x]
    return len([d for d in terminal_digits if d in (0, "0")])
@hauntsaninja
Copy link
Collaborator

Linking #5090

@jakkdl
Copy link

jakkdl commented Feb 19, 2023

Any plans on implementing this? It bites me semi-reliably and the pytype approach seems to be working out.

@sirosen
Copy link
Author

sirosen commented Feb 20, 2023

As the OP on this, I'll note that I've changed my perspective a bit in the time since opening this.

Without this being a part of the type system with some support, library maintainers run into trouble. It becomes unclear what types should be published if, for example, an API is meant to take any Sequence[str] except for str. That problem is exacerbated in the stdlib. Is passing a str to random.choice forbidden?

There's currently an ongoing thread on discuss.python.org which explores this a bit.

Some folks in that thread want to put this idea into typeshed, which seems problematic. That would mean the types are actively lying about the interfaces of str.

I think this is a valuable linting check. But if it's not in the type system's strict rules, it's not really a type-check anymore. It's hard to say that code which passes a str to random.shuffle is "wrong" -- it just might surprise you. Does that mean mypy can't or shouldn't do it? Unclear. No tool other than a type checker is well positioned to implement this lint.

This whole situation reminds me of some of flake8-bugbear's "opinionated rules", which are explicitly defined, but only opt-in. Maybe mypy is the right place, but only ever opt-in?

@bluenote10
Copy link
Contributor

bluenote10 commented May 31, 2023

That would mean the types are actively lying about the interfaces of str.

In my opinion the problem is that str is a Sequence[str] (or str is an Iterable[str]) is also partially a lie.

Technically these statements are true, but semantically most often not.

The perhaps ~99% use case of a str is to store multiple characters. In practice, everything involving all kinds of names, identifiers, texts, etc. involves arbitrary length strings. Python unfortunately cannot distinguish between characters and arbitrary length strings, and the issue we are facing here is basically a result of that: str satisfies Sequence[str] in the sense that it can provide a sequence of characters but not a sequence of arbitrary length strings. In that sense the str lies that it can serve as sequence of str, because it will never be able to provide a sequence of arbitrary length strings. So from a practical perspective, it is pretty obvious that passing a str into e.g. a variable names: Sequence[str] is a bug, because names semantically wants "multiple arbitrary length strings" and not "multiple characters".

Regarding the rare use cases that actually want a sequence of characters: Since this can be easily solved explicitly via characters: Union[str, Sequence[str]], I'd very much favor the pytype behavior.

JonathonReinhart added a commit to JonathonReinhart/scuba that referenced this issue Sep 9, 2023
When make_vol_opt() was called with options=[], it would return a volume
spec which erroneously contained a trailing colon.

Also, this removes the unnecessary flexibility of passing a single
option string (as opposed to a sequence of option strings). This wasn't
intended by the type signature for options, but unforunately a 'str'
is a 'Sequence[str]'. See python/mypy#11001.

This fixes #215.
@hauntsaninja
Copy link
Collaborator

I've added a SequenceNotStr type to https://github.com/hauntsaninja/useful_types

@jakkdl
Copy link

jakkdl commented Jun 26, 2024

What would it take to implement this? I have yet to see any strong arguments against adding an optional flag, and pytype continues to be happy, so I feel like somebody could "just" sit down and try to code this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants