Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with str being treated as Sequence[str] #5090

Closed
pawelswiecki opened this issue May 21, 2018 · 16 comments
Closed

Problem with str being treated as Sequence[str] #5090

pawelswiecki opened this issue May 21, 2018 · 16 comments

Comments

@pawelswiecki
Copy link

pawelswiecki commented May 21, 2018

Are you reporting a bug, or opening a feature request?

I think it's a bug or very unexpected behavior at least. Related issues I found: #1965, #4334.

Code

d = {'k1': 'qwe', 'k2': []}
reveal_type(d)
value: str = d['k1']

Actual Output

$ mypy file.py
file.py:2: error: Revealed type is 'builtins.dict[builtins.str*, typing.Sequence*[builtins.str]]'
file.py:3: error: Incompatible types in assignment (expression has type "Sequence[str]", variable has type "str")

So, as far as I understand, mypy tried to find the strictest type for values of d, which is Sequence[str] in this case. In Python, string is a Sequence of (sub)strings (s[0] == s[0][0] == s[0][0][0]), also list is a Sequence, so this is understandable. Unfortunately, the rule that string is a sequence of (sub)strings does not work in case of value: str = d['k1'].

Expected output
To get rid of the error I need to manually give more general type: d: Dict[str, Any], so this is not such a big problem. Having said that, expected behavior would be: No "Incompatible types in assignment" error for the given code.

Versions
Python 3.6.5
mypy==0.600

@pawelswiecki pawelswiecki changed the title Problem with treating str as Sequence[str] Problem with str being treated as Sequence[str] May 21, 2018
@gvanrossum
Copy link
Member

gvanrossum commented May 21, 2018 via email

@pawelswiecki
Copy link
Author

pawelswiecki commented May 21, 2018

@gvanrossum Thank you for the reply.

One more thing (which is probably obvious for someone more knowlegable). Why mypy treats those two cases differently:

d1 = {'k1': 'qwe', 'k2': []}
reveal_type(d1)

d2 = {'k1': [1], 'k2': []}                                                     
reveal_type(d2) 

mypy:

$ mypy file2.py 
file2.py:2: error: Revealed type is 'builtins.dict[builtins.str*, typing.Sequence*[builtins.str]]'
file2.py:5: error: Revealed type is 'builtins.dict[builtins.str*, builtins.object*]'

I'd say type of dict's value in the second shoud be, analogically, Sequence[int], no?

EDIT: I think I can guess: in the first case empty list is treated kinda as an empty string, which is quite wrong.

@ilevkivskyi
Copy link
Member

The second example is a known issue #5045

@Zac-HD
Copy link
Contributor

Zac-HD commented May 22, 2018

We can suggest workarounds etc. for the example code, but I wonder -- does any code that declares it wants a Sequence[str] really expect a single str object?

Yep!

For example, in Hypothesis we have sampled_from, which supports ordered collections including Sequence, OrderedDict, and Enum. Excluding str from that would just force a pointless conversion from sampled_from(astring) to sampled_from(list(astring)) on all our users, or we would have to add string types to the type hint.

So at the very least, I have a strong concrete (as well as aesthetic) preference to keep treating str as Sequence[str] whenever needed.

@gvanrossum
Copy link
Member

gvanrossum commented May 22, 2018 via email

@bwo
Copy link
Contributor

bwo commented May 24, 2019

We can suggest workarounds etc. for the example code, but I wonder -- does any code that declares it wants a Sequence[str] really expect a single str object?

I also have a use case for this, where I have parsers some of which expect Sequence[T] (or Iterable[T]) and others of which expect T, and I'd like to be able to use them together when T == str in combinators that (e.g.) sequence one parser after another.

@stereobutter
Copy link

I think both the runtime behavior of treating a string as instance of collections.abc.Iterable and static checks like hello world matching Iterable[str] are design defects introducing weird corner cases in the code of most users. I suspect the following is a canonical example of unexpected behavior:

def process_words(seq: Iterable[str]) -> Iterable[str]:
    return [s.upper() for s in seq]

where the intended use case is

process_words(['hello', 'world'])  # ['HELLO', 'WORLD']

and an unintended side effects of the implementation is

process_words('hello' world')  # ['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D']

Given that the runtime behavior of str is unlikely to change (I assume?), at least making mypy respect the difference between Iterable[str] and str will warn the users of this pitfall.

@bwo
Copy link
Contributor

bwo commented Sep 24, 2019

But str is an iterable of str. Making mypy lie about what is actually the case because it might be an error seems extremely backwards to me.

@emmatyping
Copy link
Collaborator

But str is an iterable of str

Yes, and this is actually really nice, having to deal with character types adds a lot of unneeded complexity. The correct way to deal with this is that whenever you want an Iterable[str], have an assert not isinstance(seq, str) or similar, to make sure it isn't passed a str. Sometimes handing a function a str is acceptable though...

@stereobutter
Copy link

stereobutter commented Sep 26, 2019

A clean but albeit somewhat ugly solution would be to introduce new types in typing (and hopefully also new abstract base classes in collections.abc) for iterables and sequences that treat str as atomic and not a member of said types.

Example:

x: typing.NewIterable[str] = ['hello', 'world'] # typechecks
y: typing.NewIterable[str] = 'hello world' # doesn't typecheck

isinstance(['hello', 'world'], collections.abc.NewIterable) # True
isinstance('hello world', collections.abc.NewIterable) # False

I think this is do-able but alas

There are only two hard things in Computer Science: cache invalidation and naming things.
-- Phil Karlton

What should be the names of the new Sequence and Iterable? My previous comment was made with the thought in mind that in properly typed code the use of NewSequence and NewIterable would probably be higher than of the regular Sequence and Iterable.

In the spirit of the python Zen (explicit is better than implicit, there should be one obvious way to do it, yada yada) annotations like Union[str, NewIterable[str]] for cases where both a single string and an iterable sequence of strings are valid, beat the current situation imho.

@wei-hai
Copy link

wei-hai commented Jul 27, 2020

experiencing the same issue, any update?

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 31, 2020

One way to deal with this would be to introduce a new optional error that is generated when str is used while Iterable[str] or Sequence[str] is expected. It might be hard to catch all cases of this, but at least the most common ones such as passing an str argument, assigning a str value and iterating over a str should not be difficult to handle.

@bwo
Copy link
Contributor

bwo commented Jul 31, 2020

Again, str is an Iterable[str].

My understanding of the goal of mypy as a project is that it's a type system for Python as it exists, not as it might be if it were slightly nicer.

@asford
Copy link

asford commented Aug 6, 2021

One way to deal with this would be to introduce a new optional error that is generated when str is used while Iterable[str] or Sequence[str] is expected. It might be hard to catch all cases of this, but at least the most common ones such as passing an str argument, assigning a str value and iterating over a str should not be difficult to handle.

@JukkaL how challenging do you think it would be implement a putative --warn-iterable-anystr or --no-iterable-anystr in mypy? I'd imagine that the semantics would be as you described...

  • Raise a warning or error when an AnyStr (str, bytes) is used where an abstract collection of the type (eg. Iterable[str], Iterable[AnyStr]) is expected.
  • Unless the AnyStr type is provided via a Union (eg. Union[str, Iterable[str]]).
  • This applies to the collections.abc abstract types (and matching typing aliases) : Iterable, Collection, Sequence.

For example:

def just_sequence(vals: Iterable[str]) -> Any:
    ...

def sequence_or_str(val_or_vals: Union[str, Iterable[str]]) -> Any:
   ...

# mypy error or warning if `--warn-iterable-anystr` or `--no-iterable-anystr` is provided
just_sequence("bad") 

just_sequence(["this", "is", "fine"])
sequence_or_str("this_is_fine")
sequence_or_str(["this", "is", "fine"])

This feels like a longstanding issue, eg python/typing#256, and there's strong evidence that this is both a common error and pain point in the language. So much so, that it's one of the very few deviations between python and starlark's semantics.

However it seems clear from the discussion here and elsewhere that a typing-level or language level change would be controversial. I think support for this error detection in mypy would be a valuable immediate addition and may provide a useful starting point to determine if this issue warrants enough effort to consider typing level enhancements to address it in the future.

I don't believe this can be implemented through mypy's existing plugin interface, but would be very happy to understand if you think the plugin interface can support this kind of feature.

@ghost
Copy link

ghost commented Jun 1, 2023

I for one would make heavy use of an ability to specify that I expect some strings to not be treated as Sequence[str]. Note that str and Sequence[Sequence[Sequence[...[str]]]] are synonymous, which results in this unreasonable behavior:

NestedIntList = Union[int, Sequence["NestedIntList"]]

def complex_interpreter(src: str) -> NestedIntList:
    nesting = src.count("[")
    result: NestedIntList = src.replace("[", "")
    for _ in range(nesting):
        result = [result]
    return result
$ pyright demo6.py
0 errors, 0 warnings, 0 informations

$ mypy demo6.py
Success: no issues found in 1 source file

$ pytype demo6.py
Success: no errors found

I'd be heavily in favor of a deprecation cycle that allows str to no longer typecheck as Sequence, except for an opt-in annotation of RecursiveStr but since that seems infeasible due to existing code's inertia, I'd settle for an opt-out type annotation of AtomicStr which satisfies a str requirement but not a Sequence[Sequence[str]]. (Naming could use work, obviously.)

@hauntsaninja
Copy link
Collaborator

Closing as a duplicate of the more popular #11001. I've also added a SequenceNotStr type to https://github.com/hauntsaninja/useful_types

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests