-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ReadableBuffer and WriteableBuffer Union aliases #4232
Conversation
Since typing doesn't yet have a way to express buffer protocol objects (python/typing#593), various interfaces have ended up with a mish-mash of options: some list just bytes (or just bytearray, when writable), some include mmap, some include memoryview, I think none of them include array.array even though it's explicitly mentioned as bytes-like, etc. I ran into problems because RawIOBase.readinto didn't allow for memoryview. To allow for some uniformity until the fundamental issue is resolved, I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer, and applied them in stdlib/3/io.pyi as an example. If these get rolled out in more places, it will mean that we have only one place where they have to get tweaked in future, or swapped out for a public protocol. This unfortunately does have the potential to break code that inherits from RawIOBase/BufferedIOBase and overrides these methods, because the base method is now more general and so the override now needs to accept these types as well (which is why I've also updated gzip and lzma). However, it should be a reasonably easy fix, and will make the downstream annotations more correct. I'm not 100% happy with the names: bytes-like is slightly stricter than just buffer protocol (it must be able to export a C-contiguous buffer), but in practice I'd be surprised if there are types for which there is a difference at static analysis time (e.g. not every memoryview instance is bytes-like, but that's a property of instances, not types).
Some other places where these types could (I think) be used:
|
Good idea! I haven't verified that the specific changes are correct but this is a good way to make progress on the buffer interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks reasonable. If it turns out, it breaks too much, we might need to temporarily change the types, but collecting the types accepting buffers in _typeshed.pyi
is reasonable in any case.
Based on the Python stdlib documentation: - Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like. I've used the _typeshed.ReadableBuffer alias defined in python#4232. - Since Python 3.6, mmap.write returns the number of bytes written. - Since Python 3.3, mmap.read allows None as the parameter; while in Python 2 the argument cannot be omitted.
Based on the Python stdlib documentation: - Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like. I've used the _typeshed.ReadableBuffer alias defined in python#4232. - Since Python 3.6, mmap.write returns the number of bytes written. - Since Python 3.3, mmap.read allows None as the parameter; while in Python 2 the argument cannot be omitted.
This is a follow-up on python#4232. memoryview, hashlib, and hmac are updated to use ReadableBuffer type instead of their own home-spun unions of bytes, bytearray and whatever else each use case used. mmap is being handled in python#4244, and I'll leave BinaryIO for another day (or possibly another person) because it's going to require some messy code duplication because the relevant methods are defined in IO[AnyStr]. There's one corner case I'm not quite sure how best to handle: the documentation for hmac.digest claim that the parmaeters have the same meanings as in hmac.new, but in CPython the latter has an explicit check that `key` is bytes or bytearray while the former works with a memory-view. For now I've matched the documentation. Also, the documentation for HMAC.update says that `msg` can be any type supported by hashlib from Python 3.4; but I can't see anything in the Python 2.7 implementation that would prevent it also taking bytes-like objects, so I've not tried to treat Python 2 any different to Python 3.
* Update mmap stubs for newer Python versions Based on the Python stdlib documentation: - Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like. I've used the _typeshed.ReadableBuffer alias defined in #4232. - Since Python 3.6, mmap.write returns the number of bytes written. - Since Python 3.3, mmap.read allows None as the parameter; while in Python 2 the argument cannot be omitted. * Further clean up mmap.pyi Use the fact that Python 3.0-3.4 are no longer supported to clean up the version-dependent logic. Functions that always have different signatures in Python 2/3 are moved from the base _mmap[bytes] to the mmap subclass.
This is a follow-up on #4232. memoryview, hashlib, and hmac are updated to use ReadableBuffer type instead of their own home-spun unions of bytes, bytearray and whatever else each use case used. mmap is being handled in #4244, and I'll leave BinaryIO for another day (or possibly another person) because it's going to require some messy code duplication because the relevant methods are defined in IO[AnyStr]. There's one corner case I'm not quite sure how best to handle: the documentation for hmac.digest claim that the parmaeters have the same meanings as in hmac.new, but in CPython the latter has an explicit check that `key` is bytes or bytearray while the former works with a memory-view. For now I've matched the documentation. Also, the documentation for HMAC.update says that `msg` can be any type supported by hashlib from Python 3.4; but I can't see anything in the Python 2.7 implementation that would prevent it also taking bytes-like objects, so I've not tried to treat Python 2 any different to Python 3.
Since typing doesn't yet have a way to express buffer protocol objects (python/typing#593), various interfaces have ended up with a mish-mash of options: some list just bytes (or just bytearray, when writable), some include mmap, some include memoryview, I think none of them include array.array even though it's explicitly mentioned as bytes-like, etc. I ran into problems because RawIOBase.readinto didn't allow for memoryview. To allow for some uniformity until the fundamental issue is resolved, I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer, and applied them in stdlib/3/io.pyi as an example. If these get rolled out in more places, it will mean that we have only one place where they have to get tweaked in future, or swapped out for a public protocol. This unfortunately does have the potential to break code that inherits from RawIOBase/BufferedIOBase and overrides these methods, because the base method is now more general and so the override now needs to accept these types as well (which is why I've also updated gzip and lzma). However, it should be a reasonably easy fix, and will make the downstream annotations more correct.
* Update mmap stubs for newer Python versions Based on the Python stdlib documentation: - Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like. I've used the _typeshed.ReadableBuffer alias defined in python#4232. - Since Python 3.6, mmap.write returns the number of bytes written. - Since Python 3.3, mmap.read allows None as the parameter; while in Python 2 the argument cannot be omitted. * Further clean up mmap.pyi Use the fact that Python 3.0-3.4 are no longer supported to clean up the version-dependent logic. Functions that always have different signatures in Python 2/3 are moved from the base _mmap[bytes] to the mmap subclass.
This is a follow-up on python#4232. memoryview, hashlib, and hmac are updated to use ReadableBuffer type instead of their own home-spun unions of bytes, bytearray and whatever else each use case used. mmap is being handled in python#4244, and I'll leave BinaryIO for another day (or possibly another person) because it's going to require some messy code duplication because the relevant methods are defined in IO[AnyStr]. There's one corner case I'm not quite sure how best to handle: the documentation for hmac.digest claim that the parmaeters have the same meanings as in hmac.new, but in CPython the latter has an explicit check that `key` is bytes or bytearray while the former works with a memory-view. For now I've matched the documentation. Also, the documentation for HMAC.update says that `msg` can be any type supported by hashlib from Python 3.4; but I can't see anything in the Python 2.7 implementation that would prevent it also taking bytes-like objects, so I've not tried to treat Python 2 any different to Python 3.
Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.
To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.
This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.
I'm not 100% happy with the names: bytes-like is slightly stricter than
just buffer protocol (it must be able to export a C-contiguous buffer),
but in practice I'd be surprised if there are types for which there is a
difference at static analysis time (e.g. not every memoryview instance
is bytes-like, but that's a property of instances, not types).