Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ReadableBuffer and WriteableBuffer Union aliases #4232

Merged
merged 1 commit into from
Jun 19, 2020

Conversation

bmerry
Copy link
Contributor

@bmerry bmerry commented Jun 15, 2020

Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.

I'm not 100% happy with the names: bytes-like is slightly stricter than
just buffer protocol (it must be able to export a C-contiguous buffer),
but in practice I'd be surprised if there are types for which there is a
difference at static analysis time (e.g. not every memoryview instance
is bytes-like, but that's a property of instances, not types).

Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.

I'm not 100% happy with the names: bytes-like is slightly stricter than
just buffer protocol (it must be able to export a C-contiguous buffer),
but in practice I'd be surprised if there are types for which there is a
difference at static analysis time (e.g. not every memoryview instance
is bytes-like, but that's a property of instances, not types).
@bmerry
Copy link
Contributor Author

bmerry commented Jun 15, 2020

Some other places where these types could (I think) be used:

  • mmap.{find,rfind,write}
  • hashlib (currently has a local _DataType)
  • hmac (currently has a local _B, which is different to what hashlib supports)
  • typing.BinaryIO
  • memoryview.init (has a union)

@JelleZijlstra
Copy link
Member

Good idea! I haven't verified that the specific changes are correct but this is a good way to make progress on the buffer interface.

Copy link
Collaborator

@srittau srittau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks reasonable. If it turns out, it breaks too much, we might need to temporarily change the types, but collecting the types accepting buffers in _typeshed.pyi is reasonable in any case.

@srittau srittau merged commit e05fbab into python:master Jun 19, 2020
@bmerry bmerry deleted the byteslike-aliases branch June 19, 2020 11:51
bmerry added a commit to bmerry/typeshed that referenced this pull request Jun 19, 2020
Based on the Python stdlib documentation:
- Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like.
  I've used the _typeshed.ReadableBuffer alias defined in python#4232.
- Since Python 3.6, mmap.write returns the number of bytes written.
- Since Python 3.3, mmap.read allows None as the parameter; while in
  Python 2 the argument cannot be omitted.
bmerry added a commit to bmerry/typeshed that referenced this pull request Jun 19, 2020
Based on the Python stdlib documentation:
- Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like.
  I've used the _typeshed.ReadableBuffer alias defined in python#4232.
- Since Python 3.6, mmap.write returns the number of bytes written.
- Since Python 3.3, mmap.read allows None as the parameter; while in
  Python 2 the argument cannot be omitted.
bmerry added a commit to bmerry/typeshed that referenced this pull request Jun 19, 2020
This is a follow-up on python#4232. memoryview, hashlib, and hmac are updated
to use ReadableBuffer type instead of their own home-spun unions of
bytes, bytearray and whatever else each use case used. mmap is being
handled in python#4244, and I'll leave BinaryIO for another day (or possibly
another person) because it's going to require some messy code
duplication because the relevant methods are defined in IO[AnyStr].

There's one corner case I'm not quite sure how best to handle: the
documentation for hmac.digest claim that the parmaeters have the same
meanings as in hmac.new, but in CPython the latter has an explicit check
that `key` is bytes or bytearray while the former works with a
memory-view. For now I've matched the documentation.

Also, the documentation for HMAC.update says that `msg` can be any type
supported by hashlib from Python 3.4; but I can't see anything in the
Python 2.7 implementation that would prevent it also taking bytes-like
objects, so I've not tried to treat Python 2 any different to Python 3.
JelleZijlstra pushed a commit that referenced this pull request Jun 21, 2020
* Update mmap stubs for newer Python versions

Based on the Python stdlib documentation:
- Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like.
  I've used the _typeshed.ReadableBuffer alias defined in #4232.
- Since Python 3.6, mmap.write returns the number of bytes written.
- Since Python 3.3, mmap.read allows None as the parameter; while in
  Python 2 the argument cannot be omitted.

* Further clean up mmap.pyi

Use the fact that Python 3.0-3.4 are no longer supported to clean up the
version-dependent logic. Functions that always have different signatures
in Python 2/3 are moved from the base _mmap[bytes] to the mmap subclass.
JelleZijlstra pushed a commit that referenced this pull request Jun 22, 2020
This is a follow-up on #4232. memoryview, hashlib, and hmac are updated
to use ReadableBuffer type instead of their own home-spun unions of
bytes, bytearray and whatever else each use case used. mmap is being
handled in #4244, and I'll leave BinaryIO for another day (or possibly
another person) because it's going to require some messy code
duplication because the relevant methods are defined in IO[AnyStr].

There's one corner case I'm not quite sure how best to handle: the
documentation for hmac.digest claim that the parmaeters have the same
meanings as in hmac.new, but in CPython the latter has an explicit check
that `key` is bytes or bytearray while the former works with a
memory-view. For now I've matched the documentation.

Also, the documentation for HMAC.update says that `msg` can be any type
supported by hashlib from Python 3.4; but I can't see anything in the
Python 2.7 implementation that would prevent it also taking bytes-like
objects, so I've not tried to treat Python 2 any different to Python 3.
vishalkuo pushed a commit to vishalkuo/typeshed that referenced this pull request Jun 26, 2020
Since typing doesn't yet have a way to express buffer protocol objects
(python/typing#593), various interfaces have ended up with a mish-mash
of options: some list just bytes (or just bytearray, when writable),
some include mmap, some include memoryview, I think none of them include
array.array even though it's explicitly mentioned as bytes-like, etc. I
ran into problems because RawIOBase.readinto didn't allow for
memoryview.

To allow for some uniformity until the fundamental issue is resolved,
I've introduced _typeshed.ReadableBuffer and _typeshed.WriteableBuffer,
and applied them in stdlib/3/io.pyi as an example. If these get rolled
out in more places, it will mean that we have only one place where they
have to get tweaked in future, or swapped out for a public protocol.

This unfortunately does have the potential to break code that inherits
from RawIOBase/BufferedIOBase and overrides these methods, because the
base method is now more general and so the override now needs to accept
these types as well (which is why I've also updated gzip and lzma).
However, it should be a reasonably easy fix, and will make the
downstream annotations more correct.
vishalkuo pushed a commit to vishalkuo/typeshed that referenced this pull request Jun 26, 2020
* Update mmap stubs for newer Python versions

Based on the Python stdlib documentation:
- Since Python 3.5, mmap.{find,rfind,write} all accept any bytes-like.
  I've used the _typeshed.ReadableBuffer alias defined in python#4232.
- Since Python 3.6, mmap.write returns the number of bytes written.
- Since Python 3.3, mmap.read allows None as the parameter; while in
  Python 2 the argument cannot be omitted.

* Further clean up mmap.pyi

Use the fact that Python 3.0-3.4 are no longer supported to clean up the
version-dependent logic. Functions that always have different signatures
in Python 2/3 are moved from the base _mmap[bytes] to the mmap subclass.
vishalkuo pushed a commit to vishalkuo/typeshed that referenced this pull request Jun 26, 2020
This is a follow-up on python#4232. memoryview, hashlib, and hmac are updated
to use ReadableBuffer type instead of their own home-spun unions of
bytes, bytearray and whatever else each use case used. mmap is being
handled in python#4244, and I'll leave BinaryIO for another day (or possibly
another person) because it's going to require some messy code
duplication because the relevant methods are defined in IO[AnyStr].

There's one corner case I'm not quite sure how best to handle: the
documentation for hmac.digest claim that the parmaeters have the same
meanings as in hmac.new, but in CPython the latter has an explicit check
that `key` is bytes or bytearray while the former works with a
memory-view. For now I've matched the documentation.

Also, the documentation for HMAC.update says that `msg` can be any type
supported by hashlib from Python 3.4; but I can't see anything in the
Python 2.7 implementation that would prevent it also taking bytes-like
objects, so I've not tried to treat Python 2 any different to Python 3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants