Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Python FromString(bytearray()) broken on upb #15911

Closed
jensbjorgensen opened this issue Feb 21, 2024 · 11 comments
Closed

Regression: Python FromString(bytearray()) broken on upb #15911

jensbjorgensen opened this issue Feb 21, 2024 · 11 comments
Assignees
Labels
inactive Denotes the issue/PR has not seen activity in the last 90 days.

Comments

@jensbjorgensen
Copy link
Contributor

What version of protobuf and what language are you using?
Python protobuf 4.25.3
Version: since python backend changed to upb
Language: Python

What operating system (Linux, Windows, ...) and version?
Linux (ubuntu 22.04.3)

What runtime / compiler are you using (e.g., python version or gcc version)
Python 3.10.12

What did you do?
python -c 'from google.protobuf import timestamp_pb2; timestamp_pb2.Timestamp.FromString(bytearray())'

What did you expect to see
The program should run without error.

What did you see instead?

Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: expected bytes, bytearray found

Workarounds

  1. Install a 3.* version of the python package
  2. export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
  3. Use a memoryview:
with memoryview(byte_array) as mv:
    ts = timestamp_pb2.Timestamp.FromString(mv)

I've copied this in from issue #10774 which was closed and archived. This is still not fixed and is a nasty surprise if your python protobuf package requirement is not pinned to 3.x. Bytearrays are more efficient than bytes to work with so I would expect this usage is very common.

@jensbjorgensen jensbjorgensen added the untriaged auto added to all issues by default when created. label Feb 21, 2024
@honglooker honglooker removed the untriaged auto added to all issues by default when created. label Feb 22, 2024
@honglooker
Copy link
Contributor

@ericsalo do you mind taking a look?

@ericsalo
Copy link
Member

ericsalo commented Apr 5, 2024

Bouncing this over to @anandolee who knows way more about this stuff than I do.

@ericsalo ericsalo removed their assignment Apr 5, 2024
@chrismiller
Copy link

Ouch, I've just been hit by this trying to upgrade our large Python large codebase from an old protobuf version to 5.26.1:

$ python -c 'import google.protobuf; from google.protobuf import timestamp_pb2; print(google.protobuf.__version__); timestamp_pb2.Timestamp.FromString(bytearray())'
5.26.1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: expected bytes, bytearray found

Note that this behaviour is at odds with the documentation, which says:

Parameters: serialized (bytes) – Any object that allows us to call memoryview(serialized) to access a string of bytes using the buffer interface.

Some (admittedly very quick and dirty) benchmarks show that having to convert via bytes(bytearray) or memoryview(bytearray) has a significant performance hit for us:

msg.ParseFromString(bytes_data) time: 1.6248738765716553
msg.ParseFromString(bytes(array_data)) time: 2.5375828742980957
msg.ParseFromString(memoryview(array_data)) time: 2.834777355194092

Even so, 5.26.1 is still quite a bit faster than the old version (which works fine with bytearray), so the concern I have is more about the 50+ places in our codebase that will need wrapping with bytes() conversions than the performance overhead.

Out of interest, here's some comparison benchmarks for our old protobuf version, which show no overhead for using bytearray vs bytes:

msg.ParseFromString(bytes_data) time: 4.132054328918457
msg.ParseFromString(array_data) time: 4.144273519515991

@jensbjorgensen
Copy link
Contributor Author

@anandolee I submitted a quite small PR that would fix this, do you have time to take a look? 🙏

copybara-service bot pushed a commit that referenced this issue Jun 12, 2024
fix is pretty simple, just check if the type is bytearray and get the bytes if it is

addresses issue: #15911

Closes #16691

COPYBARA_INTEGRATE_REVIEW=#16691 from jensbjorgensen:main 6249e62
PiperOrigin-RevId: 642623917
TinyTinni pushed a commit to TinyTinni/protobuf that referenced this issue Jun 15, 2024
fix is pretty simple, just check if the type is bytearray and get the bytes if it is

addresses issue: protocolbuffers#15911

Closes protocolbuffers#16691

COPYBARA_INTEGRATE_REVIEW=protocolbuffers#16691 from jensbjorgensen:main 6249e62
PiperOrigin-RevId: 642623917
deannagarcia pushed a commit to deannagarcia/protobuf that referenced this issue Jun 20, 2024
fix is pretty simple, just check if the type is bytearray and get the bytes if it is

addresses issue: protocolbuffers#15911

Closes protocolbuffers#16691

COPYBARA_INTEGRATE_REVIEW=protocolbuffers#16691 from jensbjorgensen:main 6249e62
PiperOrigin-RevId: 642623917
@chrismiller
Copy link

This fix doesn't seem to be in the new 27.2 release from yesterday, is that to be expected?

@jensbjorgensen
Copy link
Contributor Author

jensbjorgensen commented Jun 26, 2024

I haven't tested it myself, but looking at the tag it should be in there. Did you build it yourself?

@chrismiller
Copy link

Ah OK - no I haven't tried it, I just didn't see it mentioned in the release notes and I couldn't see it on the 27.x branch, but I probably just misunderstand the release process. I'll grab 27.2 and give it a go. Thanks! :)

@jensbjorgensen
Copy link
Contributor Author

jensbjorgensen commented Jun 27, 2024

I take it back @chrismiller I looked at the log and also checked out v27.2 in my working copy and I concur that the change is not present in this tag. git merge-base main v27.2 shows:

commit 57a6e8d (tag: v27-dev)
Author: Mike Kruskal [email protected]
Date: Wed Apr 17 16:31:41 2024 -0700

So I guess the commit that fixed the issue didn't and won't make it into 27.x at all :-(

@chrismiller
Copy link

Ah that's unfortunate, thanks for confirming though. Not sure quite how other changes do make it into the 27.x branch, but I guess 28.x won't be all that far away at least.

Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

@github-actions github-actions bot added the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Sep 26, 2024
@jensbjorgensen
Copy link
Contributor Author

the PR to fix this was merged into version 28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Denotes the issue/PR has not seen activity in the last 90 days.
Projects
None yet
Development

No branches or pull requests

5 participants