-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf lacks framed stream of messages #54
Comments
To be clear, protobuf does support framed stream of messages. Are you talking specifically about protobuf in Python? In C++ and Java, you can use CodedInputStream/CodedOutputStream to read/write varint or any other protobuf wire format data. |
Yes, Python. Can we get a stable interface to the underlying operations in Python officially blessed? |
On 15 October 2014 09:20, Brian Olson [email protected] wrote:
Doesn't CodedOutputStream already provide that ? |
It does. Have you seen the Python implementation? |
CodedOutputStream is in the C++ library. In Python I think what I want is buried in google.protobuf.internal.encoder._EncodeVarint and google.protobuf.internal.decoder._DecodeVarint |
…lbuffers#54) * Changed schema for JSON test to be defined in a .proto file. Before we had lots of code to build these schemas manually, but this was verbose and made it difficult to add to the schema easily. Now we can just write a .proto file and adding fields is easy. To avoid making the tests depend on upbc (and thus Lua) we check in the generated schema. * Made protobuf-compiler a dependency of "make genfiles." * For genfiles download recent protoc that can handle proto3. * Only use new protoc for genfiles.
I recently implemented a Bazel persistent worker in Python. The lack of varint-delimited reading/writing APIs was an obstacle. I worked around it by using the private APIs. This is case of two Google products not working well together. Is it possible to publish these APIs? |
I would like to revive this thread by sharing our experience. We have a distributed system, which communicates using protobuf messages. Since those aggregation points aggregate a LOT of protobufs, it's important for the aggregation to be fast. We've tried different ways of creating a list of protobufs, and found out that the following way is the fastest.
Here's our benchmark and results: my_proto.proto:
benchmark.py:
Results for a list of 1M protos:
As you can see, with pure proto, the stream code not only creates a smaller output, but also almost twice as fast, but we must do better, so we use CPP protos: With CPP python protos:
Here you can see that the stream code is more than 1.5 times slower :(. Can we please get streams as part of this package, as it seems doing it any other way will not give us good enough speed. |
Now that Python is implemented on top of upb, this has become a upb issue. First up is to implement a proto text parser, which is something I am doing now. Initial implementation will be limited to continuous buffers, after that we will look into adding support for stream I/O. I can't say yet when (or even whether) this may float to the top of the work queue but it is definitely on my radar so reassigning this to myself. |
…lbuffers#54) * Changed schema for JSON test to be defined in a .proto file. Before we had lots of code to build these schemas manually, but this was verbose and made it difficult to add to the schema easily. Now we can just write a .proto file and adding fields is easy. To avoid making the tests depend on upbc (and thus Lua) we check in the generated schema. * Made protobuf-compiler a dependency of "make genfiles." * For genfiles download recent protoc that can handle proto3. * Only use new protoc for genfiles.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment. This issue is labeled |
Python is officially getting a public API for length-prefixed streams of messages: #16965 It is not released yet, but it will be included in the next minor version. If you have any performance issues with this API, please open a separate issue for it. |
Lots of applications want a stream of protobuf messages in a file or a network stream.
It could be as simple as exposing the internal utility functions to write a varint to a stream. An application could then write a varint length prefix and then the blob of serialized protobuf.
The text was updated successfully, but these errors were encountered: