-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C# ByteString, CodedInputStream performance improvements desired for messages with large byte array field #4206
Comments
Somewhat related gRPC issue: grpc/grpc#14378 |
I would want to have support for unsafe ByteString that would support setting it's length as well. When sending chunked data, last chunk is usually smaller. @d79ima are you are aware that if you allocate 85000 bytes or more in .NET Framework allocation goes directly to LOH (Large Object Heap) and skips GC generations? It will add lot of GC pressure and LOH can only be compacted manually. So if you want to send binary data it should done in less than 85000 byte chunks. |
I'm seconding this wish for a garbage free api. |
+1, we are observing the same problem. How do we move forward on this? |
A simple enhancement that would address most of this issue is if the C# gpc library had a way to pool no longer needed ByteString (like Go's sync.Pool) and so then could be internally reused. This would prevent the crazy firehose of ByteStrings constantly being created that the GC has to chase. If I were implementing it, I'd make 4-8 size ByteStrings pools/buckets internally so the internal faux-allocator chooses from a pool such that waste isn't excessive. And there could be a manual polling call to drain them or a periodic task that slowly drains 1-5% of each bucket every 10-60 secs. That way no timestamps/tracking is needed, it covers the lion's share of cases, and there are no freak/worst-case scenarios. That extension could be added and tested in a couple days by one person. I'd volunteer to do if someone here told me there was a good chance of the enhancement being picked up. Please help us Protobuf heroes! We absolutely love protobufs and we're astonished they're not used more. I do think that will change as interoperability is recognized as more important. Drew |
I believe the way forward is implementing #5888 (which allows parsing messages directory from a ReadOnlySequence). For avoiding excessive allocations in ByteString, that's a separate problem and we'd need to come up with a separate design. The main problem of using array pools with ByteString is that ByteString wasn't designed to be disposable, so there's no way of expressing that the rented array can be returned to the pool. |
Hi! Maybe I'm not following correctly, but my vision was that a gpb consumer would explicitly drop off ByteStrings that are known to be no longer used ( Sure, I love the idea of 5888, but it seems to be a larger bolder ask of development, whereas adding a ByteString Hopefully I'm making sense or maybe I'm missing what you're saying. |
Would something similar to Stream.Read(byte[], int, int) work? https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.read |
I'm having trouble imagining some API additions for ByteString that would make it possible to manage them more efficiently:
Because ReadOnlySequence is ephemeral, the only approach I could see is somehow making ByteString parsing customizable (an advanced API) and optionally make ByteString only remember a Slice of the input ReadOnlySequence (which will become invalid as soon as the input ROS becomes invalid). After that, the user would need to make sure to extract the data from byteString themselves before the input ROS is invalidated. This could be useful in some high-performance scenarios, but it's sound like an api that's very specialized and hard to use, so I'm not a fan. |
I think there are two scenarios for creating ByteString:
protobuf/csharp/src/Google.Protobuf/ByteString.cs Lines 57 to 78 in 6263268
To emphasis its unsafe nature, the name could alternatively be Adding this new way of creation would sacrifice the current safety ByteString has of it always being immutable. I don't know if this is important or not. If it is, perhaps ByteString could have a flag on it to identify whether it is an "unsafe" or not?
|
It's it possible to codegen a different type besides ByteString for the unsafe "optimized" variant? |
Is there any progress / workaround for this issue? I belive that, as @JamesNK mentioned, new public |
I wrote a PR that allows a ByteString to be created without copying the data. It matches what is available in Java - #7645 No progress on merging it right now. |
When will the change arrive? For gRPC messages with large payloads (read images) a GC friendly design is a must. |
We need this please. |
…improvement for fixing memiry spikes, looks this is a known issue for bytestring. bytesring is not designed for disposeble, refer to: protocolbuffers/protobuf#4206
Currently if you have a message with an array of bytes field that translates into a ByteString immutable object.
ByteString can't be created without incurring a GC memory allocation and an array copy.
When you deserialize a ByteString using CodedInputStream, the same penalty is paid.
Furthermore, when you then go to consume the bytes you received you pay this same penalty yet again because of ByteString.ToByteArray().
In my application these penalties are adding up to excessive GC pressure and memory copies which is reducing my message throughput substantially. This leaves me some unatractive options for my .NET C# application, either use C++ protobuffs communication layer or send my large byte array messages natively without using protobuffs.
It would be really nice if you allowed an option to have more fine grained control over the "bytes" field such that the code generator would spit out just a native byte[] type field or a new class (i.e. UnsafeByteString) which bypasses the above mentioned penalties and user takes on full responsibility for the fact that it is a mutable unsafe byte array.
CodedInputStream also needs to provide an option for user to inject his own byte array allocator so that user application can have fine grained control over byte array memory allocations to minimize GC penalties.
I think this is really important to allow for truly high performance C# protobuf apps.
The text was updated successfully, but these errors were encountered: