Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming API #6

Open
LevitatingBusinessMan opened this issue Jun 1, 2023 · 13 comments
Open

Streaming API #6

LevitatingBusinessMan opened this issue Jun 1, 2023 · 13 comments

Comments

@LevitatingBusinessMan
Copy link

LevitatingBusinessMan commented Jun 1, 2023

Are there any plans for a streaming API? The ability to serialize/deserialize impl Read and impl Write.

I want to be able to deserialize from a TCPStream.

bincode and postcard both support this.

@caibear
Copy link
Collaborator

caibear commented Jun 1, 2023

Are there any plans for a streaming API? The ability to serialize/deserialize impl Read and impl Write.

For our use case we don't need this feature so I am hesitant to add and maintain it.

From an API perspective it would involve duplicating encode into encode_into(w: &mut impl Write, t: &impl Encode) -> Result<(), Error> and decode into decode_from<T: Decode>(r: &mut impl Read) -> Result<T, Error>.

From an internal code perspective it would need to avoid regressing the current performance without duplicating too much code.

I want to be able to deserialize from a TCPStream.

Can you read from the TCPStream into a Vec<u8> and pass that to bitcode?

Are your messages too large that they would consume too much memory? I kind of doubt this because serialized bitcode typically consumes less memory than the deserialized type does.

@LevitatingBusinessMan
Copy link
Author

Can you read from the TCPStream into a Vec and pass that to bitcode?

Yes but that vector could include multiple structs. Postcard has a method take_from_bytes which returns the slice of unused bytes.

From an internal code perspective it would need to avoid regressing the current performance without duplicating too much code.

I was worried that streaming wasn't possible because bitcode relied on knowing where the serialized data ends. In hindsight I realize that doesn't make sense.

@finnbear
Copy link
Member

finnbear commented Jun 2, 2023

that vector could include multiple structs

We work exclusively with WebSockets which provide their own framing of messages. The easiest way to use bitcode on a raw TcpStream might be to transmit the length (e.g. a 4 byte unsigned integer in network endian) and then the bytes from bitcode.

@LevitatingBusinessMan
Copy link
Author

that vector could include multiple structs

We work exclusively with WebSockets which provide their own framing of messages. The easiest way to use bitcode on a raw TcpStream might be to transmit the length (e.g. a 4 byte unsigned integer in network endian) and then the bytes from bitcode.

Yes, that's a good solution thanks. But first I might take a crack at modifying the bitcode codebase to allow for reading a slice partially or reading from a stream.

@NiseVoid
Copy link

NiseVoid commented Sep 6, 2023

I think having a way to pack multiple types into one big packet is quite an essential feature, when encoding I guess we can just .extend_from_slice() on the slice from Buffer::encode. But there seems to currently be no way to read multiple messages packed together without including the length of each message (which afaict would be redundant information).

In my usecase (sending game data in UDP packets) I currently use a Cursor and decode messages (using bincode) in a loop until it consumed the entire packet, but even something as simple as getting (T, usize) as a return value, where usize is the number of bytes that were decoded, would be enough.

@caibear
Copy link
Collaborator

caibear commented Sep 19, 2023

but even something as simple as getting (T, usize) as a return value, where usize is the number of bytes that were decoded, would be enough.

This would still result in redundant information since each message would be padded to the nearest byte.

@finnbear finnbear closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2023
@LevitatingBusinessMan
Copy link
Author

Well that's a bummer

@finnbear
Copy link
Member

finnbear commented Sep 20, 2023

This would still result in redundant information since each message would be padded to the nearest byte.

It's necessary for TCP streams which don't support transmitting fractional bytes, unless the end of each message waited until the start of the next message.

Well that's a bummer

To be clear, a streaming API in the sense of impl Read + Write is not planned due to performance and compatibility issues.

We're considering an API that allows you to:

  • append messages with minimal padding and no 'length' field
  • decode the prefix of received data as a message and know where the decoder left off

This would slightly reduce the overhead of using bitcode in a stream-like context.

Edit: Closing this issue may have been premature. I've reopened it until there is an issue more focused on what we can actually implement.

@finnbear finnbear reopened this Sep 20, 2023
@LevitatingBusinessMan
Copy link
Author

decode the prefix of received data as a message and know where the decoder left off

❤️

@caibear
Copy link
Collaborator

caibear commented Feb 22, 2024

New version of bitcode #19 has the potential to add streaming without high overhead.

@MOZGIII
Copy link

MOZGIII commented May 4, 2024

Looks like the code actually would work great with streaming APIs, if only the codec mod was publicly available. Or, rather, the View and Decoder traits for decoding.

I'd have a loop with roughly this:

  1. Try T::populate(1) on the buffer;
  2. Fails? Read and buffer more data;
  3. Works? Decode and return the value, advance the buffer to match what T::populate did to the slice we gave it.

Note that this way bitcode crate does not do any IO itself - external code would be responsible for that. This is the way I'd recommend doing it, as async exists and there everyone has their own traits for read/write ops.

@caibear
Copy link
Collaborator

caibear commented May 4, 2024

I'd have a loop with roughly this:

  1. Try T::populate(1) on the buffer;
  2. Fails? Read and buffer more data;
  3. Works? Decode and return the value, advance the buffer to match what T::populate did to the slice we gave it.

You could achieve the same effect by length prefixing your messages. If your messages are long, this shouldn't add much overhead. If your messages are short and you encode them one at a time, bitcode won't provide any benefit over bincode.

Note: If your use-case is packing multiple small messages into a UDP packet see bitcode_packet_packer. It's able to encode multiple messages at once, but produces discrete packets that don't exceed a limit. Included is a benchmark of various techniques including encoding messages one at a time.

@MOZGIII
Copy link

MOZGIII commented May 4, 2024

This is, obviously, a workaround that is universal and well know, and it what I'm using currently. I'm interested in specifically bitcode to provide this functionality - not because there's no other way but rather because bitcode already has everything that is required to do it.

Thanks for sharing the bitcode_packet_packer.

My use case is passing data over WebTransport streams - and currently it is for an example app. It is, basically, sending a packet and waiting for a reply, very old fashioned state machine on both ends without the need to pack multiple messages at once.

I'm currently using the tokio_util::codec::length_delimited::LengthDelimitedCodec - but just that, without the rest of FramedCodec infrastructure, as we don't use tokio io types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants