-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TS Bebop views are shared which leads to difficult to diagnose bugs #209
Comments
I’ll provide a more thoughtful response later this evening but to comment quickly:
Bebop is focused on performance across every language precisely because for us every millisecond spent allocating or wasting cpu cycles impacts end-user performance. To state otherwise runs counter to why we made Bebop, and isn’t a mindset that is productive for us to subscribe to. Browsers can be slow and it’s our job to make Bebop resilient (also V8 is pretty fast) |
When you have a chance to review more closely, I do think the bring-your-own BebopView would be a good middle ground to prevent the default from being hard-to-understand behavior but also not blocking users from optimizing. |
For sure. We do “bring your own buffer” in C# too so you can leverage array pooling. I have notes from our initial BebopView designs laying around, so i’ll include those and @lynn chime in once she’s back. |
As outlined in #209, the TypeScript runtime utilizes a shared buffer for encoding which can lead to subtle bugs when decoding multiple types. For example: ```typescript // should be the data of 'CreateUser' const createUserBuffer = CreateUser.encode({ email: "[email protected]" }); // encode another type CreateOk.encode({ now: 1 }) // fails to decode as the data of 'createUserBuffer' is now the data of 'CreateOk.encode' const user = CreateUser.decode(createUserBuffer); ``` This change ensures we create and return a new buffer from 'toArray()'; yes, it allocates additional data, but it means users don't need to think about what happens when encoding multiple types when they might still need the data buffer.
I implemented a fix for this in #225 but I’m holding off on merging. Copying memory, even on small data structures, reduces performance substantially:
The Do we prioritize safety over speed or more clearly document this runtime behavior to maintain speed over ergonomics and safety? @lynn tagging you for comment since you wrote the original implementation and I’d appreciate your perspective |
I would lean towards making the change out of the "principle of least surprise", and then exposing the old fast-but-dangerous behavior as |
Describe the bug
Okay, this might technically be a feature, but it was undocumented and so I spent a good bit of time assuming my code was wrong only to find out it had to do with serialization logic assumptions not holding up.
To Reproduce
This test passes
And this test fails
Basically just use the serialized data from one type as a
byte[]
value for another and serialize and then weird things happen.There is another case where this can and will pop up, which is if you serialize data, and then for ANY reason
await
before using the buffer. This could happen in networking or subtle code logic, like if you were to havebecause it is very possible, depending on what triggers myFn, to have
buf
change beforedoOtherThing
is called.This is also an issue where you call say a function for a database to do a thing with
buf
and, without you even knowing, it chooses to await something before it uses your provided buffer. At that point it would be almost impossible to debug without knowing about this behavior why the data in your database is not the data that was submitted to the function.You will likely only run into this secondary case if you are writing network code which triggers sterilization as multiple concurrent operations may occur naturally. What is worse about this, is that it is effectively a race condition that is very subtle and almost impossible to reproduce and debug.
Expected behavior
When I get the serialized data, I expect it to continue living until I drop the last reference to it unless otherwise documented.
IMO, in a language like C++, it makes some sense to do this and expect the user to know and proceed with caution, but you would also probably use a mutex on the buffer. In TS there is NO mutex which is fine as long as you don't run into that secondary case of await-before-use because JS is single-threaded. At minimum this needs to be documented, but JS is not a language focused on performance so I think it would be better to make a new BebopView for every time encode is called by default as it is the least likely to cause weird results. We should allow the user to bring their own view and that would allow more efficient serialization where they know what they are doing.
Bebop info:
rpc-ts
branch)Desktop (please complete the following information):
Additional context
Working on RPC and for this we have a datagram which contains a raw data field which is another bebop serialized type. This is how this behavior was discovered by I, @Eadword, a person who was not particularly familiar with the serialization implementation and made what I thought were reasonable assumptions about behavior.
The text was updated successfully, but these errors were encountered: