You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been looking into this periodically since very early in the life of kio. This is ticket is for tracking progress, discussing nuances and just some general brain-dumping of ideas and findings.
The idea is simple to describe. In order to speed up serde, we re-implement all parsing and serialization in Rust using PyO3. The main entry-point to parsing is entity_reader and its internal read_entity, they would both be implemented as Rust functions that identically to the current implementation introspects an entity class and from that is able to parse a stream of bytes.
The tooling for this kind of setup is mature. There is some boilerplate to set it up initially, but not a lot.
Testing strategy
My idea for this is that we have achieved a very solid test suite already, and that we should reap the fruits of that investment. We have high trust that the current suite asserts correctness. Therefore, when we rewrite implementations in Rust, it will be valuable to keep running the same test suite in Python.
There might be also cases where we want to add additional testing on the Rust level, but that would likely mostly be to cover utilities that are not exposed to Python, as I see it.
memoryview instead of IO[bytes]
In order to achieve zero copy semantics, when used in client code, we need to rewrite the current implementation to use memoryview as the main interface to read bytes from rather than IO[bytes]. Since memoryview doesn't maintain a position in the stream, this necessarily changes the interface to all parsing functions.
My solution for this is to instead of having signatures like (IO[bytes]) -> T for a parser of T, we change that into (memoryview) -> (memoryview, T), so that every function in addition to a parsed value also returns a new memoryview of the remaining bytes from the stream that are yet to be parsed. Creating new memoryviews from existing ones in this way is cheap and still maintains the zero copy semantics. It also seems more ergonomic than for instance having every function return the number of consumed bytes as an integer.
Using memoryview does come with issues though 12. It's not yet clear to me how best to approach this, and whether there have been recent improvements to best practices. I'm currently looking closer into it.
I have been looking into this periodically since very early in the life of kio. This is ticket is for tracking progress, discussing nuances and just some general brain-dumping of ideas and findings.
The idea is simple to describe. In order to speed up serde, we re-implement all parsing and serialization in Rust using PyO3. The main entry-point to parsing is
entity_reader
and its internalread_entity
, they would both be implemented as Rust functions that identically to the current implementation introspects an entity class and from that is able to parse a stream of bytes.The tooling for this kind of setup is mature. There is some boilerplate to set it up initially, but not a lot.
Testing strategy
My idea for this is that we have achieved a very solid test suite already, and that we should reap the fruits of that investment. We have high trust that the current suite asserts correctness. Therefore, when we rewrite implementations in Rust, it will be valuable to keep running the same test suite in Python.
There might be also cases where we want to add additional testing on the Rust level, but that would likely mostly be to cover utilities that are not exposed to Python, as I see it.
memoryview
instead ofIO[bytes]
In order to achieve zero copy semantics, when used in client code, we need to rewrite the current implementation to use
memoryview
as the main interface to read bytes from rather thanIO[bytes]
. Sincememoryview
doesn't maintain a position in the stream, this necessarily changes the interface to all parsing functions.My solution for this is to instead of having signatures like
(IO[bytes]) -> T
for a parser ofT
, we change that into(memoryview) -> (memoryview, T)
, so that every function in addition to a parsed value also returns a newmemoryview
of the remaining bytes from the stream that are yet to be parsed. Creating newmemoryview
s from existing ones in this way is cheap and still maintains the zero copy semantics. It also seems more ergonomic than for instance having every function return the number of consumed bytes as an integer.Using
memoryview
does come with issues though 12. It's not yet clear to me how best to approach this, and whether there have been recent improvements to best practices. I'm currently looking closer into it.Footnotes
https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/ ↩
https://discuss.python.org/t/pep-draft-safer-mutability-semantics-for-the-buffer-protocol/42346/5 ↩
The text was updated successfully, but these errors were encountered: