Feature proposal: implement native parsing and serialization #214

antonagestam · 2024-10-13T10:37:10Z

I have been looking into this periodically since very early in the life of kio. This is ticket is for tracking progress, discussing nuances and just some general brain-dumping of ideas and findings.

The idea is simple to describe. In order to speed up serde, we re-implement all parsing and serialization in Rust using PyO3. The main entry-point to parsing is entity_reader and its internal read_entity, they would both be implemented as Rust functions that identically to the current implementation introspects an entity class and from that is able to parse a stream of bytes.

The tooling for this kind of setup is mature. There is some boilerplate to set it up initially, but not a lot.

Testing strategy

My idea for this is that we have achieved a very solid test suite already, and that we should reap the fruits of that investment. We have high trust that the current suite asserts correctness. Therefore, when we rewrite implementations in Rust, it will be valuable to keep running the same test suite in Python.

There might be also cases where we want to add additional testing on the Rust level, but that would likely mostly be to cover utilities that are not exposed to Python, as I see it.

`memoryview` instead of `IO[bytes]`

In order to achieve zero copy semantics, when used in client code, we need to rewrite the current implementation to use memoryview as the main interface to read bytes from rather than IO[bytes]. Since memoryview doesn't maintain a position in the stream, this necessarily changes the interface to all parsing functions.

My solution for this is to instead of having signatures like (IO[bytes]) -> T for a parser of T, we change that into (memoryview) -> (memoryview, T), so that every function in addition to a parsed value also returns a new memoryview of the remaining bytes from the stream that are yet to be parsed. Creating new memoryviews from existing ones in this way is cheap and still maintains the zero copy semantics. It also seems more ergonomic than for instance having every function return the number of consumed bytes as an integer.

Using memoryview does come with issues though ¹². It's not yet clear to me how best to approach this, and whether there have been recent improvements to best practices. I'm currently looking closer into it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature proposal: implement native parsing and serialization #214

Feature proposal: implement native parsing and serialization #214

antonagestam commented Oct 13, 2024 •

edited

Loading

Feature proposal: implement native parsing and serialization #214

Feature proposal: implement native parsing and serialization #214

Comments

antonagestam commented Oct 13, 2024 • edited Loading

Testing strategy

memoryview instead of IO[bytes]

Footnotes

antonagestam commented Oct 13, 2024 •

edited

Loading

`memoryview` instead of `IO[bytes]`