-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-121710: Add PyBytesWriter API #121726
gh-121710: Add PyBytesWriter API #121726
Conversation
@picnixz: Such coding style review is not helping. It's the first version of the PR, we didn't discuss the API yet. I would prefer comments about the API rather than 7 comments about spaces and PEP 7. It's too early for that. |
Oh ok my bad (I'll try to reduce those kind of reviews in the future). Then, I have a suggestion about maybe having the possibility to expose something for byte substrings (similar to So, ideally:
(Again, sorry for my nitpicky review). |
@picnixz: I would prefer to make the API as small as possible. Where do these use cases come from? Did you see usage in the current C code of CPython with the _PyBytesWriter API? In your list, the most appealing is
While it's tempting to add as many features as PyUnicodeWriter API, I'm not convinced that this function is needed right now.
You can already call
You can already call
PyBytesWriter is not PyUnicodeWriter. The intended usage is to allocate enough bytes and then just use str pointer. Example:
WriteChar() would be way slower. |
Errr.. my idea was to have a parity with the PyUnicodeWriter API, but if you want to make it as small as possible, I'm fine with it (I never had to use the private API to write a bytes string either and I didn't check the code to see occurrences of it actually). I think the |
Look at the current usage of PyUnicodeWriter is used for other use cases and so it deserves a different API. |
Don't call _PyBytesWriter_Dealloc() on error.
The caller must check str. Any additional check has a negative impact on performance.
PR rebased on main to fix a merge conflict. |
I created a C API Working Group issue: capi-workgroup/decisions#39 |
Microbenchmark comparing bytearray to PyBytesWriter API: Result:
PyBytesWriter is 4x faster (93.0 => 23.4 ns) than using bytearray. |
It seems that all the needed information could fit inside the memory of a
With that we could avoid allocating the big struct. With a very quick and dirty proof of concept, I get reasonable results, which suggest this could be useful for alternative implementations or future CPython versions:
For the API, it would mean that |
It's used if you call Prepare() multiple times.
It's used. It does defer the Python bytes object until Finish() which can avoid having to call the inefficient _PyBytes_Resize() function.
Why do you want to avoid that? I would like to reuse the existing private |
It would be nice to keep the possibility open for other implementations, if possible.
But, this is not exposing all of the private interface, and I'm not sure if the existing tests apply to what's exposed here. In particular, the public API doesn't have:
An alternate implementation would not need the machinery for those features, and could do without |
I decided to reject this API for now. It's too low-level and too error prone: capi-workgroup/decisions#39 (comment) |
📚 Documentation preview 📚: https://cpython-previews--121726.org.readthedocs.build/