-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C API] Add an efficient public PyUnicodeWriter API #119182
Comments
Move the private _PyUnicodeWriter API to the internal C API.
Benchmark using:
The difference comes from overallocation: if I add The |
By the way, PyPy provides
|
Article about this performance problem in Python: https://lwn.net/Articles/816415/ |
Curious if this warrants a further API I know, I know, hyper-generalization, yet this is what the Union example is screaming for... I suppose we can add those later. How long has the internal writer API existed? Would these be in the Stable ABI / Limited API from the start? (API-wise these look stable.) |
I suppose that you mean
There is already a collection of helper function accepting a writer and I find this really cool. It's not "slot-based", since each function has many formatting options. extern int _PyLong_FormatWriter(
_PyUnicodeWriter *writer,
PyObject *obj,
int base,
int alternate);
extern int _PyLong_FormatAdvancedWriter(
_PyUnicodeWriter *writer,
PyObject *obj,
PyObject *format_spec,
Py_ssize_t start,
Py_ssize_t end);
extern int _PyFloat_FormatAdvancedWriter(
_PyUnicodeWriter *writer,
PyObject *obj,
PyObject *format_spec,
Py_ssize_t start,
Py_ssize_t end);
extern int _PyComplex_FormatAdvancedWriter(
_PyUnicodeWriter *writer,
PyObject *obj,
PyObject *format_spec,
Py_ssize_t start,
Py_ssize_t end);
extern int _PyUnicode_FormatAdvancedWriter(
_PyUnicodeWriter *writer,
PyObject *obj,
PyObject *format_spec,
Py_ssize_t start,
Py_ssize_t end);
extern Py_ssize_t _PyUnicode_InsertThousandsGrouping(
_PyUnicodeWriter *writer,
Py_ssize_t n_buffer,
PyObject *digits,
Py_ssize_t d_pos,
Py_ssize_t n_digits,
Py_ssize_t min_width,
const char *grouping,
PyObject *thousands_sep,
Py_UCS4 *maxchar); These functions avoid memory copies. For example,
12 years: I added it in 2012.
I wrote this API to fix the major performance regression after PEP 393 – Flexible String Representation was implemented. After my optimization work, many string operations on Unicode objects became faster than Python 2 operations on bytes! Especially when treating only ASCII characters which is the most common case. I mostly optimized In 2016, I wrote an article about the two "writer" APIs that I wrote to optimize: https://vstinner.github.io/pybyteswriter.html
I would prefer to not add it to the limited C API directly, but wait one Python version to see how it goes. |
(Yes, I meant WriteRepr.) I like these other helpers -- can we just add them all to the public API? Or are there issues with any of them? |
I added the following function which should fit most of these use cases: PyAPI_FUNC(int) PyUnicodeWriter_FromFormat(
PyUnicodeWriter *writer,
const char *format,
...); Example to write PyUnicodeWriter_FromFormat(writer, "%R", obj); Example to write PyUnicodeWriter_FromFormat(writer, "%S", obj); It's the same format than PyUnicodeWriter_FromFormat(writer, "Hello %s, %i.", "Python", 123); |
Thank you, this looks very useful! I see that The va_arg function is problematic for non-C languages, but it's possible to get the functionality with other functions – especially if we add a number-writing helper, so I'm OK with adding it. The proposed API is nice and minimal. My bet about what users will ask for next goes to Name bikeshedding:
I see the PR hides underscored API that some existing projects use. I thought we weren't doing that any more. |
"WriteChar" name comes from PyUnicode_ReadChar() and PyUnicode_WriteChar() names. I don't think that mentioning UCS4 is useful.
I would prefer just "PyUnicodeWriter_Format()". I prefer to not support str.format() which is more a "Python API" than a C API. It's less convenient to use in C. If we don't support str.format(), "PyUnicodeWriter_Format()" is fine for the "PyUnicode_FormFormat()" variant. |
Yeah, I think that using unqualified |
Move the private _PyUnicodeWriter API to the internal C API.
I propose to add PyUnicodeWriter_WriteString() which decodes from UTF-8 (in strict mode). PyUnicodeWriter_WriteASCIIString() has an undefined behavior if the string contains non-ASCII characters. Maybe it should be removed in favor of PyUnicodeWriter_WriteString() which is safer (well defined behavior for non-ASCII characters: decode them from UTF-8). |
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
The main problem with the current private PyUnicodeWriter C API is that it requires allocating the PyUnicodeWriter value on the stack, but its layout is an implementation detail, and exposing such API would prevent future changes. The proposed new C API allocates the data in dynamic memory, which makes it more portable and future proof. But this can add additional overhead. Also, if we use dynamic memory, why not make PyUnicodeWriter a subclass of PyObject? Then Py_DECREF could be used to destroy it, we could store multiple writers in a collection, and we can even provide Python interface for it. |
I ran benchmarks and using the proposed public API remains interesting in terms of performance: see benchmarks below.
Adding a Python API is appealing, but I prefer to restrict this discussion to a C API and only discuss later the idea of exposing it at the Python level. For the C API, I don't think that Py_DECREF() semantics and inheriting from PyObject are really worth it. |
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
I renamed functions:
|
Right, I would like to hide/remove the internal API from the public C API in Python 3.14 while adding the new public C API. The private I prepared a PR for pythoncapi-compat to check that it's possible to implement the new API on Python 3.6-3.13: python/pythoncapi-compat#95 |
Move the private _PyUnicodeWriter API to the internal C API.
There is some confusion with names. The So, for consistency we should use |
We can refer to them as "Unicode", such as: |
…ython#120809) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. It also makes the code simpler and shorter.
) Add PyUnicodeWriter_WriteWideChar() and PyUnicodeWriter_DecodeUTF8Stateful() functions. Co-authored-by: Serhiy Storchaka <[email protected]>
Use PyUnicodeWriter_WriteWideChar() in PyUnicode_FromFormat()
…120799) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. Benchmark: python -m pyperf timeit \ -s 't = list[int, float, complex, str, bytes, bytearray, ' \ 'memoryview, list, dict]' \ 'str(t)' Result: 1.49 us +- 0.03 us -> 1.10 us +- 0.02 us: 1.35x faster
…on#120797) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. Benchmark: python -m pyperf timeit \ -s 't = int | float | complex | str | bytes | bytearray' \ ' | memoryview | list | dict' \ 'str(t)' Result: 1.29 us +- 0.02 us -> 1.00 us +- 0.02 us: 1.29x faster
Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.
…ython#120809) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. It also makes the code simpler and shorter.
) Add PyUnicodeWriter_WriteWideChar() and PyUnicodeWriter_DecodeUTF8Stateful() functions. Co-authored-by: Serhiy Storchaka <[email protected]>
Use PyUnicodeWriter_WriteWideChar() in PyUnicode_FromFormat()
See also #121710 : [C API] Add PyBytesWriter API. |
…120799) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. Benchmark: python -m pyperf timeit \ -s 't = list[int, float, complex, str, bytes, bytearray, ' \ 'memoryview, list, dict]' \ 'str(t)' Result: 1.49 us +- 0.03 us -> 1.10 us +- 0.02 us: 1.35x faster
…on#120797) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. Benchmark: python -m pyperf timeit \ -s 't = int | float | complex | str | bytes | bytearray' \ ' | memoryview | list | dict' \ 'str(t)' Result: 1.29 us +- 0.02 us -> 1.00 us +- 0.02 us: 1.29x faster
Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.
…ython#120809) The public PyUnicodeWriter API enables overallocation by default and so is more efficient. It also makes the code simpler and shorter.
) Add PyUnicodeWriter_WriteWideChar() and PyUnicodeWriter_DecodeUTF8Stateful() functions. Co-authored-by: Serhiy Storchaka <[email protected]>
Use PyUnicodeWriter_WriteWideChar() in PyUnicode_FromFormat()
Feature or enhancement
Creating a Python string object in an efficient way is complicated. Python has private
_PyUnicodeWriter
API. It's being used by these projects:Affected projects (5):
I propose making the API public to promote it and help C extensions maintainers to write more efficient code to create Python string objects.
API:
The internal writer buffer is overallocated by default.
PyUnicodeWriter_Finish()
truncates the buffer to the exact size if the buffer was overallocated.Overallocation reduces the cost of exponential complexity when adding short strings in a loop. Use
PyUnicodeWriter_SetOverallocate(writer, 0)
to disable overallocation just before the last write.The writer takes care of the internal buffer kind: Py_UCS1 (latin1), Py_UCS2 (BMP) or Py_UCS4 (full Unicode Character Set). It also implements an optimization if a single write is made using
PyUnicodeWriter_WriteStr()
: it returns the string unchanged without any copy.Example of usage (simplified code from Python/unionobject.c):
Linked PRs
The text was updated successfully, but these errors were encountered: