Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-125196: Use PyUnicodeWriter in error handlers #125262

Closed
wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 10, 2024

PyCodec_ReplaceErrors() and PyCodec_BackslashReplaceErrors() now use the public PyUnicodeWriter API.

PyCodec_ReplaceErrors() and PyCodec_BackslashReplaceErrors() now use
the public PyUnicodeWriter API.
@@ -700,27 +700,38 @@ PyObject *PyCodec_IgnoreErrors(PyObject *exc)
}


static inline int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to move this inline function into the header file? That way it can omit the declaration in the header file and integrate well. It looks like a public API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I implemented #125201 which should replace this static inline function.

@vstinner
Copy link
Member Author

Benchmark:

import codecs
import pyperf

runner = pyperf.Runner()
replace = codecs.lookup_error('replace')
backslashreplace = codecs.lookup_error('backslashreplace')

for LEN in (1, 1000):
    text = "\xff" * LEN
    exc = UnicodeEncodeError("ascii", text, 0, len(text), "reason")
    runner.bench_func(f"replace encode len={LEN}", replace, exc)

    text = "\xff" * LEN
    exc = UnicodeTranslateError(text, 0, len(text), "reason")
    runner.bench_func(f"replace translate len={LEN}", replace, exc)

    data = b"\x20\x80\xff" * LEN
    exc = UnicodeDecodeError("ascii", data, 0, len(data), "reason")
    runner.bench_func(f"backslashreplace decode len={LEN}", backslashreplace, exc)

    text = "\x20\xff\u20ac\U0010ffff" * LEN
    exc = UnicodeEncodeError("ascii", text, 0, len(text), "reason")
    runner.bench_func(f"backslashreplace encode len={LEN}", backslashreplace, exc)

    text = "\x20\xff\u20ac\U0010ffff" * LEN
    exc = UnicodeTranslateError(text, 0, len(text), "reason")
    runner.bench_func(f"backslashreplace translate len={LEN}", backslashreplace, exc)

Results, Python built with gcc -O3, CPU isolation:

+-------------------------------------+---------+------------------------+
| Benchmark                           | ref     | change                 |
+=====================================+=========+========================+
| replace encode len=1                | 92.9 ns | 110 ns: 1.19x slower   |
+-------------------------------------+---------+------------------------+
| replace translate len=1             | 103 ns  | 138 ns: 1.34x slower   |
+-------------------------------------+---------+------------------------+
| backslashreplace decode len=1       | 95.6 ns | 140 ns: 1.46x slower   |
+-------------------------------------+---------+------------------------+
| backslashreplace encode len=1       | 120 ns  | 166 ns: 1.37x slower   |
+-------------------------------------+---------+------------------------+
| backslashreplace translate len=1    | 126 ns  | 169 ns: 1.35x slower   |
+-------------------------------------+---------+------------------------+
| replace encode len=1000             | 157 ns  | 2.09 us: 13.31x slower |
+-------------------------------------+---------+------------------------+
| replace translate len=1000          | 178 ns  | 2.23 us: 12.51x slower |
+-------------------------------------+---------+------------------------+
| backslashreplace decode len=1000    | 3.03 us | 27.2 us: 8.98x slower  |
+-------------------------------------+---------+------------------------+
| backslashreplace encode len=1000    | 14.4 us | 46.3 us: 3.22x slower  |
+-------------------------------------+---------+------------------------+
| backslashreplace translate len=1000 | 14.4 us | 45.9 us: 3.18x slower  |
+-------------------------------------+---------+------------------------+
| Geometric mean                      | (ref)   | 3.03x slower           |
+-------------------------------------+---------+------------------------+

It's way slower :-(

This PR has a naive PyUnicodeWriter_Fill() implementation calling PyUnicodeWriter_WriteChar() in a loop. I wrote PR gh-125201 which is way more efficient:

+-------------------------------------+---------+------------------------+-----------------------+
| Benchmark                           | ref     | change                 | fill                  |
+=====================================+=========+========================+=======================+
| replace encode len=1000             | 157 ns  | 2.09 us: 13.31x slower | 176 ns: 1.12x slower  |
+-------------------------------------+---------+------------------------+-----------------------+
| replace translate len=1000          | 178 ns  | 2.23 us: 12.51x slower | 276 ns: 1.55x slower  |
+-------------------------------------+---------+------------------------+-----------------------+

@vstinner vstinner marked this pull request as draft October 10, 2024 14:50
@vstinner
Copy link
Member Author

For the benchmark, I had to call directly the error handler functions. Using ASCII doesn't work (it doesn't measure my change), since Objects/unicodeobject.c has a fast-path for most common error handlers such as replace and backslashreplace.

@vstinner
Copy link
Member Author

Let's keep the private API for now, since it's faster.

@vstinner vstinner closed this Oct 15, 2024
@vstinner vstinner deleted the writer_codecs branch October 15, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants