Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() #120639
gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() #120639
Changes from 2 commits
8aa73b7
788a85f
e67a8b4
de56475
e48eec7
75fa8ba
3f284f8
1e018d2
6f29c53
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This do nothing in non-debug build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions are always built in _testcapi.c: the NDEBUG macro is undefined early in parts.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test surrogate pairs and non-BMP characters.
Since the code depends on the kind of the buffer string, you need to test different combinations: write different strings after writing a UCS2 or UCS4 string.
I suggest to implement in C a function which creates a PyUnicodeWriter, write the first argument as a Python string, then covert the second argument to the
wchar_t*
string and write it with size specified as optional third argument, and return the result. This helper function can be called in Python code with different arguments. The result will be checked even in non-debug build. You can test much more cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this API will be used a lot to build Python Unicode objects from wchar_t input, I think it's better to try to optimize it and avoid creating a temporary object.
The PyUnicode_FromWideChar() could be refactored using a private helper shared by both PyUnicode_FromWideChar () and this PyUnicodeWriter_WriteWideChar() to make this possible: https://github.com/python/cpython/blob/main/Objects/unicodeobject.c#L1794
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I optimized PyUnicodeWriter_WriteWideChar(). I ran a benchmark on
_testcapi.test_unicodewriter_widechar()
:It's a 1.4x faster, so it's worth it. It saves around 53 ns for 3 calls to
PyUnicodeWriter_WriteWideChar()
.