-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cast bytearray to string #3707
Cast bytearray to string #3707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great, with a couple extra tests. I'm not too sure about my if (bytes)
worry. Let's see what other maintainers think.
tests/test_builtin_casters.py
Outdated
"""Tests the ability to pass bytearray to C++ string-accepting functions. Note that this is | ||
one-way: the only way to return bytes to Python is via the pybind11::bytearray class.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is slightly misleading and this test is completely clear without the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the correct term should be return bytearray to Python
. I think one-way is true that bytearray has to form explicitly with py::bytearray similarly as py::bytes? I can correct that or do you think both bytes/bytearray tests are self-explainable?
tests/test_builtin_casters.py
Outdated
def test_bytearray_to_string(): | ||
"""Tests the ability to pass bytearray to C++ string-accepting functions. Note that this is | ||
one-way: the only way to return bytes to Python is via the pybind11::bytearray class.""" | ||
# Issue #2799 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a PR, not an Issue. I think it's better to delete this, too. We have the git history with full context.
# Issue #2799 | ||
assert m.string_length(bytearray(b"Hi")) == 2 | ||
assert m.strlen(bytearray(b"bytearray")) == 9 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add tests with empty bytearray
s?
And with malformed utf-8, to prove that there really isn't any encoding (see e.g. malformed_utf8
in test_pytypes.py).
I looked at the Python C code underneath PyByteArray_AsString
, it special-cases empty arrays. But in any case, tests for corner cases are best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, an empty bytearray test case will be added.
For malformed_utf8, I'm not sure it's the same case as casting to python str. In test_pytypes, it's casting a single byte to str, and str is not fulfilled with a utf-8 resulting "b'\x80'". While passing b"\x80" to C++ string it would be treated as a byte and has size 1. Is this the case you're looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation/idea is very simple and high-level:
- We want to be sure this works for malformed utf-8.
We know the current implementation does. We want to be sure that's not accidentally broken somehow (e.g. refactoring).
My suggestion was based on past experience, where that actually happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, the test is added. I'm trying to make sure assert m.string_length(bytearray(b"\x80")) == 1
is the behavior we're looking for 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it just be better to parameterize the bytes_to_str tests and have them be rerun with all the bytes args wrapped as bytearrays(via pytest fixture for instance? or by having a function that either returns the bytes or the bytes wrapped in a bytearray depending on test parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Skylion007, I also came up wrapping up the params for better coverage. Considering there're at least 3 dimensions (bytearray vs bytes, string_lenth vs strlen, bytearray with/without params), and a few corner cases(empty bytearray), I would suggest the current intuitive version should give us similar coverage with simplicity. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
include/pybind11/cast.h
Outdated
// We were passed a bytearray; accept it into a std::string or char* | ||
// without any encoding attempt. | ||
const char *bytes = PyByteArray_AsString(src.ptr()); | ||
if (bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (!) this, and the existing code above, are a bit awkward, because bytes
will always be true in this context, or something bad happened upstream (e.g. a bug in an alpha release of a new Python version), and this code will silently ignore it. Also, assuming bytes
could actually be legitimately false, do we need to call PyErr_Clear()
?
Maybe:
if (!bytes) pybind11_fail("Unexpected PyByteArray_AsString() failure.");
Or just remove the if
and let it segfault? Almost seems best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain if there will be a case bytes is nullptr/NULL. I agree with you we better shout out for such an error.
I'll wait for more comments for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you added the checks, thank you!
This PR looks great to me. The only reason I'm holding off clicking the approve button is that I want to run this through our (Google's) global testing. It will probably take me a couple days to restore my ability to do that; I need to merge master into the smart_holder branch first, which will take some effort after the sweeping clang-format changes.
I'll get back here asap.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passed global testing! (2022-02-18 08:00 batch: OCL:429501428:BASE:429631186:1645218189257:b364f587)
The magic long number helps me keeping track, JIC there are questions later.
@kururu002 could you please add a suggestion for the changelog in the PR description, and also give some context for the reference to 2799 (just one terse sentence is fine)? |
Thank you for testing! I updated the description, not sure what the suggestion for the changelog is for, so I follow characteristics from other PRs. |
Description
A follow-up change for #2799 to complete bytearray functionalities.
Suggested changelog entry: