Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast bytearray to string #3707

Merged
merged 12 commits into from
Feb 23, 2022
Merged

Cast bytearray to string #3707

merged 12 commits into from
Feb 23, 2022

Conversation

porrashuang
Copy link
Contributor

@porrashuang porrashuang commented Feb 9, 2022

Description

  • Support bytearray to string casting
  • Add test case
  • Rename load_bytes to load_raw

A follow-up change for #2799 to complete bytearray functionalities.

Suggested changelog entry:

Support bytearray casting to string

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great, with a couple extra tests. I'm not too sure about my if (bytes) worry. Let's see what other maintainers think.

Comment on lines 141 to 142
"""Tests the ability to pass bytearray to C++ string-accepting functions. Note that this is
one-way: the only way to return bytes to Python is via the pybind11::bytearray class."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is slightly misleading and this test is completely clear without the comment.

Copy link
Contributor Author

@porrashuang porrashuang Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the correct term should be return bytearray to Python. I think one-way is true that bytearray has to form explicitly with py::bytearray similarly as py::bytes? I can correct that or do you think both bytes/bytearray tests are self-explainable?

def test_bytearray_to_string():
"""Tests the ability to pass bytearray to C++ string-accepting functions. Note that this is
one-way: the only way to return bytes to Python is via the pybind11::bytearray class."""
# Issue #2799
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a PR, not an Issue. I think it's better to delete this, too. We have the git history with full context.

# Issue #2799
assert m.string_length(bytearray(b"Hi")) == 2
assert m.strlen(bytearray(b"bytearray")) == 9

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests with empty bytearrays?

And with malformed utf-8, to prove that there really isn't any encoding (see e.g. malformed_utf8 in test_pytypes.py).

I looked at the Python C code underneath PyByteArray_AsString, it special-cases empty arrays. But in any case, tests for corner cases are best practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, an empty bytearray test case will be added.
For malformed_utf8, I'm not sure it's the same case as casting to python str. In test_pytypes, it's casting a single byte to str, and str is not fulfilled with a utf-8 resulting "b'\x80'". While passing b"\x80" to C++ string it would be treated as a byte and has size 1. Is this the case you're looking for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation/idea is very simple and high-level:

  • We want to be sure this works for malformed utf-8.

We know the current implementation does. We want to be sure that's not accidentally broken somehow (e.g. refactoring).

My suggestion was based on past experience, where that actually happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, the test is added. I'm trying to make sure assert m.string_length(bytearray(b"\x80")) == 1 is the behavior we're looking for 😊

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it just be better to parameterize the bytes_to_str tests and have them be rerun with all the bytes args wrapped as bytearrays(via pytest fixture for instance? or by having a function that either returns the bytes or the bytes wrapped in a bytearray depending on test parameters)

Copy link
Contributor Author

@porrashuang porrashuang Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Skylion007, I also came up wrapping up the params for better coverage. Considering there're at least 3 dimensions (bytearray vs bytes, string_lenth vs strlen, bytearray with/without params), and a few corner cases(empty bytearray), I would suggest the current intuitive version should give us similar coverage with simplicity. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

// We were passed a bytearray; accept it into a std::string or char*
// without any encoding attempt.
const char *bytes = PyByteArray_AsString(src.ptr());
if (bytes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (!) this, and the existing code above, are a bit awkward, because bytes will always be true in this context, or something bad happened upstream (e.g. a bug in an alpha release of a new Python version), and this code will silently ignore it. Also, assuming bytes could actually be legitimately false, do we need to call PyErr_Clear()?

Maybe:

    if (!bytes) pybind11_fail("Unexpected PyByteArray_AsString() failure.");

Or just remove the if and let it segfault? Almost seems best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain if there will be a case bytes is nullptr/NULL. I agree with you we better shout out for such an error.
I'll wait for more comments for this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added the checks, thank you!

This PR looks great to me. The only reason I'm holding off clicking the approve button is that I want to run this through our (Google's) global testing. It will probably take me a couple days to restore my ability to do that; I need to merge master into the smart_holder branch first, which will take some effort after the sweeping clang-format changes.

I'll get back here asap.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passed global testing! (2022-02-18 08:00 batch: OCL:429501428:BASE:429631186:1645218189257:b364f587)

The magic long number helps me keeping track, JIC there are questions later.

@rwgk
Copy link
Collaborator

rwgk commented Feb 18, 2022

@kururu002 could you please add a suggestion for the changelog in the PR description, and also give some context for the reference to 2799 (just one terse sentence is fine)?

@porrashuang
Copy link
Contributor Author

@kururu002 could you please add a suggestion for the changelog in the PR description, and also give some context for the reference to 2799 (just one terse sentence is fine)?

Thank you for testing! I updated the description, not sure what the suggestion for the changelog is for, so I follow characteristics from other PRs.

@Skylion007 Skylion007 merged commit da15bb2 into pybind:master Feb 23, 2022
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Feb 23, 2022
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants