Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ToPyObject and IntoPy<PyObject> impls for PyBackedStr. #4205

Merged
merged 12 commits into from
Jun 1, 2024

Conversation

JRRudy1
Copy link
Contributor

@JRRudy1 JRRudy1 commented May 24, 2024

I have a PyBackedStr that I extracted from a PyObject, and would like to pass it back to Python through another method call. However, PyBackedStr currently does not implement the conversion traits necessary to do so without cloning. This PR simply adds these conversion methods, which is trivial since the PyBackedStr is already storing a pointer to the Python string as a PyObject.

@JRRudy1 JRRudy1 marked this pull request as ready for review May 24, 2024 22:06
Copy link

codspeed-hq bot commented May 24, 2024

CodSpeed Performance Report

Merging #4205 will not alter performance

Comparing JRRudy1:pybackedstr-intopy (43fb3f7) with main (cb34737)

Summary

✅ 67 untouched benchmarks

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this implementation makes complete sense and probably should have been added when I first added the type 🤦

I think PyBackedBytes could use the same treatment, perhaps?

Also, CI is failing.

…ls to the case where `cfg(any(Py_3_10, not(Py_LIMITED_API)))`. When this cfg does not apply, the conversion is less trivial since the `storage` is actually `PyBytes`, not `PyString`.
src/pybacked.rs Outdated
Comment on lines 89 to 96
#[cfg(any(Py_3_10, not(Py_LIMITED_API)))]
impl ToPyObject for PyBackedStr {
fn to_object(&self, py: Python<'_>) -> Py<PyAny> {
self.storage.clone_ref(py)
}
}

#[cfg(any(Py_3_10, not(Py_LIMITED_API)))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cfgs are unfortunate; I think it should still be possible to write the implementations even if it's a bit more work.

You are right to call out the complexity though; we should add some tests for these methods :)

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented May 25, 2024

I actually just noticed that the conversion is not as trivial when cfg(not(any(Py_3_10, not(Py_LIMITED_API)))) since the backing PyString gets converted to a PyBytes when the PyBackedStr is created; this isn't necessarily a problem, but means the object passed back to Python would be bytes instead of str as expected. I just added attributes to limit these impls to the trivial case, but maybe this is not the best solution. I see three options:

  1. limit the impls as I just did
  2. allow the impls unconditionally, accepting that the Python object will be bytes in some cases
  3. flesh out the impls to handle the limited-API case by converting back to PyString; I would need some advice on the correct way to do this.

Regarding the CI, do you have any tips on why it would be failing and what I can do to fix that? The change seems too small to actually effect anything

@davidhewitt
Copy link
Member

https://github.com/PyO3/pyo3/actions/runs/9238604511/job/25416822459?pr=4205#step:7:23

It looks like cargo fmt will rearrange the imports.

@davidhewitt
Copy link
Member

3. flesh out the impls to handle the limited-API case by converting back to PyString; I would need some advice on the correct way to do this.

Something like PyString::new(py, self) probably works, because self as a PyBackedStr will deref to a str.

JRRudy1 added 2 commits May 25, 2024 17:40
…`cfg(not(any(Py_3_10, not(Py_LIMITED_API))))` case by converting the `PyBytes` back to `PyString`.
@JRRudy1
Copy link
Contributor Author

JRRudy1 commented May 25, 2024

Something like PyString::new(py, self) probably works, because self as a PyBackedStr will deref to a str.

I guess I was overthinking it because I was expecting to need some ffi wizardry like PyUnicode_FromString or something haha. But is my understanding correct that the limited-API case will involve copying the whole string, while the other case simply copies a reference to the original string?

I updated the impls and will look into adding some tests.

@JRRudy1
Copy link
Contributor Author

JRRudy1 commented May 26, 2024

I went ahead and added ToPyObject and IntoPy<PyObject> impls for PyBackedBytes as well, but it is a bit unclear whether the PyBackedBytesStorage::Rust variant should covert to PyBytes or PyByteArray. Mirroring the FromPyObject impl suggests PyByteArray so this is what I went with, but now I'm questioning whether it would be more intuitive to consistently return PyBytes in both cases. The data is getting copied either way so I don't see a technical reason to prefer one over the other.

What do you think @davidhewitt?

I also added a couple tests, but let me know if you have any suggestions to improve them.

JRRudy1 added 2 commits May 25, 2024 19:22
…oduce `PyBytes` regardless of the backing variant. Updated tests to demonstrate this.
@JRRudy1
Copy link
Contributor Author

JRRudy1 commented May 29, 2024

I changed my mind and updated the PyBackedBytes conversion impls to return a Python bytes object regardless of the backing storage type. My logic is that the storage type is an implementation detail, and the user shouldn't have to know what it is in order to predict the resulting Python type. The Rust type is called "PyBackedBytes", so intuitively the Python type should be bytes as well. The only weird aspect is that the round-trip conversion would be PyByteArray -> PyBackedBytes -> PyBytes instead of returning to PyByteArray where it started, but I think that's a worthy tradeoff.

JRRudy1 added 2 commits May 29, 2024 12:47
…BackedStr` instead of `&str` since the latter is not supported under some `cfg`'s.
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great to me!

Overall I agree that getting bytes out in all cases is most user-friendly. They can always create a bytearray manually when needed.

Just one bit which I think needs fixing, then let's merge 👍

src/pybacked.rs Outdated Show resolved Hide resolved
src/pybacked.rs Outdated Show resolved Hide resolved
… both storage types as intended. Updated test to properly catch the error.
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super, LGTM! Thanks 👍

@davidhewitt davidhewitt enabled auto-merge May 31, 2024 15:01
@JRRudy1
Copy link
Contributor Author

JRRudy1 commented Jun 1, 2024

I think one of the runners crashed and the checks need to be restarted

@davidhewitt davidhewitt added this pull request to the merge queue Jun 1, 2024
Merged via the queue into PyO3:main with commit 5d47c4a Jun 1, 2024
41 checks passed
@JRRudy1 JRRudy1 deleted the pybackedstr-intopy branch June 2, 2024 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants