Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable writing to python stdio streams #3920

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

goulart-paul
Copy link

Enables creation of handles for printing directly to python sys.stdout and sys.stderr. This is a (partial) fix for #2247.

Usually calling Rust's println!result's in output appearing in the python interpreter. However, in some cases this fails (particularly in some, but not all, Jupyter notebooks and on Google Colab) because the Rust std::io::stdout and std::io::stderr streams are not redirected to match Python's sys.stdout and sys.stderr.

This does not directly fix the problem with println!, but instead enables printing via
writeln!(pyo3::stdio::stdout(),...)

I have not written a unit test for this because it's unclear what such a test should do other than not crash. I don't see an obvious way of checking via a unit test whether text piped to a python stream actually appears.

@davidhewitt
Copy link
Member

Thanks for opening this PR! I will do my best to review tomorrow.

src/stdio.rs Outdated Show resolved Hide resolved
src/stdio.rs Outdated Show resolved Hide resolved
src/stdio.rs Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member

Thanks @adamreichold for picking up the review 👍

As well as the above, I think tests are definitely worth adding here. It would be possible to achieve that by temporarily assigning the Python sys.stdout / sys.stderr streams to io.StringIO objects (or similar), use these APIs to write to them, and validate the final contents.

src/stdio.rs Outdated Show resolved Hide resolved
src/stdio.rs Outdated


struct PyStdio<T: PyStdioRawConfig> {
inner: LineWriter<PyStdioRaw<T>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the Rust-side line buffering here? From the Python documentation

When interactive, the stdout stream is line-buffered. Otherwise, it is block-buffered like regular text files. The stderr stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting the PYTHONUNBUFFERED environment variable.

I would infer that the line buffering is happening inside the sys.stdout/err Python objects (if it is desired/enabled) to which PySys_WriteStdout/err eventually defer. Meaning that we actually should not make a buffering decision here and also defer to whatever these Python objects decide for the buffer strategy.

Finally, this also makes me wonder whether we should go through the formatting machinery of PySys_WriteStdout/err at all instead of calling write on pystream same as we call flush. The code at

https://github.com/python/cpython/blob/5dc8c84d397110f9edfa56793ad8887b1f176d79/Python/sysmodule.c#L3895

at least does not seem to do anything more special than what we could do directly if intern! is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least does not seem to do anything more special than what we could do directly if intern! is used.

And we could avoid the cost of the runtime formatting machinery entirely, just passing the byte slice directly to write.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it with the LineWrite wrapper because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust, rather then getting full lines, i.e. seemingly no buffering was happening. I'm happy to remove it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that this is coming so piece-meal, but maybe we just want a public wrapper type that will adapt any Py<PyAny> as a Write impl by calling its write and flush methods, e.g.

pub struct PyWrite(Py<PyAny>);

impl PyWrite {
  pub fn new(ob: Py<PyAny>) -> Self;
}

impl Write for PyWrite { .. }

pub fn stdout(py: Python<'_>) -> PyResult<PyWrite> {
   let module = PyModule::import_bound(py, "sys")?;
   let stdout = module.getattr("stdout")?.into();
   PyWrite::new(stdout)
},

pub fn stderr() -> PyWrite;

(I am also wondering whether we should have a variant storing Bound<'py, PyAny> to avoid repeatedly calling with_gil in the Write impl. It should be easy enough to convert between PyWrite and PyWriteBound.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust

I think this depends on under what conditions you tested it, but in any case, the decision for buffering or not just be with the stream stored at sys.stdout/err, not with our wrapper of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

impl PyWriter {
    pub fn buffered(&self) -> LineWriter<PyWriter> {
        LineWriter::new(self)
    }
}

Or, alternatively, an unbuffered that works the other way around. Fine if not though, since it's easy enough to implement such a feature myself as a user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

We can certainly add a convenience method to add optional buffering. I would suggest calling it line_buffered though as it is a relatively specific buffering strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pyo3::stdio::stdout() were to create something implementing std::fmt::Write instead, then any use of writeln! requires different use declarations depending on context. It also looks as if std::io::Write is used throughout std::io, so doing something like reading from a file and printing to stdout would be awkward if I want to use std::io::copy.

I have to think about this but I am not really convinced as sys.stdout is not the same the Unix stdout on which Rust's std::io::stdout module is operatring, it is defined as a text stream and not a byte stream. I think if you want to expose an impl of std::io::Write, you should use sys.stdout.buffer as the backing stream and not sys.stdout. But note that this will not always be available. (Silently corrupting the written data using from_utf8_lossy is not option IMHO as it is just too surprising if e.g. std::io::copy produces garbage ZIP archives for that reason.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a solution would be to implement new for PyWriterBound, and to test at the point of object creation whether the inner PyAny is an instance of either io.BufferedIOBase or io.TextIOBase. Subsequent write behaviour could then be configured accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsequent write behaviour could then be configured accordingly.

But what would that be? I don't think we can support io::Write using io.TextIOBase. (The other way around works, i.e. fmt::Write can write into io.BufferedIOBase by calling as_bytes, but it would need to do that explicitly to avoid errors on the Python side.)

So personally, I think we should

  • Provide generic wrappers for io::Write and fmt::Write, or since you can here with a specific use case in mind, at least for fmt::Write.
  • Provide constructor functions like io::stdout (accessing sys.stdout.buffer) and fmt::stdout (accessing sys.stdout) which only succeed if the correct Python writer is present.

@goulart-paul
Copy link
Author

goulart-paul commented Mar 3, 2024

Thanks @adamreichold for picking up the review 👍

As well as the above, I think tests are definitely worth adding here. It would be possible to achieve that by temporarily assigning the Python sys.stdout / sys.stderr streams to io.StringIO objects (or similar), use these APIs to write to them, and validate the final contents.

Good idea. I had not thought to redirect the python side streams someplace else for inspection. I will give this a try.

Edit : unit test added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants