-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: io and os reform: initial skeleton #517
Conversation
I will have to read this in more detail tomorrow, but I just wanted to mention that it seems that adding |
// these all return partial results on error | ||
fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... } | ||
fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... } | ||
fn read_at_least(&mut self, min: uint, buf: &mut [u8]) -> NonatomicResult<uint, uint, Err> { ... } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's never been totally clear to me what the exact use case for this is. Is this method ever called with min
not equal to buf.len()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually I've considered this in terms of a buffered reader. For example if you ask for 10 bytes from a buffered reader, the buffered reader can pass its entire buffer to the underlying reader, but request that only 10 bytes be actually read. In that sense I think it's a bit of a performance optimization where you're willing to accept a huge amount of bytes but only require a few.
I don't think this is implemented or used much in practice though, so the benefit may be fairly negligible to have the extra argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually requesting only 10 bytes sounds different than what this function's name describes. For the "only read 10 bytes" case, I'd expect one would pass a buffer.slice_to(10)
to read
(well, some form of read that always reads the amount requested).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that sense I think it's a bit of a performance optimization
I don't quite understand what kind of performance gain you expect.
- Reducing the number of
read()
-like calls againstBufferedReader
, or - reducing the number of
read()
calls byBufferedReader
against the underlying "real" stream such asFile
orTcpStream
?
The former will only save negligible number of nanoseconds (if any) because BufferedReader::read()
etc. are memory operations in user space. The latter is a matter of tuning internal parameters of BufferedReader
.
Did I miss anything? My understanding of its behavior is:
let mut b = [0u8, .. 30];
let res = r.read_at_least(10, b.slice_to_mut(20));
res
can be any of Ok(10)
, Ok(15)
, Ok(20)
, or Err(PartialResult(5, EndOfFile))
. It will be tedious to change how to cook the content of b
depending on the Ok()
value.
I can't think of any practical usages of read_at_least()
. @alexcrichton How would you use it in, say, your tar
code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of a buffered reader, I would consider it a performance optimization in terms of the number of reads of the underlying stream. The buffered reader can pass down a very large buffer but only request that a tiny part gets filled, and if more than that is filled in then it results in, for example, fewer syscalls (in theory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton The current implementation of BufferedReader
simply calls read()
with its entire internal buffer. Then it should suffice to have a simpler convenience method like the below one in Reader
because it will be "inherited" by BufferedReader
:
fn read_exact(&mut self, buf: &mut [u8]) -> NonatomicResult<(), uint, Err> {
let mut read_so_far = 0;
while read_so_far < buf.len() {
match self.read(buf.slice_from_mut(read_so_far)) {
Ok(n) => read_so_far += n,
Err(e) => return NonatomicResult(read_so_far, e)
}
}
Ok(())
}
(cf. PR rust-lang/rust#18059 )
Deadlined { deadline: deadline, inner: inner } | ||
} | ||
|
||
pub fn deadline(&self) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/u64/Duration/
and `read_char` is removed in favor of the `chars` iterator. These | ||
iterators will be changed to yield `NonatomicResult` values. | ||
|
||
The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these being renamed? If not, they'll have to live in a different module from the BufferedReader
trait, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth thinking about if we want to keep BufferedStream::with_capacities
as is, or have it take a single size that's used for both buffers. I'm not really sure if anyone wants different buffer sizes for readers and writers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh! The renaming to BufferedReader
came late and I didn't catch this.
I'm not particularly happy with the trait name. @alexcrichton prefers Buffered
, but I worry that we may eventually want something for buffered writers.
Suggestions welcome.
re: with_capacities
, I agree; we could simplify for now, and add back this functionality later if needed.
I'm a bit worried about the timeout changes making some uses of the current infrastructure impossible, or maybe just painful/awkward. For example, rust-postgres provides an iterator over asynchronous notifications sent from the database. A method defined on the iterator is The current setup works fine, if a bit awkwardly: https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/message.rs#L279-L285 With the new infrastructure, it'll still be possible to take the same strategy, but probably through some kind of gross hackery like reading the first byte with a timeout, and then passing that byte to the main message read function without the timeout. What would really be ideal is to have the ability to wait on the socket for data to be ready to read for a certain period of time. Is something like that feasible to implement before 1.0 in a cross platform manner? |
I don’t understand why this is undefined behavior and |
I like the explanation linked there. Good find. 👍 |
|
||
impl OsStr { | ||
pub fn from_str(value: &str) -> &OsStr; | ||
pub fn as_str(&self) -> Option<&str>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be -> Result<&str, ()>
?
Or as a larger point (perhaps out of scope for this RFC), should Option<T>
return types be Result<T, ()>
instead when None
kind of represents an error, in order to interoperate with try!
and other error-handling infrastructure we might add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently changed from_utf8
to return Result<&str, Utf8Error>
, so this can probably pick up that error. I suspect this will probably just continue to return the same value as str::from_utf8
.
I do think that in general Option
should only be used where None
is a normal value, not an error (to use with try!
as you pointed out). In the second pass of stabilization we're going to look closely at all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A string not being UTF-8 or a file not existing aren't any more of an error than a key not existing in a map. Attempting to open a file or parse text is also a way to discover if what you were looking for was there, just like a map lookup. There are few remaining use cases for Option
if it's not meant to be used this way... any missing value can be considered an error, just like a missing file / whatever.
Currently Rust |
``` | ||
|
||
In addition, `read_line` is removed in favor of the `lines` iterator, | ||
and `read_char` is removed in favor of the `chars` iterator (now on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are occasionally very useful when you don't need to read the entire stream but only a few lines or characters. I have several of these in my code base. The supposed replacement
let line = r.lines().next().unwrap()
doesn't look really good.
Also, why chars()
is on Reader
? Doesn't reading characters require buffering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we were thinking that the code you listed would be the replacement for a bare read_line
. In general we're trying to move as much functionality to iterators as possible, and we could possibly add some form of method on an iterator which peels off the first element, failing if it's None
if this becomes too unergonomic.
The current implementation for chars()
doesn't actually use buffering at all, it just peeks at a byte and then might read some more bytes. We thought that if we're exposing bytes()
on Reader
which is not speedy unless buffered, then we may as well expose chars()
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@netvl Note that the .unwrap()
part (or something like try!()
) is necessary in either cases. I don't think
let first = r.lines().next().unwrap();
let second = r.lines().next().unwrap();
or
let mut lines = r.lines();
let first = lines.next().unwrap();
let second = lines.next().unwrap();
looks that worse than
let first = r.read_line().unwrap();
let second = r.read_line().unwrap();
As for chars()
, any Unicode character in UTF-8 occupies at most 6 bytes. So read()
into a fixed array on stack will suffice.
RFC discusses the most significant problems below. | ||
|
||
This section only covers specific problems with the current library; see | ||
[Vision for IO] for a higher-level view. section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...for a higher-level view. section.
Typo?
The |
@aturon given that the eventual goal is to have both blocking and nonblocking implementations of the Summary: Long Story: Option 1: Call the APIs
Option 2: Call the APIs
Option 3: Call the APIs
I would be happy with either Option 2 or 3, preferring 3. Option 1 just leaves me a little uneasy.. |
@yazaddaruvala I think it's more practical to have |
I don't understand this @nodakai. |
@tshepang I meant re-export such as |
@nodakai I understand the desire to have a "default" io, especially given that its currently the only io. But is it really the right long term philosophy? If the only current difference is I've only played around with Nodejs a little, and some of its ideas are definitely controversial, but I think everyone can agree the one thing it did really well was educate all of its users about the difference between blocking and nonblocking io. And yes you could achieve this through documentation.. but similar to immutable by default, syntax/explicitness (at minimal cost) is the best way to educate people. |
I've actually been doing a lot of experimentation and work on non-blocking IO and I can say that I do think that blocking IO is a better default model for Rust. It fits much better in with the borrow system for resource management, since the usage of all resources is deterministic relative to the structure of the code, whereas this is not true at all for asynchronous actions. Additionally, until (if?) Rust has true The non-clean, extremely low-level bindings for asynchronous IO already exist in the form of mio, and I really think there is no need to integrate these things into |
|
I agree with what several others have said here: I think |
I agree, too. |
An amendment for |
Regarding impl<T> Vec<T> where T: Copy {
pub unsafe fn fill_more<F, E>(&mut self, len: usize, op: F) -> Result<usize, E>
where F: FnOnce(&mut [T]) -> Result<usize, E>
{ ... }
} |
@mzabaluev: Or maybe |
…nstructors, r=alexcrichton `std::io` does not currently expose the `stdin_raw`, `stdout_raw`, or `stderr_raw` functions. According to the current plans for stdio (see rust-lang/rfcs#517), raw access will likely be provided using the platform-specific `std::os::{unix,windows}` modules. At the moment we don't expose any way to do this. As such, delete all mention of the `*_raw` functions from the `stdin`/`stdout`/`stderr` function documentation. While we're at it, remove a few `pub`s from items that aren't exposed. This is done just to lessen the confusion experienced by anyone who looks at the source in an attempt to find the `*_raw` functions.
This RFC proposes a significant redesign of the
std::io
andstd::os
modulesin preparation for API stabilization. The specific problems addressed by the
redesign are given in the Problems section below, and the key ideas of the
design are given in Vision for IO.
Note about RFC structure
This RFC was originally posted as a single monolithic file, which made
it difficult to discuss different parts separately.
It has now been split into a skeleton that covers (1) the problem
statement, (2) the overall vision and organization, and (3) the
std::os
module.Other parts of the RFC are marked with
(stub)
and will be filed asfollow-up PRs against this RFC.
Rendered