Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add try fill #18059

Closed
wants to merge 1 commit into from
Closed

add try fill #18059

wants to merge 1 commit into from

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Oct 15, 2014

A more useful version of read_at_least.

@sfackler
Copy link
Member

Why does the behavior of this differ from read_at_least with respect to 0-byte reads?

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

It is not clear to me in which situations a Reader returns 0. This is the conservative choice that will always perform well.

@alexcrichton
Copy link
Member

Can you provide some rationale for why this method should be added? "A more useful version of read_at_least" isn't very useful. I also agree with @sfackler that this is inconsistent with the way 0-byte reads are handled elsewhere throughout the Reader trait, which is worrisome. To land, this will also require some tests, can you add some?

@sfackler
Copy link
Member

I'm also not sure why this is better than just calling read. Either way you have to be looping over it.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

Why do I have to loop?

@sfackler
Copy link
Member

It seems like the point of this is intended to be to avoid dealing with short reads, but you can still end up with a short read if read ever returns 0. Why would one use this instead of rdr.read_at_least(buf, buf.len())?

@sfackler
Copy link
Member

Another way of thinking about this is that the semantics of read_fill are identical to read, except that there's a higher probability of getting a full buffer.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

read_at_least is useless for any value of min except 1 because it throws the buffer away if it can't fill it or encounters EOF. It is also useless if you want performant code that doesn't do a thousand unnecessary function calls if read returns 0.

The behavior of try_fill is clearly documented, which means that you should not use it with Readers that can return 0, which should be all sensible Readers.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

Another way of thinking about this is that the semantics of read_fill are identical to read, except that there's a higher probability of getting a full buffer.

There is a zero probability of getting a full buffer if you read from stdin.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

cc @kballard

@lilyball
Copy link
Contributor

There is a zero probability of getting a full buffer if you read from stdin.

Only true if stdin is an interactive terminal and is being driven by a human (as opposed to, say, expect).

In any case, the only utility here is "a higher chance of getting a full buffer". Can you provide any compelling reason why that's useful enough to be in the standard library? If a client needs a certain number of bytes in order to process, they can use read_at_least(). If a client doesn't need that many bytes, then what benefit does using try_fill() have over simply handling the output ofread()?

As for being a "more useful version of read_at_least", that's not true at all. It doesn't serve the same purpose. read_at_least allows clients to avoid having to deal with 0-length reads, and it also does all the work of dealing with bad readers that return 0 indefinitely. try_fill() does neither.

@lilyball
Copy link
Contributor

Additionally, the behavior of try_fill() around 0-length reads is not very good. Why terminate when receiving a 0-length read? 0-length reads should be very uncommon, but they're certainly not errors. As an example, in Go, zero-length reads were returned sometimes from the TLS implementation, as part of their workaround for BEAST. Where OpenSSL took the approach of sending a 1-byte packet followed by an n-1-byte packet, Go's TLS implementation sent a 0-byte packet followed by an n-byte packet. Both approaches prevent the BEAST attack, but Go's approach resulted in TLS sockets sometimes producing 0-length reads.

Basically, a 0-length read does not mean that clients should stop processing anything. If your goal is "fill a buffer as much as possible", then you should read until you encounter an error. A 0-length read is meaningless. However, you should also be doing this reading using read_at_least(1), to protect against badly-behaving readers that return 0 indefinitely.


However, all that ends up being irrelevant, because this approach is fundamentally flawed. std::io provides a synchronous I/O interface, so issuing a second read after the first one returns is highly likely to block. If a Reader has 10 bytes buffered, and you issue a read() with a 1024-byte buffer, it will give you 10 bytes. If you then reissue a read() with the remaining 1014-byte buffer, it will block until it has another byte to give you (or until various implementation details cause a 0-length read, such as the TLS example given above). Surely the client of try_fill() does not want any blocking if any bytes have already been read.

In light of that, a function like try_fill() only makes sense if there exists an API for doing a non-blocking read. As std::io has no such API, try_fill() ends up being an unsuitable API. And of course if we did have a non-blocking read API, we'd still be back at the original challenge, which is providing a reason why try_fill() is worth having in the standard library.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

There is no justification given for Readers ever returning 0. Such a Reader is broken and should be fixed.

If a client doesn't need that many bytes, then what benefit does using try_fill() have over simply handling the output ofread()?

Algorithms that work on blocks of a certain size and pad only the last block will need a full buffer.

It doesn't serve the same purpose.

I'm not sure what purpose read_at_least serves at all. A Reader that returns 0 is broken and even if you want to handle such Readers, there is no need for the min argument.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

BEAST

I don't know anything about that and I doubt that it justifies adding this kind of API design that requires working around zero reads most of the time. In your workaround, you do 1000 unnecessary function calls in the worst case. Where is the justification for that? POSIX seems to do fine without 0 sized reads.

Surely the client of try_fill() does not want any blocking if any bytes have already been read.

I'm sure the client does because it is the purpose of try_fill to fill the complete buffer unless we're at EOF (ignoring 0 reads for the moment.)

In light of that, a function like try_fill() only makes sense if there exists an API for doing a non-blocking read.

Using try_fill for non-blocking reads would make no sense since try_fill would return an error as soon as there are no more bytes available. This is not the purpose of try_fill.

///
/// # Error
///
/// If an error different from EOF occurs at any point, that error is returned, and no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if EOF occurs? It's not returned, and further bytes are read??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If read
returns 0 or EOF, the total number of bytes read will be returned.

I guess I'll reformulate that a bit:

If read returns 0 or EOF, the total number of bytes read is returned immediately.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 15, 2014

In other words: Just like read_at_least is supposed to shield clients from having to deal with 0-sized reads, try_fill is supposed to shield clients from Readers that don't have all of their data available immediately (stdin, network, ..) The difference is that there is no justification for Readers ever returning 0. I will open an issue regarding this shortly.

After reading your Go-example again, I'm quite confused. You're talking about how Go created packages with a 0-sized payload, but this has nothing to do with Readers. It is very unlikely that a Rust implementation that would deal with such packages would use a Reader to forward the payload to the user.

@nodakai
Copy link
Contributor

nodakai commented Oct 17, 2014

read_at_least() is a weird API. br.read_at_least(n, vec.slice_to_mut(m)) is pretty much saying "Give me at least n bytes, and if you feel generous, you can give me more. But please don't be too generous, I'm not ready to receive more than m bytes." Given that BufferedReader is... well... a buffered reader, such an indeterministic API design benefits nobody (it could have been meaningful if it were a real system call and someone wanted to reduce the number of kernel/user context switches.) I'll always say "give me exactly m bytes, except at the (untimely?) end of the stream" with this pattern:

br.read_at_least(buf.len(), buf) // buf is a mut slice

I don't particularly support @mahkoh 's patch as it is now, but I'll be glad to see a convenient wrapper of read_at_least() which saves me from typing buf.len() over and over again.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 17, 2014

There will be no wrapper of read_at_least because it does a completely different thing. I'm not saying that you cannot improve its API, but if you change it enough to replace this function, then it has become the same function. If it can be agreed that Readers should never return 0(†), then read_at_least can be completely replaced by try_fill.

†: I've realized that, in the case where the buffer is empty, the return value should be unspecified.

@huonw
Copy link
Member

huonw commented Oct 17, 2014

(BTW, #18059 (comment) hasn't been addressed... or even acknowledged.)

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 17, 2014

@huonw: Why would I add tests before it is clear that this functionality will get merged?

@huonw
Copy link
Member

huonw commented Oct 17, 2014

Because any sane patch will have tests, tests can help clarify the intended behaviour and reviewing the tests is part of reviewing the patch. You could, at the very least, just inform us that you've seen the comment and will add tests when a decision is made (we can't read your mind).

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 17, 2014

When I opened the PR it was not clear to me that such a small function would even need a test. I will add one if I get a preliminary r+.

@huonw
Copy link
Member

huonw commented Oct 17, 2014

Tests are always necessary. Even a small function can have a typo (or a refactoring/renaming later can introduce a typo) and it is not acceptable for stdlib stuff to be untested (any places that are currently untested should be regarded as bugs).

Maybe it's fine in your own personal libraries, but this is functionality that may exist 'forever' and be used everywhere and anywhere, so it should be as bulletproof as possible. Of course bugs are hard to avoid, but having tests is a big part of the fight against them.

@mahkoh
Copy link
Contributor Author

mahkoh commented Nov 28, 2014

Time to close this one. See #18289

@mahkoh mahkoh closed this Nov 28, 2014
lnicola pushed a commit to lnicola/rust that referenced this pull request Sep 25, 2024
fix: Updating settings should not clobber discovered projects

`linkedProjects` is owned by the user's configuration, so when users update this setting, `linkedProjects` is reset. This is problematic when `linkedProjects` also contains projects discovered with `discoverCommand`.

The buggy behaviour occurred when:

(1) The user configures `discoverCommand` and loads a Rust project.

(2) The user changes any setting in VS Code, so rust-analyzer receives `workspace/didChangeConfiguration`.

(3) `handle_did_change_configuration` ultimately calls `Client::apply_change_with_sink()`, which updates
`config.user_config` and discards any items we added in `linkedProjects`.

Instead, separate out `discovered_projects_from_filesystem` and `discovered_projects_from_command` from user configuration, so user settings cannot affect any type of discovered project.

This fixes the subtle issue mentioned here: rust-lang/rust-analyzer#17246 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants