Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to return dropped Body to pool #94

Open
davidiw opened this issue Jun 25, 2020 · 8 comments
Open

Attempt to return dropped Body to pool #94

davidiw opened this issue Jun 25, 2020 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@davidiw
Copy link
Contributor

davidiw commented Jun 25, 2020

In one of my apps, I use ureq to read/write from a key/value server. When I perform writes, the status response is sufficient to know failure or success. However, if I do not read the response body, the connection doesn't return to the pool since the data remains on the stream. This seems odd and requires in my app adding an explicit read on the response in order to allow the stream to return to the pool. I was curious why this is the case and whether or not we could just flush the buffer:

https://github.com/algesten/ureq/blob/master/src/pool.rs#L218

@jsha
Copy link
Collaborator

jsha commented Jun 25, 2020

At the HTTP protocol level, something has to read the full response body before the TCP connection is ready to send connections on again. In ureq, that's done with into_reader, into_string, etc. Reading diem/diem#4735 it looks like you do read the body, but only on 200's. That means 403's, 500s, etc will not get returned to the pool and get closed when the Response is dropped. If you're seeing something other than that, please let us know. It seems like what you most likely want is to unconditionally call into_string() and then check the status code.

We've talked recently about a possible breaking change to the API to use Result. I'm planning to write up a detailed proposal soon, but I think that returning an Err for connection-breaking errors would perhaps make it clearer that anything receiving an Ok needs to read the stream.

Also on diem/diem#4735 I see someone mention implementing drop. I assume that means "read all outstanding response bytes when Response is dropped." I think that's not the right approach, since that can wind up performing large amounts of work or even blocking indefinitely.

@algesten
Copy link
Owner

algesten commented Jun 26, 2020

The connection is auto-put back in the pool on drop (if possible), we could make some explicit API for that.

On the Response object we could add something like this:

  fn is_poolable(&self) -> bool {
    // check if response is exhausted
  }

  fn return_to_pool(self) {
    // 1. exhaust response into void
    // 2. by virtue of taking `self`, the rest just happens
  }

The idea is that it would aid workflows where pooling is desirable.

@davidiw
Copy link
Contributor Author

davidiw commented Jun 26, 2020

Yeah, in my case, it is obvious that recycling should happen without any issues, but I can totally see that as @jsha alluded this cannot be the default behavior -- if someone attempts to download a file just to see if it exists, for instance, a naive approach would end up downloading the entire file /just/ to reuse a connection.

@jsha
Copy link
Collaborator

jsha commented Jun 26, 2020

I'm not keen on adding an explicit return_to_pool, when we already have various functions that accomplish the same goal, like into_string(). I would rather make the API more obvious / easier to use in a way that maximizes efficiency.

Here are a couple of other interventions that could increase connection pooling in cases where the user isn't necessarily calling into_string() or into reader():

  • If there is no response body (that is, the response headers do not contain Content-Length or Transfer-Encoding), go ahead and return the connection to the pool immediately without waiting for into_string() / into_reader(). Store some internal state to remember that we gave up the Stream and return a zero-length reader if asked.
  • If the response body is known to be zero-length (Content-Length: 0), do the same thing.
  • Similarly, if the response body is known to be small (Content-Length: N for N < 1024), proactively read the body into an internal buffer, and return a Cursor over that buffer when asked for into_reader().

By the way, thinking about this made me realize another trickiness of the API. Since into_reader() / into_string() consume the Response, the status code and headers become unavailable to users once those methods are called. I think in many cases the user may want to do something like:

  let resp = ureq::get("http://example.com").call();
  let resp_body = resp.into_string()?;
  println!("response status {}, body {}", resp.status(), resp_body);

The above fails because resp was moved / consumed by into_string(). A user can work around it by storing a copy of the status code and headers before consuming the body, but it's a bit awkward.

I don't have a particular recommendation for how to change this API right now but will think about it some more.

@algesten
Copy link
Owner

algesten commented Jun 27, 2020

easier to use in a way that maximizes efficiency.

I like these ideas. Repooling early is not a thought that crossed my mind.

If we're considering breaking changes, one idea could be to use the informal rust standardized http crate. In hreq I made that a design goal, and it has some advantages for the user.

  • It's standard.
  • It's agnostic wrt body.
  • Body is separate.

A common approach with http would be into_parts(). Here's some fantasy API:

use ureq::Body;
let resp: http::Response<Body> = ureq::get("http://example.com").call();
let (parts, body) = resp.into_parts();

println!("response status {}, body {}", parts.status, body.into_string()?);

And we could use extension traits to make it even more ergonomic.

use ureq::ResponseExt;
use ureq::Body;
let resp: http::Response<Body> = ureq::get("http://example.com").call();

println!("response status {}, body {}", resp.status(), resp.read_string()?);

@tv42
Copy link

tv42 commented Jan 13, 2021

For what it's worth, in a similar scenario of wanting to reuse a socket with unread response leftovers, Go's net/http consumes & discards up to 2 KiB of data, abandoning the socket if it would have to read more. If that doesn't result in the end of response, the socket is thrown out. It also checks Content-Length and doesn't bother reading at all if it looks like there'd be too much to consume.

https://github.com/golang/go/blob/go1.15.6/src/net/http/client.go#L693-L701

(Tangent: this is only true for HTTP/1, HTTP/2, /3 are different beasts.. Not that that would matter for ureq as it is today.)

@jsha
Copy link
Collaborator

jsha commented Jan 13, 2021

That's a very helpful reference. Thank you!

@algesten algesten changed the title Stream (socket) pooling depends on reading streams to completion Read up to 2kb of data for a Dropped body to attempt to return to pool Nov 26, 2024
@algesten algesten changed the title Read up to 2kb of data for a Dropped body to attempt to return to pool Attempt to return dropped Body to pool Nov 26, 2024
@algesten algesten added the help wanted Extra attention is needed label Nov 26, 2024
@algesten
Copy link
Owner

In ureq3. x this still needs doing. Dropping the Body before it is finished currently results in that connection being gone.

  • A new config option to attempt to return connections of dropped bodies to the pool. Defaults to false.
  • impl Drop trait for the relevant types
  • Attempt to discard up to 2kb (or configurable amount)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants