Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in "Timed out in bb8" #67

Closed
m9xiuz opened this issue May 19, 2020 · 31 comments
Closed

Stuck in "Timed out in bb8" #67

m9xiuz opened this issue May 19, 2020 · 31 comments

Comments

@m9xiuz
Copy link

m9xiuz commented May 19, 2020

I have a web api application that uses bb8-postgres, tokio_postgres and hyper. When I started to do performance tests, I noticed that if there're too many requests, it shows this error "Timed out in bb8" and it doesn't go away, only restart helps to get rid of it.
I do create a single instance of bb8::Pool and clone it for each request. Then, I do db_pool.run(|conn| async {Ok((<some_func_that_does_queries_using_conn>.await, conn))}).await.
I'm not sure if it's a bug or I'm doing something wrong. Could you help me, please?

@djc
Copy link
Owner

djc commented May 19, 2020

I'm afraid there are still ways to "leak" connections out of the pool. Can you tell me how you have configured/instantiated the Pool?

@m9xiuz
Copy link
Author

m9xiuz commented May 19, 2020

pub type Pool = bb8::Pool<bb8_postgres::PostgresConnectionManager<tokio_postgres::NoTls>>;
pub type Conn = tokio_postgres::Client;

pub async fn create_pool(settings: &Settings) -> Result<Pool, tokio_postgres::Error> {
    let db_settings = &settings.database;

    let pg_mgr = PostgresConnectionManager::new_from_stringlike(
        format!(
            "postgresql://{}:{}@{}:{}/{}",
            &db_settings.user, &db_settings.pass,
            &db_settings.host, &db_settings.port,
            &db_settings.name,
        ),
        tokio_postgres::NoTls,
    ).map_err(|err| {
        error!("{}", err);
    }).unwrap();

    Pool::builder().build(pg_mgr).await
}

@m9xiuz
Copy link
Author

m9xiuz commented May 19, 2020

and then I pass it hyper like this:

    let db_pool = create_pool(settings).await?;
    let addr = (host, port).into();

    let make_service = make_service_fn(move |_| {
        let db_pool = db_pool.clone();
        async move {
            Ok::<_, Error>(service_fn(move |req| {
                let db_pool = db_pool.clone();
                
                async move {
                    handle_request(req, db_pool).await
                }
            }))
        }
    });

@m9xiuz
Copy link
Author

m9xiuz commented May 19, 2020

In Cargo.toml:
bb8-postgres = "0.4"
and I use re-exported bb8 and tokio-postgres from https://docs.rs/bb8-postgres/0.4.0/bb8_postgres/#reexports

@djc
Copy link
Owner

djc commented May 21, 2020

Can you try with #68? I think this should fix the problem.

@m9xiuz
Copy link
Author

m9xiuz commented May 21, 2020

I've changed Cargo.toml:
bb8-postgres = { git = "https://github.com/khuey/bb8", branch = "inner-guard" }
and rebuilt the app, and it didn't help.
Btw, I've already switched to another connection pool library and it works great.

@fakeshadow
Copy link

fakeshadow commented May 21, 2020

It could be because the unfairness of Mutex. Certain threads would acquire the lock quickly while others starving. It's only noticeable when the lock is under very heavy load.

@djc
Copy link
Owner

djc commented May 21, 2020

Interesting. I pushed a change in 8ec0c0a to switch to tokio's Mutex, which is fair.

@m9xiuz
Copy link
Author

m9xiuz commented May 21, 2020

This commit also didn't help solve the problem. With this connection pool https://github.com/bikeshedder/deadpool everything is ok. Maybe you could check how it's implemented and do the same thing, but I suspect its implementation is fundamentally different.

@djc
Copy link
Owner

djc commented May 21, 2020

Yeah, deadpool uses a very different approach. I have one more idea which is in progress, would be happy if you could test that as soon as I finish it.

@m9xiuz
Copy link
Author

m9xiuz commented May 22, 2020

Ok, I'm still here.

@djc
Copy link
Owner

djc commented May 22, 2020

I just pushed some more changes to the inner-guard branch. If you have a chance to test those, that would be great!

@m9xiuz
Copy link
Author

m9xiuz commented May 22, 2020

No, it doesn't fix the problem. The error message doesn't appear now, requests just hang forever.

@djc
Copy link
Owner

djc commented May 22, 2020

So how are you stress-testing your project?

@m9xiuz
Copy link
Author

m9xiuz commented May 22, 2020

I used jmeter. But I noticed that this problem appears even if I simply take the url of the slowest GET request (~100ms), put it into Firefox's address bar and press F5 frequently and too many times (it takes ~10 seconds for the problem to appear.) So your changes I tested this way and didn't use jmeter tests.
After your last change, firsts requests are working fine, but eventually it hangs forever, I cannot send any request anymore, neither from the browser, nor from curl. But the error message doesn't appear. It just behaves as if all timeouts are infinite and that's why it doesn't complain about timeout errors anymore, but the problem is there anyway.

@sync
Copy link

sync commented May 23, 2020

i was experiencing the same issue when testing pooling with redis, would get stuck in the Timed out in bb8, when refreshing my browser a couple of time, using warp and websocket

@fakeshadow
Copy link

Interesting. I pushed a change in 8ec0c0a to switch to tokio's Mutex, which is fair.

I was just guessing so I could be wrong. Altough tokio's mutex is backed by a FIFO linked list which makes it fair but it also make the connection returning to pool have a very long wait for the lock or even a dead lock. I imagine a lock where it's fair and can also give returning connection a highest priority in the wait list would be the best for bb8.

@fggarcia
Copy link

Hi @djc do you have any progress about this issue? I know cdrs use bb8 as its connection pool to cassandra.

@djc
Copy link
Owner

djc commented Oct 23, 2020

I have a branch in progress, but haven't been able to spend much time on it recently. I'll see if I can spend some time on it soon.

@djc
Copy link
Owner

djc commented Nov 5, 2020

I've spent a bunch of time over the past days to improve the internals, and I think issues like this should be fixed. If you're still interested, please give current master a shot and let me know if it improves things for you.

@djc djc mentioned this issue Nov 5, 2020
@fggarcia
Copy link

fggarcia commented Nov 5, 2020

Hi @djc thank you so much. I read in another that someone has tried master version and issue is solved. Do you plan to release a new version as 0.4.x?

@djc
Copy link
Owner

djc commented Nov 5, 2020

No, in order to provide more robustness I've had to make some API changes. In particular, the run() API has been removed, and ManageConnection::is_valid() now takes a &mut PooledConnection. So these changes will be released as 0.5, though I was hoping to gather some feedback before I release it.

I'll likely also release the 0.6 version soon after, which will rely on tokio 0.3. That will not come with redis support until the redis crate is also updated to tokio 0.3, though.

This is just my thinking, any feedback on what people are looking for is much appreciated.

@djc
Copy link
Owner

djc commented Nov 15, 2020

0.5.0 has been released, so I'm going to close this issue for now. If you still see issues, please open a new issue!

@djc djc closed this as completed Nov 15, 2020
@lassipulkkinen
Copy link

This seems to be reproducible (again?) in 0.6.0. To clarify, I'm using bb8-postgres. This consistently happens when I stress test my server using wrk with a large number of connections.

@djc
Copy link
Owner

djc commented Dec 10, 2020

Can you say a bit more about how you've configured the pool and how you use it? Just pool.get()?

@lassipulkkinen
Copy link

lassipulkkinen commented Dec 10, 2020

Just pool.get(). The pool is configured with a maximum size of 20, and default values for everything else.

I made a wrapper around the future returned by Pool::get that logs when it's polled or dropped. Looks like some of those futures were being dropped before they completed, i.e. Hyper was canceling some request handlers early. Therefore this very likely has to do with bb8 not handling canceling properly. (EDIT: the earlier fixes seem to be related to this as well.)

As for why the handlers were being canceled, my best guess is that at the end of the benchmark wrk forcefully closed all the connections that were still waiting for a response, and Hyper decided that there was no reason to wait for their handlers to complete.

@lassipulkkinen
Copy link

lassipulkkinen commented Dec 12, 2020

This minimally reproduces the bug:

use std::{convert::Infallible, future::Future, task::Poll};

use async_trait::async_trait;
use futures::future::poll_fn;

struct ConnectionManager;
struct Connection;

impl bb8::ManageConnection for ConnectionManager {
    type Connection = Connection;
    type Error = Infallible;

    async fn connect(&self) -> Result<Self::Connection, Self::Error> {
        Ok(Connection)
    }

    async fn is_valid(&self, _: &mut bb8::PooledConnection<'_, Self>) -> Result<(), Self::Error> {
        Ok(())
    }

    fn has_broken(&self, _: &mut Self::Connection) -> bool {
        false
    }
}

// Works with flavor = "current_thread", too.
#[tokio::main]
async fn main() {
    let pool = bb8::Pool::builder().max_size(20).build(ConnectionManager).await.unwrap();

    let mut connections = Vec::new();

    // With flavor = "current_thread" this issue never occurs if the number of connections acquired
    // here is less then the pool size. With flavor = "multi_thread", however, this still
    // *sometimes* occurs as long as at least one connection is acquired.
    for _ in 0..20u32 {
        connections.push(pool.get().await);
    }

    let mut futures = Vec::new();

    for _ in 0..20u32 {
        futures.push(Box::pin(pool.get()));

        // Poll the future once.
        poll_fn(|context| {
            let _ = futures.last_mut().unwrap().as_mut().poll(context);
            Poll::Ready(())
        }).await;
    }

    // The order in which these are dropped is important, and the futures need to be dropped, not awaited.
    drop(connections);
    drop(futures);

    pool.get().await.unwrap(); // error!
}

@djc
Copy link
Owner

djc commented Dec 12, 2020

That is awesome, thanks! I'll check it out ASAP.

to266 added a commit to to266/eventually-rs that referenced this issue Dec 21, 2020
@djc
Copy link
Owner

djc commented Dec 21, 2020

@lassipulkkinen #91 fixes your minimal reproduction. Can you check if it fixes your actual application?

@lassipulkkinen
Copy link

@lassipulkkinen #91 fixes your minimal reproduction. Can you check if it fixes your actual application?

That fixed it, thanks!

@djc
Copy link
Owner

djc commented Dec 22, 2020

I released this fix as 0.5.2 (tokio 0.2) and 0.6.2 (tokio 0.3), thanks for the quick feedback!

ar3s3ru added a commit to get-eventually/eventually-rs that referenced this issue Dec 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants