-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers.ErrBadConn returned for DB.QueryRowContext.Scan when context is cancelled #1062
Comments
We are facing similar issue, cancelling the context by http server when user cancels the request triggers |
FWIW, I haven't seen unexpected connection pool exhaustion or problematic reconnects. The workaround I put in my code is to check for the database/sql/driver.ErrBadConn explicitly and treating it similarly to how I treat context.Canceled errors: marking it a user error, not a server error. That saves me getting alerted. |
…d make rows.Next return it (#1064) * add reproducer for issue 1062 see #1062 * Keep track of an error for a connection. Instead of just whether the connection is ErrBadConn. Often times, the error will still be ErrBadConn. But for expired/cancelled contexts, it will be the error for the context. Most functions still return ErrBadConn per the database/sql/driver contract ("ErrBadConn should only be returned from [...] a query method"). For rows.Next() we return the context-related error. The database/sql/driver contract doesn't look very precise. Is Next a "query method" and should database/sql handle ErrBadConns when Next returns them? Do we have more functions that should return the canceled-context error message? * in test for QueryRowContext, fail for all unexpected errors, not just for the one unexpected error * in TestContextCancel*, accept context.Canceled as valid error * explicitly test for driver.ErrBadConn in test that breaks the connection feedback from otan * move the mutex-protected data in a struct with the mutex, and fix an unsafe usage. there may be unsafeness in this file. feedback from otan
This issue was solved in release 1.10.4 via via #1064 ? |
Should be! |
I remember from last looking at the code thinking that lib/pq may not be adhering fully to the requirements set by database/sql about when it returns ErrBadConn. You could investigate in that direction... |
I have been seeing unexpected "driver: bad connection" (driver.ErrBadConn) errors in my logging. In my case, these are returned by calls to database/sql's DB.QueryRowContext.Scan (and likely also on Tx's) when their context is cancelled. Cancelling contexts happens for me in practice because I pass contexts from http.Request's on to lib/pq. I believe DB.QueryRowContext.Scan should not be made to return driver.ErrBadConn when contexts are cancelled.
I'll add a small reproducer after I've created the issue.
I looks the following ordered events take place in lib/pq:
conn.QueryContext
,conn.query()
returns successfully.conn.watchCancel()
(scheduled inconn.QueryContext
) handles the cancellation, callingconn.setBad()
to mark the connection.rows.Next
(forScan
afterDB.QueryRowContext
) checks the connection withconn.getBad()
and returns a driver.ErrBadConn.I would expect the
Scan
on the sqlRow
to return context.Canceled in this case.Perhaps lib/pq's connections could start to keep track not just whether connections are bad, but also the error to return to future calls on the connection. For expired contexts, the errors would become eg context.Canceled or context.DeadlineExceeded.
The text was updated successfully, but these errors were encountered: