-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: Connection Pool Full error locks up vttablet #15745
Comments
I did a small review here. I think the vitess/go/mysql/sqlerror/constants.go Lines 51 to 63 in 64d9037
Interestingly, after the changes I introduced in the new Pool PR, none of the possible What you're seeing here is an error packet coming directly from MySQL, which we've translated and mis-reported as a Potential culprits:
Please let us know what you find. The issue is clearly arising from |
This ended up being related to this vtorc bug that consumed too many connections |
This issue may have been caused by this bug I also want to clarify this statement for future readers, since it had us scratching our heads for a while:
The connection pool implementation will return a |
Overview of the Issue
We're randomly seeing vttablet lock up with errors like this:
PoolFull: skipped 60 log messages
Once it gets into that state, health checks start failing and query serving never recovers until I restart the whole tablet pod.
This is a complete guess, but I wonder if this is happening in k8s because when vttablet is unhealthy, that gets reported to the k8s service, which then quits routing traffic, and the health check is waiting for new queries to become healthy again.
We haven't seen any connection pool errors in ~5 years (possible that we didn't notice logs, but there was never an outage). The only flag that was removed when upgrading from v18 to v19 was
--queryserver-config-query-cache-size 100
. @deepthi pointed out this PR from @vmg #14034 that was a major refactor of connection pools as being the first place to look.Maybe we're in an edge case because of our usage of message tables. Until v15, there was a separate flag/pool
--queryserver-config-message-conn-pool-size
for messaging connections, so maybe those aren't accounted for?Reproduction Steps
vttablet flags
Binary Version
Operating System and Environment details
Log Fragments
The text was updated successfully, but these errors were encountered: