-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in `node': corrupted double-linked list (TLS / SSL crashes) #933
Comments
First, I think these are separate errors. I think the first "Handshake failed" error indicates that a client tried to connect with a TLS cypher set that the server does not support, and I'm less sure about the second error but my guess is that you're using mutual TLS and the client does not have a recognized certificate. The getpeername error is probably a relatively minor bug; I would bet that the error messages you're seeing are the main impact. The "corrupted double-linked list" error is one I have also seen very occasionally but I also do not know how to reproduce it. I'm pretty sure it's actually not a memory usage problem; it looks like memory corruption. The first thing that would be useful to know is how frequently you see that error on your servers, in error rate over time and/or related to the number of calls handled. That will help figure out how to proceed. One solution may be to use the |
Thanks for the response. Unfortunately, we do not have control over the implementation of clients in many cases. We do not use mTLS for our handshaking. I am interested in the Given that the core dump references SSL, we're going to put a proxy in front of our service and have it perform TLS termination instead of grpc to remove TLS being a factor. Will update when I'm able to. |
An update - putting a proxy in front def eliminated our SSL error messages, and this specific issue (as in we don't get the Proxy does TLS termination -> grpc service running without TLS enabled (eg Given that, there has to be something with the TLS/SSL implementation here that's causing it. Sorry the update took so long, had to go through our lengthy internal vetting/qa process before it was staged against production level traffic. |
This is a reproducible bug in the GRPC server. This relates to grpc/grpc#19430 and I have been able to reproduce myself. Simply spamming a grpc server with The |
Running this repro with valgrind
|
Appears to be a double unref bug in The call to |
If that is the problem, it's fixed on the latest version of the library. The |
I no longer work at the company in question, but thank you for eventually fixing it / integrating the fix! |
Upgrading and replaying the repro I can confirm that this is indeed fixed in 1.24.11. There may be some scope for improving the peer read error handling (should an endpoint be created at all if we know the socket is already closed?) but I'm satisfied that this resolves the issue. |
This library has been deprecated for almost 9 months. We will still make security fixes like this one, but not broader improvements like that. |
I would put a notice on the project readme that it's in security-fix mode and to use the javascript-based client instead. |
The notice is there right at the top of the library's README, and it is marked as deprecated in npm. If you're talking about this repository as a whole, both packages live here so that would not be the appropriate place for that notice. |
Got it. Yeah, I was referring to the main repo for this lib. Following the link does go to the readme with the notice. |
We attempted an upgrade to grpc-js but we had performance issues so we're stuck with this for now. I suspect it's not that grpc-js is particularly slow on its own, rather it's just an issue of piling extra work on the main thread. |
Problem description
App crashes with a core trace. Unsure what is the cause. I've attached the trace log.
dump.txt
Excerpt from it:
Questions:
corrupted double-linked list
type error, but an error specific to running out of memory, correct?Environment
grpc-node versions we've tried:
1.20.3
1.16.1
Node Version:
10.15.2
(we've also upgraded node along the way in the 10.x series)OS: Debian-based (using docker node:10.5.2 image), we do not do any package updates (eg apt update, install latest devtools, etc)
Additional info
We've noticed that we get a lot of these entries in our logs - it could be clients that are misconfigured, unsure if it eventually leads to crashing:
We also get these in our logs too:
The text was updated successfully, but these errors were encountered: