-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@grpc/grpc-js has memory leak issue with latest version #2448
Comments
This isn't really enough information to find the problem. Can you provide code that demonstrates this memory leak, preferably without needing to access an external service? Can you provide heap dumps from at least two different points in a process's lifetime that show the memory growth? Please also check whether you are creating multiple separate client objects. If you are, either change your code to not do that or make sure you are closing them when you are done with them. See #2436, #2402, and #2003 for previous examples of this. |
Here is a sample query that we use. We do have a wrapper around spanner and have 3 clients:
which is used in the graphql server graceful shutdown here:
The reason why we had 3 clients to spanner is because those 3 clients actually talk to 3 separate spanner databases. Right now, we do not use 2 of them, I can try to remove it and upgrade grpc back to latest version to see if that'll fix the issue. Thanks for the tips. |
Just three clients shouldn't be an issue. The problem I was talking about comes from creating a large number of clients. |
Yes, I am also experiencing memory leaks with a library that uses this one. Newer versions of |
Do you know how quickly you were leaking memory, in terms of amount of memory leaked per request made or per period of time? An order of magnitude estimate would be enough to help narrow this down. |
Update: we have been unable to reproduce a memory leak with our own tests. Please provide a complete reproduction that leaks memory so that we can investigate this further. |
I also have this and will try to provide a reproduction when possible/time allows. I'm coming from the last few years in Go, so IOC/DI frameworks in node are new to me and I believed originally my problem originated there, with unintentionally multiple clients. I'm still open to the notion it could be there in inversify misuse, but I'm more inclined to believe the issue is with grpc-js as with @alanzhao. |
@SophiaH67 In order to clearly demonstrate a significant memory leak, your test would need to do the following:
If you do this and you observe similar memory retention in a number proportional to the number, then please share your test code and I can investigate. |
Thanks for your time! I should've prefaced this by saying this is my first time debugging memory leaks, so my bad if I miss anything. From what I could tell, the record snapshot button would automatically do garbage collection, but I cannot find a source for that so don't know where I learnt that. After rerunning my tests in our production app with manual In a small reproduction setup however, I could not find any problem with this library. Which leads me to think it's probably something in either a different library, or our own code! |
Have you also confirmed that the memory keeps growing with multiple successive calls? gRPC creates some resources on the first request that it reuses in later requests, so seeing more memory after the first request than before only confirms that, it doesn't show that anything is wrong. |
I have found the issue our team was running into finally! We were using nestjs's official terminus package for health checks, which created a new channel for every health check attempt. After doing some digging, it seems that these channels are not able to be garbage collected by NodeJS, due to several self-references(in timeouts and other stuff). This by itself could also be considered a bug, but I'm not too familiar with the garbage collector to solve this. I did make a PR in terminus to cache the channels it opens, which I built and deployed in our app. As you can see here, this has totally solved our issue, and explains why I couldn't reproduce it with a minimal setup! Still thanks to @murgatroid99 for your time and explaining garbage collector stuff to me ❤️ |
Calling |
we have seen memory leaks on our services too. The ECS tasks get abruptly killed with |
"rate limit" generally refers to a limit on how frequently you can call an API. So, an error that says "Exceeded rate limit" generally refers to excess API call volume, not memory growth. |
It was a false alarm. We had upgraded our application from Node 14 to Node 18. The restarts were caused by an internal code that was throwing an unhandledPromiseRejection error and led to the application's termination. Apparently, the process termination was introduced in Node 15. https://maximorlov.com/node-js-15-is-out-what-does-it-mean-for-you/ |
I've been seeing the same behavior @murgatroid99 . Basically my test setup is: in a
While I'm doing that, I'm logging every second the If I run this test script with NodeJS option
My setup:
import { Client, Metadata, credentials } from '@grpc/grpc-js'
// some metrics I want to keep track of
const metrics = {
frequency: 1000, // 1s
countSinceLastTick: 0,
countTotal: 0,
speed: 0,
}
// this reporter will log the metrics every second
const reporter = setInterval(() => {
process.stdout.clearLine(0)
console.log(`processed ${metrics.countTotal}`)
process.stdout.clearLine(0)
console.log(`speed ${Math.round(metrics.countSinceLastTick)} calls per sec`)
process.stdout.clearLine(0)
console.log(`mem usage ${process.memoryUsage().heapUsed / 1000000}MB`)
metrics.countSinceLastTick = 0
process.stdout.moveCursor(0, -3)
}, metrics.frequency)
// this call makes a single rpc request, instanciating a new client + closing it at the end
const makeRequest = async () => {
const sslCreds = credentials.createSsl(
Buffer.from(process.env.ROOT_CERT, 'utf-8'),
Buffer.from(process.env.PRIVATE_KEY, 'utf-8'),
Buffer.from(process.env.CERT, 'utf-8')
) // this is just because in my case I use mTLS
const options = {
'grpc.keepalive_time_ms': 120000,
'grpc.http2.min_time_between_pings_ms': 120000,
'grpc.keepalive_timeout_ms': 20000,
'grpc.http2.max_pings_without_data': 0,
'grpc.keepalive_permit_without_calls': 1,
'grpc.service_config_disable_resolution': 1,
}
const grpcClient = new Client(`your.host.com:8700`, sslCreds, options)
const deserialize = <T>(argument: T) => argument
const serialize = (argument: Uint8Array) => Buffer.from(argument)
await new Promise((resolve, reject) => {
grpcClient.makeUnaryRequest(
'/your.Service/Healthcheck',
serialize,
deserialize,
Uint8Array.from([]),
new Metadata(),
(err, res) => {
if (err) {
reject(err)
} else {
resolve(res || Uint8Array.from([]))
}
}
)
})
grpcClient.close()
}
// main function
const main = async () => {
while (true) {
await makeRequest()
metrics.countSinceLastTick++
metrics.countTotal++
}
}
main() |
@Thibault2ss I think I fixed that with #2606, according to local testing. |
That fix is out in version 1.9.8 |
same issue here: |
@Valeronlol The code in #2448 (comment) does not show a memory leak in 1.9.9 in my local testing. If you are experiencing a memory leak, please share code that demonstrates it. |
I am closing this issue, because it has covered too many different memory leaks. If you find this issue because you are experiencing a memory leak in this library, please file a new issue. |
@murgatroid99 thanks for the information about keeping one client. Is that documented somewhere? I tried looking for it in the grpc-js docs. We were also wondering a while back whether to keep one instance of the client or one for each request and landed on the latter based on some assumptions about how grpc-js works but it seems like we were incorrect. So I was interested to see if there are docs that go into this, and also curious if there are other 'best practices' we should be doing with grpc-js. |
This is a recommendation for gRPC overall, as stated in our performance best practices doc. It also seems to me that this should be the default assumption for object-oriented programming in general: you create a single object that is responsible for some domain of functionality, and then you call multiple methods on that object to invoke that functionality. And you only create multiple instances when you have some compelling reason to do so, such as needing to use different constructor arguments. |
Problem description
We use spanner node js for our api, and the latest grpc-js 1.8.14 has memory leak issues, same as 1.8.13.
Reproduction steps
We use spanner node library for our api. We recently upgraded to node 18. so in package-lock json, grpc-js gets upgraded to 1.18.14. Before upgrade, we were not having the memory leak issue. After upgrade, we have a slow memory leak, manifested in 4-7 days. And our pods restarted due to low memory.
Our stop gap solution is to downgrade @grpc/grpc-js to 1.6.2 and that seems to fix the issue.
Thanks for looking into this. Much appreciated.
Environment
Additional context
We have about 100 pods in prod, and all of them exhibit the symptom after the library gets updated.
The text was updated successfully, but these errors were encountered: