Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in grpc-js bidi streams #2653

Closed
wants to merge 1 commit into from

Conversation

mcollina
Copy link

@mcollina mcollina commented Jan 31, 2024

This PR fixes a relatively bad memory leak when the client sends a huge stream of messages to the server, which is slow in processing them. In the current implementation, messages where kept buffered inside .messageToPush array. This is pausing the underlining H2 stream, so that the memory leak does not occur.

Copy link

@ttessarolo ttessarolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the fix in my environment and works like a charm!!

@murgatroid99
Copy link
Member

#2650 rewrites a lot of this server code, and it includes a change that should fix this bug.

@mcollina
Copy link
Author

Did you run the test? I don't think that PR overlaps with this area of code.

I think the streaming logic could be simplified further.

@murgatroid99
Copy link
Member

That PR completely deletes and replaces the Http2ServerCallStream class, and the replacement does contain logic that is similar to the logic in this change. However, that means that this test does not work with that PR, because the test code digs into internals that will no longer exist.

@murgatroid99
Copy link
Member

Specifically, I am referring to this section

@mcollina
Copy link
Author

However, that means that this test does not work with that PR, because the test code digs into internals that will no longer exist.

I think we can prepare a test that fails in that PR too.

Specifically, there is no backpressure handled when calling stream.push()

@murgatroid99
Copy link
Member

There is backpressure, but there's a bit of action at a distance. At the interceptor API level, a single call to startRead eventually produces a single message or end event. startRead is called by the stream's _read method, which will only be called when the application is ready to read another message.

@mcollina
Copy link
Author

You are right!

I would put a comment in https://github.com/murgatroid99/grpc-node/blob/eeaa6c0e6e360b4d884f4654c07796f7149645f3/packages/grpc-js/src/subchannel-call.ts#L386, because I think that's the pause() that makes it not leak.
Note that this only works because it "automatically" pause after one message and it waits for a subsequent call.startRead().

After taking a look at that whole logic, I think it could be greatly simplified by using stream.read() and on('readable') instead of on('data'). It's clear that you want to process one message at a time, and stream.read() will give you that (or null if you need to wait for a readable event). It would likely be significantly faster, and you would not have to worry about leaks.


What's the timeline for the interceptor PR? Might be worthwhile shipping this in a patch.

@murgatroid99
Copy link
Member

That link points to client code. The code there is similar, but it's not actually involved here. The pause call at a similar level on the server side is here.

It's not really clear to me that using read() and on('readable') instead of pause(), resume(), and on('data') would actually result in simpler code, but I'd be happy to consider it as a change after the initial implementation goes out.

I'm hoping to get the server interceptor change out within the next week.

@murgatroid99
Copy link
Member

Release 1.10.0 includes the server interceptors change.

@mcollina mcollina deleted the fix-memory-leak-in-grpc-js branch February 7, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants