Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak due to process.fork()? #15651

Closed
Reggino opened this issue Sep 28, 2017 · 40 comments
Closed

Memory leak due to process.fork()? #15651

Reggino opened this issue Sep 28, 2017 · 40 comments
Labels
child_process Issues and PRs related to the child_process subsystem. confirmed-bug Issues with confirmed bugs.

Comments

@Reggino
Copy link

Reggino commented Sep 28, 2017

  • Version: v8.5.0
  • Platform: Linux

I am experiencing memory leakage in an app that usesprocess.fork() a lot. These child processes get sent messages via process.send() with a sendHandle and are terminated later on.

I did run into issues with memory management here. Some heap dumps show that even after the child-processes exited, the ChildProcess-instances are retained in the master process. I learned that using subprocess.disconnect() partly fixes that issue, but one more retainer can be found here:

node/lib/net.js

Line 1665 in 20259f9

Server.prototype._setupWorker = function(socketList) {

How, where and when should this socketList be removed from the _workers-array?

@mscdex mscdex added the question Issues that look for answers. label Sep 28, 2017
@cjihrig
Copy link
Contributor

cjihrig commented Sep 28, 2017

Do you have code that reproduces the memory leak?

@Reggino
Copy link
Author

Reggino commented Sep 28, 2017

I tried to make a simplified setup of my app:

server.js

const http = require('http');
const { fork } = require('child_process');

const server = http.createServer((req, res) => {
    const workerProcess = fork('worker.js', [], {execArgv: []});
    workerProcess.on('message', (status) => {
        switch (status) {
            case 'ready':
                workerProcess.send({
                    method: req.method,
                    headers: req.headers,
                    path: req.path,
                    httpVersionMajor: req.httpVersionMajor,
                    query: req.query,
                }, res.socket);
                break;

            case 'done':
                res.socket.end();
        }
    });
});

server.listen(8080);

setInterval(() => {
  console.log(process.memoryUsage());
}, 10000);

worker.js

const http = require('http');

process.on('message', (req, socket) => {
    const res = new http.ServerResponse(req);
    res._finish = function _finish() {
        res.emit('prefinish');
        socket.end();
    };
    res.assignSocket(socket);
    socket.once('finish', () => {
        process.send('done', () => {
            process.exit(0);
        });
    });
    res.end(process.pid.toString());
});

process.send('ready');

When simulating load on this server using ApacheBench, memory usage of the server-process slowly rises.

ab -n 1000000 -c 50  http://localhost:8080/

You'll have to be patient: it does take a few hours on my machine to become critical...

@Reggino
Copy link
Author

Reggino commented Sep 28, 2017

Node 8.6.0 has the same issue.

@Reggino
Copy link
Author

Reggino commented Sep 29, 2017

When limiting the available memory for the server to let's say 100MB, it dies within a few minutes at a certain point:

node --max_old_space_size=100 server.js 

...

--- Last few GCs --->

[15664:0x39d9410]   504893 ms: Mark-sweep 252.2 (299.0) -> 252.2 (268.0) MB, 219.9 / 0.1 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 220 ms) last resort 
[15664:0x39d9410]   505119 ms: Mark-sweep 252.2 (268.0) -> 252.2 (268.0) MB, 226.7 / 0.1 ms  last resort 


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x3b40f62a8799 <JSObject>
    2: text(aka utf8Text) [string_decoder.js:201] [bytecode=0x4695cc3cbd1 offset=29](this=0x54b29377d39 <StringDecoder map = 0x16700ae4d0a9>,buf=0x17c2ab58e4b1 <Uint8Array map = 0x139076a04e01>,i=0)
    4: write [string_decoder.js:85] [bytecode=0x4695cc3c8c1 offset=97](this=0x54b29377d39 <StringDecoder map = 0x16700ae4d0a9>,buf=0x17c2ab58e4b1 <Uint8Array map = 0x139076a04e01>)
    6: onread [int...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x13576dc [node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
 5: v8::internal::Factory::NewStruct(v8::internal::InstanceType) [node]
 6: v8::internal::Factory::NewTuple3(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [node]
 7: v8::internal::LoadIC::LoadFromPrototype(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Smi>) [node]
 8: v8::internal::LoadIC::GetMapIndependentHandler(v8::internal::LookupIterator*) [node]
 9: v8::internal::IC::ComputeHandler(v8::internal::LookupIterator*) [node]
10: v8::internal::LoadIC::UpdateCaches(v8::internal::LookupIterator*) [node]
11: v8::internal::LoadIC::Load(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Name>) [node]
12: v8::internal::Runtime_LoadIC_Miss(int, v8::internal::Object**, v8::internal::Isolate*) [node]
13: 0x1745398046fd
Aborted (core dumped)

bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Sep 29, 2017
Entries in the `net.Server#_workers` array that is used to track handles
sent from the master to workers were not deleted when a worker exited,
resulting in a slow but inexorable memory leak.

Fixes: nodejs#15651
@bnoordhuis
Copy link
Member

@Reggino Can you check that #15679 fixes the issue for you?

@Reggino
Copy link
Author

Reggino commented Sep 29, 2017

@bnoordhuis Thanks! While i do think your change is part of the solution , this issue still isn't completely resolved: even now the server-process keeps leaking memory up untill it crashes...

@bnoordhuis
Copy link
Member

Do you mean with the example you posted or with a larger application?

@Reggino
Copy link
Author

Reggino commented Sep 29, 2017

The example...

@jasnell jasnell closed this as completed in 2e59ec0 Oct 1, 2017
@Reggino
Copy link
Author

Reggino commented Oct 1, 2017

The issue still exists...

@gibfahn gibfahn reopened this Oct 1, 2017
@targos targos reopened this Oct 5, 2017
@bnoordhuis
Copy link
Member

I was going to remove the Fixes: tag but James merged the PR before I could do so. Inconvenient.

@bnoordhuis
Copy link
Member

@Reggino I can't reproduce after #15679 anymore. I let your test case run for a couple of hours with --max_old_space_size=32 and it's rock solid.

@Reggino
Copy link
Author

Reggino commented Oct 6, 2017

Just checked out and built 2e59ec0 again and ran the test on my Linux 64bit environment..

Output of server.js:

./node --max_old_space_size=32 server.js 
{ rss: 50212864,
  heapTotal: 24485888,
  heapUsed: 8344752,
  external: 8609 }
{ rss: 57155584,
  heapTotal: 50176000,
  heapUsed: 12755848,
  external: 17203 }
{ rss: 65495040,
  heapTotal: 55943168,
  heapUsed: 17016640,
  external: 17179 }
{ rss: 73113600,
  heapTotal: 62234624,
  heapUsed: 21116696,
  external: 17261 }
{ rss: 80306176,
  heapTotal: 68001792,
  heapUsed: 25077032,
  external: 17243 }
{ rss: 86573056,
  heapTotal: 73244672,
  heapUsed: 28898720,
  external: 17203 }
{ rss: 92934144,
  heapTotal: 78487552,
  heapUsed: 32581744,
  external: 17267 }
{ rss: 96854016,
  heapTotal: 80060416,
  heapUsed: 36295896,
  external: 17309 }
{ rss: 98816000,
  heapTotal: 80060416,
  heapUsed: 39852152,
  external: 17210 }

<--- Last few GCs --->

[19835:0x30f51b0]    92906 ms: Mark-sweep 38.3 (77.9) -> 38.3 (46.9) MB, 37.0 / 0.0 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 37 ms) last resort GC in old space requested
[19835:0x30f51b0]    92944 ms: Mark-sweep 38.3 (46.9) -> 38.3 (46.9) MB, 37.4 / 0.0 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x20b4bb5a5e49 <JSObject>
    1: pushValueToArray [bootstrap_node.js:~245] [pc=0x202eb9386617](this=0x2c7b79a1b319 <JSArray[1]>)
    2: arguments adaptor frame: 6->0

==== Details ================================================

[1]: pushValueToArray [bootstrap_node.js:~245] [pc=0x202eb9386617](this=0x2c7b79a1b319 <JSArray[1]>) {
// optimized frame
--------- s o u r c e   c o d e ---------
function pushValueToArray() {\...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [./node]
 2: 0x128126c [./node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [./node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [./node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [./node]
 6: v8::internal::DeoptimizationInputData::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag) [./node]
 7: v8::internal::compiler::CodeGenerator::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [./node]
 8: v8::internal::compiler::CodeGenerator::FinalizeCode() [./node]
 9: v8::internal::compiler::PipelineImpl::FinalizeCode() [./node]
10: v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl() [./node]
11: v8::internal::Compiler::FinalizeCompilationJob(v8::internal::CompilationJob*) [./node]
12: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [./node]
13: v8::internal::StackGuard::HandleInterrupts() [./node]
14: v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) [./node]
15: 0x202eb920463d
Aborted (core dumped)

@bnoordhuis What exact commit did you run the test with? Or could it be platform-related?

@bnoordhuis
Copy link
Member

Today's master, 4f339b5. That's about 70 commits ahead of 2e59ec0. It might be a platform-specific issue but I can't reproduce on my x86_64 Ubuntu 16.04 box nor on my MBA.

@Reggino
Copy link
Author

Reggino commented Oct 6, 2017

Hmmm... tried again with 4f339b5 but no cigar: i get the same output.

Not sure what do to next... Should we close this issue since it isn't reproducible?

@Reggino
Copy link
Author

Reggino commented Oct 6, 2017

@fruitl00p @AubreyHewes @mishavantol any luck in reproducing this issue?

@bnoordhuis
Copy link
Member

@Reggino Perhaps you can investigate with llnode? Its findjsobjects command can tell you what's on the heap.

@gibfahn
Copy link
Member

gibfahn commented Oct 6, 2017

@Reggino what version of Linux are you using?

@Reggino
Copy link
Author

Reggino commented Oct 6, 2017

Tested both Ubuntu 16.04 and 17.04, both with the same issue.

@bnoordhuis Did you see the worker PID when requesting http://localhost:8080 ? Did ab report all requests successful (up untill stopping it)?

bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Oct 29, 2017
Entries in the `net.Server#_slaves` array that is used to track handles
sent from the master to workers were not deleted when a worker exited,
resulting in a slow but inexorable memory leak.

PR-URL: nodejs#15679
Fixes: nodejs#15651
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
gibfahn pushed a commit that referenced this issue Oct 30, 2017
Entries in the `net.Server#_slaves` array that is used to track handles
sent from the master to workers were not deleted when a worker exited,
resulting in a slow but inexorable memory leak.

PR-URL: #15679
Backport-PR-URL: #16586
Fixes: #15651
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
gibfahn pushed a commit that referenced this issue Oct 31, 2017
Entries in the `net.Server#_slaves` array that is used to track handles
sent from the master to workers were not deleted when a worker exited,
resulting in a slow but inexorable memory leak.

PR-URL: #15679
Backport-PR-URL: #16586
Fixes: #15651
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
@dhenson02
Copy link

I would like to bump this again as I am now seeing this in production environments. Will post reproduction code once available

@amanthegreatone
Copy link

amanthegreatone commented Feb 19, 2018

I am also experiencing memory leak with child_process.fork in one of my apps. There is a MQTT client connected to a MQTT broker in index.js and on receiving a message it forwards it to a child_process for processing using child_process.send(). In the child process I catch this message using process.on() and then do the relevant processing.

There are 5 sensors sending data every 5s to the app and every 5 min or so the child process is exiting with SIGABRT and I see the heapdump going to 1GB+ just around that instant.

node version 8.9.4

https://stackoverflow.com/questions/48801587/nodejs-child-process-ipc-communication-issue-on-server

@santigimeno
Copy link
Member

I think the problem with the original code was that after passing the socket to the worker, the underlying handle is automatically closed but the http (HTTPParser, etc) objects associated to the socket don't notice about it so they are never freed. Is this something the child_process module should correctly handle itself?

To avoid this you can pass the option keepOpen: true when sending the socket, so the handle is not automatically closed after being sent. This way when the done message is received you can normally end the socket connection and the http resources associated to it will be freed.

@Reggino
Copy link
Author

Reggino commented Feb 19, 2018

@santigimeno Thank you for your suggestion.

I tried it and indeed the server stays alive a bit longer, doesn't seem to run out of memory, but stops responding after about 2 minutes... weird?

@santigimeno
Copy link
Member

@Reggino can you try to send the done message from the worker on the close event instead of on finish?

@Reggino
Copy link
Author

Reggino commented Feb 20, 2018

@santigimeno same result, the server always crashes, with or without { keepOpen: true }

@santigimeno
Copy link
Member

santigimeno commented Feb 20, 2018

Weird, I ran the code several hours and the heap size seemed pretty stable. Is it the same crash? Did it took longer to crash?

Can you try with this patch without the { keepOpen: true }?:

diff --git a/lib/internal/child_process.js b/lib/internal/child_process.js
index 58bb46e8a2..91e4cbc978 100644
--- a/lib/internal/child_process.js
+++ b/lib/internal/child_process.js
@@ -18,6 +18,8 @@ const SocketList = require('internal/socket_list');
 const { convertToValidSignal } = require('internal/util');
 const { isUint8Array } = require('internal/util/types');
 const spawn_sync = process.binding('spawn_sync');
+const { HTTPParser } = process.binding('http_parser');
+const { freeParser } = require('_http_common');
 
 const {
   UV_EAGAIN,
@@ -94,6 +96,14 @@ const handleConversion = {
       if (!options.keepOpen) {
         handle.onread = nop;
         socket._handle = null;
+        socket.setTimeout(0);
+        // In case of an HTTP connection socket, release the associated resources
+        if (socket.parser && socket.parser instanceof HTTPParser) {
+          freeParser(socket.parser, null, socket);
+          if (socket._httpMessage) {
+            socket._httpMessage.detachSocket(socket);
+          }
+        }
       }
 
       return handle;

Reggino added a commit to Reggino/node-issue-15651-test that referenced this issue Feb 21, 2018
@Reggino
Copy link
Author

Reggino commented Feb 21, 2018

@santigimeno Thanks for the patch. I tried it, but now, when i try some requests on the server i get an error like:

TypeError: Cannot read property 'end' of null
    at ChildProcess.workerProcess.on (/test/server.js:23:28)
    at ChildProcess.emit (events.js:129:13)
    at emit (internal/child_process.js:793:12)
    at process._tickCallback (internal/process/next_tick.js:115:19)

To make sure you (and @bnoordhuis ) should be able to run the test i set up a repo containing the test here: https://github.com/Reggino/node-issue-15651-test . It should make sure the failing test is reproducible and we all use the same node flags and load-testing / benchmarking tool (this example uses https://github.com/carlos8f/slam )

@santigimeno
Copy link
Member

santigimeno commented Feb 21, 2018

@Reggino yeah with the patch the socket is detached from the response so it's set to null. In fact, you don't need to call socket.end() as the socket is automatically closed after being sent to the child. Can you comment that line?

@Reggino
Copy link
Author

Reggino commented Feb 21, 2018

@santigimeno awesome! Indeed, this seems to fix the issue! 🙇‍♂️

@amanthegreatone
Copy link

Hi,

Can someone help me with my issue. nodejs/help#1123

@gireeshpunathil
Copy link
Member

Is it a close candidate?
@santigimeno - was your patch intended to be a workaround, or is it going into the code base?

fwiw, I am able to reproduce the leak consistently in my MAC, with the latest.

@santigimeno
Copy link
Member

@gireeshpunathil I forgot about this one. I'll try to prepare a PR and see how it goes.

@gireeshpunathil
Copy link
Member

thanks, you may ping me for any interim testing as required, as I have a reliable recreate!

santigimeno added a commit to santigimeno/node that referenced this issue Apr 25, 2018
After passing an HTTP socket, release its associated resources.

Fixes: nodejs#15651
@BridgeAR BridgeAR added confirmed-bug Issues with confirmed bugs. child_process Issues and PRs related to the child_process subsystem. and removed question Issues that look for answers. labels Apr 25, 2018
MylesBorins pushed a commit that referenced this issue May 4, 2018
After passing an HTTP socket, release its associated resources.

PR-URL: #20305
Fixes: #15651
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@ellisium
Copy link

Hello,

I updated my application to node.js 10.15 version however, the memory leak is still present.
I ran a test on openshift with a simple "hello world" and a liveness tcp health check
https://drive.google.com/file/d/1bTOkFKETvcWLO4HTbq5eFKXH8NrlHpe_/view?usp=sharing

Regards

@santigimeno
Copy link
Member

@ellisium can you post code that reproduces the leak? Thanks

@ellisium
Copy link

it's a boilerplate used on my work so I'm not allowed to share it. I will confirm it with another test using only native node.js function and will back with result for end of this week.

@Reggino
Copy link
Author

Reggino commented Feb 13, 2019

@ellisium could you try https://github.com/Reggino/node-issue-15651-test ? I was able to reproduce the issue and validate the actual fix with it...

@ellisium
Copy link

Ok I will try but I m focusing on testing cluster.fork first. Sorry maybe it's a different use case, if yes you can ignore my last messages so

@ellisium
Copy link

Hello, after few days testing, the curve stabilized and it seems ok. I guess it's depending on req frequencies which impact the memory allocation ramp up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. confirmed-bug Issues with confirmed bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.