Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

child_process: stdout data is lost if not read quick enough #6595

Closed
malern opened this issue Nov 27, 2013 · 6 comments
Closed

child_process: stdout data is lost if not read quick enough #6595

malern opened this issue Nov 27, 2013 · 6 comments

Comments

@malern
Copy link

malern commented Nov 27, 2013

Unread data that is still in the child_process stdout stream when the process exits seems to be deleted. So if you attempt to read the data after the process has exited then the read() function returns null rather than the expected data.

For example, the following code always gets null from proc.stdout.read(), even though the child process is actually outputting data.

var proc = require('child_process').spawn('ls');

proc.stdout.on('readable', function() {
    console.log('readable');

    setTimeout(function() {
        while (true) {
            var data = proc.stdout.read();
            console.log('read: '+data);
            if (data == null) { break; }
        }
    }, 1000);
});

The problem here is this means you must read the data as quickly as it's produced, so there's no opportunity to put back pressure on the stream.

version: 0.10.22-linux-x64

@bnoordhuis
Copy link
Member

I'm not sure whether this is a valid issue, other committers should weigh in. My observations:

  • You call proc.stdout.read() after the 'close' event has been emitted so IMO it's proper and expected that it returns null.
  • The point about (lack of) back-pressure seems valid. Node.js reads the data from the child process but subsequently drops it. I defer to the streams experts whether that's expected and/or desired behavior. ISTM it could be remedied by calling uv_read_start() only when there are readers. How that should interact with the 'close' and 'exit' events is left as an exercise to the reader.

/cc @isaacs @tjfontaine

@malern
Copy link
Author

malern commented Nov 28, 2013

Thanks for the response.

You call proc.stdout.read() after the 'close' event has been emitted so IMO it's proper and expected that it returns null.

The problem is there's no way to know when the 'close' event is going to come. The only way to prevent reading data after the 'close' event is to always read all the available data in the same tick that you receive the 'readable' event.

And if a 'readable' event always forces you to read straight away, doesn't that make it effectively the same as the 'data' event? Both events result in an immediate read (but 'data' saves you having to write the line to do it).

This makes it hard to process the data asynchronously. Currently I'm having to push the data into an array when it's available, then pop the data off again when my async function is ready for it. But if the process produces output quicker than the async func consumes it then my array/buffer grows indefinitely.

Also, for comparison, net sockets do allow you to read the data after the 'close' event, and file streams only emit 'end' after all data has been read()

@timbertson
Copy link

I also hit this bug, and additionally noticed that you can miss the data entirely if you remove the readable listener, even if you re-add it in the same tick:

var childProcess = require('child_process');
var inspect = require('util').inspect;

var child = childProcess.spawn('echo', ['1234'], {stdio: ['ignore', 'pipe',2]});
child.stdout.setEncoding('ascii');
child.on('exit', function(code) {
  console.log("CHILD EXIT", code);
});

var reader = function() {
  console.log(" (data available)");
  var chunk;
  while (null !== (chunk = child.stdout.read())) {
    console.log('CHUNK: ' + inspect(chunk));
  }

  // pretend we're not ready to read data for a short time,
  // even though this is in the same tick:
  child.stdout.removeListener('readable', reader);
  child.stdout.on('readable', reader);
};

child.stdout.on('readable', reader);

Output:

CHILD EXIT 0
 (data available)
 (data available)
 (data available)

So the stream emits 3 "readable" events, but none of them actually have any data available. So it seems that any non-flowing use of child streams cannot be trusted to not drop data. Also note that the end event comes before I even had a chance to read any data - possibly because of the time it takes to run console.log().

(node v0.10.26)

@bguezzie
Copy link

I've noticed similar behavior when using unbuffered output on a child process (specifically "python -u") and the 'data' event. When I do this, my node process reads a portion of the data, but drops the last several writes from the child process (which we do control, and we are even flushing stdout after each write just to make sure).

Specifically, our child process outputs yaml documents over a period of time (status updates, essentially), and when the process exits not all of the yaml has been received by my 'data' listener prior to the 'close' event. As mentioned above, no data exists in child.stdout for reading, so it appears that the data is permanently missing.

@jasnell
Copy link
Member

jasnell commented May 28, 2015

@trevnorris ... any thoughts on this one?

@trevnorris
Copy link

I think there was a libuv patch that went in recently that helped with this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants