Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handing spawn() stdio as socket wont duplex properly #15714

Closed
devinivy opened this issue Oct 1, 2017 · 14 comments · Fixed by #55322
Closed

Handing spawn() stdio as socket wont duplex properly #15714

devinivy opened this issue Oct 1, 2017 · 14 comments · Fixed by #55322
Labels
child_process Issues and PRs related to the child_process subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.

Comments

@devinivy
Copy link

devinivy commented Oct 1, 2017

  • Version: At least v6.9.1 and v8.5.0
  • Platform: Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; x86_64

I'm trying to implement a sort of PassThrough-style stream that can be used as mock stdio for ChildProcess.spawn(). PassThrough itself isn't appropriate because it doesn't have a file descriptor. So I'm doing this by creating a socket that speaks to an echo server. What I'm finding is that the server receives the data and successfully pipes it, but the passthrough socket never seems to receive it.

I've poked around the node internals a bit to figure-out why, but I don't have much to share at this time. It looks like spawn() may create another socket that's readable and not writable, and uses the socket I passed as its handle [here]. But my socket remains readable and writable as I would expect, so I still can't explain the behavior!

Possibly related to #9413. Here's some code to reproduce the issue and play with.

'use strict';

const ChildProcess = require('child_process');
const Net = require('net');

const makePassThrough = (cb) => {

    const srv = Net.createServer((sock) => {

        sock.on('data', (data) => console.log('server sock saw data:', data.toString()));
        sock.once('close', () => console.log('server sock closed'));
        sock.once('end', () => console.log('server sock ended'));
        sock.once('error', (err) => console.log('server sock errored', err));
        sock.once('timeout', () => console.log('server sock timed-out'));

        sock.pipe(sock);
    });

    srv.once('close', () => console.log('server closed'));
    srv.once('error', cb);
    srv.listen(() => {

        const sock = new Net.Socket();

        sock.once('close', () => console.log('client sock closed'));
        sock.once('end', () => console.log('client sock ended'));
        sock.once('error', (err) => console.log('client sock errored', err));
        sock.once('timeout', () => console.log('client sock timed-out'));

        sock.once('error', cb);
        sock.connect(srv.address().port, () => cb(null, sock));
    });
};

makePassThrough((err, stdout) => {

    if (err) {
        throw err;
    }

    stdout.on('data', (data) => console.log('subproc stdout saw data:', data.toString()));

    const subproc = ChildProcess.spawn('echo', ['is anybody out there?'], {
        stdio: ['ignore', stdout, 'ignore']
    });

    subproc.on('close', process.exit);
});

/* Program outputs the following,
      $ node index.js 
      server sock saw data: is anybody out there?
      # Does not exit
*/

For my purposes I'm all set– I can use stdio: 'pipe' and pipe the process's stdio thru actual passthrough streams. This is still a curiosity, though!

@mscdex mscdex added the child_process Issues and PRs related to the child_process subsystem. label Oct 1, 2017
@gireeshpunathil
Copy link
Member

@devinivy - is this still outstanding? From your description, I understood upto the point you were saying that the child data reaches upto the socket in the server. Then what? where else it should flow as per your expectation?

@devinivy
Copy link
Author

devinivy commented Apr 8, 2018

I just tried this on node v8.11.1 and got the same results. After coming back to this issue after some time, I can see how it's a little confusing—sorry for that. I will try to be a little clearer.

I was trying to author a passthrough-style stream that can be used with spawn(). Because the stream must have an underlying file descriptor in order to be used with spawn(), I decided to try to create a passthrough stream using Net by implementing an "echo server" (hence sock.pipe(sock)). In other words stdout is implemented as the client of an echo server: anything that it writes should be written back to it.

As such, I expected the output of the program to be,

      server sock saw data: is anybody out there?
      subproc stdout saw data: is anybody out there?

Instead I only get the first line, which is very hard for me to explain to myself. Especially because if I write directly to stdout (rather than through spawn()), it works correctly as a passthrough:

// ...
makePassThrough((err, stdout) => {

    if (err) {
        throw err;
    }

    stdout.on('data', (data) => console.log('subproc stdout saw data:', data.toString()));

    stdout.write('is anybody out there?');
});

/* Program outputs the following,
      $ node index.js 
      server sock saw data: is anybody out there?
      subproc stdout saw data: is anybody out there?
      # Does not exit
*/

Does this provide some more useful context?

@gireeshpunathil
Copy link
Member

thanks. one or two iterations and we should be good to understand each other!

so: no data being available through:

stdout.on('data', (data) => console.log('subproc stdout saw data:', data.toString()))

is this the problem?

@devinivy
Copy link
Author

devinivy commented Apr 8, 2018

Yes, exactly! I would expect that line to log subproc stdout saw data: is anybody out there? in the program's execution. I would additionally expect the subprocess to close, and the program to exit.

@gireeshpunathil
Copy link
Member

Here is the relevant sections in the strace dump:

PID 30062: child (echo)

// stdin
[pid 30062] dup2(16, 0)                 = 0
[pid 30062] fcntl(0, F_GETFL)           = 0x8000 (flags O_RDONLY|O_LARGEFILE)
[pid 30062] close(16)                   = 0

//stdout == 12 client's socket, all writes into this goes to fd 13 in the parent
[pid 30062] dup2(12, 1)                 = 1
[pid 30062] fcntl(1, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 30062] fcntl(1, F_SETFL, O_RDWR)   = 0
[pid 30062] open("/dev/null", O_RDWR)   = 16

//stderr
[pid 30062] dup2(16, 2)                 = 2
[pid 30062] fcntl(2, F_GETFL)           = 0x8002 (flags O_RDWR|O_LARGEFILE)
[pid 30062] close(16)                   = 0
[pid 30062] close(12)                   = 0

[pid 30062] execve("/bin/echo", ["echo", "is anybody out there?"], [/* 32 vars */] <unfinished ...>

// stdout = socket type
[pid 30062] fstat(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0

// write the echo
[pid 30062] write(1, "is anybody out there?\n", 22) = 22

// that is read in the parent (30052) in fd 13, the server's fd.
[pid 30052] read(13, "is anybody out there?\n", 65536) = 22

// child exits
[pid 30062] close(1)                    = 0
[pid 30062] close(2)                    = 0
[pid 30062] exit_group(0)               = ?
[pid 30062] +++ exited with 0 +++

// in parent, socket data handler gets called
[pid 30052] write(9, "server sock saw data: is anybody"..., 45) = 45

// in parent, the data is writtent back into fd 13 (due to self-piping)
[pid 30052] write(13, "is anybody out there?\n", 22) = 22

[pid 30052] epoll_ctl(3, EPOLL_CTL_MOD, 13, {EPOLLIN, {u32=13, u64=13}}) = 0
[pid 30052] epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}, {EPOLLIN, {u32=12, u64=12}}}, 1024, -1) = 2

// that is read in fd 12, client's fd.
[pid 30052] read(12, "is anybody out there?\n", 65536) = 22

// client's fd data handler gets invoked: but where it goes?????
[pid 30052] write(9, "is anybody out there?\n\n", 23) = 23

I guess the summary of the puzzle is, where is fd 9 pointing to?

@devinivy devinivy changed the title Handing spawn() stdio a socket wont duplex properly Handing spawn() stdio as socket wont duplex properly Apr 9, 2018
@gireeshpunathil
Copy link
Member

I installed 2 handlers: one on the socket in question and one on subproc.stdout, the socket that acts as the parent end of the IPC pipe thus:

stdout.on('data', dh1)
subproc.stdout.on('data', dh2)

and the trace revealed this:

bind socket 0x2cdc5bb49dc9 with data handler dh1:

   8:        ~_addListener+0(this=0x114ba1d022d1 <undefined>, 0xca8f86d0781 <Socket map = 0x2cdc5bb49dc9>, 0xeccc2135af9 <String[4]: data>, 0xca8f86d5b99 <JSFunction dh1 (sfi = 0x1395dbc19269)>, 0x114ba1d023e1 <false>) {

emit 'data' on socket 0x2cdc5bb4a501. No handler for this socket, so no handler invoked, the stream is read again, and chunk is ignored.

   5:     ~emit+3(this=0xca8f86da089 <Socket map = 0x2cdc5bb4a501>, 0xeccc2135af9 <String[4]: data>) {
   5:     } -> 0x114ba1d023e1 <false>
   5:     ~Readable.read+0(this=0xca8f86da089 <Socket map = 0x2cdc5bb4a501>, 0) {
   6:      ~debugs.(anonymous function)+0(this=0x114ba1d022d1 <undefined>) {
   6:      } -> 0x114ba1d022d1 <undefined>
   6:      ~howMuchToRead+0(this=0x114ba1d022d1 <undefined>, 0, 0xca8f86da189 <ReadableState map = 0x2cdc5bb42c59>) {
   6:      } -> 0

bind socket 0x10312b449dc9 with data handler dh2:

   8:        ~_addListener+0(this=0x18f7d9e022d1 <undefined>, 0x35913b1da081 <Socket map = 0x10312b449dc9>, 0x38cbdc435af9 <String[4]: data>, 0x35913b1d5bd9 <JSFunction dh2 (sfi = 0x21fd77299311)>, 0x18f7d9e023e1 <false>) {

emit 'data' on socket 0x10312b44a501 that is still different from the one for which we istalled dh2, but the handler seem to have invoked as if the data handler is added on that socket!

   5:     ~emit+3(this=0x35913b1da081 <Socket map = 0x10312b44a501>, 0x38cbdc435af9 <String[4]: data>) {
   6:      ~dh2+0(this=0x35913b1da081 <Socket map = 0x10312b44a501>, 0x35913b1dcf39 <Uint8Array map = 0x10312b44b6e1>) {

Something beyond my understanding, so /cc @nodejs/streams to see what is going on: the precreated socket is not used for IPC is understood, but how do we link it with the active socket it is the key requriement here.

@mcollina
Copy link
Member

mcollina commented Apr 9, 2018

I still do not understand what is the issue and what are you trying to achieve.

@gireeshpunathil
Copy link
Member

  • being able to establish a communication channel between two processes,
  • wherein, one is a parent and one is a child,
  • and wherein, the parent has pre-created duplex streams for the IPC, not the one returned by spawn.

so looks like pre-rcreated stream looses its association with the underlying socket and / or the event handlers. The data is written to the stream, but no user callback was invoked.

@mcollina
Copy link
Member

mcollina commented Apr 9, 2018

I'm still lost why we should support such a thing. It seems very fragile, and very prone to breakage.

the parent has pre-created duplex streams for the IPC, not the one returned by spawn.

I think the spawned process hold a reference to the handle/fd, and when that gets collected that is freed. Please verify this.

The data is written to the stream, but no user callback was invoked.

This is normal. If you call  write()  on a stream, there is no guarantee that the callback is called. If the streams error, the callbacks are all dropped.

@devinivy
Copy link
Author

devinivy commented Apr 9, 2018

I'm still lost why we should support such a thing. It seems very fragile, and very prone to breakage.

I attempted this for testing purposes. I was authoring a CLI tool and wanted to mock stdin/stdout for my tests. One of my CLI commands uses spawn(), so I was trying to pass my mock stdio (which were PassThrough streams) directly to the subprocess. This didn't work because PassThrough streams naturally aren't backed by a file descriptor, so I tried to write a passthrough stream that is backed by a file descriptor. This was one approach I tried that didn't work, but I couldn't understand why it didn't work.

This is normal. If you call write() on a stream, there is no guarantee that the callback is called. If the streams error, the callbacks are all dropped.

Is this to say that the program's execution makes sense specifically if the stream errored?

Is the caveat I missed simply that you can't pass a stdin as a writable or stdout/stderr as readables under spawn()'s stdio option? If there's a practical/known limitation there, I'd be happy to make a PR to the documentation to address it.

@mcollina
Copy link
Member

mcollina commented Apr 9, 2018

I attempted this for testing purposes. I was authoring a CLI tool and wanted to mock stdin/stdout for my tests. One of my CLI commands uses spawn(), so I was trying to pass my mock stdio (which were PassThrough streams) directly to the subprocess. This didn't work because PassThrough streams naturally aren't backed by a file descriptor, so I tried to write a passthrough stream that is backed by a file descriptor. This was one approach I tried that didn't work, but I couldn't understand why it didn't work.

I agree that we should document that this is not really feasible.

@BridgeAR BridgeAR added doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. labels Jan 2, 2020
@HCelante
Copy link

Hi, @BridgeAR a question, is this issue still open? I would like to start collaborating on this project.

@JPMoles
Copy link

JPMoles commented Jun 30, 2023

Hi, I'm looking to contribute to this project. Is documentation still needed for this issue, considering that the previous pull request was closed?

@Ed-roro
Copy link
Contributor

Ed-roro commented Oct 8, 2024

I do not know if this issue is no longer a thing or is necessary. But i made a PR to update the documentation to reflect the discovery found in this thread and sent it out.

Please review and request changes if needed 😊🤟🏿

aduh95 pushed a commit that referenced this issue Oct 23, 2024
louwers pushed a commit to louwers/node that referenced this issue Nov 2, 2024
tpoisseau pushed a commit to tpoisseau/node that referenced this issue Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants