Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awaiting between creating a readline interface, and using async iterator will cause iterator to sometimes skip. #33463

Closed
Tatsujinichi opened this issue May 19, 2020 · 7 comments
Labels
doc Issues and PRs related to the documentations. readline Issues and PRs related to the built-in readline module.

Comments

@Tatsujinichi
Copy link

Tatsujinichi commented May 19, 2020

What steps will reproduce the bug?

if you perform something like
https://github.com/nodejs/node/blob/2a7432dadec08bbe7063d84f1aa4a6396807305c/test/parallel/test-readline-async-iterators.js

async function testSimple() {
  for (const fileContent of testContents) {
    fs.writeFileSync(filename, fileContent);

    const readable = fs.createReadStream(filename);
    const rli = readline.createInterface({
      input: readable,
      crlfDelay: Infinity
    });

    const iteratedLines = [];
    for await (const k of rli) {
      iteratedLines.push(k);
    }

    const expectedLines = fileContent.split('\n');
    if (expectedLines[expectedLines.length - 1] === '') {
      expectedLines.pop();
    }
    assert.deepStrictEqual(iteratedLines, expectedLines);
    assert.strictEqual(iteratedLines.join(''), fileContent.replace(/\n/gm, ''));
  }
}

If you add some kind of await xxx() between creating the interface and iterating, the iterator will miss lines. In my case I input 100k lines from a file, then output those same lines to a new file. several thousand lines will go missing.

How often does it reproduce? Is there a required condition?

100%
Need to add some await async between creating the interface and using for await

What is the expected behavior?

not to miss iterations

What do you see instead?

Additional information

@devsnek
Copy link
Member

devsnek commented May 19, 2020

readline isn't a stream, if you don't attach a handler you will miss events. that being said, exposing an async iterable doesn't make that very clear.

@BridgeAR BridgeAR added question Issues that look for answers. readline Issues and PRs related to the built-in readline module. labels May 23, 2020
@BridgeAR
Copy link
Member

As @devsnek outlined there's nothing actually wrong with the behavior. We might want to improve our docs when it comes to that?

@BridgeAR BridgeAR added doc Issues and PRs related to the documentations. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. good first issue Issues that are suitable for first-time contributors. and removed question Issues that look for answers. labels May 23, 2020
@preyunk
Copy link
Contributor

preyunk commented May 27, 2020

@devsnek @BridgeAR I would like to take up this issue, could you guide me through the process for achieving the same as I am a first time contributor.

@jfriend00
Copy link

Similar problem shown here: https://stackoverflow.com/questions/62885667/why-does-this-readline-async-iterator-not-work-properly.

I see only this output in the console:

before for() loop
finished
finally
done

The for await (const line1 of rl1) loop never goes into the for loop - it just skips right over it:

const fs = require('fs');
const readline = require('readline');
const { once } = require('events');

async function test(file1, file2) {
    try {
        const stream1 = fs.createReadStream(file1);
        await once(stream1, 'open');
        const rl1 = readline.createInterface({input: stream1, crlfDelay: Infinity});

        const stream2 = fs.createReadStream(file2);
        await once(stream2, 'open');
        const rl2 = readline.createInterface({input: stream2, crlfDelay: Infinity});

        console.log('before for() loop');
        for await (const line1 of rl1) {
            console.log(line1);
        }
        console.log('finished');
    } finally {
        console.log('finally');
    }
}

test("data/numbers.txt", "data/letters.txt").then(() => {
    console.log(`done`);
}).catch(err => {
    console.log('Got rejected promise:', err);
})

@jfriend00
Copy link

So, there is no intent to ever actually "fix" this. You're just going to document it as is and leave it that way forever? So, no asynchronous operations are permitted between creating the interface and consuming the async iterator to read the lines? What if you're trying to use this along with some other asynchronous stuff together? That's just a busted implementation IMO. Documenting the existing behavior is OK for a stop-gap, but not really OK for just sweeping the issue under the rug forever.

@Trott
Copy link
Member

Trott commented Aug 30, 2020

@nodejs/readline Should this be re-opened?

@Trott Trott removed good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. labels Aug 30, 2020
richardlau pushed a commit that referenced this issue Sep 1, 2020
Fixes: #33463

PR-URL: #34675
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anto Aravinth <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@govindrai
Copy link

govindrai commented Apr 9, 2021

I agree with @jfriend00. This behaves like an improper implementation. Even after following the guidelines to prevent loss of line data, we are facing issues where have a function that is doing some heavy async operations on the result of each line (i.e. each line may take 5 or minutes to complete) and what we get is readline running some lines and eventually hanging and not issuing any more lines -> the program hangs forever.

async function heavyAsyncOperationsPerLine() {
    const rl = readline.createInterface({ input: getReadStream() }); // this file has 800K lines
 // called directly after 
    for await (const line of rl) {
        await operation1()
        await operation2()
    }
}

benjamingr added a commit to benjamingr/io.js that referenced this issue Jan 26, 2022
Beforehand the async iterator for readline was created lazily when the
stream was accessed which meant no events were buffered. In addition
stream handling was modernized and improved in the process to support
backpressure correctly. A test was added to assert this.

Fixes: nodejs#28565
Fixes: nodejs#33463
richardlau pushed a commit that referenced this issue Apr 3, 2022
Fixes: #33463

PR-URL: #34675
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anto Aravinth <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. readline Issues and PRs related to the built-in readline module.
Projects
None yet
7 participants