Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm test hangs infinitly on windows #4425

Closed
derdeka opened this issue Jan 14, 2020 · 27 comments · Fixed by #4657
Closed

npm test hangs infinitly on windows #4425

derdeka opened this issue Jan 14, 2020 · 27 comments · Fixed by #4657
Assignees
Labels
bug developer-experience Issues affecting ease of use and overall experience of LB users os:Windows Issues specific to Windows
Milestone

Comments

@derdeka
Copy link
Contributor

derdeka commented Jan 14, 2020

Steps to reproduce

  • check out loopback-next repo
  • npm install
  • npm test or DEBUG=mocha:* npm test

Current Behavior

Many tests are executed - one test does not end. npm test never ends and runs infinitely.
Tried with cmd, powershell and git bash.

mocha:runner run suite app-generator (SLOW) +17ms

^^^^^^is the last logged console output.

Expected Behavior

npm test should end successfully or with some error message.

Additional information

Windows 10 Pro ver 1909
node v10.15.3
Docker version 19.03.4, build 9013bf5
@loopback/cli version: 1.28.1

when executing DEBUG=mocha:* npm test these are the last few log messages:

..  mocha:runner run suite TodoApplication +60ms
  mocha:runnable timeout 30000 +11s
  mocha:runnable timeout 30000 +2s
...  mocha:runnable timeout 30000 +25ms
..  mocha:runner run suite TodoApplication when dealing with a single persisted todo +3s
........  mocha:runner run suite GeoLookupService +71ms
.  mocha:runner run suite TodoController +18ms
  mocha:runner run suite TodoController createTodo +0ms
..  mocha:runner run suite TodoController findTodoById +37ms
.  mocha:runner run suite TodoController findTodos +14ms
...  mocha:runner run suite TodoController replaceTodo +36ms
.  mocha:runner run suite TodoController updateTodo +12ms
.  mocha:runner run suite TodoController deleteTodo +11ms
.  mocha:runner run suite app-generator (SLOW) +17ms
  mocha:runnable timeout 60000 +235ms
@derdeka derdeka added the bug label Jan 14, 2020
@dougal83 dougal83 added the developer-experience Issues affecting ease of use and overall experience of LB users label Jan 14, 2020
@dougal83
Copy link
Contributor

I found it runs horribly on Win10 (other issues, /w some tests). The best solution for me was to use a virtual machine with linux.

If you have HyperV capable PC then it's extremely quick to set up a VM.

@bajtos
Copy link
Member

bajtos commented Jan 14, 2020

This is a known issue, but also difficult to reproduce locally. See the discussion in #1350 and nodejs/node#21210 for more details.

To avoid this issue on CI, we are using Node.js 8.x on Windows:

https://github.com/strongloop/loopback-next/blob/c1cff12ba579975e17a9f0d03e8b8bf0d3e3c09a/appveyor.yml#L3

Considering that Node.js 8.x is no longer supported by the Node.js project, I think it would be great to take another look at this problem and find a way how to make our test suite pass on Windows versions run by CI provides like AppVeyor, GitHub Actions or Azure Pipelines.

@bajtos bajtos added the os:Windows Issues specific to Windows label Jan 14, 2020
@bajtos
Copy link
Member

bajtos commented Jan 14, 2020

Re-posting #1431 (comment)

Weird ... it failed without any messages, timeouts from Mocha or anything.

This is the same problem we have been experiencing from the start, see #1350 (comment):

looks like there is some other bug on Windows, the test run freezes inside our test suite.

I have described details in nodejs/node#21210. In short, the callback installed via setImmediate is never invoked, the Node.js process hangs forever and after one hour, AppVeyor gives up waiting and kills the build as timed out.

@dhmlau
Copy link
Member

dhmlau commented Jan 16, 2020

@bajtos, are you suggesting we should move away using setImmediate?

FYI - I was able to reproduce on my home computer - Win10 Home. When running npm test, I got similar results as @derdeka:

  .......  mocha:runner run suite GeoLookupService +34ms
.  mocha:runner run suite TodoController +8ms
  mocha:runner run suite TodoController createTodo +0ms
..  mocha:runner run suite TodoController findTodoById +15ms
.  mocha:runner run suite TodoController findTodos +7ms
...  mocha:runner run suite TodoController replaceTodo +22ms
.  mocha:runner run suite TodoController updateTodo +7ms
.  mocha:runner run suite TodoController deleteTodo +7ms
.  mocha:runner run suite app-generator (SLOW) +7ms
  mocha:runnable timeout 60000 +116ms

@derdeka
Copy link
Contributor Author

derdeka commented Jan 16, 2020

I've tried to replace all setImmediate(callback) with setTimeout(callback, 0), but without any difference. I think that might not be the root cause.

@dougal83
Copy link
Contributor

Alternative to Win 10 native(or full VM) is to set up WSL. VSCode has an extension to connect to remote.

Just run test and got following result:

   2960 passing (4m)
  8 pending
  1 failing

  1) HttpServer (integration)
       reports error when the server cannot be started:
     AssertionError: expected [Promise] to be rejected
      at Assertion.fail (packages/testlab/node_modules/should/as-function.js:275:17)
      at /mnt/i/dev/loopback-next/packages/testlab/node_modules/should/as-function.js:1747:16
      at runMicrotasks (<anonymous>)
      at processTicksAndRejections (internal/process/task_queues.js:94:5)
      at Context.<anonymous> (packages/http-server/src/__tests__/integration/http-server.integration.ts:204:5)

I could write a small guide on setting this up as Windows users might need some hand holding on setting up env in linux.

@derdeka
Copy link
Contributor Author

derdeka commented Jan 17, 2020

@dougal83 I would like to test the WSL approach to continue the work on a PR. Can you send me a guide?

@dougal83
Copy link
Contributor

dougal83 commented Jan 17, 2020

From what I can remember offhand:

For Ubuntu on WSL bash prompt(powershell > ubuntu cmd) you'll need to install basics:

  • check for updates install them: sudo apt update > apt list --upgradable > sudo apt upgrade
  • Nodejs
  • Git: sudo apt install git

I think that is most of it. Happy to help if you have any issues. If you can document any issues and resolutions we could put together a guide together if you like.

The pro tip is... git config --global core.autocrlf true or you'll be setting all files as modified in git. :)

@bajtos
Copy link
Member

bajtos commented Jan 20, 2020

@derdeka @dougal83 thank you for looking into better support for developing LoopBack on Windows ❤️ It would be awesome if you could describe the recommended setup in our developer documentation, see https://github.com/strongloop/loopback-next/blob/master/docs/site/DEVELOPING.md

For longer term, I'd like to figure out what's going on under the hood, why are the test freezing on Windows; and hopefully find a fix. I am happy to do the investigation myself, but need to find time in respect to other priorities our team has.

@bajtos
Copy link
Member

bajtos commented Jan 24, 2020

I am able to reproduce the problem on my personal Windows laptop.

  • Windows 10 Pro, Version 1909, Build 18363.592
  • Node.js 12.14.1
  • Source code from the current HEAD - 9de1133

Here is a fast way how to trigger the problem:

  1. Checkout the code

  2. npm install

  3. Inside packages/testlab:

    npm run build
    
  4. Inside packages/cli:

    .\node_modules\.bin\lb-mocha -R list test\acceptance\app-run.acceptance.js
    

@bajtos
Copy link
Member

bajtos commented Jan 24, 2020

While debugging the code, I found that BaseGenerator.prototype.setOptions() is trying to read JSON configuration from stdin. I think we may have a bug in this condition:

https://github.com/strongloop/loopback-next/blob/9de1133ecdaaf7625ba3cf740567dc6190fa9a52/packages/cli/lib/base-generator.js#L87-L90

My wild guess is that on Unix, process.stdin.isTTY is truthy and therefore we don't try to read from stdin. On Windows, the terminal is not TTY-like.

The second problem is in the prompt helper. When the terminal is not TTY, the prompt aborts the generator in a LoopBack specific way: no error is throws to Yeoman, we just enable this.exitGeneration flag.

https://github.com/strongloop/loopback-next/blob/9de1133ecdaaf7625ba3cf740567dc6190fa9a52/packages/cli/lib/base-generator.js#L202-L211

This would be fine on its own if our generators were correctly handling this.exitGeneration flag. Unfortunately, that's not always the case. When a generator step detects exit condition and return undefined, then all seems to be ok. However, when a generator step returns false, it halts the queue and application freezes.

https://github.com/strongloop/loopbaack-next/blob/9de1133ecdaaf7625ba3cf740567dc6190fa9a52/packages/cli/generators/app/index.js#L81

I find it strange that some generators steps are returning false, while most other steps returning undefined. See e.g. here:

https://github.com/strongloop/loopback-next/blob/9de1133ecdaaf7625ba3cf740567dc6190fa9a52/packages/cli/generators/app/index.js#L71-L82

My conclusions:

  1. We need to fix our CLI to allow prompts in non-TTY environments, or at least detect when a generator is executed with mocked prompts.

  2. Nice to have: revisit our error handling strategy and flow control related to exiting after an expected failure, ensure that we don't freeze the process.

@raymondfeng You are the author of the error handing strategy, could you please take a look at the second item? I'll try to take a look at TTY-related problems myself.

@bajtos
Copy link
Member

bajtos commented Jan 24, 2020

How to uncover the TTY-related problem on a Unix machine: pipe /dev/null to stdin when running the tests.

For example, inside packages/cli directory:

$ ./node_modules/.bin/lb-mocha test/integration/generators/model.integration.js < /dev/null

@raymondfeng
Copy link
Contributor

raymondfeng commented Jan 24, 2020

I find it strange that some generators steps are returning false, while most other steps returning undefined.

As far as I know, the return value from a generator method does not have impact how the queued steps are executed.

However, when a generator step returns false, it halts the queue and application freezes.

Let me dig into the source code to check.

See https://github.com/yeoman/generator/blob/master/lib/index.js#L449. It does check the returned value from the method

@raymondfeng
Copy link
Contributor

FYI: The intention of if (jsonFileOrValue === 'stdin' || !process.stdin.isTTY) { was to read the input from stdin or a piped stdin such as < config.json.

@bajtos Does process.stdin.isTTY return false on Windows?

@bajtos
Copy link
Member

bajtos commented Feb 6, 2020

FYI: The intention of if (jsonFileOrValue === 'stdin' || !process.stdin.isTTY) { was to read the input from stdin or a piped stdin such as < config.json.

I understand that intention, it's just seems that the current implementation does not work as expected. Personally, I'd prefer to have an explicit CLI option like --interactive (when prompting the user, enabled by default) and --no-interactive (when reading JSON from stdin).

Does process.stdin.isTTY return false on Windows?

This is tricky. It usually returns true :

C:\>node -e "console.log('isTTY?', process.stdin.isTTY)"
isTTY? true

However, when I run Mocha tests from VS Code debugger, stdin is not TTY (isTTY returns undefined).

I think that explains my observations described in #4425 (comment). Unfortunately, it also means that this problem is very difficult to troubleshoot in the debugger. IMO, that's one more reason for abandoning isTTY checks, or at least implementing a way how to enforce the same behavior regardless of whether we are running the code from a TTY terminal or inside a debugger.

I added some more logging, here is what I am observing. When running the tests in test\integration\generators\model.integration.js only, the shared tests pass and then I get here:

prompting: isTTY? true
CALLING THE ORIGINAL PROMPT [
  {
    type: 'input',
    name: 'name',
    message: 'Model class name:',
    when: false,
    default: undefined,
    validate: [Function]
  }
]
GOT ANSWERS {}

We call BaseGenerator.prompt, which decides to call super.prompt. We ask a single prompt for name and receive an empty answers object back. At this point the process hangs.

I see two problems here:

  1. Why is the process hanging? IMO, the generator should abort with an error. I think this is the problem I described in npm test hangs infinitly on windows #4425 (comment), this should be easy to debug & fix on any OS (e.g. MacOS).
  2. super.prompt is supposed to call mock prompt provided by Yeoman test utils. Why is the answers object empty? AFAICT, the test provides name answer in the mocked prompt data, see here.

@bajtos
Copy link
Member

bajtos commented Feb 7, 2020

Based on the discussion with @dhmlau, let's timebox the February work on this issue to one week. Learn more about the problem, fix what can be fixed quickly and then propose the next steps and/or follow-up issues.

@bajtos
Copy link
Member

bajtos commented Feb 10, 2020

However, when I run Mocha tests from VS Code debugger, stdin is not TTY (isTTY returns undefined).

This part can be solved by running the debugger manually via node --inspect-brk.

@bajtos
Copy link
Member

bajtos commented Feb 10, 2020

We call BaseGenerator.prompt, which decides to call super.prompt. We ask a single prompt for name and receive an empty answers object back. At this point the process hangs.

I see two problems here:

  1. Why is the process hanging? IMO, the generator should abort with an error. I think this is the problem I described in npm test hangs infinitly on windows #4425 (comment), this should be easy to debug & fix on any OS (e.g. MacOS).

AFAICT, when I run tests with stdin read from /dev/null, then we don't get as far as prompting for options like name, however the tests don't always pass. The outcome is different depending on how I limit the tests executed and whether Mocha is configured with no timeout.

Few examples:

  • lb-mocha "test/integration/generators/model.integration.js" -b -t 0 < /dev/null
    All tests pass

  • $ lb-mocha "test/integration/generators/model.integration.js" -b < /dev/null
    The test does not run without package.json fails on 2s timeout. (More tests would fail on timeout too if I didn't provide -b argument to Mocha.)

  • $ lb-mocha "test/**/*.js" -b < /dev/null

    Fails with the following report:

    1) app-generator (SLOW)
         "before all" hook: install dependencies for "passes `npm test` for the generated project":
       ValidationError: No packages remain after filtering [ '@loopback/sandbox-app' ]
        at filterPackages (/Users/bajtos/src/loopback/next/node_modules/@lerna/filter-packages/filter-packages.js:54:13)
        at /Users/bajtos/src/loopback/next/node_modules/@lerna/filter-options/lib/get-filtered-packages.js:39:5
    

I am not sure if this is directly related to problems on Windows, but I think it's still worth fixing to make troubleshooting of CLI bugs easier in general.

@bajtos
Copy link
Member

bajtos commented Feb 10, 2020

I am not sure if this is directly related to problems on Windows, but I think it's still worth fixing to make troubleshooting of CLI bugs easier in general.

On the second thought, I think it's better to discuss this problem in a different issue and keep the discussion here focused on Windows. See #4607

@bajtos
Copy link
Member

bajtos commented Feb 13, 2020

With #4605 and #4605 landed, npm test is no longer freezing on my Windows machine when using Node.js version 12.16.0. There are few test failures though, I opened #4643 to fix some of them. There will be few more fixes required until npm test is green, I'll continue my investigation.

@dougal83
Copy link
Contributor

@bajtos Happy to confirm tests work on my Win 10 machine. Just say the word and I'll check.

@bajtos
Copy link
Member

bajtos commented Feb 13, 2020

@dougal83 that's great news! Are the tests actually passing for you? I am seeing several test failures when running the tests from Windows CLI (cmd.exe). I am fixing some of the problems in #4643, can you please review?

@dougal83
Copy link
Contributor

dougal83 commented Feb 13, 2020

OK I will review #4643 next. On my main PC via powershell(run as administrator) npm run test (EDIT: had to regenerate snapshots) returns:

  3087 passing (5m)
  12 pending
  5 failing

  1) HttpCachingProxy
       handles the case where backend service is not running:
     Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (I:\dev\loopback-next\packages\http-caching-proxy\dist\__tests__\integration\http-caching-proxy.integration.js)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  2) HttpServer (integration)
       stops server:
     Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (I:\dev\loopback-next\packages\http-server\dist\__tests__\integration\http-server.integration.js)     
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  3) HttpServer (integration)
       stops server with grace period and inflight request:
     Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (I:\dev\loopback-next\packages\http-server\dist\__tests__\integration\http-server.integration.js)     
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  4) HttpServer (integration)
       stops server with shorter grace period and inflight request:
     Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (I:\dev\loopback-next\packages\http-server\dist\__tests__\integration\http-server.integration.js)     
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  5) app-generator with tilde project path
       "before all" hook for "scaffold a new application for tilde-path-app":   

      AssertionError [ERR_ASSERTION]: true == false
      + expected - actual

      -true
      +false

      at Context.<anonymous> (packages\cli\test\integration\generators\app.integration.js:281:12)
      at processImmediate (internal/timers.js:439:21)

@bajtos
Copy link
Member

bajtos commented Feb 14, 2020

Thank you @dougal83 for reviewing and testing #4643. My take is that we are on the right track and have few more tests to fix.

Here are the next steps I see:

  1. Land Fix Windows-related issues in CLI tests #4643
  2. Investigate app-generator with tilde project path test failure. I am not sure if Windows allows ~ in file and directory names?
  3. Investigate HttpCachingProxy and HttServer test failures. I suspect they may have the same cause.

Do you have any bandwidth @dougal83 to help me and work on the second or the third task yourself?

@dougal83
Copy link
Contributor

@bajtos I'll take a look at (2). Tilde is used to shorten paths to my knowledge. Hopefully you find (3) the same!

@dougal83
Copy link
Contributor

dougal83 commented Feb 14, 2020

@bajtos I did wonder why a sandbox folder appeared in packages and now we have a/the culprit.

app.integration.js#L259 appears it should read const rootDir = path.join(__dirname, '../../../../../'); Change passes on Win10 + WSL.
EDIT: Ditto for L220 (Both changes pass on Linux + Win).

PR created.

dougal83 added a commit to dougal83/loopback-next that referenced this issue Feb 14, 2020
Fixes error on windows tests.  Removes instances of sandbox folder
erroniously placed in packages folder.

See loopbackio#4425

Signed-off-by: Douglas McConnachie <[email protected]>
raymondfeng pushed a commit that referenced this issue Feb 14, 2020
Fixes error on windows tests.  Removes instances of sandbox folder
erroniously placed in packages folder.

See #4425

Signed-off-by: Douglas McConnachie <[email protected]>
dougal83 added a commit to dougal83/loopback-next that referenced this issue Feb 15, 2020
give windows extra time to complete tests

Fix loopbackio#4425

Signed-off-by: Douglas McConnachie <[email protected]>
dougal83 added a commit to dougal83/loopback-next that referenced this issue Feb 15, 2020
give windows extra time to complete tests

Fix loopbackio#4425

Signed-off-by: Douglas McConnachie <[email protected]>
@dougal83
Copy link
Contributor

@bajtos (3) was the same. Thanks for doing all the hard work! ;)

dougal83 added a commit to dougal83/loopback-next that referenced this issue Feb 17, 2020
give windows extra time to complete tests

Fix loopbackio#4425

Signed-off-by: Douglas McConnachie <[email protected]>
raymondfeng pushed a commit that referenced this issue Feb 17, 2020
give windows extra time to complete tests

Fix #4425

Signed-off-by: Douglas McConnachie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug developer-experience Issues affecting ease of use and overall experience of LB users os:Windows Issues specific to Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants