Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRIORITY] Persistent failures on Windows #1005

Closed
rvagg opened this issue Feb 28, 2015 · 36 comments
Closed

[PRIORITY] Persistent failures on Windows #1005

rvagg opened this issue Feb 28, 2015 · 36 comments
Labels
windows Issues and PRs related to the Windows platform.

Comments

@rvagg
Copy link
Member

rvagg commented Feb 28, 2015

See https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/215/nodes=iojs-win2012r2/console for sample output, consistent with 2008 build and consistent across runs at the moment too.

10 failures in total. These have gone uncaught for a little while because of a combination of:

  • Jenkins :trollface:
  • Jenkins configuration problems/mistakes (mainly on my part)
  • The fact that test-ci still doesn't work on any platform so we have to resort to more hacky means of making enough tests run
  • Errors in vcbuild.bat that have meant we haven't been running the full suite, fixed in build: improve vcbuild.bat #998

So we've been getting lots of blue when they really should have been red; so totally off everyone's radar.

I'm can't assess the severity of these failures at a glance, nor can I see a single common theme that would point to something to address. When I have time I'll go back and find a run that had these passing so we can start a manual bisect at least.

_test-child-process-stdio-big-write-end fixed in #1008_

=== release test-child-process-stdio-big-write-end ===
Path: parallel/test-child-process-stdio-big-write-end
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: write ENOTSUP
    at exports._errnoException (util.js:734:11)
    at Socket._writeGeneric (net.js:668:26)
    at Socket._writev (net.js:682:8)
    at doWrite (_stream_writable.js:293:12)
    at clearBuffer (_stream_writable.js:389:5)
    at onwrite (_stream_writable.js:338:7)
    at Socket.WritableState.onwrite (_stream_writable.js:88:5)
    at WriteWrap.afterWrite (net.js:765:12)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-stdio-big-write-end.js
=== release test-timers-first-fire ===
Path: parallel/test-timers-first-fire
timer fired in -0.19568000000000296
assert.js:87
  throw new assert.AssertionError({
        ^
AssertionError: Timer fired early
    at null._onTimeout (c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-timers-first-fire.js:11:10)
    at Timer.listOnTimeout (timers.js:88:15)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-timers-first-fire.js
=== release test-child-process-double-pipe ===
Path: parallel/test-child-process-double-pipe
grep stdin write 7
echo exit
grep stdin write 18
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-double-pipe.js
--- TIMEOUT ---
=== release test-child-process-exit-code ===
Path: parallel/test-child-process-exit-code
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-exit-code.js
--- TIMEOUT ---
=== release test-child-process-spawnsync ===
Path: parallel/test-child-process-spawnsync
sleep started
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-spawnsync.js
--- TIMEOUT ---
=== release test-child-process-spawnsync-input ===
Path: parallel/test-child-process-spawnsync-input
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-spawnsync-input.js
--- TIMEOUT ---
=== release test-http-curl-chunk-problem ===
Path: parallel/test-http-curl-chunk-problem
dd command:  "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe" "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\create-file.js" "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\tmp.11\big" 10485760
Server running at http://localhost:8080
making curl request
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-http-curl-chunk-problem.js
--- TIMEOUT ---
=== release test-process-kill-null ===
Path: parallel/test-process-kill-null
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-process-kill-null.js
--- TIMEOUT ---

_test-pipe-head fixed in #1008_

=== release test-pipe-head ===
Path: sequential/test-pipe-head
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\sequential\test-pipe-head.js
--- TIMEOUT ---
=== release test-stdin-from-file ===
Path: sequential/test-stdin-from-file
"c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe" "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\echo-close-check.js" < "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\stdin.txt"
c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\sequential\test-stdin-from-file.js:37
  if (err) throw err;
                 ^
Error: Command failed: C:\Windows\system32\cmd.exe /s /c ""c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe" "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\echo-close-check.js" < "c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\stdin.txt""


assert.js:87

  throw new assert.AssertionError({

        ^

AssertionError: false == true

    at Server.<anonymous> (c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\fixtures\echo-close-check.js:17:5)

    at Server.g (events.js:257:16)

    at emitNone (events.js:67:13)

    at Server.emit (events.js:163:7)

    at net.js:1184:12

    at process._tickCallback (node.js:350:11)


    at ChildProcess.exithandler (child_process.js:716:12)
    at emitTwo (events.js:87:13)
    at ChildProcess.emit (events.js:169:7)
    at maybeClose (child_process.js:984:16)
    at Process.ChildProcess._handle.onexit (child_process.js:1057:5)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\sequential\test-stdin-from-file.js
@rvagg
Copy link
Member Author

rvagg commented Feb 28, 2015

ping @iojs/platform-windows, this needs fairly urgent attention from anyone capable of digging in and figuring these out.

@rvagg
Copy link
Member Author

rvagg commented Feb 28, 2015

Two of these have been fixed by #1008

Last complete test-simple run I can find is: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/nodes=iojs-win2012r2/54/console which is back at January 12th. i.e. so it seems that we've been happily seeing passes in CI for Windows builds since we've been "releasing" and none of us have noticed that it's been mainly test-message that's been running. This is entirely because it's been assumed that vcbuild.bat ... test-simple test-message is supposed to bundle both of them in to one, but unfortunately vcbuild.bat never had that properly working anyway. Fixed now in #998.

That means two things:

  1. We're much further behind on Windows quality than we thought
  2. These fixes probably aren't quite as urgent as we thought because some of these may be have been broken for longer than just 1.4.1 but we haven't had direct bug reports about them.

/cc @iojs/build

@rvagg
Copy link
Member Author

rvagg commented Mar 1, 2015

I had limited success in reproducing many of these. Running them locally on a Windows machine and even running them on the exact same server that is giving us the failures via Jenkins, I can only make either 4 or 3 tests fail. The block of timeouts seem to be somehow Jenkins-specific.

Can other Windows users verify against 1.v HEAD and report back? vcbuild x64 release nosign test-simple (add test-message at the end to do exactly the same as Jenkins).

@domenic
Copy link
Contributor

domenic commented Mar 1, 2015

I get these failures: https://gist.github.com/domenic/ad191da152fc6632fa32 Unsure what to make of them compared to yours above.

I guess the first one, maybe others, is due to not having the openssl binary installed. I'll go check my prereqs and report back if anything else changes.

@rvagg
Copy link
Member Author

rvagg commented Mar 1, 2015

Install git bash and you'll get what you need. Google "git windows" and download and install that one.

@domenic
Copy link
Contributor

domenic commented Mar 1, 2015

Yeah I have Git bash but when I try to run .bat files from within it all hell breaks loose.

@rvagg
Copy link
Member Author

rvagg commented Mar 1, 2015

You should be able to run vcbuild.bat from cmd.exe but git bash comes with curl and other Unix utils that can go in your PATH. I think that's an install option though.

@domenic
Copy link
Contributor

domenic commented Mar 1, 2015

OK yeah, getting the same failures minus the first one (the openssl one).

@seishun
Copy link
Contributor

seishun commented Mar 5, 2015

I have 32-bit Windows at home so I ran vcbuild release nosign test-simple against b72fa03, The result is similar to @domenic's: https://gist.github.com/seishun/ebba8198423259444892

After deleting tmp.0 manually I re-ran parallel/test-fs-access and it passed.

@rvagg
Copy link
Member Author

rvagg commented Mar 6, 2015

git bisect says that @indutny's changes in b968623 / #840 introduced the test-stdin-from-file failures

I'm still having trouble reproducing the timeout failures, perhaps something odd about the way the processes are being spawned via Jenkins is feeding in to that.

@piscisaureus
Copy link
Contributor

@rvagg Before that patch Server.fd would be undefined on windows, after the patch the value is -1. But that test has already been fixed: abd3ecf

@rvagg
Copy link
Member Author

rvagg commented Mar 6, 2015

right, sorry, I'm barking up the wrong tree test, I need to narrow this down but there's definitely more failures after that merge for Windows than before but cross referencing with what's already fixed is tough.

so, ignore my comment for now!

@rmg
Copy link
Contributor

rmg commented Mar 6, 2015

I haven't looked at the tests or the Jenkins config, so forgive me if this isn't helpful..

The combination of Jenkins and Windows immediately brings to mind the problems I've had with build console logs being garbled or out of order because it is non-blocking. Depending on how the scripts are wired up, that could introduce timing issues.

@rvagg
Copy link
Member Author

rvagg commented Mar 9, 2015

Restarting Jenkins seems to have fixed the timeouts, I was playing with running Jenkins from cmd rather than as a service but soon discovered it was simply a matter of needing to restart. I also did a full cleanout of the build workspace so that potentially could have helped too, I really don't know. Jenkins :trollface:

These are the remaining persistent failures on both 2008 and 2012.

If you want to contribute but don't feel confident enough to actually find and fix the underlying problem then running a git bisect would be a great help to narrow down the commits where these were introduced (I suppose one at a time, best not to assume they are all related). I don't know a good to start from though so you'll have to go fishing.

Build with vcbuild x64 release nosign, then test with Release\iojs.exe test\parallel\test-http-content-length.js (etc.).

=== release test-http-content-length ===
Path: parallel/test-http-content-length
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: read ECONNRESET
    at exports._errnoException (util.js:734:11)
    at TCP.onread (net.js:538:26)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\test\parallel\test-http-content-length.js
=== release test-regress-GH-io-1068 ===
Path: parallel/test-regress-GH-io-1068
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: shutdown EPIPE
    at exports._errnoException (util.js:734:11)
    at Socket.onSocketFinish (net.js:218:26)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:163:7)
    at finishMaybe (_stream_writable.js:477:14)
    at endWritable (_stream_writable.js:486:3)
    at Socket.Writable.end (_stream_writable.js:452:5)
    at Socket.end (net.js:393:31)
    at process._tickCallback (node.js:349:13)
    at Function.Module.runMain (module.js:487:11)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\test\parallel\test-regress-GH-io-1068.js
=== release test-tls-over-http-tunnel ===
Path: parallel/test-tls-over-http-tunnel
CLIENT: Making CONNECT request
PROXY: got a client connection
PROXY: got CONNECT request
PROXY: creating a tunnel
PROXY: replying to client CONNECT request
CLIENT: got CONNECT response
CLIENT: Making HTTPS request
SERVER: got request
SERVER: sending response
CLIENT: got HTTPS response
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: read ECONNRESET
    at exports._errnoException (util.js:734:11)
    at TCP.onread (net.js:538:26)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\test\parallel\test-tls-over-http-tunnel.js

@Fishrock123
Copy link
Contributor

test-http-content-length is from #1062

See also #1137

More info: (edit):

First CI run: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/250/

CI was not run on the PR.

Edit2: fixed in 53e200a

@Fishrock123
Copy link
Contributor

test-regress-GH-io-1068 is from #1073

I reported that the test case was failing on windows a week ago in the original issue: #1068 (comment)

More info: (edit):

First CI: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/247/

CI was run on this PR, it did error, was unreproducible out of CI, and further investigation did not yet happen.

@Fishrock123
Copy link
Contributor

Fwiw, both of these have been failing since they were added.

@Fishrock123
Copy link
Contributor

There's currently a new one: #881 (comment)

@chrisdickinson
Copy link
Contributor

#1150 reverts preload.

@Fishrock123
Copy link
Contributor

test-tls-over-http-tunnel does indeed appear to have been fixed by #1155

https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/312/

@Fishrock123
Copy link
Contributor

As of recently, test-net-reconnect-error has been timing out (almost?) every run on windows...

=== release test-net-reconnect-error ===
Path: parallel/test-net-reconnect-error
CLIENT disconnect
(...)
CLIENT disconnect
CLIENT error: ECONNREFUSED
(...)
CLIENT error: ECONNREFUSED
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-net-reconnect-error.js
--- TIMEOUT ---

@indutny
Copy link
Member

indutny commented Mar 18, 2015

Doesn't happen for me on my Windows box.

@mathiask88
Copy link
Contributor

I can repro test-net-reconnect-error timeout on win8.1. I think it comes from f19e9b6 where a 50s timeout was added to tests. And on my machine the tcp timeout is about 1s so this test is quite tight.

@Fishrock123
Copy link
Contributor

@mathiask88 thanks for pointing that out. Given that is a changelog commit I believe it was committed in error from dev testing. #1198

ping @rvagg

@rvagg
Copy link
Member Author

rvagg commented Mar 19, 2015

I restarted jenkins again on the 2008 machines (actually the 2012 ones as well) and we're back to parity between test runs on both: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/331/nodes=iojs-win2008r2/console

So now it's just test-regress-GH-io-1068 and test-net-reconnect-error left.

@Fishrock123
Copy link
Contributor

As of dd37fb4 there might not be any more persistent windows timeouts: #1198

(Which leaves only the dreaded test-regress-GH-io-1068)

@rvagg
Copy link
Member Author

rvagg commented Mar 19, 2015

improving, but we now have test-child-process-stdout-flush-exit in the mix occasionally, not sure if this is a new failure but it's shown up a few times in recent runs.

@mathiask88
Copy link
Contributor

Maybe it helps, but on my windows machine test-regress-GH-io-1068 works.

@Fishrock123
Copy link
Contributor

The windows build(s) build are green as of #1233 cheers all! 🍻

Of course, special thanks to @indutny for his hard work on lots of this. :)

@indutny
Copy link
Member

indutny commented Mar 22, 2015

🍺 to everyone! Now to figure out some odd leak.

@rvagg
Copy link
Member Author

rvagg commented Mar 22, 2015

woo! thanks all

@rvagg rvagg closed this as completed Mar 22, 2015
@piscisaureus
Copy link
Contributor

Thank you very much @indutny !

@artnikpro
Copy link

Same thing on OS X Yosemite

$ http-server
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE 0.0.0.0:8080
    at Object.exports._errnoException (util.js:837:11)
    at exports._exceptionWithHostPort (util.js:860:20)
    at Server._listen2 (net.js:1231:14)
    at listen (net.js:1267:10)
    at net.js:1376:9
    at doNTCallback3 (node.js:440:9)
    at process._tickCallback (node.js:346:17)
    at Function.Module.runMain (module.js:477:11)
    at startup (node.js:117:18)
    at node.js:951:3

@nikolmarku1
Copy link

Same thing on OS X Yosemite could we open this issue

@indutny
Copy link
Member

indutny commented Sep 30, 2015

Perhaps you have a server listening on 8080 port?

@nodejs nodejs locked and limited conversation to collaborators Oct 6, 2015
@Fishrock123
Copy link
Contributor

I'm going to lock this. This is historical only. Please report new issues if they arise in new issue thread. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests