Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent repl test failures #7640

Closed
staticfloat opened this issue Jul 17, 2014 · 19 comments
Closed

Intermittent repl test failures #7640

staticfloat opened this issue Jul 17, 2014 · 19 comments
Labels
bug Indicates an unexpected problem or unintended behavior priority This should be addressed urgently test This change adds or pertains to unit tests
Milestone

Comments

@staticfloat
Copy link
Member

I'm getting intermittent test failures on OSX for the repl test:

...
        From worker 4:       * floatapprox
        From worker 4:       * readdlm
        From worker 5:       * regex
        From worker 3:       * float16
        From worker 5:       * combinatorics
        From worker 3:       * sysinfo
        From worker 5:       * rounding
        From worker 5:       * ranges
        From worker 3:       * mod2pi
        From worker 3:       * euler
        From worker 3:       * show
        From worker 4:       * lineedit
        From worker 3:       * replcompletions
        From worker 4:       * repl
        From worker 3:       * test
        From worker 3:       * examples
Worker 4 terminated.
ERROR: ProcessExitedException()
 in wait at ./task.jl:279
 in wait at ./task.jl:189
 in wait_full at ./multi.jl:594
 in remotecall_fetch at multi.jl:696
 in remotecall_fetch at multi.jl:701
 in anonymous at task.jl:1348
while loading /Users/sabae/tmp/julia-packaging/osx10.7+/julia-master/test/runtests.jl, in expression starting on line 46

@Keno any ideas of what I could do to debug this? The backtrace isn't particularly helpful. Note that this is mostly happening when I'm running tests inside Homebrew or the nightly build process, I haven't been able to trigger it reliably yet.

@staticfloat
Copy link
Member Author

Hmmm. Looks like it's not just OSX either.
https://travis-ci.org/JuliaLang/julia/jobs/30194012

@staticfloat staticfloat removed the mac label Jul 17, 2014
@staticfloat staticfloat added this to the 0.3 milestone Jul 17, 2014
@tkelman
Copy link
Contributor

tkelman commented Jul 17, 2014

Related to 3e2a2fa maybe?

@jiahao
Copy link
Member

jiahao commented Jul 18, 2014

Just ran into this too on Travis: https://travis-ci.org/JuliaLang/julia/jobs/30245693#L1517

@vtjnash
Copy link
Member

vtjnash commented Jul 22, 2014

Related to 3e2a2fa maybe?

no, that one would result in a hung task

@JeffBezanson
Copy link
Member

This is failing quite frequently on travis. It's hard to get a passing build. We really need to fix this.

@Fedster
Copy link

Fedster commented Jul 22, 2014

Hi,

failing again on the latest build:

https://gist.github.com/pao/65690e5bfbbc8c88eb39

I'm on OSX, 10.9.4.

Best

F

@pao
Copy link
Member

pao commented Jul 22, 2014

@Fedster Thanks for the report. Since this is the same failure as in the original report, and I don't believe it adds any new information, I have moved the contents into a gist (available at https://gist.github.com/pao/65690e5bfbbc8c88eb39 and linked in place above) which keeps this issue's discussion concise.

@Fedster
Copy link

Fedster commented Jul 22, 2014

@pao thanks for doing that

@staticfloat
Copy link
Member Author

I've got a bad feeling about this. </starwars>

Looks like it's not necessarily related to the repl test at all.

@Fedster
Copy link

Fedster commented Jul 22, 2014

Just did git fetch && git merge origin

...
From git://github.com/JuliaLang/julia
9299b25..e874beb master -> origin/master
Updating 9299b25..e874beb
Fast-forward
doc/manual/control-flow.rst | 4 ++--
doc/stdlib/linalg.rst | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)

and testall passes

@staticfloat
Copy link
Member Author

Try running it a couple times. I find it's rather intermittent.
-E

On Tue, Jul 22, 2014 at 11:29 AM, Fedster [email protected] wrote:

Just did git fetch && git merge origin

...
From git://github.com/JuliaLang/julia
9299b25 https://github.com/JuliaLang/julia/commit/9299b25..e874beb
e874beb master ->
origin/master
Updating 9299b25 9299b25..
e874beb e874beb
Fast-forward
doc/manual/control-flow.rst | 4 ++--
doc/stdlib/linalg.rst | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)

and testall passes


Reply to this email directly or view it on GitHub
#7640 (comment).

@Fedster
Copy link

Fedster commented Jul 22, 2014

I just tried twice more and testall passed all the times (i.e. passed 3 times in a row).

@vtjnash vtjnash mentioned this issue Jul 23, 2014
@vtjnash
Copy link
Member

vtjnash commented Jul 23, 2014

https://gist.github.com/vtjnash/837ec5dc9e06a68a99a9
https://gist.github.com/vtjnash/14235f4ff896cdcdfd2f

i find that failures in the repl test don't give an error log however

@vtjnash
Copy link
Member

vtjnash commented Jul 23, 2014

this is the closest I got to what might be the backtrace:
https://gist.github.com/vtjnash/d02a9a275d0dbbad89ee

vtjnash referenced this issue Jul 24, 2014
…ation"

This reverts commit d303568.

This commit breaks linux error handling. For example:
julia> f()=f();f()
Segmentation fault (core dumped)
@vtjnash vtjnash closed this as completed Jul 24, 2014
@vtjnash
Copy link
Member

vtjnash commented Jul 24, 2014

@tkelman is pretty sure this fixed now. i agree that not having functions lying around containing invalid pointers is probably a good thing for avoiding segfaults.

@staticfloat
Copy link
Member Author

I'm still getting failures on Linux which look suspiciously similar (This log is a Linux build of 4448e7c). I have a different error on OSX now, but I'm not certain it's related.

@tkelman
Copy link
Contributor

tkelman commented Jul 24, 2014

Spawn segfault, 32 bit? Seems like a different problem to me. At 4448e7c on 64 bit Ubuntu 14.04 I got 10 successful make testalls in a row. Reverting to 0195310 fails the repl test the first time (still collecting more runs).

@tkelman
Copy link
Contributor

tkelman commented Jul 24, 2014

On the same machine at 0195310, ten runs of make testall gave 6 repl test failures. Looks much better now to me, but if anyone still sees repl failures on current master, speak up.

@staticfloat
Copy link
Member Author

Having slept on it, I believe @tkelman is right. Thanks, @vtjnash for hunting that down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior priority This should be addressed urgently test This change adds or pertains to unit tests
Projects
None yet
Development

No branches or pull requests

7 participants