-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky tests: test_wait_signal, test_wait, test_execve #251
Comments
Including output from @viraptor's report in #260:
|
new flaky test:
|
Currently, several of the tests are failing intermittently. After some research it appears that these failures only occur when thread parallelism is enabled (as is the case by default). To test, I just ran the failing tests over and over. I would consistently see errors when running the following: $ while true; do target/debug/test-7ec4d9681e812f6a; done When I forced single threaded execution, I no longer saw failures: $ while true; do RUST_TEST_THREADS=1 target/debug/test-7ec4d9681e812f6a; done I was mostly looking at the test_unistd failures which make calls out to fork() and then make subsequent calls to wait(). In that case there is one parent and the wait() called could (and frequently does) get some random child pid back because it just happened to terminate. That is why when one of the test fails so does the other one. I couldn't think of an obvious fix other than preventing thread parallelism in the short term. The tests still run very quickly. nix-rust#251 Signed-off-by: Paul Osborne <[email protected]>
|
…hubi testing: increase stability by removing thread parallelism Currently, several of the tests are failing intermittently. After some research it appears that these failures only occur when thread parallelism is enabled (as is the case by default). To test, I just ran the failing tests over and over. I would consistently see errors when running the following: $ while true; do target/debug/test-7ec4d9681e812f6a; done When I forced single threaded execution, I no longer saw failures: $ while true; do RUST_TEST_THREADS=1 target/debug/test-7ec4d9681e812f6a; done I was mostly looking at the test_unistd failures which make calls out to fork() and then make subsequent calls to wait(). In that case there is one parent and the wait() called could (and frequently does) get some random child pid back because it just happened to terminate. That is why when one of the test fails so does the other one. I couldn't think of an obvious fix other than preventing thread parallelism in the short term. The tests still run very quickly. #251 Signed-off-by: Paul Osborne <[email protected]>
test_sigwait still seems to fail sometimes even with the single threaded execution. Must be a separate problem. |
Thoughts on these issues:
I'm going to make them wait on the child they spawn instead of any old child. I'm still not sure what is going on in |
Those errors (mentioned on Jan 31) look like the ones I saw as well. |
I have some ideas on this as well, but I'll check your link. I think it's a separate issue from the flaky / intermittent failures though. |
I'll bet its race where the SIGUSR1 is being delivered before the wait is called. The error:
seems to be saying that the process was killed by SIGUSR1. If a context switch happened after the raise and before the call to wait, then it's entirely possible that that signal gets processed before thewait, and since no one is yet waiting for the signal, it terminates the process. But I'm really unfamiliar with the OSX kernel stuff, so I could be spouting nonsense. |
I opened #303 for the more general how-to-test issue. |
To me it looks like the test does not run in a single threaded process. I thought we changed the test behaviour to guarantee that? |
They have four problems: * The chdir tests change the process's cwd, which is global. Protect them all with a mutex. * The wait tests will reap any subprocess, and several tests create subprocesses. Protect them all with a mutex so only one subprocess-creating test will run at a time. * When a multithreaded test forks, the child process can sometimes block in the stack unwinding code. It blocks on a mutex that was held by a different thread in the parent, but that thread doesn't exist in the child, so a deadlock results. Fix this by immediately calling std::process:exit in the child processes. * My previous attempt at thread safety in the aio tests didn't work, because anonymous MutexGuards drop immediately. Fix this by naming the SIGUSR2_MTX MutexGuards. Fixes nix-rust#251
638: Make aio, chdir, and wait tests thread safe r=Susurrus Fix thread safety issues in aio, chdir, and wait tests They have four problems: * The chdir tests change the process's cwd, which is global. Protect them all with a mutex. * The wait tests will reap any subprocess, and several tests create subprocesses. Protect them all with a mutex so only one subprocess-creating test will run at a time. * When a multithreaded test forks, the child process can sometimes block in the stack unwinding code. It blocks on a mutex that was held by a different thread in the parent, but that thread doesn't exist in the child, so a deadlock results. Fix this by immediately calling `std::process:;exit` in the child processes. * My previous attempt at thread safety in the aio tests didn't work, because anonymous MutexGuards drop immediately. Fix this by naming the SIGUSR2_MTX MutexGuards. Fixes #251
I've seen these fail occasionally
sys::test_wait::test_wait_signal
test_unistd::test_wait
test_unistd::test_execve
Maybe some others
The text was updated successfully, but these errors were encountered: