-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: TestWaitGroupMisuse2 test is flaky #11443
Comments
Could you describe your PC? |
How many CPUs? Which? Which OS, OS version, arch (32-bit or 64-bit)? Dmitry, maybe 1e6 should be bigger in the test? /cc @dvyukov |
OS: 64-bit Debian 8.1 running on VMWare player 7.1 (host OS Windows 8). Hardware: 4 CPUs (Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz) |
I've just tested it under Windows 8 OS (the same hardware). The test is still flaky though success rate is higher. |
@kostya-sh What is the iteration count in the test at which it stops being flaky? |
And how long does the test take to run at said iteration count? |
It fails reliably on my Linux laptop in an LXC container running Ubuntu 14.10 (3/3 times): ok strings 0.156s My CPU's details (this is a Nehalem class mobile CPU): $ lscpu My OS's details: Ubuntu Linux 14.10 (x86_64), running kernel 4.0.6 (Linus' mainline) in an unprivileged LXC container within an Ubuntu 15.04 (x86_64) host: marebri@utopic:~/devel/go.git/src$ uname -a Other information: I reproduced this on the Ubuntu 15.04 (x86_64) host (failed 1/3 times): ok strings 0.179s vs the passing 2 tests: ok strings 0.236s ok strings 0.312s EDIT: Both host and LXC container are on the current git tip: d0ed87d |
We can see this on the dragonfly buildbot: http://build.golang.org/log/fd18334c684f5cec5c7d4f939c39f26ec7c30741 by 03a48eb. |
Even with 8e7 iterations the test is still flaky on x86_64 Debian VMWare VM. It takes about 100 seconds to fail with this number of iterations. It could take any time between 0.5 sec to 50 seconds for the test to succeed. |
I reproduced this on Ubuntu 15.04 vivid (4 times)
After all 1e6 iterations there was no one expected panic. |
i've also seen the failure on a m3.xlarge aws ec2 instance (4 vCPU, 15 GiB), running ami-ee793a86 (ubuntu), golang commit a76c1a5: ok cmd/nm 1.763s GOMAXPROCS=2 runtime -cpu=1,2,4ok runtime 36.327s sync -cpu=10--- FAIL: TestWaitGroupMisuse2 (1.58s) |
@dvyukov with iteration count
But I think it's not good choice to increase iterations count. Maybe better to introduce some better way to get panic with more probability. Actually, maybe iterations count must depends on CPU's count. For example, 8 CPU's – |
I don't see any way to increase panic probability. We probably need to not run it in short mode. But that will make test effectively useless... |
It sounds like the test just sucks. I think it should be deleted if it can't made to be reliable. |
But if you want to make it whitelist per builder, we'll need to finish #11346 |
This is also failing on my personal Linux server. Physical hardware, Ubuntu vivid 3.16.0-39-generic, 8 CPUs in /proc/cpuinfo:
|
Disabling this for now in https://go-review.googlesource.com/11721 |
Update #11443 Change-Id: Icb7ea291a837dcf2799a791a2ba780fd2a5e712b Reviewed-on: https://go-review.googlesource.com/11721 Reviewed-by: Brad Fitzpatrick <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]>
@dvyukov, do you care to make this test reliable before I delete it? |
The test looks good to me. It just shows that scheduler sucks and doesn't execute runnable goroutines. 1.4:
1.7:
Part of the problem is the next argument of runtime.runqput. But there may be other problems. |
/cc @aclements because the scheduler reportedly sucks. |
Just in case, this deflakes the test: |
CL https://golang.org/cl/36841 mentions this issue. |
Using tip (ca91de7) TestWaitGroupMisuse2 test fails approximately 19 times out of 20 on my PC:
The text was updated successfully, but these errors were encountered: