-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/exec: TestExtraFiles is flaky on 32-bit linux builders #25628
Comments
That is odd. The errors show relatively large descriptor numbers. It implies that something in the system is opening a large descriptor before running make.bash. |
There is another recurring failure mode in An example failure (https://build.golang.org/log/cef4b584d2ccb30621e2965de0517f6077aa2f92):
|
Yes, they could be related. The os/exec package tries to guess which descriptor is being used for testlog.txt; if it guesses wrong, because of an unexpectedly open descriptor, that could cause the broken pipe error. |
Since this doesn't seem to be happening anywhere else, this is most likely a problem specific to the linux-386-sid builder. Something on that system is causing the os/exec tests to run with an unexpectedly open file descriptor. @bradfitz Is there anything unusual about the linux-386-sid builder? |
The 386-sid builder is an amd64 Docker container (like all our linux-386-*) builders with the "debian:sid" base layer (https://github.com/golang/build/blob/master/env/linux-x86-sid/Dockerfile). We occasionally re-build it to reflect the ever-changing Debian sid. Last updates:
|
This was reported to also happen on a linux_386 Virtualbox machine with an updated Ubuntu Mate 18.04 LTS and boostrap Go version 1.4.3 (Issue #26261). |
Ran into this issue with 1.11rc2 on Fedora 27 i386 as well:
|
This happens on NixOS as well. There it seems to be happening reliably as I can reproduce it on my laptop again and again. The NixOS build server also reproduces the error for every build. This happens on go 1.11. For reference: https://hydra.nixos.org/build/80999166/nixlog/12 |
This seems to only happen on i686 on go 1.11. It also happens on Debian. For more information: golang/go#25628
This seems to only happen on i686 on go 1.11. It also happens on Debian. For more information: golang/go#25628 (cherry picked from commit 0bf6b44)
@bradfitz The test will try to run lsof when it finds a leaked FD. That isn't happening on linux-386-sid. Could we get the lsof package installed on that system? |
@ianlancetaylor, I filed #29347 for somebody to do that. |
Still flaky, but no |
I am a bit confused about this code:
This is in a If it is required, then it looks like this code is racing against netpollinit, in which case (1) IsPollDescriptor should be using atomics, if only to prevent the compiler from optimizing away the repeated loads, and (2) it needs to be double-checked after we detect an issue. The epfd might be -1 at the start of the loop, but its creation is racing with our os.NewFile, and epfd wins: epfd gets the fd we want, and we get fd+1. In that case, we should |
The network is now started whenever you create a timer, which could conceivably happen in an For this issue, I don't think it matters. The point of that code was to report whether any descriptors are open when the test starts. The logs show that they aren't. If there were somehow a race on |
I sent https://golang.org/cl/225278 for the |
Change https://golang.org/cl/225278 mentions this issue: |
For #25628 Change-Id: If1dce7ba9310e1418e67b9954c989471b775a28e Reviewed-on: https://go-review.googlesource.com/c/go/+/225278 Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
We have a hit with the new readlink logging, but it doesn't seem particularly illuminating: 2020-03-26T16:12:18-d1ecfcc/linux-386-clang
|
That's just weird. The code looks fine to me. Seems like this could only open if some other goroutine briefly opens and closes a descriptor. But I can't think of what would do that. Or why it would be specific to these builders. I sent CL 225799 to run the test under strace. Perhaps that will show us something. |
Change https://golang.org/cl/225799 mentions this issue: |
On linux-386 builders run the TestExtraFiles subprocess under strace, in hopes of finding out where the unexpected descriptor is coming from. For #25628 Change-Id: I9a62d6a5192a076525a616ccc71de74bbe7ebd58 Reviewed-on: https://go-review.googlesource.com/c/go/+/225799 Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]>
The new debugging code seems to be deadlocking or otherwise timing out instead of generating output. 2020-04-04T01:01:04-fff7509/linux-386-sid |
Change https://golang.org/cl/227517 mentions this issue: |
Try to get some output even if the subprocess hangs. For #25628 Change-Id: I4cc0a8f2c52b03a322b8fd0a620cba37b06ff10a Reviewed-on: https://go-review.googlesource.com/c/go/+/227517 Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]>
Looks like we need a longer grace period on the deadline https://build.golang.org/log/8cdf7d284333d1fd9307916177b4ea66f1d898f0):
I'll send a CL. |
Change https://golang.org/cl/227765 mentions this issue: |
…line Updates #25628 Change-Id: I938a7646521b34779a3a57833e7ce9d508b58faf Reviewed-on: https://go-review.googlesource.com/c/go/+/227765 Run-TryBot: Bryan C. Mills <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
It worked! We have some |
Excellent. In both cases the extraneous file descriptor was due to opening But I do not know why this only happens on 386 systems. I may be missing a step. |
I filed https://sourceware.org/bugzilla/show_bug.cgi?id=25817 for glibc to stop doing this. |
Change https://golang.org/cl/228099 mentions this issue: |
TestExtraFiles seems to be flaky on GNU/Linux systems when using cgo because creating a new thread will call malloc which can create a new arena which can open a file to see how many processors there are. Try to avoid the flake by creating several new threads at process startup time. For #25628 Change-Id: Ie781acdbba475d993c39782fe172cf7f29a05b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/228099 Reviewed-by: Bryan C. Mills <[email protected]>
Hmm, the extra threads seem to have made it flaky on 64-bit builders: 2020-04-16T03:19:50-ab3bd2c/linux-arm64-packet That's... unexpected. Maybe we should skip this test on cgo-enabled configurations? |
I think I must have missed something. I think you're right that we should run this test without cgo. I think that just skipping if cgo is enabled would effectively mean that we never run the test. But we should be able to make this work. |
Change https://golang.org/cl/228639 mentions this issue: |
Hi, does TestExtraFiles now require 'lsof' to be installed in the test environment? Because I still see this test fail on our linux/amd64 and linux/arm64 machines. Repo: 4eaf855
Arch: linux/arm64
|
Fortunately it looks like your failure appeared before https://golang.org/cl/228639 was committed, as I think that CL should fix the test. |
I've seen several of these:
on the
linux-386-sid
builder in the past weeks. Here's one:https://build.golang.org/log/b95e0220b4130af6c1bcfed59ec934568717d0ef
The text was updated successfully, but these errors were encountered: