-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testscript: "signal: killed" exec errors on MacOS 12 #200
Comments
Just as a data point, I've never managed to reproduce this on macOS 10, despite many attempts. |
I've never seen this error on macos-10 nor macos-11 on GitHub Actions despite having used testscript for years, only on its macos-12, so I'm pretty sure that this bug only happens on MacOS 12. |
Just bit me in the last PR as well: https://github.com/rogpeppe/go-internal/actions/runs/4125273301/jobs/7125604762
|
I see this as well |
I see this as well, but on Ubuntu (ubuntu-latest) with Go 1.20. I needed to remove Go 1.20 from the build matrix of one project ... I will investigate later. |
@bep perhaps I'm missing something, but I can't find a single mention of either "signal" or "killed" in those logs. The failures seem pretty normal - some commands inside your testscripts are failing when the script expects them to succeed. |
@mvdan you're right about that, but all the tests (which uses the test script package in this repo) fails with |
"unexpected command failure" is one of the relatively normal error messages you will get with testscript when the tests fail. This particular bug only happens on MacOS 12, and noone has been able to reproduce it on Linux that I am aware of, so it seems very unlikely that you're running into the same bug. |
As I said, I'm pretty sure these |
Per rogpeppe#200, macos-12 can cause sporadic `signal: killed` testscript failures, and we have started seeing them in some jobs within go-internal itself as well. Downgrade to macos-11 for now, like we've done in other projects, as we still don't know the cause. Also drop test-gotip; we haven't been keeping it up to date for a while now, so it's clearly not needed at the moment. If we want to ensure that go-internal works on new major versions of Go before they are released, using the beta or RC releases seems like a better and easier approach.
Per #200, macos-12 can cause sporadic `signal: killed` testscript failures, and we have started seeing them in some jobs within go-internal itself as well. Downgrade to macos-11 for now, like we've done in other projects, as we still don't know the cause. Also drop test-gotip; we haven't been keeping it up to date for a while now, so it's clearly not needed at the moment. If we want to ensure that go-internal works on new major versions of Go before they are released, using the beta or RC releases seems like a better and easier approach.
@rogpeppe has been seeing the same failures now on Perhaps whatever changed in macos-12 to trigger this bug was backported to macos-11 now. |
@ldemailly, that sounds like a reasonable idea, but if that's the case, how come it doesn't happen every time, and how come it doesn't happen when you run |
By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes rogpeppe#200
This has become a real issue for me, so I decided to take a look at it, and I found that there's a correlation between the size/amount of commands added to See #219 |
By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes rogpeppe#200
By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes rogpeppe#200
On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` On CI builds the workaround has been to downgrade to a builder with MacOS <= 11 (e.g. macos-11 on GitHub). In development on a MacBook, this is not an option. This commit works around what seem to be a upstream bug in `os.Link` by adding a small sleep before `TestingM.Run()` to allow the write of the test command symlinks to be ready. See rogpeppe#200
On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` On CI builds the workaround has been to downgrade to a builder with MacOS <= 11 (e.g. macos-11 on GitHub). In development on a MacBook, this is not an option. This commit works around what seem to be a upstream bug in `os.Link` by adding a small sleep before `TestingM.Run()` to allow the write of the test command symlinks to be ready. See rogpeppe#200
On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` On CI builds the workaround has been to downgrade to a builder with MacOS <= 11 (e.g. macos-11 on GitHub). In development on a MacBook, this is not an option. This commit works around what seem to be a upstream bug in `os.Link` by adding a small sleep before `TestingM.Run()` to allow the write of the test command symlinks to be ready. See rogpeppe#200
On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` On CI builds the workaround has been to downgrade to a builder with MacOS <= 11 (e.g. macos-11 on GitHub). In development on a MacBook, this is not an option. This commit works around what seem to be a upstream bug in `os.Link` by adding a small sleep before `TestingM.Run()` to allow the write of the test command hard links to be ready. See rogpeppe#200
By doing a full copy and not a hard link of the binaries. This is the fall back used already for Windows. This is tested OK to remove unexpected test failures with error output similar to: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` See rogpeppe#200
It's Where do you see |
So, this works: if runtime.GOOS != "windows" {
if runtime.GOOS == "darwin" {
if err := unix.Clonefile(from, to, 0); err == nil {
return nil
}
} else {
if err := os.Link(from, to); err == nil {
return nil
}
}
} But it seem to create a copy of the binary1. Or, according to the docs, it shouldn't... Footnotes
|
Full copy on Windows, Clonefile on Mac, and hard links on Linux sound good to me. Beware that we likely need build tags now, since
|
I can prepare a PR with something ala the above in a few hours. |
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes #200
@bep - thanks very much for digging in here to get this fixed. |
it's fantastic to have a solution, I'm just a bit confused as to why symlink isn't even better/simpler? |
@ldemailly see the previous comments, particularly #200 (comment). |
Now that it seems we found a fix to rogpeppe#200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
Now that it seems we found a fix to #200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
oic "[tools like go] will use the symlink target" that seems like a bug though if they cause the args0 to change as it's fairly common to rely on that (also that mr did pass all tests so presumably it's actually working ?) what's the size of the directory when using clonefile? |
oic, neat, from the man page:
except for EXDEV possible issue (are we sure we stay on same FS?) |
There is no guarantee that the source and destination are the same filesystem. Which is why there is a fallback with regular file copying. However, they will often be the same filesystem, as |
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
Now that it seems we found a fix to rogpeppe#200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
Thanks for the explanation (that there is a fallback already). So a symlink, in theory, if not for that (possibly outdated?) Either way I think the current solution is great and matches the previous hardlink behavior (fs wise), so it's awesome and indeed I don't see "killed" anymore on my mac 🎉 |
Now that it seems we found a fix to rogpeppe#200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
Now that it seems we found a fix to rogpeppe#200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes rogpeppe#200
Now that it seems we found a fix to rogpeppe#200, there is no reason to stick to macos-11, which will likely be deprecated soon. Update actions/setup-go to its latest version as well. The new version uses caching by default, which we do not need. While here, tidy up the cloneFile docs a bit.
I've seen this in a number of projects of mine, like:
@rvagg mentions the same crash in ipld/go-car#364, and in the past, others like @mr-joshcrane have mentioned the same error on Slack.
This must be something going wrong with either testscript or Go, because for example, that
TestScript/flags
test from above was just runningexec shfmt -h
, showing the help output from a Go program. You can see that the testscript file is rather boring, so it's not doing anything particularly worrying.Personally, I've worked around this by downgrading from
macos-latest
on GitHub Actions (which switched tomacos-12
late last year) tomacos-11
, which seems to make the failures go away entirely. But of course that's not a complete fix.I first hoped that this would be fixed in Go 1.20 with https://go-review.googlesource.com/c/go/+/460476, and that may still be true, given that there are four distinct os/exec bugs for Mac there. But it's just a good guess, I haven't verified this yet - nor do I have a Mac machine to test with. Help would be appreciated.
The only other recent mentions of "signal: killed" upstream for Mac are golang/go#57418 and golang/go#57239, and they both seem to point to processes being OOM-killed by the system. This could be the case for us as well, perhaps either due to the OS version upgrade changing the OOM behavior, or perhaps because the
macos-12
GitHub machines have less available memory. But I'd also find it hard to believe, given that testscript doesn't use a particularly high amount of memory.Filing this issue to track investigation and progress.
The text was updated successfully, but these errors were encountered: