-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: corrupt binary export data seen after signal preemption CL #35326
Comments
I get many errors like it when I attempt to test my Go installation, too - via |
I built 50 commits behind the current master (cc4b824) and |
I'm seeing this error on
Which parts of the tests and which packages it fails on varies, but it looks like it's always vet failing. Re-running
|
The trailing part of that looks like valid iexport data, except the first 32 bytes have been corrupted. This seems like memory corruption, either from miscompilation or a GC issue. |
In all three of the examples, the first 32 bytes of the buffer were corrupted. |
BTW, the thing that makes me suspect memory corruption within cmd/compile is that "unknown bexport format" means that the Also, cmd/compile explicitly flushes its write buffer after writing Finally, this data isn't specially aligned at all within the file, so it's not like it could be generic file-level corruption. Not to mention the index out of range error within the runtime during compilation seems bad. |
@mvdan @siebenmann Can you two describe your test environments in more detail so maybe we can try to identify more commonalities? E.g., it looks like you're both using linux/amd64. @siebenmann mentioned Fedora 30; @mvdan what Linux distro and version are you using? Also, how many cores? |
The machine I'm seeing this on is an i7-8700K (not overclocked) with 32 GB of RAM and HyperThreading still enabled, so a 6/12 cores/threads machine. At the moment it has Fedora's kernel |
I am on a Thinkpad with a i5-8350U, HT left on (4 real cores, 8 total), and 16GiB of RAM. I run Arch Linux, with its current 5.3.8 Linux kernel. Aside from that my Go setup is a pretty standard build from source, using 1.13.4 to bootstrap. I don't have any fancy go env vars other than |
I've now reproduced this on another Fedora 30 linux/amd64 machine, also with kernel I've git bisected this and on both of my Fedora 30 machines, the commit where the failures start is 62e53b7, 'runtime: use signals to preempt Gs for suspendG', which is targeted at #10958 and #24543. I think this makes this issue CC @aclements. |
Linux kernel version-specific signal handling behavior sounds fuuuuun. |
At least it's good that enough people test tip, so that this was caught at the start of the freeze :) I'm happy to help debug this further, if the runtime maintainers can't reproduce this themselves. |
@mvdan Oh yeah, definitely. I'm glad you and @siebenmann both reported the issue and have been able to reproduce it reliably enough to track down a culprit CL and identify some likely commonality between test setups. (I'll check the Linux kernel version I was using later, and also give it a shot on my personal laptop running Fedora 31.) I'm just a little nervous because this is already the second issue discovered in the runtime preemption feature that landed last week. |
I see similar spurious problems, e.g. tonights tip c2edcf4:
I'm similarly running Fedora with a 5.3.10 kernel and an intel i7-8550U (4 core with hyperthreading disabled). I haven't tried bisecting this yet, but can |
Thanks for all of the reports! Could someone who can reproduce this try setting The two "index out of range" failures are interesting. In both cases they're in |
Is it useful to do this if I can only reliably reproduce the vet-based failures, not the crashes? And is there something smaller to reproduce problems than running all.bash, which produces several MB of output with these options that I don't know how to usefully reduce? (I can make the full output available if it would be useful.) |
Yes, it's useful for the vet-based failures, too. I only need the output from the failed process (basically, after the last "# ..."), which should be substantially less. |
I just ran into this via
|
Sorry, sent too early there. System details as follows:
|
@aclements I'm attaching the two logs you wanted. It's hard to reproduce it when running
conservative.txt If these files don't contain enough information, let me know and I can try again. |
go version devel +43ec1b12f5 Wed Nov 13 01:15:54 2019 +0000 linux/amd64 I have been seeing very repeatable failures building Builds/tests are successful under Here is my debug output in case it's useful:
all-debugScanConservative.txt |
I just saw this too, while running some benchmarks:
|
@aclements assuming for the moment that this is caused by async pre-emption, is there a way to disable stackguard-based pre-emption? That might make this reproduce more easily. It seems like it should suffice to comment out this line from gp.stackguard0 = stackPreempt but I don't know for sure, or know what else that might break. :) |
Doesn't Maybe worth using fork+kill instead of threads to better recreate our current minimized test case? |
Well, maybe, but we're seeing the problem with Go's runtime package signal preemption which uses thread-directed signals. |
I set out to bisect the kernel and discovered that it also depends on the GCC version:
So this may happen in earlier kernel versions but be masked by their incompatibility with GCC 9. My guess is that GCC 9 started using AVX registers for something that GCC 8 didn't? I'm going to see if a later v5.2.x kernel builds with GCC 9. |
This is not the first time that the kernel, gcc, and signals have gotten tangled up in a mess: https://news.ycombinator.com/item?id=13313563 |
I'm finally ready to actually bisect. Setting CONFIG_RETPOLINE to n works around the GCC 9 incompatibility. With this configuration, I've been able to reproduce the failure with v5.2, but NOT with v5.1. I'm now bisecting between the two. For reference, here's the script I'm using to configure, build, and load the kernel. This is based on the ubuntu-1910-eoan-v20191114 GCE image (grown to 50 GiB to fit the kernel build, sigh). #!/bin/sh
set -e
# Start with system's 5.3 config
cp /boot/config-5.3.0-1008-gcp .config
# Work around GCC 9 flag incompatibility
sed -r -i 's/^(CONFIG_RETPOLINE)=y/\1=n/' .config
# Remove file systems that are modules
sed -r -i 's/^(CONFIG_.*_FS)=m/\1=n/' .config
# Disable big hardware subsystems we don't need
sed -r -i 's/^(CONFIG_(BT|WLAN|WIRELESS|SOUND|INFINIBAND|USB_SUPPORT|MEDIA_SUPPORT|FIREWIRE|ISDN|IIO|INPUT_(JOYSTICK|TOUCHSCREEN|TABLET|MISC)))=./\1=n/' .config
# Turn off ETHERNET (we use VIRTIO_NET) and things that force it on
sed -r -i 's/^(CONFIG_(ETHERNET|SCSI_CXGB3_ISCSI|SCSI_BNX2_ISCSI|SCSI_BNX2X_FCOE))=./\1=n/' .config
# Clean up the config
make olddefconfig
# Build kernel and install modules
make -j12 && sudo make modules_install
# Load kexec image. Command line from /boot/grub/grub.cfg
sudo kexec -l arch/x86/boot/bzImage --append="root=PARTUUID=2954cd84-339a-4c4a-b80d-c039bd33478c ro scsi_mod.use_blk_mq=Y console=ttyS0" --initrd=/boot/initrd.img-5.3.0-1008-gcp
# Run sudo kexec -e to reboot into loaded kernel (does NOT do a clean shut down) |
I've bisected the issue to kernel commit torvalds/linux@d9c9ce3. I haven't dug into this commit yet, but it appears to be a fix for bug that was introduced earlier (or possibly a redo of an earlier commit). For reference, torvalds/linux@a352a3b in that same commit series introduced another bug first that produced similar failure types, but at a far higher rate (I couldn't even run cmd/go). However, this was fixed later in the series, somewhere between torvalds/linux@e0d3602 and torvalds/linux@1d731e7. My first bisect log is below. This led to the fail-fast failure. # bad: [0ecfebd2b52404ae0c54a878c872bb93363ada36] Linux 5.2 # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 git bisect start 'v5.2' 'v5.1' # bad: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm git bisect bad a2d635decbfa9c1e4ae15cb05b68b2559f7f827c # bad: [82efe439599439a5e1e225ce5740e6cfb777a7dd] Merge tag 'devicetree-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux git bisect bad 82efe439599439a5e1e225ce5740e6cfb777a7dd # bad: [78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e] Merge branch 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect bad 78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e # good: [275b103a26e218b3d739e5ab15be6b40303a1428] Merge tag 'edac_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp git bisect good 275b103a26e218b3d739e5ab15be6b40303a1428 # good: [962d5ecca101e65175a8cdb1b91da8e1b8434d96] Merge tag 'regmap-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap git bisect good 962d5ecca101e65175a8cdb1b91da8e1b8434d96 # good: [9bff9dfc513bd5de72cb59f4bffb72cf0a5aa526] Merge tag 'spi-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi git bisect good 9bff9dfc513bd5de72cb59f4bffb72cf0a5aa526 # good: [573de2a6e844cb230c4483833f29b8344a6a17cc] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching git bisect good 573de2a6e844cb230c4483833f29b8344a6a17cc # good: [b62de322579702f07175fc275ecb2c3afae6cd96] hugetlb: make use of ->free_inode() git bisect good b62de322579702f07175fc275ecb2c3afae6cd96 # bad: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails git bisect bad d9c9ce34ed5c892323cbf5b4f9a4c498e036316a # good: [577ff465f5a6a5a0866d75a033844810baca20a0] x86/fpu: Only write PKRU if it is different from current git bisect good 577ff465f5a6a5a0866d75a033844810baca20a0 # # Below here the failure mode changed to fail almost instantly in cmd/go, # though the types of failures looked similar. # # bad: [e0d3602f933367881bddfff310a744e6e61c284c] x86/fpu: Inline copy_user_to_fpregs_zeroing() git bisect bad e0d3602f933367881bddfff310a744e6e61c284c # good: [383c252545edcc708128e2028a2318b05c45ede4] x86/entry: Add TIF_NEED_FPU_LOAD git bisect good 383c252545edcc708128e2028a2318b05c45ede4 # bad: [a352a3b7b7920212ee4c45a41500c66826318e92] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD git bisect bad a352a3b7b7920212ee4c45a41500c66826318e92 # good: [69277c98f5eef0d9839699b7825923c3985f665f] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() git bisect good 69277c98f5eef0d9839699b7825923c3985f665f # first bad commit: [a352a3b7b7920212ee4c45a41500c66826318e92] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD After this I backed up and looked for the original failure. This bisect also happened to reveal that the fast failure had been fixed before the original failure was introduced. # bad: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails # good: [e0d3602f933367881bddfff310a744e6e61c284c] x86/fpu: Inline copy_user_to_fpregs_zeroing() git bisect start 'd9c9ce34ed5c892323cbf5b4f9a4c498e036316a' 'e0d3602f933367881bddfff310a744e6e61c284c' # good: [1d731e731c4cd7cbd3b1aa295f0932e7610da82f] x86/fpu: Add a fastpath to __fpu__restore_sig() git bisect good 1d731e731c4cd7cbd3b1aa295f0932e7610da82f # good: [06b251dff78704c7d122bd109384d970a7dbe94d] x86/fpu: Restore regs in copy_fpstate_to_sigframe() in order to use the fastpath git bisect good 06b251dff78704c7d122bd109384d970a7dbe94d # good: [a5eff7259790d5314eff10563d6e59d358cce482] x86/pkeys: Add PKRU value to init_fpstate git bisect good a5eff7259790d5314eff10563d6e59d358cce482 # first bad commit: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails |
Progress update. Eagerly faulting in the gsignal stack immediately after allocating it (by just writing zeros to the whole stack) successfully mitigates the issue. This indicates that it's related to the failure path of torvalds/linux@d9c9ce3, which deals with the stack not being mapped for the xsave area. This is a workaround, but not a great one, since other things could cause those stack pages to be faulted out (though this isn't very likely). I tried modifying the C reproducer to mmap the sigaltstack and Things that haven't changed the behavior:
|
We could use Would be really nice to have a C reproducer, though. If you write |
What about just mmap'ing new signal stack VMA each iteration of the main loop? |
I just got the C reproducer working. I'm working on tidying it up and I'll post it. Both madvising and mmaping the sigaltstack work to clear the pages (and that is necessary). The other missing ingredient was just running lots of the processes simultaneously. |
Here's the C reproducer. This fails almost instantly on 5.3.0-1008-gcp, and torvalds/linux@d9c9ce3 (5.1.0-rc3+). It does not fail at the parent of that commit (torvalds/linux@a5eff72). I'll work on filing this upstream with Linux. // Build with: gcc -pthread test.c
//
// This demonstrates an issue where AVX state becomes corrupted when a
// signal is delivered where the signal stack pages aren't faulted in.
//
// There appear to be three necessary ingredients, which are marked
// with "!!!" below:
//
// 1. A thread doing AVX operations using YMM registers.
//
// 2. A signal where the kernel must fault in stack pages to write the
// signal context.
//
// 3. Context switches. Having a single task isn't sufficient.
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <sys/wait.h>
static int sigs;
static stack_t altstack;
static pthread_t tid;
static void die(const char* msg, int err) {
if (err != 0) {
fprintf(stderr, "%s: %s\n", msg, strerror(err));
} else {
fprintf(stderr, "%s\n", msg);
}
exit(EXIT_FAILURE);
}
void handler(int sig __attribute__((unused)),
siginfo_t* info __attribute__((unused)),
void* context __attribute__((unused))) {
sigs++;
}
void* sender(void *arg) {
int err;
for (;;) {
usleep(100);
err = pthread_kill(tid, SIGWINCH);
if (err != 0)
die("pthread_kill", err);
}
return NULL;
}
void dump(const char *label, unsigned char *data) {
printf("%s =", label);
for (int i = 0; i < 32; i++)
printf(" %02x", data[i]);
printf("\n");
}
void doAVX(void) {
unsigned char input[32];
unsigned char output[32];
// Set input to a known pattern.
for (int i = 0; i < sizeof input; i++)
input[i] = i;
// Mix our PID in so we detect cross-process leakage, though this
// doesn't appear to be what's happening.
pid_t pid = getpid();
memcpy(input, &pid, sizeof pid);
while (1) {
for (int i = 0; i < 1000; i++) {
// !!! Do some computation we can check using YMM registers.
asm volatile(
"vmovdqu %1, %%ymm0;"
"vmovdqa %%ymm0, %%ymm1;"
"vmovdqa %%ymm1, %%ymm2;"
"vmovdqa %%ymm2, %%ymm3;"
"vmovdqu %%ymm3, %0;"
: "=m" (output)
: "m" (input)
: "memory", "ymm0", "ymm1", "ymm2", "ymm3");
// Check that input == output.
if (memcmp(input, output, sizeof input) != 0) {
dump("input ", input);
dump("output", output);
die("mismatch", 0);
}
}
// !!! Release the pages of the signal stack. This is necessary
// because the error happens when copy_fpstate_to_sigframe enters
// the failure path that handles faulting in the stack pages.
// (mmap with MMAP_FIXED also works.)
//
// (We do this here to ensure it doesn't race with the signal
// itself.)
if (madvise(altstack.ss_sp, altstack.ss_size, MADV_DONTNEED) != 0)
die("madvise", errno);
}
}
void doTest() {
// Create an alternate signal stack so we can release its pages.
void *altSigstack = mmap(NULL, SIGSTKSZ, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (altSigstack == MAP_FAILED)
die("mmap failed", errno);
altstack.ss_sp = altSigstack;
altstack.ss_size = SIGSTKSZ;
if (sigaltstack(&altstack, NULL) < 0)
die("sigaltstack", errno);
// Install SIGWINCH handler.
struct sigaction sa = {
.sa_sigaction = handler,
.sa_flags = SA_ONSTACK | SA_RESTART,
};
sigfillset(&sa.sa_mask);
if (sigaction(SIGWINCH, &sa, NULL) < 0)
die("sigaction", errno);
// Start thread to send SIGWINCH.
int err;
pthread_t ctid;
tid = pthread_self();
if ((err = pthread_create(&ctid, NULL, sender, NULL)) != 0)
die("pthread_create sender", err);
// Run test.
doAVX();
}
void *exiter(void *arg) {
sleep(60);
exit(0);
}
int main() {
int err;
pthread_t ctid;
// !!! We need several processes to cause context switches. Threads
// probably also work. I don't know if the other tasks also need to
// be doing AVX operations, but here we do.
int nproc = sysconf(_SC_NPROCESSORS_ONLN);
for (int i = 0; i < 2 * nproc; i++) {
pid_t child = fork();
if (child < 0) {
die("fork failed", errno);
} else if (child == 0) {
// Exit if the parent dies.
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
doTest();
}
}
// Exit after a while.
if ((err = pthread_create(&ctid, NULL, exiter, NULL)) != 0)
die("pthread_create exiter", err);
// Wait for a failure.
int status;
if (wait(&status) < 0)
die("wait", errno);
if (status == 0)
die("child unexpectedly exited with success", 0);
fprintf(stderr, "child process failed\n");
exit(1);
} |
For your nerd-sniping amusement (and not mine), the bug is somewhere in the differences in these two disassembled files. Link Linux with the gcc8 one, all is well, link with the other, it goes bad. signal.8.dis.txt There's a ridiculous amount of inlining going on, not sure it helps to have the source annotation in the disassembly, but here it is: signal.8.il.dis.txt The source file in question, I verified this by building two kernels, one entirely compiled by gcc8 except for Enjoy! |
For completeness, I have also reproduced at torvalds/linux@b81ff10, which fixed a bug in the kernel code in question (but apparently not this bug), and at v5.4. |
I've filed the upstream kernel bug here: https://bugzilla.kernel.org/show_bug.cgi?id=205663 |
@dr2chase One difference I see in the disassembly is that GCC 9 is caching the address of the thread-local variable checked by |
FYI, I reproduced, at Linux tip (5.4 + a little), the dependence of the bug on how
The exact gcc versions are
and
|
@ianlancetaylor @dr2chase seems like at this point further diagnosis of the issue should occur over at bugzilla, so that kernel developers will have the benefit of it. |
Austin said he'd copy my remarks over there (I don't think I've got an account) and this is pretty much the end of it for me. I'm pretty sure we gave them a nice start. Ian's observation has a good smell to it. |
Great work @aclements et al identifying the root cause here. Should we start discussing how Go should respond to this issue? Is there an easy/reliable way to detect affected kernels? E.g., is kernel version number a reliable test? @ianlancetaylor suggested mlock'ing the msignal stack into memory; we probably only need to lock the top page or two (not sure exactly how big sigcontext is w/ AVX-512 register files, but 32 512-bit registers by itself is already 2KiB). As a fallback (e.g., when mlock fails due to ulimit), we could just explicitly fault the pages at allocation time, and maybe again before sending an inter-thread signal? |
There is a patch from Sebastian Andrzej Siewior now and he asks the reporters to test and verify. |
Change https://golang.org/cl/209899 mentions this issue: |
Change https://golang.org/cl/246200 mentions this issue: |
Go 1.14 included a (rather awful) workaround for a Linux kernel bug that corrupted vector registers on x86 CPUs during signal delivery (https://bugzilla.kernel.org/show_bug.cgi?id=205663). This bug was introduced in Linux 5.2 and fixed in 5.3.15, 5.4.2 and all 5.5 and later kernels. The fix was also back-ported by major distros. This workaround was necessary, but had unfortunate downsides, including causing Go programs to exceed the mlock ulimit in many configurations (#37436). We're reasonably confident that by the Go 1.16 release, the number of systems running affected kernels will be vanishingly small. Hence, this CL removes this workaround. This effectively reverts CLs 209597 (version parser), 209899 (mlock top of signal stack), 210299 (better failure message), 223121 (soft mlock failure handling), and 244059 (special-case patched Ubuntu kernels). The one thing we keep is the osArchInit function. It's empty everywhere now, but is a reasonable hook to have. Updates #35326, #35777 (the original register corruption bugs). Updates #40184 (request to revert in 1.15). Fixes #35979. Change-Id: Ie213270837095576f1f3ef46bf3de187dc486c50 Reviewed-on: https://go-review.googlesource.com/c/go/+/246200 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
After building Go from master, I sometimes see errors like:
Here's another crash from earlier today, with a slightly modified (and freshly built) Go master - you can see the error mentions a different std package:
@heschik correctly points out that this could be a bad version of vet in play, since the bexport format has been replaced with the iexport. However, I already nuked my $GOBIN, and
go test -x
seems to be running/home/mvdan/tip/pkg/tool/linux_amd64/vet
, which is freshly built.Usually I'd assume this is an issue with my setup, but I can't find anything wrong with it, and I've only started seeing these errors today.
/cc @mdempsky @griesemer @alandonovan
The text was updated successfully, but these errors were encountered: