-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal Handling #1682
Comments
(Updated with more details on synchronous signals, improved formatting and unity details) |
(Updated with more details about signal latency, execution overshoot, and signal queue merging) |
Apart from blocking signals during critical sections with signal deferring we can handle the signal after the critical section, or at specific signal handling spots. This greatly reduces the complexity and overhead of avoiding deadlocks. |
Another complication are signals that cannot be blocked or caught, namely SIGKILL and SIGSTOP. While these are not as much of an issue when (a) targeted to the process itself, they may be an issue when/if they can be (b) directed to threads. pthread_kill doesn't allow that to happen, however tgkill/tkill might. Also This kind of ties in with our thread / process lifetime and Complications from (a) might show up around post-guest-process-termination cleanup work, such as flushing / merging AOT/Code Caching logic. Complications from (b) in addition might also show up from thread specific memory leaks, like code buffers or helper objects. Ideas for solutionsThe only way to handle SIGKILL gracefully is to do a sleight of hand in the sender, either sending another signal before, or replacing SIGKILL with some special signal. This won't work 100% correctly with guest/host interop. Another way is to have a "watcher" daemon that takes care of things as threads and processes die. This is more complex, but can work 100% correctly with guest/host interop. [1] pthread_suspend_np & friends
|
Another edge case that we have to consider is the host side of a thunked library registering signals handlers. |
(Deferred signals investigation is in #1666, will update here with a summary here once that is closed) |
Another edge case is how to handle cpu state reconstruction around thunks. Of course, reconstructing context is impossible if we deliver the signal while the thunk is running. Also, depending in the library that is thunked, it might unreasonable for the guest to assume things about the cpu state or the code that is being run. Exceptions are applications that, eg, might try to interpret the opcodes, eg around segfaults, in the signal handler. Another case could be to detect interrupted system calls. Thunks may perform syscalls, block or execute for an arbitrary amount of time, so deferring delivery to the return to guest could lead to bugs. And synchronous signals cannot be blocked. A "plausible" state to present to the guest would be, {invalid register context, RIP pointing to the guest-thunk function thunkOp}. Handling signal delivery in thunks does complicate things a bit for fully deferred signals as it forces us to store the host state along with the guest context. |
On storing the host context and returning with
|
Defining some terms here to make things more understandable.
|
Reading our code, we generate the context in the guest stack by reading RSP, without any reconstruction, will can lead us overwriting the guest stack. We back off 128 bytes (redzone) in x86, which likely makes the problem less likely to happen. We could either require the guest RSP to be synchronized, reconstructed or have a guaranteed uncertainty boundary so we skip over. |
ia32 compat in x64 kernels is in https://elixir.bootlin.com/linux/latest/source/arch/x86/ia32/ia32_signal.c |
https://elixir.bootlin.com/linux/latest/source/arch/x86/ia32/ia32_signal.c#L347 https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/sigframe.h#L23 the |
One more note on the actual flow in the linux kernel. For non usermode-linux kernels, signal delivery starts from
syscall restartingThe syscall number is saved in From there on, |
Complication: Signal mask handling vs thunksA host thunk may modify the signal mask, and the guest will not be informed about, so if it goes We could synchronize the masks upon entering guest after a thunk (either recursed, or returning), and making sure we forward the masking to the thunk. This may not be 100% correct as we need to always have a couple of signals unblocked. Also, a thunk may call sigaction behind our backs, and cause further issues there. |
Complication: Signal mask handling on guest signal returnsWhen returning from a guest signal handler we need to give the guest's signal mask back to the kernel, not the host's. A guest application can modify the signal mask stored in |
To Investigate: X86ContextBackup/ArmContextBackup may not contain or restore all of the contextAVX and other extensions in x86, SVE and other extensions in arm. There's a |
From discussion with @Sonicadvance1, This is also an issue with AVX512 contexts (see https://sourceware.org/bugzilla/show_bug.cgi?id=20305) |
Run into guest stack overflows in a sample app that used 4kb stacks, both with |
Another interesting tidbit from
|
Watch out with how signal queueing has changed how this has worked a bit. |
Yes, I'm not trusting anything (except how the kernel actually implements things) at this point |
And everything is subject to a test case |
Note: As a general design goal, it would be nice to move more (most?) of the guest signal handling to |
Bug: |
|
Note from https://man7.org/linux/man-pages/man2/sigaction.2.html
|
Note: There are several flags of |
Note: pselect6 (and possibly others) take the signal mask as a parameter, which might have complications around host -> guest signal handover. |
Issue: Our internal signals that shouldn't be blocked (SIGRT31, any others?) will pass a guest-sent, guest-blocked signal to the guest, instead of holding it. We need to at least defer in our signal handling and not deliver to the guest, though can can get tricky, and is probably leaky wrt sigwait and other system calls. At least From what I can see, there's no simple way (and maybe none at all) for us to "safely" steal or overload signals from the guest. I did a quick search for alternative solutions, and possibly ptracing outselves is a cleaner solution to all this, though that needs a deeper investigation. |
Issue: glibc uses internally signals 32, 33 (and possibly 34, only? in LinuxThreads which is no longer used). Guest glibc and host glibc handling might cause issues there. We need test cases. This can also cause issues around thunks. |
Note: Host-thunks probably should use the guest signal interface and delivery, to present a consistent view to the guest. Otherwise, our internal signal handling state would get corrupted. |
from https://stackoverflow.com/questions/12680624/what-has-happened-to-the-32-and-33-kill-signals
|
Note:
This might have some implications for thunk interworking |
GLIBC internal signals: https://github.com/bminor/glibc/blob/b92a49359f33a461db080a33940d73f47c756126/sysdeps/unix/sysv/linux/internal-signals.h#L30
|
For some timer details, https://man7.org/linux/man-pages/man2/timer_create.2.html Likely to be those two
|
Digging deeper in glibc, It seems to create a helper thread ( which does The only other users are The thread itself is created in It is re-set to be re-created post-fork (https://github.com/bminor/glibc/blob/b92a49359f33a461db080a33940d73f47c756126/sysdeps/unix/sysv/linux/timer_routines.c#L118) called via as only the forking thread makes it across the fork.
|
We correctly forward SIG32 to the guest (wrote two tests about it), and it should be safe to use posix timers with We cannot use safely While not tested yet, SIG33 should work fine for the guest, however doing SIG33 is sent through via and finally and received with While a bit ugly, we could piggyback on SIG33, as it is (almost? citation needed) never blocked, and use |
Looking in detail in our implementation,
|
We need to handle SIGSEGV, SIGBUS internally, which means if the guest masks them, we handle them incorrectly. SIGSEGV, SIGBUS, SIGILL, SIGFPE all terminate the process based on my testing if masked and generated synchronously. They obey normal queueing rules otherwise (kill, t/tgkill, sigqueue, timer?) |
Interesting read: http://davmac.org/davpage/linux/rtsignals.html
|
Something to be mindful of |
Another thing I realised today, we can have double faulting on arm64 due to atomics emulation, when a Guest SIGSEGV arises during SIGBUS handling of an unaligned atomic for host. |
Splitting from #1558 & #1677, as well as discussions with @neobrain and @Sonicadvance1.
The issues
(a) Signals can interrupt the JIT compiler or syscall, other FEX-related code, 3rd party libraries, or thunked libraries, which are not guaranteed to be signal re-entrant safe. Any code that touches non-stack memory, or uses mutexes is possibly not signal safe. We currently block signals around some code, either using
ScopedSignalMaskWith*
guards or manually (eg, the dispatcher disabling signal handling around calls to CompileCode)(b) Signals can interrupt the translated code in the middle of operations that would normally be atomic wrt signals. This may or may not be a problem, depending on how we have implemented x86. A good example is REP* operations. This can be an issue even without LSE elimination, as the recovered guest state might be "teared".
(c) Similar to above, signals can interrupt the translated code in places where we can't recover the guest architectural place, due to optimisations.
(d) Similar to above, synchronous signals might be generated which need to recover a full context and cannot be deffered.
Group 1: From x86 instructions
Group 2: Handled from the x86 frontend
int3
orint 0x3
Group 3: Generated from system calls
(e) Signal latency. Whenever we disable the signal mask, like we do around
::CompileCode
, or with the signal + mutex lock guards, signal delivery gets delayed. This is mostly a concern for long-standing/non constant time signal blocking, like around::CompileCode
(can take up to 10+ miliseconds with complex blocks). There is an argument to be made that we should compile blocks faster, though that will never 100% solve the issue. Also, signal handlers can be delayed while code for them is getting compiled, particularly during their first run.(f) With deferred signals the opposite problem also appears, that we consume the signal too fast. I'm not sure if this results to an extra signal being possibly queued while a signal is deferred. Also, the signal might appear 'dequeued' to the sender, while it is still 'pending' in FEX, which might lead to some guest instructions running (a bit of 'execution overshoot'), a condition that can be detected, but extremely unlikely to matter to the guest.
(g) While signal delivery is not guaranteed to happen at any speed, lovely features like signal queue merging, which can lead to losing information about the delivered signals, can uncover bugs / assumptions done in the guest code.
Current status
Our current "signal safety strategy" for (a) is to sprinkle signal disabling code around regions that deadlock. This is very inconsistent throughput the codebase, and there are several bugs waiting to be hit. In general, this is a compromise between "likely to lockup" and "performant code".
For (b) and (c) we currently only partially recover the guest architectural state, store it alongside the host architectural state, and hope the guest code doesn't care too much about the contents of the guest state, and that it will not modify it. We depend on returning to the interrupted host code using the stored host architectural state, in order to resume execution in the middle of any teared instructions, and eventually exit from some point with a valid guest state. This poses another limitation, that the interrupted block cannot be discarded from the code cache, so the code cache cannot be cleared. This might also have further implications around SMC and code invalidations.
Proposed solutions
For (a) I'd like us to have clear guidelines on how to handle this, as well as a mode that might be slower but offers guaranteed stability. This needs some thought, but is not too hard.
For (b) and (c) the only viable solution I can think of is a combination of deferring the signal delivery until we have a fully recoverable guest state, and storing metadata that can help us exit from the middle of a block. (c) Can be avoided by limiting store elimination from LSE and disabling DSE. We can have a tradeoff between "defer delay" vs "runtime performance".
For (d.1), we'll need special state flushing semantics and/or recovery metadata and/or exit blocks in instructions that may cause them. This requires extra caution around SRA.
For (d.2), the frontend can take care of everything.
For (d.3), we can likely merge it with the syscall handling case of (a)
For (e) we can implement some form of 'aborts' for long running cases with blocked signals, ie early exits during
::CompileCode
or even possibleconditional aborts
ie temporarily pausing the execution but only aborting if re-executed before getting resumed.For (f), we can modify the behavior syscalls where signal queueing status can be detected, and make them take actual signal delivery by FEX to the guest into account. This cannot be perfect during guest/host process interop.
For (g), we can implement 'user mode queueing', possibly on top of (g), to get closer to native guest behaviour.
(e) + (f) + (g) are edge case behaviors that is unlikely to matter in practice, and can mostly get triggered by compilation stutter completely altering the expected timing of the guest application.
Related Tickets
#518, #650, #1228, #1666
Other information
Unity depends on at least graceful handling of asynchronous SIGPWR, SIGXCPU (GC, loose context requirements) and SIGSEGV w/ null pointers (NullReferenceException generation, strict context requirements).
The text was updated successfully, but these errors were encountered: