-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safe_read magic unstable during detach from threaded app. #3535
Labels
Comments
Still verifying if the SIGSEGV was really caused by a safe_read_tls_magic in the signal handler, or from somewhere else.. |
hgreving2304
pushed a commit
that referenced
this issue
Apr 12, 2019
Removes two races that happen if asynchronous signals are delivered during detach, and the app's signal handler is restored before the safe_read_tls_magic's SIGSEGV is delivered. There are still possible detach related races left, if asynchronous process-directed signals are delivered after detach is done, or the app has already started a re-attach. Issue: #3535
This was referenced Apr 12, 2019
hgreving2304
pushed a commit
that referenced
this issue
Apr 15, 2019
…h. (#3539) Removes two races that happen if asynchronous signals are delivered during detach, and the app's signal handler is restored before the safe_read_tls_magic's SIGSEGV is delivered. There are still possible detach related races left, if asynchronous process-directed signals are delivered after detach is done, or the app has already started a re-attach. Tested with local reproducer. Issue: #3535
xref #26 |
hgreving2304
pushed a commit
that referenced
this issue
Apr 22, 2019
…h. (#3539) Removes two races that happen if asynchronous signals are delivered during detach, and the app's signal handler is restored before the safe_read_tls_magic's SIGSEGV is delivered. There are still possible detach related races left, if asynchronous process-directed signals are delivered after detach is done, or the app has already started a re-attach. Tested with local reproducer. Issue: #3535
I am hitting this while adding signal mask testing to the new api.detach_signal test. The mask testing distinguishes the app handler from DR's handler b/c DR's blocks more signals. |
derekbruening
added a commit
that referenced
this issue
Dec 29, 2020
Eliminates a race on detach where the detaching thread removes DR's SIGSEGV handler while a now-native detached thread is in the middle of having a signal delivered (natively) and invokes a TLS magic field safe read, whose SIGSEGV then goes to the application. The detaching thread is the one doing all the real cleanup, so we simply avoid any safe reads or TLS for detaching threads by recording the detacher's ID when we start the detach process. This var is not cleared until re-init, so we have no race with the end of detach. Tested on api.detach_signal with the forthcoming signal mask checks, which trigger when the handler is invoked for a DR signal instead of an app-generated signal. Without this fix, the test fails easily: about 1 in 5 runs in debug build. With this fix, it succeeds 200x in a loop. I still see one type of crash in debug build, a rare race where d_r_stats is set to NULL in between the check and use of a LOG(), but that is limited to debug and is beyond the scope of this fix and is much lower priority: I filed it as #4641. Fixes #3535
derekbruening
added a commit
that referenced
this issue
Dec 30, 2020
Eliminates a race on detach where the detaching thread removes DR's SIGSEGV handler while a now-native detached thread is in the middle of having a signal delivered (natively) and invokes a TLS magic field safe read, whose SIGSEGV then goes to the application. The detaching thread is the one doing all the real cleanup, so we simply avoid any safe reads or TLS for detaching threads by recording the detacher's ID when we start the detach process. This var is not cleared until re-init, so we have no race with the end of detach. Tested on api.detach_signal with the forthcoming signal mask checks, which trigger when the handler is invoked for a DR signal instead of an app-generated signal. Without this fix, the test fails easily: about 1 in 5 runs in debug build. With this fix, it succeeds 200x in a loop. I still see one type of crash in debug build, a rare race where d_r_stats is set to NULL in between the check and use of a LOG(), but that is limited to debug and is beyond the scope of this fix and is much lower priority: I filed it as #4641. Fixes #3535
derekbruening
added a commit
that referenced
this issue
Dec 30, 2020
Adds proper signal mask setting for native signals, which requires adding handler-masked signals to the DR frame's mask instead of setting the mask in the kernel. Adds tests to api.detach_signal. These mask tests also detect when a DR-caused crash from a DR bug during detach shows up in the app's handler post-detach due to detach races, including from #3535. Issue: #1921, #3535
derekbruening
added a commit
that referenced
this issue
Dec 30, 2020
Adds proper signal mask setting for native signals, which requires adding handler-masked signals to the DR frame's mask instead of setting the mask in the kernel. Adds tests to api.detach_signal. These mask tests also detect when a DR-caused crash from a DR bug during detach shows up in the app's handler post-detach due to detach races, including from #3535. Issue: #1921, #3535
derekbruening
added a commit
that referenced
this issue
Jan 7, 2021
Puts in place 6 fixes for handling signals during DR initialization, typically in a start/stop setup where other threads are alive. 1) Copy the app handler at init time for delivering native signals during init. 2) Reorder signal_arch_init(), which obtains the signal frame size, to run before DR installs its handler. 3) Obtains the app handler before installing DR's handler, eliminating a (narrow) race window. 4) Until DR starts executing the app, continues delivering native signals and using the globally recorded app handler, to match how DR init works. 5) Set detacher_tid between init and setup to avoid races like in DR's handler at the end of init that were under #3535 6) Handle a race where the init thread has set the global try_except, causing master_signal_handler_C to think an app thread's signal is DR's. We add a global_try_tid and check the thread id to solve this. Augments api.detach_signal with signals sent during init. Also tested on a large proprietary application. Issue: #1921, #3535
derekbruening
added a commit
that referenced
this issue
Jan 9, 2021
Puts in place 6 fixes for handling signals during DR initialization, typically in a start/stop setup where other threads are alive. 1) Copy the app handler at init time for delivering native signals during init. 2) Reorder signal_arch_init(), which obtains the signal frame size, to run before DR installs its handler. 3) Obtains the app handler before installing DR's handler, eliminating a (narrow) race window. 4) Until DR starts executing the app, continues delivering native signals and using the globally recorded app handler, to match how DR init works. 5) Set detacher_tid between init and setup to avoid races like in DR's handler at the end of init that were under #3535 6) Handle a race where the init thread has set the global try_except, causing master_signal_handler_C to think an app thread's signal is DR's. We add a global_try_tid and check the thread id to solve this. Augments api.detach_signal with signals sent during init. This causes the test to hit a DR bug in the vsyscall hook (#4664) on 32-bit Linux. For now we disable the test for 32-bit. Also tested on a large proprietary application. Issue: #1921, #3535, #4664
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Following happens with a small reproducer that we produced off Google code.
<=>
The text was updated successfully, but these errors were encountered: