Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5437 glibc2.34: Workaround for SIGFPE #5695

Merged
merged 7 commits into from
Oct 22, 2022
Merged

Conversation

derekbruening
Copy link
Contributor

Adds a workaround for the SIGFPE in glibc 2.34+ __libc_early_init() by setting two ld.so globals located via hardcoded offsets, making this fragile and considered temporary.

Tested on glibc 2.34 where every libc-using client crashes with SIGFPE but they work with this fix.

Adds an Ubuntu22 GA CI run but if we have failures due to other reasons the plan is to drastically shrink the tests run or abandon if it's too much work right now.

Issue: #5437

Adds a workaround for the SIGFPE in glibc 2.34+ __libc_early_init() by
setting two ld.so globals located via hardcoded offsets, making this
fragile and considered temporary.

Tested on glibc 2.34 where every libc-using client crashes with SIGFPE
but they work with this fix.

Adds an Ubuntu22 GA CI run but if we have failures due to other
reasons the plan is to drastically shrink the tests run or abandon if
it's too much work right now.

Issue: #5437
@derekbruening
Copy link
Contributor Author

The hardcoded offsets here will break on the next glibc update which inserts a var earlier in that globals list -- and I don't know enough about it to know how often that happens, whether vars are always appended, or what.

One option is to decode __libc_early_init and look for offsets there: but we'd look for hardcoded code patterns and that is also fragile. Maybe there is some simpler exported function to decode.

We could make runtime options for the offsets so a user could work around this.

2.34 seems to work if we don't even call __libc_early_init (but 2.32 crashes if we don't): we could avoid the call for 2.34, maybe if a decode of the func doesn't match our expected pattern.

OTOH not sure it's worth a ton of work if we think it's better to put that time into a refactor private loading process.

@derekbruening
Copy link
Contributor Author

I'm also looking for reviewer input on what to do with the ubuntu22 test I added here. We have the following failures (but many passes indicating this fix is working there else nearly every client test would fail):

	code_api|common.fib 
	code_api|common.nativeexec 
	code_api|common.nativeexec_retakeover 
	(ignore: i#2941) 	code_api|common.nativeexec_exe 
	(ignore: i#2941) 	code_api|common.nativeexec_retakeover_opt 
	(ignore: i#2941) 	code_api|common.nativeexec_bindnow 
	code_api|common.nativeexec_exe_opt 
	(ignore: i#2941) 	code_api|common.nativeexec_bindnow_opt 
	code_api|security-common.codemod 
	code_api|security-common.decode-bad-stack_FLAKY 
	code_api|security-common.selfmod2 
	code_api|security-common.selfmod-big 
	code_api|security-common.selfmod 
	code_api|security-common.TestAllocWE 
	code_api|security-common.TestMemProtChg_FLAKY 
	code_api|tool.drcachesim.phys_SUDO 
	code_api|tool.drcachesim.phys-threads_SUDO 
	code_api|tool.drcachesim.delay-global 
	code_api|tool.drcachesim.threads 
	code_api|tool.drcachesim.threads-with-config-file 
	code_api|tool.drcachesim.coherence 
	code_api,satisfy_w_xor_x|security-common.selfmod2 

Actually, some if not all of those are the rseq bug #5431.

@derekbruening
Copy link
Contributor Author

Actually that list is release build: many more failed on debug; but again they are mostly the rseq #5431.

.github/workflows/ci-x86.yml Show resolved Hide resolved
@derekbruening derekbruening merged commit cacb542 into master Oct 22, 2022
@derekbruening derekbruening deleted the i5437-sigfpe-workaround branch October 22, 2022 02:15
abhinav92003 added a commit that referenced this pull request Nov 18, 2022
Fixes issues with DR's rseq handling in glibc 2.35+.

Glibc 2.35 added support for the Linux rseq feature. See
https://lwn.net/Articles/883104/ for details. TLDR; glibc registers
its own struct rseq at init time, and stores its offset from the
thread pointer in __rseq_offset. The glibc-registered struct rseq is
present in the struct pthread. If glibc's rseq support isn't
available, either due to some issue or because the user disabled
it by exporting GLIBC_TUNABLES=glibc.pthread.rseq=0, it will
set __rseq_size to zero.

Improves the heuristic to find the registered struct rseq. For the
glibc-support case: on AArch64, it is at a -ve offset from app lib
seg base, whereas on x86 it's at a +ve offset. On both AArch64
and x86, the offset is of the opposite sign than what it would be
if the app registered the struct rseq manually in its static TLS
(which happens for older glibc and when glibc's rseq support
is disabled).

Detects whether the glibc rseq support is enabled by looking at
the sign of the struct rseq offset.

Removes the drrun -disable_rseq workaround added by #5695.

Adjusts the linux.rseq test to get the struct rseq registered by
glibc, when it's available. Also fixes some issues in the test.

Adds the Ubuntu_22 tag to rseq tests so that they are enabled.

Our Ubuntu-20 CI tests the case without rseq support in glibc,
where the app registers the struct rseq. This also helps test the
case where the app is not using glibc.

Also, our Ubuntu-22 CI tests the case with Glibc rseq support.
Manually tested the disabled rseq support case on glibc 2.35,
but not adding a CI version of it.

Fixes #5431
derekbruening added a commit that referenced this pull request Mar 9, 2023
Adds the same workaround for the SIGFPE in glibc 2.34+
__libc_early_init() as for 64-bit in PR #5695: we hardcode the 32-bit
offsets of the two globals written by the workaround.

Tested on glibc 2.34 where every libc-using client crashes with SIGFPE
but they work with this fix.

Adds an Ubuntu22 GA CI 32-bit run.

Issue: #5437
derekbruening added a commit that referenced this pull request Mar 10, 2023
Adds the same workaround for the SIGFPE in glibc 2.34+
__libc_early_init() as for 64-bit in PR #5695: we hardcode the 32-bit
offsets of the two globals written by the workaround.

Tested on glibc 2.34 where every libc-using client crashes with SIGFPE
but they work with this fix.

Adds an Ubuntu22 GA CI 32-bit run.

Issue: #5437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants