-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent kernel causes -fPIE ASan executables to abort on x86_64 #837
Comments
It is possible at the code size and execution time cost, which we are not willing to pay. |
This would not be the first time when the kernel change breaks the sanitizers. What we really need here is to tell at link time where the shadow is. |
I always wondered if it would be possible to express the shadow mapping as an ELF program header. That would be the ultimate way to communicate shadow memory needs to the kernel. |
I'm not sure - I'm just a user who happened to stumble across the bug. You might be able to get them to change where the executable gets mapped, but they could argue that PIE executables should be prepared to be loaded at any address.
I don't see how that is possible with PIE / ASLR. The entire point is that you don't know where the executable will be loaded, so you can't know what bits of memory will be free until runtime. |
We could have a program header that means "please reserve the first N bytes of the address space for the application". Then the kernel can use that as a minimum for ELF_ET_DYN_BASE. |
@dvyukov can you confirm that the fresh kernel breaks the sanitizers? |
I think I am hitting this bug:
(Just compiled a trivial Hello world with |
A possible workaround seems to be the following:
That way, (Yes, my |
I independently bisected this in the kernel and opened a bug there: https://bugzilla.kernel.org/show_bug.cgi?id=196537 but didn't have a lot of knowledge about the underlying issues. |
Bringing this over from twitter (https://twitter.com/kayseesee/status/894594085608013825), my basic view is that this is a bug in the ASAN library code. Assuming you can use a particular virtual address range is not valid (it could already be in use for some reason, as you're now seeing), and even if it were valid, it's not safe for something that can be used in deployment; it exposes potentially sensitive information at an attacker-known address. ASAN simply needs to pay the cost of using a variable address chosen at runtime. |
@richfelker ASAN has been using fixed addresses since 2011.
That's one way to look at it. But a much better resolution would be to have a kernel<=>userspace interface that allows to use a fixed address. And in the meantime, revert the change that broke ASAN.
If you want to discuss this topic, please open a separate issue, let's not mix too many things in a single place. |
Like I said on on the initial Twitter thread, I don't think I have much of value to say beyond "I think what you're doing is badly wrong" and "it happened to work before is not a good argument to do it (or for changing the kernel)". If we disagree then we disagree... |
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that. Is that available in the current stable release of LLVM? And if yes, can you point me to some documentation please. I think that information would be useful for downstream projects that find the runtime overhead of a dynamic shadow base is acceptable. |
@kcc I don't think this is good advice. Pretty sure that the change fixes some security issue, so you shouldn't revert that. |
I agree strongly with @bennofs. Address assignment/ASLR for production systems should not be tiptoeing around (and possibly impacting security) for the sake of a tool that's only suitable in debugging situations and not production. I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not. |
One more discussion thread is here: http://marc.info/?t=149973272100048&r=1&w=2 |
In clang there is
All these arguments are perfectly valid, but who is going to pay for the increased CPU usage and code size? Or, if we end up supporting both configurations on linux (dynamic and static) who is going to pay for the extra maintenance overhead? We really need to come up with a solution where the application requests a fixed address range at startup and the kernel can't refuse. |
@kcc: Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1) |
@FSMaxB please open a separate bug with details. |
Requesting a fixed address range at startup is non-PIE. Normal non-PIE ELF already has a way to do that: PT_LOAD segments (e.g. with PROT_NONE or just BSS you can MAP_FIXED over later). The whole point of an executable being PIE is that it doesn't demand specific addresses. Being that current kernels don't, and future kernels probably won't, support the invalid usage of assuming a particular fixed address range is free, the fixed address mode should just be removed and dynamic always used. This will simplify the amount of code that needs to be maintained anyway (since Windows already needs dynamic). Performance is not likely to be significantly worse, but ASAN already performs badly and is intended and understood as a costly (but less so than some other approaches) tool for debugging (and possibly in the future, for hardening). |
Asan's shadow being at a fixed offset does not really contradict PIE -- the rest of the addresses could be anywhere they want to (except for the shadow region). BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow. |
The view I'm putting forward, which you're free to disagree with but I think is worthwhile, is that the definition of PIE is "no fixed mappings", not "some non-fixed mappings". In this definition, PIE ELF programs can even be loaded in rather esoteric environments like a shared address space (multiple programs in the same process) or a nommu system (where all processes share an address space). There are very good reasons to consider any fixed mappings a design bug; in places where they've been used recently, they've repeatedly come back to bite the designers and users. The Linux/glibc x86_64 "vsyscall" mess, ARM kuserhelper page, etc. come to mind. |
BTW my view of these matters is somewhat broader than "Linux" because I'm thinking of/interested in the usage case of non-Linux implementations loading and executing programs using the Linux user-kernel ABI. This sort of generality is part of why I disagree with the view that the kernel is obligated to lay out memory the same way past versions did. |
May make sense to measure sanitized DSOs (where |
Relevant discussion in oss-security |
I've done an overnight run of SPEC2006 on my machine.
I was also surprised to see that the code size with dynamic shadow is actually better (~0.3%). Dynamic:
Static:
Next steps:
|
The difference between regular executables and PIE:
PIE (or -shared-libasan):
|
Btw, ASan on 32-bit Android maps shadow at 0000 0000 .. 2000 0000, because all executables are PIE, and it is slightly faster that way (and requires less code). This is now broken. |
This will not always work. If library A depends on library B, then a constructor of B may call A before A's constructors have ran. |
…e breaks them. Cf. google/sanitizers#837 Tracked in https://bugs.swift.org/browse/SR-6257
…e breaks them. (#12681) Cf. google/sanitizers#837 Tracked in https://bugs.swift.org/browse/SR-6257
As recent kernel update broke them. Cf. google/sanitizers#837 Tracked in https://bugs.swift.org/browse/SR-6257
The kernel commit was ultimately reverted. Do we want to keep this issue open? |
I don't think it was reverted. |
Oh, I think it was reverted in Ubuntu kernel, but not in upstream. |
- Revert to previous trusty image (ref. google/sanitizers/issues/837) - Switch to xcode9 on osx - Scaffolding to get the sanitizer job using clang 5.0 - ASAN: Disable stack protector - ASAN: Ignore sigsegv and sigbus
I am writing this for everyone who are trying to find a solution to the problem of running sanitizer on Linux and arrive at this thread from googling. As you might infers from the problem described in this thread, you have to disable ASLR on Linux via "nokaslr" option to be able to run sanitizer, but that put you at a potential security risk, so what I would recommends is to do the followings:
|
If we need to fix this, I think the best solution would be to use the dynamic shadow offset feature ( @kcc had concerns in 2017 about the performance of this change. He ran some benchmarks in this comment, and the results were in the noise. If someone can produce new results on a less noisy machine, I don't think there are any other objections. We can make the change and fix this issue for good. Maybe this interacts with the new ASan codegen that @kda added, I'm not sure. |
I don't know if they will ever come around to fixing this issue since this have been around for 5 years, I submitted a workaround for this until then. |
It looks like it was reverted upstream in August 2017: torvalds/linux@c715b72 There was then a minor fix in November 2017 for 5-level-paging (torvalds/linux@be739f4), but it has no impact on this issue; that's the last time ELF_ET_DYN_BASE was modified for x64. This means there is only a very narrow time window from when the breaking change was made (July 2017) and reverted (August 2017); any kernel outside of that 5-week period should be compatible with ASan. |
Since this is still open, I thought I'd comment that I ran into this today on Ubuntu 22.04, kernel 6.2.0-34-generic (a little out of date). |
Perhaps your Ubuntu installation has an aggressive address-space layout randomization (ASLR) setting. Could you please run:
and report what it prints? N.B. the ASan layout was slightly tweaked in April (https://reviews.llvm.org/D148280) - perhaps just missing the release date of Ubuntu 22.04 - which increases ASan's compatibility with ASLR. |
|
Thanks! That's weird, even 2022 ASan should not have any problems with 28-bit ASLR. Could you please provide an example program and command line where you encounter the issue? FWIW I spun up an Ubuntu 22.04 VM with a toy example, and it worked fine:
|
This is just a heads-up about this Linux kernel commit recently committed and pending on a number of stable queues:
torvalds/linux@eab0953
It seems to adjust move the default load address for
-fPIE
executables into the location ASan uses for its shadow memory map (on x86_64). This then causes ASan to abort on startup. Example error:With ASLR enabled, you can sometimes get lucky with the load address and the program runs, but most of the time ASan aborts with this error.
Is it possible for ASan to be a bit more flexible about where it places the shadow map on startup to fix this?
The text was updated successfully, but these errors were encountered: