Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[zero_stack] attribute #1853

Closed
burdges opened this issue Jan 14, 2017 · 18 comments
Closed

#[zero_stack] attribute #1853

burdges opened this issue Jan 14, 2017 · 18 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@burdges
Copy link

burdges commented Jan 14, 2017

There are various crates that attempt to provide secure memory storage for secret data with perhaps the most advanced being stouset/secrets. A minimal approach is simply to zero any data when dropping it, but mostly these tools employ heap allocation so that rustc and llvm cannot move it, possibly leaking secret data.

Any routines that process secret data must use such an allocator as well, which mildly complicates usage with #![no_std] and starts grows suspicious. Are there any initialized parts of the chacha stream cipher left on the stack by a call like this?

let mut chacha = MySecretWrapper::new( ChaCha20::new(key,nonce) );

An alternative would be a #[zero_stack] attribute for fns that ensured the fn zeroed any stack space it knowingly touched with program data when the fn finished. A few details :

Any stack allocated with alloca #1808 should be zeroed on exit as well.

An fn marked #[zero_stack] should never return data on the stack, only in registers. Accepting and using a mutable reference is fine of course, even if that reference points to the stack. In principle, this rule against returning data on the stack could be relaxed if the calling routine is also #[zero_stack]. An attribute #[no_zero_stack] might exempt specific data from zeroing, including the value to be returned.

An fn marked #[zero_stack] should inline any fns it calls, unless that fn is also marked #[zero_stack]. It should throw a warning anytime it calls a non-#[zero_stack] fn that cannot be inlined, although this warning could be overridden with a call site attribute #[no_zero_stack].

Any nested fn or closures should implicitly be made #[zero_stack] as well, unless marked #[no_zero_stack]. Any coroutines #1823 marked #[zero_stack] should zero their own stack when yielding, but some behavior might require the caller by #[zero_stack] as well.

An fn marked #[zero_stack] might zero additional stack space beyond its own usage if there is any OS specific reason to do so. Ideally, a panic might still zero any stack allocated data, but this could be omitted if problematic for whatever reasons.

Afaik, there is no reason for attributes like #[zero_stack] to try to provide all the functionality of secret data wrapper types, which might even encrypt the data in memory. I think an attribute for zeroing stack allocated data would make using rust to process secure data far more like using rust to process regular data.

cc @stouset @tarcieri @valarauca @DaGenix @briansmith @Manishearth @vks @Philipp91

@cesarb
Copy link
Contributor

cesarb commented Jan 14, 2017

See also #1496

@cesarb
Copy link
Contributor

cesarb commented Jan 14, 2017

I wrote something to zero the stack at https://crates.io/crates/clear_on_drop. It is a bit manual (you have to specify how much of the stack should be zeroed), but it should work correctly with panics (the zeroing is done within a Drop, so it zeros even during a panic unwind).

@burdges
Copy link
Author

burdges commented Jan 15, 2017

Nice. Zeroing once might be faster, depending upon your call graph, even if you must zero more, except it'll get slow as soon as you pass the existing cache lines. I wonder if there is any way to measure the memory usage and try not to zero too much?

I suppose the LLVM situation has not improved since last year, so all the issues with #1496 still apply here. :( Aside from your clear_on_drop trick with closures, we're likely stuck fixing code to only use the heap.

@valarauca
Copy link

One thing that should be noted is this change will have to ensure the zero'ing function doesn't skip the redzone that x64 adds onto the stack for leaf functions. That 128byte area could hold the majority of key in some scenarios.

@briansmith
Copy link

There was a talk at RWC about this idea: http://www.cl.cam.ac.uk/~lmrs2/publications/erasing_ram_rwc17.pdf. The main takeaway is that it has to be implemented carefully in order to avoid hurting performance by a considerable amount.

Although the RFC that attempted to specify #[clear_on_drop] was closed, I think an approach similar to that-where every secret value is annotated in some way, and the compiler automatically zeros secret values-is much better in the long term. In particular, #[clear_on_drop]-like semantics would make the cost of zeroing proportional to the amount of secret data (possibly) written, instead of proportional to the amount of all data (secret plus non-secret) possibly written. In other words, #[clear_on_drop]-like semantics is more consistent with Rust's goal of abstractions being zero-cost.

@burdges
Copy link
Author

burdges commented Jan 17, 2017

We can close this in favor of the original #[clear_on_drop] RFC #1496 or maybe tweak #[clear_on_drop] to include anything I mention here that appeared recently, like say the alloca or corountines comments. I think however we'd still face the underlying poor support from LLVM that scuttled that RFC.

An interesting middle ground might be a type struct SharedOwned<T>(pub T) that behaved like the shared reference type Shared<T> in that it warned the optimizer that additional references existed but actually owned it's data instead of being a reference. And Shared<T> could be equivalent to or even defined as roughly NonZer(*mut SharedOwned<T>) or something.

I ran into trouble with my attempts to handle clearable owned data in @cesarb's clear_on_drop crate. Just using Borrow<T>+BorrowMut<T> makes T non-cannonical https://github.com/burdges/clear_on_drop/blob/borrow_fail/src/clear_on_drop.rs but even if one solved that then the optimizer still moves around owned data too much https://github.com/burdges/clear_on_drop/blob/owned_fail/src/owned.rs for zeroing to be effective.

@llogiq
Copy link
Contributor

llogiq commented Jan 25, 2017

#1808 does not specify what happens to the data on drop, and I'm positive that LLVM doesn't zero the result of allocas. Requiring a memset on dropping an allocated array would negate any performance wins this might have. Why don't wrapper types over slices that explicitly zero on drop work for you?

@burdges
Copy link
Author

burdges commented Jan 25, 2017

There are too many situations where data gets moved around on the stack. In particular, you cannot make a by-value wrapper type that correctly zeros data because ::std::mem::drop will make a copy and drop only that one, at least when compiled in debug mode.

Instead, you must manipulate everything behind a pointer, so ClearOnDrop<Box<T>> or ClearOnDrop<&mut T>. That works, but Rust favors passing by-value. In particular, you cannot necessarily write f() -> Result<ZeroingWrapper<[u8; 32]>,Error> but instead must write either f() -> Result<ClearOnDrop<Box<[u8; 32]>>,Error> or f(&mut buf) -> Result<(),Error>

In principle, immovable types #1858 might fix this by disallowing by-value calls or coercing them into by reference calls, but consider this :

fn f(mut arg: ImmovableInput) -> ImmovableOutput { .. }

We can coerce mut arg: ImmovableInput into arg: &mut ImmovableInput as officially the caller shall never again touch it, but maybe causes issues with raw pointers the caller held. We could maybe require the caller used ImmovableOutput wherever f leaves it on the stack, but that sounds problematic too.

According to https://www.cl.cam.ac.uk/~lmrs2/publications/erasing_ram_rwc17.pdf there are much more efficient options than zeroing after every function call anyways. Analyzing the call graph can provide zeroing with minimal cost, but breaks too many things. Appears the best option would be instrumenting functions to track their stack usage.

@briansmith
Copy link

It would be bad to rely on immovable types. It's not idiomatic. We need zero-on-move and zero-on-drop types.

@eddyb
Copy link
Member

eddyb commented Jan 25, 2017

We need zero-on-move and zero-on-drop types.

Why? You're both wasting time moving data around and later zeroing it, why not keep it boxed?

Besides, without some kind of unbound (i.e. ?Move) or boxing, any generic abstraction could use unsafe code inside to do a move and it can very well assume that bit copies are a valid way to move any type.

Vec calls realloc. That does the moving internally, how would you hook that?

@briansmith
Copy link

Why?

Because you may not have a heap (e.g. #![no_std] and/or your crypto library may return things by value because that's what's idiomatic and/or otherwise necessary (e.g. ring).

You're both wasting time moving data around and later zeroing it

It's up to the compiler to minimize the zeroing overhead during dead store elimination, given knowledge about values that need to be zeroed at least once.

Vec calls realloc. That does the moving internally, how would you hook that?

Vec and related things would need to change, most likely.

@briansmith
Copy link

Vec calls realloc. That does the moving internally, how would you hook that?
Vec and related things would need to change, most likely.

This is probably a good motive for using a non-libc-based libstd, like steed.

@burdges
Copy link
Author

burdges commented Jan 26, 2017

I believe most dynamically allocating data structures need a complete redesign anyways, well unless you use them with a second layer of allocation like say

type SecureHashMap<K,V> = HashMap<K,ClearOnDrop<Box<V>>>`.  

Yes, Vec sounds simpler than say HashMap, but .into_boxed_slice() abandons the ability to reallocate easily enough, so it's fine if you can just avoid reallocations.

I'm worried about this for return-by-value, which provides numerous advantages, especially in Rust. It's also so strongly idiomatic that even we avoid it in crypto libraries then users will still pass key material around that way.

Right now, the clear_on_drop crate zeros stack pages, but I'd imagine this causes a stack of cache misses due to zeroing memory the program never touched.

In the short term, we could probably instrument crypto libraries so that they generated and saved stack usage information when benchmarks were run, and then called some explicit zeroing function at appropriate points. You'd need to run the benchmarks twice before they'd give the right answer, but that's fine.

In the longer term, I could imagine a hybrid approach between call graph analysis and instrumentation. Any function that used a dynamically sized alloca, recursion, etc. would have an instrumented version. Any function for which this information was available statically would offer it statically instead. Both approaches can be viewed as implementing a trait like the following, but static stack usage means the usize argument can be inlined :

trait FnStackUsage : ???  {
    type Arguments;
    type Output;
    fn fn_stack_usage(args: Arguments) -> (usize,Output);
}

@ticki
Copy link
Contributor

ticki commented Jan 26, 2017

People here seem to think zeroing is a major performance hit. It's not. Even if you had a lot of data, you could just encrypt it and zero the key instead.

@burdges
Copy link
Author

burdges commented Jan 26, 2017

Yes, encrypting provides an easy redesign option for allocating data structures, but it's not useful for pass-by-value and return-by-value situations.

@ticki
Copy link
Contributor

ticki commented Jan 26, 2017

True that.

@llogiq
Copy link
Contributor

llogiq commented Jan 26, 2017

Performance is always hinging on the workload. If you have a lot of stack churn, the zeroing may well become noticeable. By that point it's no longer a zero-cost-abstraction: Everyone pays for the privilege of zeroing their data, sensitive or not.

@burdges
Copy link
Author

burdges commented Jan 27, 2017

As I understand Laurent Simon's work, zeroing the secret data needed in cryptography can be cheap compared with the cost of doing the cryptography that uses it, but there are two bad situations one must avoid :

  • If all functions zero their own data, then we zero needlessly anytime a loop repeatedly calls the same functions, as always happens in cryptography.
  • Any data we zero but never touch results in cache misses, and squandered cache lines, as happens with the clear_on_drop crate's clear_stack_on_return function.

I'll close up this issue now because it takes the wrong approach. We do not want stack types that zero themselves per se. And boxing the secret data ala ClearOnDrop<Box<T>> does not work either unless you can manually tune specific workspaces that way.

Instead, we should store intermediate sensitive data in a stack-like way so that the same memory can easily be reused for different mathematical objects, but we need to zero exactly the used stack space when done with sensitive code, and take care with how else this stack gets used.

For this, we need instrumentation that tracks exactly how much stack these sensitive functions use, statically if possible, but dynamically when necessary, along with a volatile_zero_stack(bytes: usize) function to zero the exact quantity of stack space after use.

@burdges burdges closed this as completed Jan 27, 2017
@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

8 participants