Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reuse RandomState seeds #37470

Merged
merged 1 commit into from
Nov 5, 2016
Merged

Don't reuse RandomState seeds #37470

merged 1 commit into from
Nov 5, 2016

Conversation

arthurprs
Copy link
Contributor

@arthurprs arthurprs commented Oct 29, 2016

cc #36481

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 30, 2016
@alexcrichton
Copy link
Member

@rfcbot fcp merge

Looks good to me, thanks @arthurprs! Want to run it by @rust-lang/libs just to be super sure though we're on board with this solution to #36481

@rfcbot
Copy link

rfcbot commented Oct 30, 2016

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@sfackler
Copy link
Member

I would not describe this as a solution to #36481. Performance is still unnacceptable with non-default hashers. It does work around the issue in the common case, and we should land it though.

@arthurprs
Copy link
Contributor Author

Do you think I need to reword the comment mentioning #36481?

let r = rand::OsRng::new();
let mut r = r.expect("failed to create an OS RNG");
(r.gen(), r.gen())
UnsafeCell::new((r.gen(), r.gen()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this UnsafeCell can be Cell, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll change it.

@alexcrichton
Copy link
Member

The in-code comment is fine to me, but I changed Solves #... in the description to cc #...

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 31, 2016
@brson
Copy link
Contributor

brson commented Oct 31, 2016

I agree this is a good and efficient interim solution. To be clear, how much confidence do we have that addition here is actually a mathematically valid way to ensure that merging two hashmaps won't collide? My understanding is that it is clearly better than the status quo, that we suspect that in practice it will prevent collisions, but there's not any mathematical basis for knowing it will be effective (i.e. some clever person may find a way to generate collisions by incrementing the initialization value).

@brson
Copy link
Contributor

brson commented Oct 31, 2016

Here are the benchmarks comparing this approach of incrementing this value to completely regenerating it. This approach is 2ns vs 6-15ns.

@arthurprs
Copy link
Contributor Author

The fix relies on the backing hash function being high-quality, as in "Every input bit should affect every output bit about ~50% of the time" (seed is also an input).

@rfcbot
Copy link

rfcbot commented Nov 3, 2016

🔔 This is now entering its final comment period, as per the review above. 🔔

psst @alexcrichton, I wasn't able to add the final-comment-period label, please do so.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Nov 3, 2016

📌 Commit eba93c3 has been approved by alexcrichton

sophiajt pushed a commit to sophiajt/rust that referenced this pull request Nov 4, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 4, 2016
@bors
Copy link
Contributor

bors commented Nov 5, 2016

⌛ Testing commit eba93c3 with merge b243964...

@bors
Copy link
Contributor

bors commented Nov 5, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@alexcrichton
Copy link
Member

@bors: retry

On Fri, Nov 4, 2016 at 10:25 PM, bors [email protected] wrote:

💔 Test failed - auto-win-msvc-64-opt-rustbuild
https://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/2925


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37470 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD95DLFh58rBk3IDkcYSThEg1WQzM1Sks5q7BNegaJpZM4KkOFr
.

bors added a commit that referenced this pull request Nov 5, 2016
@bors
Copy link
Contributor

bors commented Nov 5, 2016

⌛ Testing commit eba93c3 with merge cae6ab1...

bors added a commit that referenced this pull request Nov 5, 2016
@bors bors merged commit eba93c3 into rust-lang:master Nov 5, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 5, 2016
bors added a commit that referenced this pull request Nov 6, 2016
@pczarn
Copy link
Contributor

pczarn commented Nov 13, 2016

@arthurprs The seed is not an input. The seed describes a choice of a function from a family of functions.

Regardless, the seed and the input are mixed in a similar way. Perhaps the solution is effective. I don't know cryptography. I can only read what cryptographers have explained.

@Manishearth
Copy link
Member

Shouldn't Clone also reinitialize the random state? We should never be copying the random state afaict.

@funny-falcon
Copy link

But what if change order of iteration just will fix original issue?

We can use just simplest lcg generator for walking order:

  // Pseudo code for iterator
  if (hash.size == 0) return
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i *5 + 1) & hash.mask
  } until (i != 0) /* returned to first position, so we walked all hash */

@funny-falcon
Copy link

funny-falcon commented Nov 24, 2016

@sfackler points out that given output of iteration one may construct new input values in desired order to DoS the application.

Ok, lcg is very tunable, so we can show different order on every iteration:

  // Pseudo code for iterator
  if (hash.size == 0) return
  ptis = ptis * 5 + 1 //per_thread_iteration_seed
  delta := ptis * 2 + 1
  mult := (ptis ^ (ptis>>16)) * 4 + 1
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i * mult + delta) & hash.mask
   } until (i != 0)

@arthurprs
Copy link
Contributor Author

arthurprs commented Nov 24, 2016

@funny-falcon the problem with that approach is that the performance hit for iteration is possibly large. It should be easily measured though.

@funny-falcon
Copy link

@arthurprs you right: cache misses on big hashtables will destroy performance. So, my suggestion is not acceptable.

@sacundim
Copy link

sacundim commented Nov 26, 2016

Quoting the SipHash paper, pp. 5-6 (my boldface):

Note that the standard PRF and MAC security goals allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. However, they do not allow access to any “leaked” information such as bits of the key or the internal state. They also do not allow “related keys”, “known keys”, “chosen keys”, etc.

In the patch attached to this pull request, the RandomState::new() function increments the second word of the key on each call. This means that RandomStates created in the same thread have related keys, which violates the preconditions of SipHash's security claims. Does anybody have an argument that SipHash achieves its security goal under these conditions?

One candidate solution that does not fall afoul of the random keys precondition would be:

  1. Use the same seed for many maps.
  2. Associate each map with unique ID that differs between all maps that share the same seed.
  3. Instead of hashing inputs verbatim, prefix them with the map's unique ID.

This way the incrementing counter is part of the message, not the seed, which is fine because the PRF security claim allows the even more adverse case of "messages chosen adaptively by the attacker."

With thread-local seeds, the unique ID can be an additional u64 counter. Since SipHash supports incremental computation, one simple optimization is to precompute, cache and reuse the hasher's state after absorbing the prefixed counter. A further optimization would be to defer this step until the first time the map needs to hash an input so that creating empty maps does not pay the cost of that.

@funny-falcon
Copy link

I think, it is a time to ask Jean-Phillipe Aumasson @veorq : does SipHash allows incrementing key, or will it fail? even though attacker never sees whole output of hashsum (it can only guess low bits, and even those bits are not exact cause of collision resolution)?

@veorq
Copy link

veorq commented Nov 26, 2016

As noted above we didn't make claims in the case of "related keys", but I haven't seen attack in the related-key model.

TL;DR: I think it's ok.

Details: IIUC the scenario here is the following:

  • at time t1 you compute SipHash( key, _)
  • at time t2 you'll compute SipHash( key + 1, _)
  • at time t2 you'll compute SipHash( key + 2, _)
  • etc.

The attacker can choose what messages are hashed. We can assume that the attacker sees the hashes of the messages with key + i for different i's and wants to predit SipHash(key + j, m) for some j>i. The attacker will only succeed if there's some relation between hashes of different messages (standard PRF security) or if there's some relation between hashes of different or identical messages with different keys (related-key PRF security).

Given the previously published cryptanalysis results on SipHash, I'm confident that SipHash will remain secure under related keys and as used in Rust.

bors added a commit that referenced this pull request Feb 16, 2017
Adaptive hashmap implementation

All credits to @pczarn who wrote rust-lang/rfcs#1796 and contain-rs/hashmap2#5

 **Background**

Rust std lib hashmap puts a strong emphasis on security, we did some improvements in #37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481).

This is a simplified version of rust-lang/rfcs#1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself.

**Proposal**
This PR adds code checking for extra long probe and shifts lengths (see code comments and rust-lang/rfcs#1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case.

We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe.

**Drawbacks**

* May interact badly with poor hashers.  Maps using those may not use the desired capacity.
* It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches.
* May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor.

**Example**

Example code that exploit the exposure of iteration order and weak hasher.

```
const MERGE: usize = 10_000usize;
#[bench]
fn merge_dos(b: &mut Bencher) {
    let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect();
    let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect();
    b.iter(|| {
        let mut merged = first_map.clone();
        for (&k, &v) in &second_map {
            merged.insert(k, v);
        }
        ::test::black_box(merged);
    });
}
```

_91 is stdlib and _ad is patched (the end capacity in both cases is the same)

```
running 2 tests
test _91::merge_dos              ... bench:  47,311,843 ns/iter (+/- 2,040,302)
test _ad::merge_dos              ... bench:     599,099 ns/iter (+/- 83,270)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.