Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BlockRng abstraction #281

Merged
merged 11 commits into from
Mar 16, 2018
Merged

Conversation

pitdicker
Copy link
Contributor

@pitdicker pitdicker commented Mar 3, 2018

The idea here is that many cryptographic RNGs generate values in blocks of 16 or more values. Examples that are currently in rand are ChaCha, HC-128 and ISAAC. Performance of these RNGs can be improved relatively much by using optimized methods to read the values from the results buffer, and these functions are not trivial. HC-128 currently contains the fastest variants, but it would be nice to make them easily available for all RNGs that generate values in blocks.

Another interesting case is ReseedingRng. With every call to next_u32, next_u64 etc it has to check whether it is time to reseed. But this does not make sense if the values are already generated and in the results buffer of such a 'block' RNG. Only checking whether to reseed when it is time to generate a new block, can improve performance by ~40%.

So this PR adds a trait BlockRngCore, that contains the core algorithm of such a block RNG. And a wrapper BlockRng that implements the RngCore methods optimally. I have implemented the trait for ChaCha, HC-128 and ReseedingRng. It is not hard to add ISAAC, but the PR is already challenge enough to review as it is I think 😄.

A little fine-tuning still has to happen. Some benchmarks alternate between fast and slow (reseeding_hc128_u32 for example between 1150 MB/s and 1380 MB/s). Some probably just need some smarter inlining, like StdRng. And I have seen reseeding_hc128_u64 to reach 1800 MB/s, although it at the moment stays at ~1535 MB/s. But I mostly wanted to get the basis out the door.

And there is the open question of how to make the BlockRng wrapper not only support 32-bit generators but also 64-bit using specialisation (when available). Although ISAAC-64 is the only 64-bit algorithm I know of.

Benchmarks (using cargo benchcmp):

 gen_bytes_chacha12               1,283,701 (797 MB/s)  1,254,126 (816 MB/s)       -29,575   -2.30%   x 1.02 
 gen_bytes_chacha20               1,954,827 (523 MB/s)  1,930,305 (530 MB/s)       -24,522   -1.25%   x 1.01 
 gen_bytes_chacha8                947,070 (1081 MB/s)   915,006 (1119 MB/s)        -32,064   -3.39%   x 1.04 
 gen_bytes_hc128                  452,401 (2263 MB/s)   430,523 (2378 MB/s)        -21,878   -4.84%   x 1.05 
 gen_bytes_std                    450,843 (2271 MB/s)   431,065 (2375 MB/s)        -19,778   -4.39%   x 1.05 
 gen_u32_chacha12                 6,422 (622 MB/s)      6,507 (614 MB/s)                85    1.32%   x 0.99 
 gen_u32_chacha20                 8,396 (476 MB/s)      9,196 (434 MB/s)               800    9.53%   x 0.91 
 gen_u32_chacha8                  4,442 (900 MB/s)      4,528 (883 MB/s)                86    1.94%   x 0.98 
 gen_u32_hc128                    2,890 (1384 MB/s)     2,869 (1394 MB/s)              -21   -0.73%   x 1.01 
 gen_u32_std                      3,582 (1116 MB/s)     3,768 (1061 MB/s)              186    5.19%   x 0.95 
 gen_u64_chacha12                 11,246 (711 MB/s)     10,279 (778 MB/s)             -967   -8.60%   x 1.09 
 gen_u64_chacha20                 16,544 (483 MB/s)     15,520 (515 MB/s)           -1,024   -6.19%   x 1.07 
 gen_u64_chacha8                  8,565 (934 MB/s)      7,629 (1048 MB/s)             -936  -10.93%   x 1.12 
 gen_u64_hc128                    4,418 (1810 MB/s)     4,277 (1870 MB/s)             -141   -3.19%   x 1.03 
 gen_u64_std                      5,114 (1564 MB/s)     5,210 (1535 MB/s)               96    1.88%   x 0.98 
 reseeding_hc128_bytes            450,510 (2272 MB/s)   431,435 (2373 MB/s)        -19,075   -4.23%   x 1.04 
 reseeding_hc128_u32              4,570 (875 MB/s)      3,486 (1147 MB/s)           -1,084  -23.72%   x 1.31 
 reseeding_hc128_u64              5,763 (1388 MB/s)     5,098 (1569 MB/s)             -665  -11.54%   x 1.13 
 thread_rng_u32                   4,847 (825 MB/s)      3,321 (1204 MB/s)           -1,526  -31.48%   x 1.46 
 thread_rng_u64                   6,935 (1153 MB/s)     6,377 (1254 MB/s)             -558   -8.05%   x 1.09

For a bit of history: this is my fifth attempt... The first attempt was 17 december (dhardy#76 (comment)). All kinds of cleanup had to happen before this PR was possible, and the master branch kept changing so much that this had to start from scratch 4 times.

I have tried to make as few as possible functional changes. RngCore gets it methods from Hc128Rng. ChaChaRng and Hc128Rng should be just a refactor, with no real changes.
ReseedingRng did need significant changes to it's logic, but all the pieces of the puzzle combine to give the same logic as #255. Only #255 (comment) had a performance impact of 15%, and I did not re-implement it in the end.

@pitdicker
Copy link
Contributor Author

Added a commit that modifies some inline hints, which takes care of the lower performance in StdRng and partly of ThreadRng and the reseeding_* benchmarks. Only reseeding_hc128_u32 left to figure out...

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should try to support #[derive(RngCore, SeedableRng)] somehow, though that can be done later of course.

Mostly looks good, but I'm a little concerned with the error handling of generate(). I learned before that the let _ = ... pattern can hide problems too easily.

I still want to go through the reseeding stuff more closely, though not concerned.

src/impls.rs Outdated
///
/// `next_u32` simply indexes the array. `next_u64` tries to read two `u32`
/// values at a time if possible, and handles edge cases like when only one
/// value is left. `try_fill_bytes` is optimized to even attempt to use the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation indicates this is specific to u32 internal values, so why not rename it to Block32Rng or something?

Third sentence would be better put something like:

try_fill_bytes is optimized to use the BlockRngCore implementation directly when generating fresh values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still hope there is some way to use specialization in the future to use BlockRng also with u64 values. No problem to rename it, do you think I should?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why you would want to do that though — the next_u32 and next_u64 impls would have to be rewritten, so it's only the try_fill_bytes bit that gets reused — and some abstraction could allow that. So you might as well just have Block32Rng and Block64Rng?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that you can then use both 64- en 32-bit RNGs with wrappers like ReseedingRng (just like now). But as I haven't got it working, I am not sure. Maybe best to keep this one with the name it has, and if specialisation doesn't work out name only the other Block64Rng. A 32-bit variant is by far the most common anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still could with separate Block64Rng, though there might be a lot of code redundancy. But your plan sounds good 👍

src/impls.rs Outdated
self.index += 2;
// Read an u64 from the current index
if cfg!(any(target_arch = "x86", target_arch = "x86_64")) {
unsafe { *(&self.results.as_ref()[index] as *const u32 as *const u64) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this assumes both LE and support for unaligned reads

src/impls.rs Outdated
let _ = self.core.generate(&mut self.results);
self.index = 2;
let x = self.results.as_ref()[0] as u64;
let y = self.results.as_ref()[1] as u64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't want to use the x86 optimisation here? You could probably put that inside a local read_u64 function/closure to save repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't even think about doing that. Do you think it is worth the extra complexity? Not that things are easy now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't seem like it would add much complexity to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't look close enough. We already do the same a few lines up 😄 (now near a computer again).

src/impls.rs Outdated

let len_remainder =
(dest.len() - filled) % (self.results.as_ref().len() * 4);
let len_direct = dest.len() - len_remainder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename len_directend_direct? It's not the length (because it includes filled_u8).

src/impls.rs Outdated

// As an optimization we try to write directly into the output buffer.
// This is only enabled for platforms where unaligned writes are known to
// be safe and fast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on LE platforms where — as above, there are two requirements I believe

src/impls.rs Outdated
(y << 32) | x
} else {
let x = self.results.as_ref()[len-1] as u64;
let _ = self.core.generate(&mut self.results);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap instead of let _ = please

src/impls.rs Outdated
}

fn fill_bytes(&mut self, dest: &mut [u8]) {
let _ = self.try_fill_bytes(dest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap instead of let _ = please

x.rotate_right(17) ^ x.rotate_right(19) ^ (x >> 10)
// Cannot be derived because [u32; 1024] does not implement Clone in
// Rust < 1.21.0 (since https://github.com/rust-lang/rust/pull/43690)
impl Clone for Hc128Core {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now require Rust ≥ 1.22.0 anyway

@@ -39,73 +40,125 @@ use {RngCore, SeedableRng, Error, ErrorKind};
/// `ReseedingRng` with the ISAAC RNG. That algorithm, although apparently
/// strong and with no known attack, does not come with any proof of security
/// and does not meet the current standards for a cryptographically secure
/// PRNG. By reseeding it frequently (every 32 MiB) it seems safe to assume
/// PRNG. By reseeding it frequently (every 32 kiB) it seems safe to assume
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, given that this is discussing the old StdRng

/// It is usually best to use the infallible methods `next_u32`, `next_u64` and
/// `fill_bytes` because they can make use of this error handling strategy.
/// Use `try_fill_bytes` and possibly `try_reseed` if you want to handle
/// reseeding errors explicitly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you changed try_reseed to have the same error handling, so no reason to prefer them.

@pitdicker
Copy link
Contributor Author

The idea with let _ = ... and continuing after an error in try_fill_bytes is mostly to make the error handling of ReseedingRng work. PRNGs are supposed not to suddenly stop working, as I sort-of wrote in the comment above the impl. If reseeding failed somewhere in the loop from try_fill_bytes the error will be preserved and returned when all bytes are filled. This also makes sure that fill_bytes ends up with a completely filled slice, even when reseeding failed somewhere in-between. Does this make some sense?

src/reseeding.rs Outdated
// we should return the error from reseeding.
// The only way to get this behaviour without destroying performance
// was to split part of the function out into a
// `reseed_and_generate` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was about branch prediction then, presumably? I can't find anything Rust-specific, and the answers here contradict each other... so maybe this is the best option (and more portable than relying on which branch is most likely to be predicted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it had to do with branch prediction, but with the size of the function and LLVM trying to combine both generate() functions and both returns, moving things around and adding checks outside the single branch we have now.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not happy with the error handling (see suggestions).

src/reseeding.rs Outdated
/// through some combination of retrying and delaying reseeding until later.
/// It will also report the error with `ErrorKind::Transient` with the
/// original error as cause.
fn auto_reseed(&mut self) -> Result<(), Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the changes to the name and doc. The name auto_reseed doesn't really imply anything more than reseed (since obviously it's talking about itself).

src/reseeding.rs Outdated
// Behaviour is identical to `try_reseed`; we just squelch the error.
let _res = self.try_reseed();
fn reseed(&mut self) -> Result<(), Error> {
R::from_rng(&mut self.reseeder).map(|result| self.core = result)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... this is called by ReseedingRng::reseed (though it probably shouldn't be) and by auto_reseed. Just inline it; we don't need the extra function. Then you can rename auto_reseedreseed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O, I forgot about this function in the first comment, and also with the recent work around ReseedingRng.

Before, ReseedingRng had a method users could manually call to force the RNG to reseed. Now the reseed method has gotten much smarter (thanks to you). Delaying, and wrapping the error. That is very useful for the situations where we reseed 'because it is time'. But I think it useful to also expose the basic functionality again.

That is why the extra function in ReseedingRng::reseed (and Reseeder::reseed)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the "smart" logic does is potentially delay automatic reseeding and change the error kind. I don't see how that is a problem here. It still "forces" reseeding just as much as the old logic; it just acts a bit differently if the attempt fails.


/// Reseed the internal PRNG.
pub fn reseed(&mut self) -> Result<(), Error> {
self.0.core.reseed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't adjust bytes_until_reseed; it should probably call auto_reseed instead...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done on purpose, see the previous comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if reseeding is successful, then it makes sense to reset bytes_until_reseed IMO.

src/reseeding.rs Outdated
BlockRng {
core: Reseeder {
core: rng,
reseeder: reseeder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names are confusing. "core" is used both for the core of the BlockRng and as the inner RNG in Reseeder. Reseeder is an internal type used to implement ReseedingRng; reseeder is the RNG used for reseeding.

  • core 1: no change
  • core 2: rename to inner or rng or inner_rng?
  • Reseeder: maybe ReseederBlockRng? Or leave as is
  • reseeder: possibly reseeding_rng, or leave as is but only if above name is changed

src/reseeding.rs Outdated
let res1 = self.auto_reseed();
self.bytes_until_reseed -= results.as_ref().len() as i64 * 4;
let res2 = self.core.generate(results);
if res2.is_err() { res2 } else { res1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since reseed_and_generate is only used within generate, I suggest ignoring any error from auto_reseed here, but passing on errors from generate — because if reseeding fails we can still output fresh random numbers, but if the latter fails (if that's even possible) then we have a much more serious problem.

This implies that if generate doesn't return an Result then this shouldn't either; I'm not sure which way it should go, but IMO generate should never return errors from reseeding because there's nothing useful to do with them and it just ends up with bad handling like I already commented on above.

@pitdicker
Copy link
Contributor Author

Mostly looks good, but I'm a little concerned with the error handling of generate(). I learned before that the let _ = ... pattern can hide problems too easily.

Sorry, I'm still not happy with the error handling (see suggestions).

I will try to write a little better about my reasoning, but otherwise time for a discussion 😄.

In all but this last variant of the BlockRngCore trait generate did not return a result. My reasoning was that it is only intended as an implementation detail for PRNGs. And as they are just pure algorithms shouldn't ever have a need to return errors. ReseedingRng would provide all next_u32 etc methods, duplicating the code from the BlockRng wrapper.

Then, it seemed cleaner to reuse the BlockRng wrapper inside ReseedingRng. But then there would have to be a way to thread errors through. Should the BlockRngCore trait get an extra try_generate method? Did not seem very useful by itself.

With your PR #255 things became easier, with both reseed and try_reseed implementing the same delaying logic, and reseed ignoring the result from try_reseed.

The same thing was possible here, but then the implementations in BlockRng should ignore the results of generate. This seems like a safe thing to do, because (1) PRNGs shouldn't return any errors (didn't design the trait to return errors in the first place), and (2) it is part of the implementation of a PRNG, so everything is under the author's control. Also BlockRng is a useful building block, but using it is not required...

Also continuing to request bytes even after a PRNG returns an error seems reasonable to me, in try_fill_bytes. I imagine it can return an error because of a failed reseed, because it reached some kind of limit, or it 'noticed' some kind of interference (fork protection?). Nothing prevents the PRNG from continuing to generate a few more values. But it should preserve the error and return it together with a filled slice. That is better than a failed reseed causing a half-filled slice.

I prefer giving an error over giving no indication of a failed reseed ever. Also because than all the effort we put into that 😄.

But I agree that if we keep things like this, the documentation of BlockRng should change to make clear is primarily intended for PRNGs. It will silently ignore errors, and only return the first error from regenerate in try_fill_bytes. It is not meant to be used with external RNGs that are expected to fail once in a while, and do not keep generating results.

@dhardy
Copy link
Member

dhardy commented Mar 7, 2018

Nothing prevents the PRNG from continuing to generate a few more values. But it should preserve the error and return it together with a filled slice. That is better than a failed reseed causing a half-filled slice.

But if the error originated in the PRNG's try_fill_bytes then your slice won't have been filled properly; i.e. your error handling is incorrect if the inner PRNG can fail. My suggestion is to either panic or return an error if the inner PRNG fails, but handle any reseeding errors internally (delay) and don't report (other than via log messages).

@dhardy
Copy link
Member

dhardy commented Mar 7, 2018

@pitdicker
Copy link
Contributor Author

This is some of the changes I had in mind.

Very helpful 😄.

I have made most of the changes you recommended. But to be honest I still think the error handling story is not that bad as it is implemented now... Still it is all very theoretical, whith PRNGs that should never fail, and reseeding which not only uses OsRng (which basically never fails), but also has a backup entropy source.

But if the error originated in the PRNG's try_fill_bytes then your slice won't have been filled properly; i.e. your error handling is incorrect if the inner PRNG can fail.

You mean generate? Still I assume a PRNG should never fail, at least not that badly that they can no longer generate results.

@dhardy
Copy link
Member

dhardy commented Mar 8, 2018

Still sounds to me like your error handling is doing theoretically the wrong thing with the excuse it doesn't matter because the PRNG will never fail. But we don't know what PRNG people might use, and in any case this is a poor excuse. Are there any drawbacks to the error-handling I implemented?

@rust-random rust-random deleted a comment from pitdicker Mar 8, 2018
@pitdicker
Copy link
Contributor Author

We are looking different at the error handling here, and I am not sure why. I don't see how relying on a PRNG to keep working is "theoretically the wrong thing". Is it because you have been bitten in the past by let _ = ...?

Still sounds to me like your error handling is doing theoretically the wrong thing with the excuse it doesn't matter because the PRNG will never fail. But we don't know what PRNG people might use, and in any case this is a poor excuse.

I think that the fact that PRNGs can't fail is one of the foundations of APIs in rand. gen(), gen_range, the distributions, all assume the RNG just works. It is just a purely mathematical algorithm after all.

Or maybe you expect the BlockRng wrapper to be used in more cases than I do? For example in OsRng? I agree it may make the wrong choices for such a scenario. But can't we just say "don't do that, for such an RNG, implement RngCore yourself"?

Are there any drawbacks to the error-handling I implemented?

It removes the main reason I added error handling to the BlockRngCore trait (only added it last week). If we go that route, I would prefer to remove the Result type and the error story completely from BlockRngCore.

@pitdicker
Copy link
Contributor Author

I know I'm being difficult with the error handling. But this PR does not change or regress any scenario we have now. Only the BlockRng wrapper may not be optimal for hypothetical fallible PRNGs that can stop working. But that this is not supported is clearly documented.

I would like to move this PR forward. Basically there is one part left where we differ in opinion, and I would argue the most minor part of it. If we agree the abstraction is worth it, and there are no other concerns, can this be merged?

@dhardy
Copy link
Member

dhardy commented Mar 13, 2018

Sorry @pitdicker I missed your comment from 4 days ago. Yes, I've been bitten by let _ = f(); swallowing errors that should have been handled in the past; part of the reason for this is that if the return type of f() is changed the result still gets ignored; one way around this is to use let _: T = f(); instead. But on the whole I don't like the pattern.

We've constructed a boundary between things that can be expected to fail and things that can't, and the way this is handled is that things not expected to fail (like calling gen() on a PRNG) would panic if they unexpectedly fail — I consider a panic acceptable in this circumstance. But your code can silently ignore unexpected errors — I'm less happy about that. I agree in principle your handling should work fine for any envisaged usage; I just don't like potentially allowing ignored errors.

I'll have another look over this later and see if I have any other ideas if you like. I'd also be happy removing the Result return type completely and just panicking I think.

@dhardy dhardy mentioned this pull request Mar 13, 2018
@dhardy dhardy added X-enhancement P-high Priority: high D-review Do: needs review and removed P-medium labels Mar 14, 2018
@dhardy
Copy link
Member

dhardy commented Mar 14, 2018

It is assumed that the generate() function never return serious errors from the inner PRNG. The only other error source is from auto_reseed which only returns Transient errors, which can thus be ignored. So maybe the best option is to change generate to not return errors? Note that the only errors the current code could return are Transient reseeding errors through try_fill_bytes, so by removing the error handling code we lose almost no functionality and simplify the code, as well as resolving our disagreement.

Basically there is one part left where we differ in opinion, and I would argue the most minor part of it.

Sorry, but details are important.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2018

BTW do you want to tack this on to this PR?

@pitdicker
Copy link
Contributor Author

BTW do you want to tack this on to this PR?

That was actually part of the PR, but I couldn't easily get it to compile. I will do a second try.

@pitdicker
Copy link
Contributor Author

pitdicker commented Mar 14, 2018

Sorry, but details are important.

Yes they are. I was only thinking at the time and (as it seems to me) endless rebases this took, and now I have to rebase again 🙁. Edit: no problem though.

I am slowly leaning towards removing the error story from generate. Are you really sure we don't want to return some error when reseeding fails? Although now of course we have (optional) logging.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2018

You can just merge if you prefer... hopefully the new rand-core crate doesn't cause too many problems. Yes, it's a pain dealing with many branches (I'm currently rebasing experimental again... I lose history but merges just weren't working any more with all the diverged history).

It's true, we lose all reseeding errors that way. I'm not too fussed really; if thread_rng returns an error most people won't want to deal with it. Okay, this means our whole error-handling story is getting derailed again 🤕

I think the "correct" way to do error handling like you've been trying to do would be to check the error kind, drop the error if it's Transient and panic otherwise. But I'm not sure if it's worth doing that — it might be slow?

@pitdicker pitdicker force-pushed the blockrng_take5 branch 2 times, most recently from f99f9cb to f684ed6 Compare March 14, 2018 17:58
@pitdicker
Copy link
Contributor Author

I have removed all traces of error handling from BlockRngCore (I think), and added the CryptoRng implementation to ReseedingRng. Let me know what you think.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I'm merging this with master for you, plus making a few tweaks. I'll let you see before committing to master.

value
}

#[inline(always)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these functions should always be inlined? They're relatively large.

@burdges
Copy link
Contributor

burdges commented Mar 15, 2018

I've several questions:

  1. I'd think an associated type Item makes more sense than a type parameter T, no? It'll simplify supporting Item=u64 via specialization or whatever too, right?
  2. Why BlockRngCore: Sized? It'll eliminate one reason if you make it the last element in BlockRng. I doubt this causes alignment issues since index: usize but if so I'd hope the compiler would optimize a Sized one to live before the index.
  3. An associated type Error causes no trouble given this remains so low level, right?
  4. Is there any reason for the pub bits?
pub trait BlockRngCore {
    type Item;
    type Results: AsRef<[Item]> + Default;
    type Error;
    fn generate(&mut self) -> Result<Results,Error>;
}
pub struct BlockRng<R: BlockRngCore<Item=u32>> {
    index: usize,
    results: R::Results,
    core: R,
} 
  1. Are there any "block PRNGs" that utilize their output buffer? I suppose this makes cycling length unpredictable, which sounds undesirable. If they exists, can this system implement one? I suppose yes but Results: AsMut<[Item]>+Default with Results::default().as_ref().len() > 0 gets tricky, but doable. If we combine enough recent unmerged RFCs then we've a fairly clean generic version:
struct ArrayDefaultableRef<T: Default; const LEN: usize>(&[T; LEN])
impl<T: Default; const LEN: usize> Default for ArrayDefaultableRef<T; LEN> {
    fn default() -> ArrayDefaultableRef {
        static default_data : [T; LEN] = [T::default(); LEN];
        ArrayDefaultableRef(&default_data)
    }
}

or maybe

#[dreive(Default)]
struct ArrayDefaultableRef<T: Default; const LEN: usize>(Option<&[T; LEN]>)
impl<T: Default; const LEN: usize> AsRef<[T]> for ArrayDefaultableRef<T; LEN> {
    fn as_ref(&self) -> ArrayDefaultableRef {
        static default_data : [T; LEN] = [T::default(); LEN];
        self.0.as_ref().unwrap_or(&default_data)
    }
}

And non-generic solutions work right now, of course.

@dhardy
Copy link
Member

dhardy commented Mar 15, 2018

Thanks for reviewing @burdges

  1. I'll give the associated type thing a try.
  2. I'll try that too (unsized core trait).
  3. I guess you reviewed an older version because the latest has fn generate(&mut self, results: &mut Self::Results);, i.e. it never returns an error. This may not be ideal but seems a reasonable option; see above comments.
  4. Which pub bits — the BlockRng fields? Yes, e.g. ChaChaRng implements functions in the wrapper which simply call e.g. self.0.core.set_rounds(rounds) as well as resets the index directly. We could use an access fn plus fn reset_index(&mut self) instead I suppose but I don't see much point.
  5. The current impl of fill_bytes can make generate target the output slice directly to save a copy; this would be impossible if we allowed the PRNG to use its buffer. I don't know if there are PRNGs which use the buffer to generate the next results, but often use both an "advance state" function and a "hash output" function; for crypto-rngs in particular it is important that output results cannot be used to predict the internal state/future output. Since this buffering is mostly used by crypto-rngs, the answer may be no.

@dhardy dhardy added B-API Breakage: API T-RNG and removed D-review Do: needs review P-high Priority: high labels Mar 15, 2018
@burdges
Copy link
Contributor

burdges commented Mar 15, 2018

  1. I'm suggested returning an associated error type from generate if the stream is exhausted. In practice, I think panicing sounds fine for CSPRNGs but folks might do non-cryptographic things that benefit from returning an error. I suggested the more idiomatic form because it should optimize to avoid any extra stack copy. In other words, I expect fn generate(&mut self) -> Result<Results,Error>; is always optimized to fn generate(&mut self, result: &mut Result) -> Result<(),Error>;, assuming Error is small. If Error is large, then it likely optimizes to fn generate(&mut self, result: &mut Result, error: &mut Error) -> Result<(),()>; In either case, I'd imagine self.results = self.core.generate()?; is optimized to self.core.generate(&mut self.results)?; or similar. If these optimizations do not occur, then that should be filed as a rustc bug.

  2. Ok. We expect BlockRng only gets used by PRNG crates anyways? If so, we've no reason to keep the fields private anyways, so fine.

  3. Interesting. If we return Results like I proposed in 3 then afaik roughly your default fill_bytes should still avoid the extra copy, when Results = [Item; LEN]. It'll incur the extra copy for anyone using my ArrayDefaultableRef trick too, but that's fine. I suspect my ArrayDefaultableRef trick produces incorrect behavior for your version however, so my suggestion to be idiomatic enables more options and prevents more possible bugs.

@dhardy
Copy link
Member

dhardy commented Mar 16, 2018

  1. Yes, that optimisation is very important to Rust, but that's not why we do generate(&mut self, results: &mut Result); the reason is point 5. As for returning generate(self, results) -> Result<(), Error> with associated error type, that might be a good idea, but since we've already been arguing over error handling so long I'd rather get the existing version merged first, then do another PR on error handling.

  2. Sorry, I don't follow. You are saying that if we do let results = generate()?; then copy results to either the buffer self.results or part of the slice dest that the compiler should optimise that copy out?

@dhardy dhardy merged commit 08924a3 into rust-random:master Mar 16, 2018
@burdges
Copy link
Contributor

burdges commented Mar 16, 2018

  1. I'm saying (a) self.core.generate(&mut self.results); prevents using BlockRng with any PRNG that must keep its own results buffer internally, while (b) doing self.results = self.core.generate(); should work with such PRNGs and should not incur any penalty in optimized builds with normal PRNGs that do not keep their own internal results buffer.

@dhardy
Copy link
Member

dhardy commented Mar 16, 2018

5.a) I explained why crypto-rngs will never use the results buffer to generate the next output while non-crypto RNGs don't usually use a results buffer, so I don't think there's any need to support internal results buffers.
5.b) It sounds like there would still be an extra copy where the current fill_bytes code calls core.generate(&mut dest[a..b])

@pitdicker pitdicker deleted the blockrng_take5 branch March 16, 2018 20:59
@burdges
Copy link
Contributor

burdges commented Mar 16, 2018

We should imho file a rustc bug report if dest[a..b] = core.generate(); incurs an extra copy in release builds. In that case, it's not merely incurring an unnecessary copy, but wasting stack space that's possibly scarce on some embedded platforms, and maybe annoying for their growing interest in stack pinning, etc.

In the worst case, there are still many tricks one can play even if this incurs a copy, like maybe *(&mut dest[a..b]) = ... behaves differently, and dest[a..b] can even be made Sized using the arrayref crate. We could even examine the curve25519-dalek crate too because they've gone to great lengths to avoid unnecessary copies.

@burdges
Copy link
Contributor

burdges commented Mar 16, 2018

In fact, I suppose dest[a..b] = core.generate(); might not even type check because we'd need Results: Sized but I think dest[a..b]: [T] is unsized. Instead, we might need

*array_mut_ref!(dest,a,LEN) = core.generate()?;

where Results: [Item; LEN], but this requires const generics.

I'd say do this your way for now but plan to change generate once const generics work right in 2019 or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-API Breakage: API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants