-
Notifications
You must be signed in to change notification settings - Fork 152
Allowing seeds for randomness #138
Comments
Both approaches have the downside of requiring each module that uses it duplicate the code to do with managing the rng. I can't see any way out of that without some sort of factory for everything that uses randomness, along with lifetimes on the generated objects that would definitely be worse. Of the two approaches, the second seems much easier for the end user, since it allows for a default of However, unless I'm misunderstanding box ownership (I might be, it always gets me a bit confused), the second option still results in ownership of the rng. And of course, it also results in dynamic dispatch, which is slower. So, I think the choice boils down to either slightly more complexity for the user, or slightly longer runtimes. I implemented a trial run on LDA, and I've discovered it causes some strange issues with mutability. That is, any method which uses randomness must be mutable, since all of the methods for generating randomness are mutable. I'm not sure this makes sense, so I've gone around it using pub trait Randomizable {
fn set_rng(&mut self, rng: Box<Rng>);
fn rng<'a>(&'a self) -> RefMut<'a, Box<Rng>>;
}
pub struct LDAFitter {
iterations: usize,
topic_count: usize,
alpha: f64,
beta: f64,
rng: RefCell<Box<Rng>>
}
impl Default for LDAFitter {
fn default() -> LDAFitter {
LDAFitter {
iterations: 30,
topic_count: 10,
alpha: 0.1,
beta: 0.1,
rng: RefCell::new(Box::new(thread_rng()))
}
}
}
impl Randomizable for LDAFitter {
fn set_rng(&mut self, rng: Box<Rng>) {
self.rng = RefCell::new(rng);
}
fn rng<'a>(&'a self) -> RefMut<'a, Box<Rng>> {
self.rng.borrow_mut()
}
} So, this would have to be done for every module that depended on it. Alternately we could get around the need to use |
We frequently use randomness throughout rusty-machine and in all cases default to the
rand::thread_rng
(except in the newShuffler
added by #135). It is very valuable for users to be able to set a seed to ensure that their models behave the same way on different runs.It would be nice if we could capture this with minimal effect on the current API. It is probably a requirement that the models (and other components) own their own random number generators. We would likely need to add a new generic type parameter on the models:
The alternative is to use a Box for this field and provide a trait which allows the user to modify the generator for the model:
This approach will keep the API a little cleaner. We can also control access to shared code requiring randomness by using the
Randomizable
trait. For example we could wrap some of therand_utils
functions to work on an object which implementsRandomizable
. I'd still like therand_utils
functions to be accessed in their current state though.Thoughts on this are very welcome. I suspect we will need to implement something to get a good idea of the benefits/drawbacks of any approach.
The text was updated successfully, but these errors were encountered: