Allowing seeds for randomness #138

AtheMathmo · 2016-09-22T15:40:52Z

We frequently use randomness throughout rusty-machine and in all cases default to the rand::thread_rng (except in the new Shuffler added by #135). It is very valuable for users to be able to set a seed to ensure that their models behave the same way on different runs.

It would be nice if we could capture this with minimal effect on the current API. It is probably a requirement that the models (and other components) own their own random number generators. We would likely need to add a new generic type parameter on the models:

pub struct MLModel<T1, T2, ..., Tn, R> where R: Rng {
    // All the previous fields
   rng: Rng,
}

The alternative is to use a Box for this field and provide a trait which allows the user to modify the generator for the model:

pub trait Randomizable {
    fn set_rng(&miut self, rng: Rng);

    fn rng(&self) -> Box<Rng>;
}

This approach will keep the API a little cleaner. We can also control access to shared code requiring randomness by using the Randomizable trait. For example we could wrap some of the rand_utils functions to work on an object which implements Randomizable. I'd still like the rand_utils functions to be accessed in their current state though.

Thoughts on this are very welcome. I suspect we will need to implement something to get a good idea of the benefits/drawbacks of any approach.

The text was updated successfully, but these errors were encountered:

dyule · 2017-02-21T19:33:23Z

Both approaches have the downside of requiring each module that uses it duplicate the code to do with managing the rng. I can't see any way out of that without some sort of factory for everything that uses randomness, along with lifetimes on the generated objects that would definitely be worse.

Of the two approaches, the second seems much easier for the end user, since it allows for a default of thread_rng() if they don't want to bother setting their own.

However, unless I'm misunderstanding box ownership (I might be, it always gets me a bit confused), the second option still results in ownership of the rng. And of course, it also results in dynamic dispatch, which is slower.

So, I think the choice boils down to either slightly more complexity for the user, or slightly longer runtimes.

I implemented a trial run on LDA, and I've discovered it causes some strange issues with mutability. That is, any method which uses randomness must be mutable, since all of the methods for generating randomness are mutable. I'm not sure this makes sense, so I've gone around it using RefCells, but I don't know how great that is either. Anyways, the relevant pieces are here:

pub trait Randomizable {
    fn set_rng(&mut self, rng: Box<Rng>);

    fn rng<'a>(&'a self) -> RefMut<'a, Box<Rng>>;
}

pub struct LDAFitter {
    iterations: usize,
    topic_count: usize,
    alpha: f64,
    beta: f64,
    rng: RefCell<Box<Rng>>
}

impl Default for LDAFitter {
    fn default() -> LDAFitter {
        LDAFitter {
            iterations: 30,
            topic_count: 10,
            alpha: 0.1,
            beta: 0.1,
            rng: RefCell::new(Box::new(thread_rng()))
        }
    }
}

impl Randomizable for LDAFitter {
    fn set_rng(&mut self, rng: Box<Rng>) {
        self.rng = RefCell::new(rng);
    }

    fn rng<'a>(&'a self) -> RefMut<'a, Box<Rng>> {
        self.rng.borrow_mut()
    }
}

So, this would have to be done for every module that depended on it. Alternately we could get around the need to use RefCells if trait definitions were changed to make fit mutable.

AtheMathmo mentioned this issue Oct 13, 2016

Adding an information criterion to the gmm module for dynamic K optimisation #151

Closed

AtheMathmo mentioned this issue Feb 21, 2017

Latent Dirichlet Allocation #172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing seeds for randomness #138

Allowing seeds for randomness #138

AtheMathmo commented Sep 22, 2016

dyule commented Feb 21, 2017 •

edited

Loading

Allowing seeds for randomness #138

Allowing seeds for randomness #138

Comments

AtheMathmo commented Sep 22, 2016

dyule commented Feb 21, 2017 • edited Loading

dyule commented Feb 21, 2017 •

edited

Loading