-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing distributions #560
Comments
Well,
Not really sure what you're use-case is. Also bear in mind that distributions could change significantly #290 |
Also re: #290, I noticed that distributions in stat-rs do implement |
This is a nice idea in theory, but I don't see it as something that is going to work. You would need to have exactly the same RNG algorithm, with the same seed (and initialization), with the same algorithms to convert to As you and @dhardy say, not all input values are kept around, but the get converted to internal parameters. Keeping these around is something not many users are going to need. And because of the fun that floating point is, it is also not possible to always exactly recalculate the input values from the internal parameters. Implementing |
I don't think I explained clearly the point of what I'm trying to do. Suppose you have 2 programs, one program is in numpy, the other program is in rust using rand. Yes, both RNGs and distributions in rand and numpy are quite different. But, both programs are supposed to be "similar" in that the random number sequence they each draw are nominally "equivalent": for example, this could be encoded as Now, you first run the numpy program using its Mersenne twister RNG and its distributions, and this run generates a "tape" or "trace" of the actual The key point is that we do not want to re-implement the numpy RNG/distributions in rust, we just want to replay (in rust) the recorded sequence of random numbers (from numpy) for the purpose of reproducibility, particularly to facilitate the debugging and testing of numerical code by eliminating a source of non-determinism at the level of bits.
Yes, this is my takeaway as well. It sounds like comparing distributions may better served by special types like |
It would be easy to make an Otherwise, in theory it is possible to convert whatever internal parameters are used to a mean / mode / stddev / ... (excepting that not all measures make sense on all distributions). However, we should expect some accuracy to be lost (thus exact equality not being useful). |
It seems to me like you want to compare samples from arbitrary distributions and determine how likely it is that they are from the same distribution. Is that correct? In this case you could use something like the Kolmogorov–Smirnov test. It might make sense to have that in Rand, although we currently only implement sampling (also see #290). |
Yes this is how I've ended up doing it now. Also in light of #290 and related issues, it seems the state of distributions in rand may be a bit in flux, so in my own code I'll likely mess around with some slightly alternative distribution sampling APIs, including things like
Not quite. I'm interested in (1) type-wise comparison of distributions (e.g. is it a |
It seems like you have a solution to your problem, and we still have a lot of work to do with distributions (contributions welcome in #290!), so I'm going to close this now. |
Given an arbitrary distribution w/ parameters, e.g.
Uniform
orNormal
, we might want to compare its parameters to those of another distribution of the same type. Depending on how we come across these structs, we might not easily have access to the underlying distribution parameters. I've thought of a few considerations:Is it worth implementing some way of comparing the distributions in rand?
How should comparison be implemented? Float types generally only implement
PartialEq
, so doesn't necessarily make sense forDistribution
to generally requireEq
, onlyPartialEq
. Alternatively, one could introduce a new trait with a methodfn dist_eq(&self, other: &Self) -> bool
to be implemented by comparable distributions. Even more simply, one could manually expose the distribution parameters on a per-distribution basis, e.g. addNormal::mean
andNormal::std_dev
functions, and leave comparison to the user.Related to (2), it might be possible that distributions with the same "public" parameters may be implemented with different "private" parameters that determine how samples are drawn from an RNG. Would this complicate the notion of comparing distributions?
Thoughts?
The text was updated successfully, but these errors were encountered: