-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random API #131
Comments
Recently I came across a twitter thread on randomness by @colmmacc and wanted to share it here. A short summary:
I hope this will help us design Kotlin random APIs in a way that prevents such common misuses. The original thread: |
On the kotlin forum there was just a post about adding shuffle for arrays as well. There is also an issue about this. I think it might be worth to consider to add those functions to this proposal. |
@voddan Thanks for the link. We have taken care of the implementation and API, so that our users could avoid these and some other pitfalls of random number generation. For example, there are overloads of If you like, you can review the implementation here. |
I would like to pitch for some random sampling methods such as
The benefit of having such functions in the stdlib is preventing developers from implementing naive-but-incorrect implementations themselves. For example, for
Another inefficient approach could be the implementation below, which is very easy to implement with the current proposal, and thus even more probable to be a source of problems.
As far as I understand, one of the correct and efficient algorithms for sampling is "Vose's Alias" algorithm, of which I am frankly totally ignorant, but you can read about it here http://www.keithschwarz.com/darts-dice-coins/ |
@voddan I think it's better to start another KEEP for this family of functions. |
@ilya-g IMHO sampling and shuffling functions should go in the same KEEP. Partly because how similar they are, partly because how easy it is to (inefficiently) implement one through the other. That said, maybe it is better to move shuffling out of the scope of this proposal, and combine it with the sampling APIs. |
@voddan shuffling is already supported in the standard library, just an overload that takes a particular RNG is missing in common. On the other hand, sampling is a completely new API with its own design questions. |
There's an interesting article about the performance and correctness of selecting random numbers from a range here: http://www.pcg-random.org/posts/bounded-rands.html See the conclusion for what is probably the best overall algorithm (in C, might be different for the JVM). |
@chrismiller Thanks for the article, it is very enlightening. Currently we use the "Debiased Mod (x1)" algorithm to select numbers from a range. "Debiased Int Multiplication" looks enticing, however we don't have the efficient 32-bit int multiplication that returns the upper part of the product in Kotlin/JS, not speaking of 64-bit ints for which there is no such multiplication even in Kotlin/JVM. We need to decide which one to use before releasing repeatable RNG, because different algorithms can produce different number sequences from the same source of random bits. |
I don't know if this can be used here, but it seems like there is a better algorithm than the currently used xorwow (which is part of the Xorshift family): Xoroshiro128+ and related
Apparently xorwow also fails a few tests in the BigCrush test suite. This is the authors site where they recommend |
Given everyone who participated in this discussion, I'm guessing that you'd all be interested in this proposal as well. Add a CSPRNG (eg. SecureRandom) to the Kotlin StdLib #184 I'd love your feedback. |
Might I suggest including a |
Can you please explain, why the two methods you present for taking random samples are "naive-but-incorrect"? "Vose's Alias" which you cited is an algorithm for getting random numbers with a biased (non-uniform) distribution. I cannot see how this fits into your example. |
The first one is incorrect because its slow and possibly non-terminating. When sample size is close to list length The second one is incorrect as it is very slow when list length is large and sample size is small. |
Of course, considering performance, the first is only recommendable for small subsets and the second one for large subsets. But they are both statistically sound. You indicated that they have a "different distribution than Random.nextInt" and I just wanted to point out that their results is just as good (or bad) as the built in RNG/nextInt. |
@quickstep24, @voddan was the one to suggest that it's a different distribution. I have no idea on that matter. |
Just looking for an update to see if this "bug" is gonna be fixed anytime soon. I am resorting to using java.util.Random or java.security.SecureRandom just wondering. |
Can you clarify what exactly you are interested in? Check out |
I just wanted to know if it was ever going to be crypto secure but you answered my question. Thx. |
Discussions about the Random API proposal will be held here.
Pull request: #132
The text was updated successfully, but these errors were encountered: