-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Medium]Switching to an external/more powerful random number generator #184
Comments
Thank you opening this, and for comments elsewhere. Appreciate your time and effort, as well as the things you are exploring in the code. RANLIB has been around for a while to be sure, and review of our use of it is welcome. It may be that current practice in your field is ahead of ours. The important things for us about any RNG would be (1) speed, as we make heavy use of sampling, (2) a thread-safe RNG-per-thread one way or another, and (3) we want repeatability from a given seed; those last 2 are about thread-safety and repeatability; we want to generate deterministic results from given seeds. There are always discussions about "how random is random", especially with different seeds per thread, and whether we can improve on RANLIB in that respect is a good question. We then would typically do large numbers (10,000s) of runs with different seeds to look at the ensemble properties in terms of noise/stochasticity. |
Nice, thank you for your comment. I knew there were real people behind this :-) |
Oh and btw can you make sure I get a few thumbs up in the other issue about the original source :P we need to show these guys what's the reality behind all this. I knew it was something with grad students piling over with new features and never actually refactoring it... this is something i've seen too often in academia XD |
@weshinsley Oh yes, this library is used in multi-thread - single seed - Monte Carlo high energy physics code |
why not use e.g. std::binomial_distribution and std::poisson distribution etc? Looks like a step up from the current implementation and much simpler switch than going to CLHEP (which for large runs does make sense) |
I can't seem to see from the std::poisson reference page if they're static function or not. I'm not sure if they hold a state? The goal here is repeatability, so they need to hold their state. But I'm not familiar with them. It also seems to me that there are plenty occurences of their use here and there in the code... And I think that the minimum run holds 20GB of ram so in my idea all runs where big... But it's worth looking into, they might be more optimized. There are no references in the examples about seeding... Ohh... ok I see it now... well nice! |
I raised concerns about the std:: stats library a while back (https://stackoverflow.com/questions/34977728/c-gamma-distribution-returning-infinity/34981326#34981326) and trying to get a fix for that was a tedious and uncertain process; I got the "we can't reproduce this" response for a while, later an acknowledgement it was being looked into, but never got a clear answer on whether this was fixed or not. That wasn't in the context of this model, but there was no perceived benefit at that time of changing the RNGs, and if we had had any thoughts of moving to std:: at that time, we would have been increasingly cautious to do so as a result. Relying on open source libraries we can include in full, that are demonstrably very close to the maths has considerable value, and I think it remains to be shown whether there is a driving need for anything external or more powerful than the simple and classical that we are using. But that is what the open-source process is about, so we appreciate this discussion. |
@weshinsley I was looking into CLHEP more closely and I remembered how elegantly the units were implemented in Geant4. There is also a whole lot more than the random number generators. I think that at some point maybe some of your more mathematical phenomenons could be described by linear algebra stuff like fields or matrices and others that might be more appropriate for the specific job at hand, and that is very rigourously implemented in CLHEP... |
@weshinsley just to ask if you're aware of a reproduction of this result in any compiler more recent than VS2010? Is there a bugid for it? |
I just did a very quick test, to compare the ranf routine in this project with the generator which is recommended in Numerical Recipes (3rd ed, p. 342 which they call Ran). That generator appears to run at around the same speed as ranf: [steve@fogou test]$ ./rtest In both cases thats the time taken to generate 100,000,000 pseudo-random doubles in a uniform distribution between 0 and 1 when run on my laptop. On page 354 of Numerical Recipes, they give the code for a generator which they claim is faster, if you are specifically after floats between 0 and 1 only (the Ran generator uses mostly integer maths, and can generate psuedo-random integers too). This second generator is not as good as Ran though, it appears. I don't know whether it would make sense to use here, since I'm not quite sure what is "good enough" in this case, so I've not gone the additional step to test it and see how much faster it might be. I'm happy to share my test code if it is of help, but really all I've done is use the routine in the book. Note the earlier editions of Numerical Recipes have different recommended random number generators, so if you have an older edition, that will be different code. Hopefully that is a useful data point. |
@weatherhead99 - as reported in the stackoverflow issue, that was with 2016 compilers. No bugid because there was no public issue tracker to submit bugs against std:: at that time, only email support - also explaining my caution. But not helpful to suggest "clearly not thread safe". It is completely thread-safe in all our Intel Inspector tests, and the code makes sense to us. Submit an issue if you have a counter-example. @StevenWhitehouse - thanks, but beware licencing of Numerical Recipes. Although we all have the book on our shelves, the licensing is not at all simple, and restricts the open source usage we want. In any case, no strong motivation for us to change what we use. |
@weshinsley Yes, licensing can be an issue, although I'm sure we could figure out a way around that by finding a different/similar implementation if it was worthwhile. In this case though, based on the results that I've got, I suspect that any gains that way will be minimal. Also there is potentially a significant investment in time to check a new RNG meets the requirements too, which is not worth it without a significant gain in speed to offset that. So I think next time I find a spare moment, I'll start looking at the next level up, i.e. the consumers of the random numbers, and see if I spot anything there. At the moment I'm just spending a little time checking things that I hope might be useful to look at, but if you have a todo list or something like that, it might be helpful as a guide. |
Some more random thoughts... I moved my investigation on step up the stack to look at the callers of ranf() and there are a large number of them. So I started with the callers in Rand.cpp, which are generating various distributions. There does seem to be a lot of code duplication there, and in fact ranf and ranf_mt are almost identical, so there is scope for some clean up there. I also noticed what I think (but have not yet fully investigated) is duplication between snorm and gen_norm which appear to be different routines for generating the same distribution. Although gen_norm takes the usual mean and sd parameters, they are constant 0 and 1 in all callers, but the compiler should sort that out for us. So the next question was how fast are those routines, with gen_norm using the Marsaglia polar method and snorm looks like it is using a rejection method thats been translated from Fortran. At the moment I've not verified that both are doing the same thing, but the results from a quick test of their relative performance is interesting: [steve@fogou test]$ perf record ./ntest So snorm seems much faster than gen_norm, but is it? The answer turns out to be no, since gen_norm actually can generate two values per call, but currently doesn't. The net result is that it could be twice as fast as it is at the moment, so that seems like something worth looking into. An initial test added a static variable to store the second value, and return it on every other call, and then the performance looked like: [steve@fogou test]$ ./ntest So almost a 2x gain in performance. That isn't thread safe, of course, but it shows that gen_norm can be trivially speeded up. So next step was to make it thread safe. I looked at ranf and ranf_mt to see how that had been done, and the answer seemed to be that it was done badly. The Xcg1 & 2 variables are much larger than actually required, 512 bytes per thread. The CACHE_LINE_SIZE which is in bytes was being multiplied by the sizeof(long) resulting in a very large size. Also, Xcg1 & 2 for the same thread definitely do want to be in the same cache line! So having them separate like that makes no sense at all. So with all that in mind, I came up with a patch, which I'll now attempt to copy & paste into here :-) You'll likely not be able to make this apply if you copy & paste it from here, so I'll send it to anybody who want me to email it to them. It should give a rough idea of what I'm thinking though. Benefits are:
Caveat is that it needs careful review and testing. Also, I don't dare replace snorm calls with gen-norm until I'm 100% sure that they are the same, but there would be a small additional performance benefit if that was possible. Hopefully this is useful info :-)
|
@weshinsley for example the function SampleWithoutReplacement() is not thread safe, it uses a global pointer which does not have thread local storage nor is guarded by any synchronisation primitives. Lack of thread safety does not imply the appearance of non-deterministic thread related errors, such as would be imported by a tool such as Inspector. As far as I can see that particular example can't result in a deadlock or a logical flow related race hazard, merely possibly a data race. |
A random history... When trying to figure out how to improve something, it is always good to know where you are coming from, so with that in mind, I've tracked down some info about the current random number generator. Hopefully this info is useful... I have some further details that I hope to post in due course, but this seems to be a self-contained topic, and maybe it will be useful to others looking at this. The random number generator currently in use as ranf was originally described in this paper: L'Ecuyer's random number generator uses two independent LCGs which have been chosen to give a long period, the text claims circa 2.3e18. That seems like it should be more than adequate for this situation since the test data uses (I measured this) approx 1.7e9 random numbers for the longest simulation. So no concerns there. L'Ecuyer's generator is quite obviously popular, a quick google for some of the constant values reveals a number of other users. The one that seems to top the list (for me, anyway) is gnuplot. Since we mentioned Numerical Recipes above, it is worth noting that this generator appears in their 2nd Ed (On p.273, in my Fortran copy - I gave my 2nd Ed C copy away when I bought the 3rd Ed!) in combination with a Bays-Durham shuffle which was added to break up any serial correlations in the output, implying that they were concerned that the basic algorithm was not good enough. This algorithm was named ran2. By the time of the 3rd Ed Numerical Recipes, they are warning about the perils of RNGs which rely solely on LCGs due to the generated numbers lying on a limited number of hyperplanes (if one considers vectors of N consecutive random numbers as points in N dimensional space). Their newly recommended generator (Ran) combines a 64 bit LCG (post processed with a simple hash function) with a 64 bit xorshift generator (see this paper by Marsaglia https://www.jstatsoft.org/article/view/v011i05 ) and also an MWC, which are all combined at the output. It is worth noting there that the ranf generator only generates 32 bits of output, so although my quick test above suggested it was the same speed as Ran, since one could split the 64 bits from Ran into two 32 bit words (not something you'd want to do with an LCG only generator!) it is effectively twice the speed. There are clearly things that could be done to improve the ranf function, but whether it would make any difference to the quality of the results depends on how those numbers are used. I don't have enough understanding of what is going on at the next stage up the stack to be sure. Based on my research so far, though, it probably would make sense to change to something more modern. Probably a 64 bit LCG, plus output hash, in parallel with an xorshift would be enough. There is no need to use any of the NR code, since there are plenty of other references to these techniques and implementation is straightforward. Having done a run of the test through perf, the results suggest that however lightning fast the RNG might be, it will have little effect on the overall speed of the simulation. Approx 50% of the cpu time was spent in InfectSweep and only 2.71% in ranf in my test. The first few lines of the output from perf are included here in case it is useful: Overhead Command Shared Object Symbol........ ......... .................................... ..............................................................................................................
Anything below this line used less than 1% of the cpu time. Oh, and I did also confirm that snorm is basically the same thing as gen_norm(0, 1) having tracked down the paper about that too: While we are on the subject of papers, and noting also comments on SampleWithoutReplacement() above, the comment in the code claims that it is based upon the SG algorithm from this paper: A quick inspection shows that only the "else" part of the big if statement there appears to be related to the SG algorithm, and even then it doesn't look quite the same. I've not had a chance to check in any detail though, so there may be more to it... thats probably enough work on this topic for this weekend :-) |
@weatherhead99 - can you be any more specific on what you've observed? What global pointer do you refer to in SampleWithoutReplacement? The only non-local I can see is SamplingQueue[tn][...] - where tn is thread number, making it safe. |
I've turned some of the ideas above into a pull request. It doesn't fix everything, but since this will be my first pull request, I thought it was a good plan to show what I'd got so far and get some feedback before going too far down the line. Hopefully it is useful :-) |
Issue clean-up - closing comments:-
#455 discusses the potential (I think) of making a better interface so it is easy to pick/choose RNG - I suggest any other discussion on RNGs continues there. |
This weekend I will try to branch from your project in order to change the random number generation algorithms that seems to be a bit outdated. I will use this library: https://gitlab.cern.ch/CLHEP/CLHEP
and add it as a submodule to your git. And make sure the algorithm choice is a parameter. I might add a boost program options part to deal with parameters also... I haven't had time to check how they are handled. This might be meaning the removal of the Rand.cpp file and related methods, which would make it more modular and with a better overall quality. It might also positively affect performance since this library has been going on for a while and has nice optimizations. I don't know which C++ standard they use, so yeah it might also mean moving up to C++11 or more if it was not already there.
The text was updated successfully, but these errors were encountered: