Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Mersenne Twister to generate random numbers. #329

Merged
merged 2 commits into from
Jan 16, 2015

Conversation

rhysd
Copy link
Contributor

@rhysd rhysd commented Jan 12, 2015

Crystal now uses LCG (in libc) as its random number generator. However, you know, LCG has a defect in terms of the period.

This PR replaces the LCG random number generator with Mersennne Twister(mt19937) random number generator. Mt19937 is employed in Ruby standard library because of the high-quality of random numbers which it generates.

I added Random class as Ruby's standard library and added Random::MT19937 class to implement mt19937. This is because I wanted to isolate the interfaces of Crystal's random number API and its implementation. In the future, other random number generators (e.g. xor128) may be implemented in addition to mt19937.

If you know the detail of mt19937, see:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.

I also added a test for mt19937 random number generator. It compares random numbers generated by Random::MT19937 with random numbers generated by an official C implementation.

The source code of C implementation is available in below.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

Crystal used LCG (in libc) as its random number generator.  However, you know, LCG
has a defect in terms of the period.

This commit replaces the LCG random number generator with Mersennne
Twister(mt19937) random number generator.  Mt19937 is employed in Ruby
standard library because of the quality of random numbers which it generates.

I added `Random` class as Ruby's standard library and added
`Random::MT19937` class to implement mt19937.
This is because I wanted to isolate the interfaces of Crystal's random
number API and its implementation.  In the future, other random number
generators (e.g. xor128) may be implemented in addition to mt19937.

If you know the detail of mt19937, see
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.
A test for mt19937 random number generator is added.  It compares random
numbers generated by `Random::MT19937` with random numbers generated by
an official C implementation.

The source code of C implementation is available in below.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c
@asterite
Copy link
Member

すごい!

This is indeed very needed in the standard library. I really like it that you included that big spec verifying the generated numbers.

I have a few comments/questions, but I'll accept the pull request and later we can change those things.

asterite pushed a commit that referenced this pull request Jan 16, 2015
Use Mersenne Twister to generate random numbers.
@asterite asterite merged commit fae352d into crystal-lang:master Jan 16, 2015

C.srand(Intrinsics.read_cycle_counter.to_u32)
def initialize(seeds = Array(UInt32).new(4){ Intrinsics.read_cycle_counter.to_u32 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the default seeds we could use:

StaticArray(UInt32, 4).new { Intrinsics.read_cycle_counter.to_u32 }

In that way no extra memory is allocated. What do you think?

It won't compile because later the method size is not found for StaticArray, but we can add it (or change the code to say length).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add size :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, StaticArray can be used here if the size problem is resolved.

@asterite
Copy link
Member

More comments, questions:

  1. Some languages provide a single Random class. For example: Java, C#, Ruby, Go. Others have many implementations, for example Rust and D. I think I prefer Java, C# and Ruby's way, because it's simpler for the programmer: I want a random number, I do Random.new.next or just rand, I don't have to stop and think which generator I need. If I really need a specific generator, then I choose it (maybe in another library, maybe it's in the standard library). But I don't know, I'd like to hear more opinions about this. Maybe we can have global rand that uses a default generator, and then many random implementations, but no default Random class. Alternativel, maybe we can have Random and Random::MT19937 but we can make Random.new return an instance of Random::MT19937 instead of having an engine. So many decisions to make :-)

  2. In every random implementation there is some paragraph concerning thread-safety. I tried to use the generator in this program:

THREADS = 20
TIMES = 1_000

# Generate 20_000 random numbers in v2 in parallel
m = Random::MT19937.new [0x123, 0x234, 0x345, 0x456]
values = (1..THREADS).map do |i|
  Thread.new do
    (1..TIMES).map { m.next_number }
  end
end.map &.join

v2 = [] of typeof(m.next_number)
values.each do |vs|
  v2.concat vs
end

v2.sort!

# Geenrate 20_000 random numbers in sequence
m = Random::MT19937.new [0x123, 0x234, 0x345, 0x456]
v3 = (1..THREADS * TIMES).map { m.next_number }
v3.sort!

puts v2 == v3 #=> false

I'm not sure, but I think the last result should be true. I don't know how to make it thread-safe, but since we don't have much support for concurrency right now we can leave this problem for later. In Rust they have a random number generator per thread (thread local) and rand uses that one, so that might be a good solution (of course, if you use m like above you would have to synchronize the access yourself).

k -= 1
end

# Use to_i because substituting 0x80000000 causes SEGV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment for? Could it be related to the SEGV issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to remove out-dated comment...
Yes it was a comment for the SEGV issue but it is no longer needed.

@asterite
Copy link
Member

Another question: should we include the copyright notice of the original source code? I saw it in Ruby and D's source code.

@jhass
Copy link
Member

jhass commented Jan 16, 2015

Actually Ruby sort of has two Random classes, Random, which is a PNG, and SecureRandom, which tries to make OS level random data generation methods available.

I think merging those into one interface and being able to choose the source upon initialization is a good choice. Kernel#rand should just delegate to an instance using the default engine, if you want something specific you do stuff like generator = Random.new(:mt19937) or generator = Random.new(:system) or even Random.new(MyCustomEngine.new) and then generator.rand, generator.hex(64) and stuff like that.

@jhass
Copy link
Member

jhass commented Jan 16, 2015

Oh, and other stdlib methods should utilize that interface:

some_array.sample # delegate to the same generator Kernel#rand uses
some_array.sample(Random.new(:system)) # use passed generator

Same for .shuffle and so on.

@papamarkou
Copy link
Contributor

I have done quite a lot of work coding the most up-to-date and faster version of the Mersenne twister under discussion here :) It was suggested that it shouldn't go into base but rather be maintained as an external package. Not sure why the older version of the Mersenne twister is regarded as more useful and seems to be preferred as more acceptable :)

@asterite
Copy link
Member

@Scidom Yes, and it's very appreciated. But the downside of that version is that we would need to include the C source code of that generator in the compiler, or change the Makefile to download that and then link it. The current generator (the one of this pull request) is written in Crystal so it has no dependencies. We could write dsfmt in Crystal, but I'm not sure we can just now because it might use SIMD instructions.

Also, this generator generates 10_000_000 numbers in 0.037 seconds, which I don't think it's too bad (In Ruby it takes 0.85 seconds). So having a simple, efficient enough, free of dependencies generator as the default one is good, I think. But later if you/us implement dsfmt we can replace the default one if it is much better :-)

@papamarkou
Copy link
Contributor

I have pretty much finished the implementation under scidom/random.cr; what is primarily missing at the moment is just to provide automatic seed creation (have been way too busy with some software I code for my department and that's why I haven't found the chance to do this yet). I agree with you, in general it is better to avoid inducing external dependencies; is the current PR based on an external C lib or is it enirely native? If the latter is true, then I agree it should be preferable.

I will try to complete random.cr and compare the speed of the two approaches in the near future. Happy new year :)

@rhysd
Copy link
Contributor Author

rhysd commented Jan 17, 2015

@asterite

Thank you for the merge!

  1. Some languages provide a single Random class.

As advance knowledge, random number generators have some trade-offs. For example, MT19937 generates very high-quality random numbers but its internal state is big and generating a number is a bit heavy. Xor128 generates good-quality random numbers and the generation is fast but it is difficult to choose initial seed. So it is good idea that users can select a random number generator.

However, as you said, it should be done by language or its library and it depends on a judgement of implementer of the language. So I think it is good to follow your idea 'I don't have to stop and think which generator I need.'. For example, Ruby has the same idea and Ruby's random module has only mt19937. Matz thinks that other RNG should be implemented as a library. And D implements many RNGs because it intends to be used for things where choosing RNG is a severe problem (e.g. games).

  1. In every random implementation there is some paragraph concerning thread-safety.

Random number generator has its internal state and updates it when generating a number. So, using the same generator by multiple threads causes a data race. In order to make all number generations work well in multi-threads, generating number must be guarded by mutex.

I think, singleton method rand, srand, Random::rand and Random::srand should be thread-safe because they use DEFAULT_RANDOM and they should be used casually from multi-threads. However, instance methods Random#rand and Random#srand should not be thread-safe and the thread-safety of number generation should be guaranteed by the programmer. It is because locking the entire generation of random number may make performance issue. If making the instance of random number generator, I think it is good idea that programmer guarantees the thread-safety of the generator.

Another question: should we include the copyright notice of the original source code

I referred this C implementation heavily and its license says:

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

and the condition says:

  1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

It means that below copyright notice is needed, I think.

Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.

@jhass

Your idea 'merging those into one interface' looks good for me. I don't know why Ruby's module is isolated like Random and SecureRandom...

@Scidom

Oh, sorry that I don't know your work. I'm interested in your implementation of mt19937. I think your good implementation should be merged into this PR's implementation. Generally, I think a specific language's library should be written by its language because improvements of the language reflects its library directly.

@papamarkou
Copy link
Contributor

@rhysd, I agree with your opinion; it is better for many reasons to implement std lib functionality natively (in Crystal in our case). At the moment, the dSFMT implementation is a wrapper around the corresponding C library. Nothing stops us from actually implementing dSFMT natively; it would only take some more studying of the original paper, but this would be good anyhow because it would result in a better understanding of the underlying algorithm.

@papamarkou
Copy link
Contributor

One more thing @jhass; if you are interested in adding one more random generator that is oriented towards safety, therefore being more suitable for web dev, you may look at the Isaac generator. It is fairly easy to implement it in any language (Rust has a native implementation of it).

@sdogruyol
Copy link
Member

Just learned about Mersenne Twister today. Thanks a lot @rhysd 🙏 We really miss you 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants