The Twisted Road to Randomness

A Topic Revisited, a Code Revised


by Scott Robert Ladd
4 January 2004
 

Index
  Randomness by Deviation
  Randomizing Algorithms
      Linear Congruential
      Mersenne Twister
  Seed Values
  Implementations
      C++
      Fortran 95
      Why No Java?
C++ Downloads
  libcoyotl-3.0.0.tar.gz Fortran 95 Downloads
  mtprng.f90 (generator)
  stdtypes.f90 (types)
  test_mtprng.f90 (testing)

I originally presented this article in early-2002; in the time since, my implementations of the Mersenne Twister have evolved along with my needs and new knowledge. I've updated this article to reflect major changes in the code, including a vastly improved Fortran 95 code.

Random numbers are a serious subject, and deserve careful consideration. Genetic algorithms, games, statistical tests, and simulations all rely on a program's ability to generate "random" numbers. Unfortunately, most programming languages included built-in "random number" functions that prove inadequate for applications where lots, and lots (and lots!) of random values need to be generated. Even a simple genetic algorithm requires thousands of random values -- and some of my applications use billions of random numbers in their calculations. In such circumstances, a random number "generator" must not produce repetitive or cyclical values.

To understand the issues at hand, let's begin by looking at why I use quotes around words like "random" and "generator".

Randomness by Deviation

By nature, a "random" value cannot be predicted. Randomness sounds simple, but it isn't -- and while the human mind has been known to be unpredictable, it isn't very good at generating a completely unrelated set of numbers. Try creating a list of twenty random integers in the range 1 through 100. Are you certain that your numbers are really random, and not simply fragments of old telephone numbers or checkbook balances? And wouldn't it be tedious if you had to generate a thousand -- or a million -- random numbers?

Such repetitive tasks seem well-suited to a computer program -- until you consider that programs consist of algorithms, and algorithms, by definition, produce a predictable output based on a given set of parameters. The phrase "random number algorithm" is a bit of an oxymoron; truly random numbers can't be generated by algorithm, since the last thing we want is something predictable!

The best we can do with a computer is create an algorithm that appears to generate a random sequence of numbers. Such pseudo-random numbers (PRNs) aren't really random -- a human with a sharp mind or a calculator could predict the numbers in the sequence by following the process defined by the algorithm. But a pseudo-random sequence of numbers can be carefully constructed as to be very "unobvious" and very difficult to follow; a human looking at the values should not be able to see any pattern to them. While we can't have what we really want (true random numbers), a good pseudo-random number generator (PRNG) will suffice, both statistically and aesthetically.

Herman D. (Skip) Knoble, a Research Associate at Penn State University, suggested that I use the term "quasi-random" in place of "pseudo-random". To quote from his e-mail: In your paper, the use of the phrase "pseudo-random" may better be served by the phrase "quasi-random". This is not my idea, but rather it was attributed to Tausworthe quite a while back in: Collings and Hembree, JACM 33(4), Pages 706-711, OCTOBER 1986 (or was it Lewis and Payne, JACM 20(3), Pages 456-468, July 1973).

In my opinion, "psuedo" is a better term, as "quasi" already has a meaning in reference to generated sequences. John Savard (who runs an interesting site of his own) wrote me to say: The book Numerical Recipes gives examples of quasirandom number generating algorithms; these are sequences that are in some sense more smoothly equi-distributed than random numbers; thus, they deliberately fail some statistical tests, in order to more uniformly fill a sample space for short Monte Carlo runs.

Different applications require different "flavors" of PRNs. A cryptographic application, for example, requires that xn + 1 cannot be predicted from xn. Other "random" algorithms include Monte Carlo methods, random bit generators, and scores of specialized sequences for particular applications.

In this article, I'm concerned with the needs of stochastic algorithms, games, and simulations. For those applications, a useful PRNG produces a sequence of values such that every number in a given range has an equal chance of being produced. This is what numericists refer to as a uniform deviate.

"Randomizing" Algorithms

Most programming languages implement a PRNG via a pair of functions. One function initializes the generator with a seed value, providing a starting point for a calculated sequence; the other function performs a series of mathematical operations on the seed, generating a pseudo-random number. The result of the second function is returned and becomes the next seed value.

Many researchers have devoted copious time to inventing and analyzing algorithms for this purpose, with the goal of producing the most unpredictable sequence of values. Designing a good random number generator involves solving two problems:

  • Increasing the period (size of the repetition cycle.)
    As the algorithm is applied, the seed will eventually return to its starting value, and the values begin repeating themselves. An algorithm that repeats after generating a trillion numbers is more useful than an algorithm that repeats itself after only a hundred values.
  • Avoiding predictability.
    An example of a useless random number generator is one that always returns values with the same last digit. In general, any obvious patterns in the output render a generator worthless for stochastic (non-deterministic) algorithms.

Two algorithms fit the both of those criteria; one is very traditional and highly-studied, and the other algorithm is a very recent invention that may just be the best pseudo-random number generator around. Let's start with the traditional technique.

The Linear Congruential Method

Though many fancy and complicated algorithms can generate pseudo-random numbers, the most commonly-used algorithm is also one of the simplest. First introduced by D. Lehmer in 1951, the linear congruential method involves only two multiplicative operations. The formula is:

Ni+1 = (a · Ni + c) mod m

This is an iterative formula: For any pseudo-random number, Ni, in the sequence, this formula shows how to compute Ni+1, the next number in the sequence. Different values of the parameters a, c, and m produce different sequences. The choice of parameters determine the "randomness" of the generated values and the number of iterations that can be performed before the values start to repeat. The maximum repetition period is m, but not every combination of a, c, and m produces a maximal period; in fact, most factor sets produce useless sequences. For example, if you set all three parameters to one (1), the algorithm will simply count by ones!

Numerical research by S. K. Park and K. W. Miller has identified a theoretical "best" set of parameters. For the linear congruential algorithm to be effective, a and m can take on only a very few values, with m most certainly being prime. Park and Miller identified the parameter values a = 16807, m = 2147483647, c = 0 as producing the most statistically-random values for 32-bit signed integers. For producing 16-bit values, a good set of parameters is a = 171, m = 30269, c = 0. If an application requires 32-bit numbers, Park and Miller suggested a = 42871 or a = 69621.

If N is large enough, multiplying by another large value exceeds the maximum value of a 32-bit integer, causing an arithmetic overflow. To prevent overflow, we can use an approximate factorization of m, based on the formula known as Schrage's Method:

q = m / a
r = m mod a
k = ni / q
ni+1 = a * (ni - k * q) - r * k

With c set to 0 (zero), the linear congruential algorithm becomes (a * n) % m. To eliminate the problem of overflow, Schrage's Method precalculates the constant ratio of a / m, and uses that factor to reduce the size of both the seed and the constants.

Elaborations on the linear congruential method exist, including various techniques for combining a pair of generators to produce longer sequences. I've implemented Pierre L'Ecuyer's algorithms in many of my books -- and these techniques are, indeed, effective in producing large sets of uniform deviates.

A terrific source of information on random numbers is pLAB, a website maintained by Peter Hellekalek at the University of Salzburg's Mathematics Department.

The Mersenne Twister

So am I going to present implementation of L'Ecuyer's algorithms here? No.

In 2001, I was introduced to a quite different algorithm out of Japan... and it has worked exteremely well for me, even if it hasn't the long analytical history of the linear congruential generators. The Mersenne Twister was invented by Makoto Matsumoto and Takuji Nishimura; their website includes numerous implementations of the Mersenne Twister, and I direct you to their original paper (postscript, pdf) for the formal description of the algorithm.

Essentially, the Mersenne Twister is a very large linear-feedback shift register. The algorithm operates on a 19,937 bit seed, stored in an 624-element array of 32-bit unsigned integers. The value 219937-1 is a Mersenne prime; the technique for manipulating the seed is based on an older "twisting" algorithm -- hence the name "Mersenne Twister".

John Savard provides an excellent description of the algorithm here.

One of the appealing aspects of the Mersenne Twister is its use of binary operations -- as opposed to time-consuming multiplication -- for generating numbers. The algorithm's period stems from the size of it's shift register -- 219937-1 (~106001), which is quite large when compared to a period of ~108 for the best variants of the linear congruential methods. For most practical purposes, the Mersenne Twister doesn't repeat itself. And the basic algorithm generates 32-bit integers, which provide a greater granularity than 16-bit linear congruential generators.

Granularity allows finer distinctions between individual values and is particularly important when dealing with floating-point numbers. PRNGs output integers; to obtain a floating point number -- say, between 0 and 1 -- divide the random integer by the maximum possible value for that size. When using a 16-bit PRNG, the biggest difference in floating-point values is 1/216, or 0.000015, while a 32-bit PRNG provides a granularity of 1/232, or 0.00000000023. While a 16-bit generator is really only useful for the 7-digit precision of a 32-bit IEEE floating-point number, a 32-bit PRNG produces values suitable for the 64-bit IEEE "double" type. Smaller granularity means a broader range of results, which is required by stochastic algorithms that solve large problems to a fine degree.

Seed Values

All PRNGs start with a seed, some initial value that initializes the algorithm. For each unique seed value, the algorithm should produce the same sequence of PRNGs, repeating a sequence for testing or comparison purposes. For games and other tasks, it is useful for the seed to be an unknown value, such that a new sequence is produced for every execution of an application. The more unpredictable the seed, the less likely it is for a sequence to be predictable or repeated.

Of course, if we haven't initialized the PRNG, where do we get a "random" seed from? The current time, expressed as some sort of integer, is the simplest and most common seed value. However, the granularity of computer clocks may leave something to be desired; for example, many programming languages only report the time in seconds. A fast application might request two seeds in less than a second, producing two identical random sequences. Such problems might seem unlikely, to be sure -- but I've had it happen to me, when using two different PRNGs in the same algorithm, with identical seeds derived from time values.

On some Unix systems -- including Linux -- a useful source of seeds is /dev/random and /dev/urandom. Implemented by the random device driver, these psuedo-files contain values derived from stochastic sources such as keystroke timings, maintaining an "entropy pool" used to generate random values. The /dev/random device returns random bytes based on the available bits in the entropy pool; if the pool is empty, reading /dev/random will block until bits become available. If you require an unlimited set of values, /dev/urandom uses an algorithm to generate as many random bytes as you request. In general, use /dev/random for obtaining seeds.

View the manual pages for /dev/random and /dev/urandom with this command:

$ man 4 random 

Implementations

Most programming languages provide PRNGs of varying (and usually low) quality. I remember one BASIC interpreter that produced alternating odd and even numbers! Below, I detail my impressions of the random numbers generators included with C++, Java and Fortran. I've implemented the Mersenne Twister in C++ and Fortran 95; while there already exist versions of the algorithm in those languages, none of thesemet my criteria, and some of the implementations contain language and logic errors.

If there is sufficient interest, I'll translate my code into other languages, such as Python.

C++

Standard C (and thus Standard C++) explicitly defines the following linear congruential generator for implementing the rand and srand functions:

static unsigned long next = 1;

int rand(void)
{
    next = next * 1103515245 + 12345;
    return ((unsigned int) (next / 65536UL) % 32767UL);
}

void srand(unsigned int seed)
{
    next = seed;
}

That algorithm is a slight elaboration on the basic linear congruential algorithm, in that it uses a long (32-bit signed integer) for the seed, but returns only a positive int (16-bit signed integer).

I generally don't use rand in serious applications, for many reasons.

  • Since srand and rand are two separate functions, the "seed" is a global variable. It's best to avoid global variables, even those that can be hidden using the static keyword.
  • With only one seed value, only one sequence of pseudo-random numbers is generated in a program. Often, I like to have separate random number generators for different elements of a program.
  • The ANSI rand function returns values between 0 and 32,767. In most cases, I want to retrieve random values that are within a specific range, say from 1 to 100, or between 0.0 and 1.0. Dividing the result of rand() by 32,767 produces floating-point values with a granularity of approximately 0.00003 -- much too large for many applications. 
  • Some mathematically-inept compiler vendors try to improve on rand, using cute little byte-swapping tricks that only reduce the period of repetition!
  • Statistically, even the best linear congruential generators suffer from convergence in their numeric sequences. The ANSI generator is not the theoretical best.

I've implemented the best linear congruential generators in my books -- but in my future work, I intend to use the Mersenne Twister because of it's long period and performance. The source files mtprng.h and mtprng.cpp define and implement a class-based generator that I'll be using in articles on this site.

Fortran 95

You'll find my Fortran 95 implementation of the Mersenne Twister in mtprng.f90; you'll also want the support file stdtypes.f90, and the test program test_mtprng.f90.

In early 2002, I wrote my first Mersenne Twister in Fortran 95; I had just begun working in Fortran again after a decade-long hiatus. I'd spent most of the 1990s working in C, C++, and Java -- but my career began with Fortran, first on DEC PDP-11 system and later on the VAX. Given the limited options for Fortran employment and the lack of interest by publishers in books about "obsolete" tools, it took parallel programming to bring me back into the Fortran fold.

I'm a programming polymath and agnostic; in my opinion, the best tool for the job should be used, based on available expertise and technical considerations. The current trend toward forcing every application into Java or C++ is simply wrong and ignorant; much as I respect and like C-derived languages, they have inherent weaknesses in expressiveness and logic that can't be easily glossed over with fancy coding and extensions. I see no point in writing arcane template code in C++ to accomplish a task that can be described succinctly and efficiently in Fortran 95.

Yes, I'm impressed by Fortran 95, and I'll likely write an essay about why it and other languages need to be promoted, used, and preserved for the good of computing in general. However, this article is about the Mersenne Twister, and it is to that topic I return.

In the last two years, I've written a lot of Fortran 95, and my understanding of the language has improved. My original implementations suffered several faults, one of which constituted a bug. For example, the original module has a single state; the current version employs a state structure to allow the coexistence of multiple generators.

Also, my original code lost some precision and may have had other errors due to my poor use of Fortran's signed integers. Fortran 95 specifies a symmetric integer model, where every negative number has a positive counterpart. From a mathematical standpoint, this makes sense -- yet it also renders the smallest negative two's complement value (2b-1) invalid. In Fortran 95, -2,147,483,648 (-231) is not a valid 32-bit constant (although it can be a valid integer value). The reference implementation of the Mersenne Twister, however, is written in C, which imposes no such restriction (other than to say that abs(-231) is undefined). This is perhaps a cosmetic matter, and the same value can be attained by subtracting 1 from -2,147,483,647) -- however, Fortran also declares that binary operations on the sign bit are platform-specific.

I was unaware of those issues, and blithely implemented a Mersenne Twister using the constant -2147483648. And for over a year-and-a-half, the code worked, with both Intel and Lahey compilers. I note that other Fortran implementations also use such out-of-range values, or they implement special bit manipulation routines.

Then I involved myself in the GNU gfortran project; gfortran rejected the constant -2147483648 as invalid. Indeed, the "free-as-in-liberty" compiler used a strict interpretation of the Fortran 95 Standard (section 13.7, to be precise). Faced with both forum debates and uncertainties, I reexamined the Mersenne Twister module in detail, and, in the end, largely rewrote it. I modified the code to use a state structure, such that I can use several generators at the same time; I also fixed a few stylistic problems, and integrated the code with my forthcoming Hypatia library.

I solved the integer range problem by storing the seed components in the lower 32 bits of 64-bit integers. Storing the seed in 312 64-bit elements was not a solution, since I would then have had to work around the invalid constant -263. Nor did I want to implement additional bit manipulation rotuines that obscure the underlying algorithm. In the end, the 64-bit elements merely waste a very few bytes of memory in exchange for algorithmic clarity and simplicity. Performance remains excellent.

Why No Java?

The java.util.Random class defines a good linear congruential generator based on Donald Knuth's presentation in The Art of Computer Programming, Vol 2. Random describes self-contained objects, so multiple generators can exist simultaneously.

Java, like Fortran 95, does not implement unsigned integers, although it does support shift operations that treat signed values as unsigned. Still, I haven't had much incentive for implementing a Java version of the Mersenne Twister. I don't do much experimental work in Java these days; for the most part, my Java work is web-centric or business-oriented, where random numbers just don't come into play.

Onward

Software development is often a choice between "rolling your own" code and using default tools provided by compiler or toolkit vendors. Programmers, creative beings that they are, find it tempting to create something "better", even when such effort provides minimal benefit at the expense of extra development time. In some cases, though, standard libraries and language definitions can be inadequate for a given task, lacking capacity, performance, or capability. The trick is in balancing development time versus real, identifiable needs in your software.

In the case of random numbers, I found several inadequacies in the standard generators for C, C++, and Fortran, leading me to create the PRNGs presented here. I've been quite happy with the results, and will be using these tools in the forseeable future.

As always, I'm interested in your comments.

- Scott

 
Send E-mail

Consulting Services
Scott's CV

FAQ
Scott's Books
Reviews
Bibliography

Privacy Policy
Legal Stuff



©  2008
Scott Robert Ladd
All rights reserved.
Established 1996