Home
Blog
Pseudo-random number generators: Overview Analysis
Prime Numbers
Cisco Passwords
Bandwidth Calc.
Epoch Time Conversion
2-letter country codes
Comms line speeds


Off-site links
Squish's DNS Checker

PSEUDO-RANDOM NUMBER GENERATORS



(REPORT II)



A report submitted in partial fulfilment of the requirements for the degree of Bachelor of Science

Z. O'Connell (950786535)



Department of Information Systems and Computing,

Brunel University

June 1999



Table of Contents



Table of Contents 2

1 Background 4

1.1 Definition and Scope 4

1.2 Objectives 4

2 Overall design 5

2.1 Structure 5

2.2 Files 6

2.3 Testing 7

3 Testers 8

3.1 Introduction 8

3.2 API 8

3.3 Library routines 8

3.4 Tests implemented 9

3.5 Unimplemented tests 12

3.6 General observations and conclusion 12

4 Generators 13

4.1 Introduction 13

4.2 API 13

4.3 Library routines 14

4.4 Implemented generators 15

4.5 Unimplemented Generators 19

4.6 General observations and conclusion 19

5 Front-ends 21

5.1 Tester 21

5.2 test-static 21

6 Conclusions 22

References 23

Appendix A - Evaluation 24

A.1 Project Overview 24

A.2 The Approach 24

A.3 Possible Future Work In The Area 24

Appendix B - Tables of results 25

B.1 Empirical test results 25

B.2 Speed test results 26

Appendix C - Source Code 27



1 Background



1.1 Definition and Scope



There are many known algorithms for generating numbers which are, or at least appear to be, random. There are also a large number of tests to check that what is produced by such generators does closely resemble a random stream. There is not, however, a currently freely available suite of generators and testing routines allowing the user to eaisly define new generators and testers for experimental purposes. This project implements the framework for such a system.



It would be impossible in the timescale to study every possible generator and test procedure available. The project does however implement the most common of these and it should be very simple to add new libraries as required.



1.2 Objectives



There are two main objectives:

To produce a portable set of pseudo-random number generators that can be easily used as part of any application in a variety of programming languages

To test the random number generators for speed and randomness



The main consideration when writing these generators is portability and usefulness. Although in many cases the generators are coded in such a way that they are slower than they could be trying to speed up the generators would make the code less readable. The main suite of random number generators/testers currently available, Diehard, [9 ] consists mostly of highly-optimized FORTRAN code which has then been automatically converted to C. Even with a detailed knowledge of how the generators and testers work it is usually impossible to figure out how the code works or adapt it for other uses.



To achieve the objectives, we can break the problem down into three tasks which need to be considered:

The generator library and a well-defined API (Application Programmers Interface)

The testing library and a well-defined API

A front-end that can be used to test the generators



2 Overall design



2.1 Structure

Each generator or tester is presented as a stand-alone file that, with the exception of combination generators, does not rely on other generators. These libraries are dynamically loadable at run time or can be compiled into a program, allowing the choice of generator to be specified by the programmer or end-user as desired. There is also a central library of functions used by all the libraries to avoid excessive duplication of code.



The code is all written in C. There are several reasons for this choice of language. Firstly, it is fast. Most other languages are harder to optimize and perform bounds checking on arrays and variables which, while normally desirable, will slow the generators down quite significantly. Secondly, it is portable. C code can be compiled on almost any computer currently in existence. This allows most of the generators, except combination ones which rely on dlopen() as discussed below, to be compiled on virtually any platform. Finally, C code is easy to build into libraries and most programming languages allow C code to be called from within them.



The development platform is Unix, (Linux primarily, also tested on Solaris 2.5.1) however as the code is written with portability in mind and it should be easy to compile on other systems with only minor modifications. There are two notable exceptions to this - dlopen() and gmp. Dlopen() is used to load libraries dynamically and is only available on some systems. It is used primarily by the front-end tester but also by some of the "combination" generators. The combination generators could be modified to link in their component generators statically although this would take quite a bit of work to allow all the generators to have "alternative" function names when used as component generators. Combination generators perform poorly both in terms of speed and randomness however so this doesn't seem worthwhile.



The ICG generator uses the GMP GNU multi=precision math library for doing inverse multiplication. Although this library will compile on the majority of systems it is not generally installed as standard.



Both the generators and testers work on bit-streams rather than floating point numbers. Although most older random number generator systems generate floating point numbers they are harder to analyze and reliance on floating point math would slow down the generators. It would be trivial, if required, to convert the output from this API to a floating point number. Care should be taken to ensure that the granularity of the numbers obtained from a generator is sufficient however - clearly taking a single bit at a time is going to produce poor results if three decimal places of randomness is required.



2.2 Files

Makefile Instructions for building the programs (See below)

*.c Source code

*.so Compiled dynamic-link libraries

librnd_name.* Generator "name" (See section 4.4)

libtest_name.* Tester "name" (See section 3.4)

random.h Global function definitions, included by every generator.

librandom.* Useful routines used by many generators and testers. (See sections 3.3 and 4.3)

tester Simple testing program that loads in generators and testers dynamically. (See section 5.1)

test-static Statically linked example showing the display tester and the Mersenne Twister generator. (See section 5.2)

random.dat Random source used for the file generator and as a seed by other generators. (See below)



Each generator/tester is a separate source (.c) file and compiles into a separate object (.o) or library (.so) file.



2.2.1 Makefile

The Makefile is used to compile the code by simply typing make on a Unix system. make -k should normally be used to force make build all the libraries even if some fail as the ICG generator will only compile if the gmp library is present. There are only two things that need to be changed in the Makefile in the normal course of events. The first is the CCOPTIONS line. Normal options would be -O6 -Wall, indicating full optimization and all compiler warnings. Other useful options are -DVERBOSE to turn on extra output in generators and testers and -g instead of -O6 to enable debugging symbols in the output files for use with debuggers such as gdb. -DQUIET can be used to suppress all output from testers. This is for use when testers might want to be included in a larger program although isn't the most common case and therefor isn't the default option.



If desired, the libraries compiled into test-static can be easily changed here, demonstrating how easy it would be to incorporate a new generator as part of a program using this API. Debuggers don't handle dynamic linking particularly well so in most cases it is eaiser to link a specific library statically as part of test-static for debugging.



Typing "make clean" will delete all the compiled libraries and executable.



The variable LIB_GCC is a workaround for an apparent bug in the gcc and egcs compilers. For some reason, under certain circumstances involving dynamic linking not all the correct functions for unsigned long long math are included in librandom.so. Adding the libgcc library specifically resolves this problem.



2.2.2 Random.dat

For a "true" random source and as a seed for some generators, this file was downloaded from a hardware source on the Internet. [10] The web site doesn't allow large downloads in one go, so around a hundred separate files were downloaded and concatenated together to produce one larger file.



2.3 Testing

Although the generators are tested by the testers, the testers themselves need to be validated to ensure they give the expected results. To do this, we can use generators known to be "perfect'', such as cryptographic generators, which should not fail any statistical test. There is a small probability that a truly random source will produce non-random output however this possibility is very small given the amount of data being generated and this problem was not encountered.



There are also certain expected results for most tests, as almost every paper discussing a testing mechanism gives several tests and the results obtained by the author. However,there are many ways of implementing certain tests and some generators do fail tests they have been shown to pass in other situations.



3 Testers



3.1 Introduction

The testers are presented as separate libraries which can either be linked in with a program at compile-time or, if supported on the specific platform, dynamically selected and loaded at run-time. The output is a boolean pass or fail indication for a particular generator, although all the testers produce additional human-readable text to showing what results were produced. It is not expected that testers will be called automatically by programs using the libraries, however in same cases this may be desirable. Some applications may wish to produce a different set of random numbers each time without saving the state of the generators between each run. However, some "seed" values for generators may produce poor output. A good tester, such as Maurer, can be quickly applied to a particular generator with a certain seed value to check if it produces sane output.



3.2 API

There are only two functions required for the tester modules:



int rantest_test(void *state, getbits function)

char *rantest_info_text(void)



Rantest_test takes a pointer to a generators state array and a pointer to a getbits function and produces a single boolean result to indicate pass or fail. All of the testers produce some sort of diagnostic output however giving actual p-values where appropriate, and if VERBOSE is defined in the Makefile they will usually give details of the internal counters. Verbose output is usually unlabelled however and only really interesting for debugging purposes.



Rantest_info_text simply copies a descriptive string into scratch and returns a pointer to that. It is implemented this way, rather than by a static text string in the generator, so that extra dynamic information on the test library could be output, as with the generators, although this is never actually used.



3.3 Library routines

There are two routines in librandom.c called by testing routines. These are:



double ChiSqPN(unsigned long df, double x)

double KolSmirP(unsigned long df, double x)



ChiSqPN takes degrees of freedom and a value and calculates the p-value for the Chi-squared distribution from x with df degrees of freedom. The code is adapted from poorly documented Javascript by Ritter [8] as no other readily-implementable formulae or algorithms were available. This routine works by normalising x and then passing that to NormalP which actually calculates the p-value for the normal distribution using a fairly simple formulae.



KolSmirP is also adapted from code by Ritter but is far more complicated and difficult to understand. Similarly to ChiSqPN it calculates the p-value for the Kolmogorov-Smirnov distribution from x with df degrees of freedom. Unfortunately, the Kolmogorov-Smirnov distribution has even less information available about it than the Chi-squared distribution and the source of the underlying algorithm isn't known.



3.4 Tests implemented



3.4.1 Display tester

This "display" test simply prints the first 100,000 16-bit numbers from the generator being "tested". This is for manually checking the output of a generator is sane, (No leading zeros/all zeros output) doesn't have an incredibly short period and for manually checking that a generator is iterating according to the correct formula.



3.4.2 Speed tester

The speed test measures how fast the generator can produce 2,000,000 16 bit numbers and the results are presented in Appendix A. Three platforms were measured, Intel, Sun and Alpha, although not all platforms were directly available so only Intel results are given in full. The test is slightly unfair in that generators that produce results in multiples of 16 bits will perform better as rangen_generic_getbits is faster when bits are being requested in multiples of the generator bits per iteration. The reflects how the generators would probably be used in practice however. There are few surprises here, the generators perform as would generally be expected and in roughly the same order on all platforms. The file generator appears slightly faster than all zeros on Intel because it loads a 64-bit value into STATE->output every iteration whereas the zero generator only loads 32 bits a time.



The tester actually measures the CPU time elapsed for the process rather than the actual time elapsed so even on a fairly heavily loaded machine results will be accurate. On faster generators, however, the granularity of the system clock seems to cause a problem and affect results by about 5%. The compiler used can make a more significant difference and can affect results by as much as 50%. On all platforms, the tests used the gcc compiler. On the sun platform however the sunpro compiler was found to be between 30% and 300% faster than gcc for various generators.



The speed of a generator can heavily affect the choice of generator for an application - a generator such as the linux /dev/random, whilst highly random and unpredictable, is far too slow if tens of millions of random numbers are required.



The speed test always passes a generator.



3.4.3 Chi-squared

The simple statistical chi-squared test fails a surprising number of generators given that it is independent of ordering. We test ten million 16-bit values however, which will tend to cause any generator with a period lower than a few million to fail badly. Almost every generator which fails another test also fails Chi-squared although this is more likely to be coincidence that poor generators also have shorter periods that are detected by Chi-squared. Generators with short periods fail because slight deviations from the expected distribution are exaggerated as they loop and give the same values again.



Chi-squared is a two-tailed test so a low value is as bad as a high one - LCG 2 fails as the values are too close to the expected distribution to be random. A generator passes this test if 0.005 < p < 0.995 - i.e. it passes at the two-tailed 1% level.



Because this generator is "two-tailed" it detects sequences that are "too random" - i.e. that fir the expected distribution improbably well - as well as those that are not random enough. LCG 2 exhibits this property - it gives a Chi-squared value of zero.



3.4.4 Kolmogorov-Smirnov

The KS test is really just a weaker version of the Chi-squared test and fails most of the generators that Chi-squared picks up. Each result is one-tailed - I.e. it only fails if too high not too low - but two values are produced and the test fails if either are too high. This tester is weaker than Chi-squared primarily because calculating the p-value for ten million degrees of freedom is too time-consuming computationally whereas the million degrees of freedom used are much faster. This test and other tests based on it passes the test if p < 0.99 - i.e. it passes at the one-tailed 1% level.



3.4.5 Gap test

The gap test measures the distributions of gaps between occurrences of a number fits the expected distribution. This test actually measures using chi-squared the distribution of gaps between each possible 8-bit number individually and then performs a Kolmogorov-Smirnov test on the results. Because of this the test proves to be very powerful - it picks up flaws in generators that every other test passes. LFSRs and FCSRs failing is a surprise, although looking at debug output by uncommenting the commented out code lines in the tester shows that the two linear feedback generators are failing due to extremely long gaps between certain numbers. (This produces a lot of output)



3.4.6 Poker test

The poker tests measures occurrences of the same digit in a four-digit hexadecimal number generated by the generators. Because we are doing this test on bit streams rather than floating point numbers, any non-randomness here should also be picked up by the Chi-squared test and this is indeed the case.



3.4.7 Runs up/runs down

This test proved very difficult to implement as Knuth's description for calculating the matrices required for this test isn't clear. The matrixes, a and b, are hard-coded into the program however as they do not change when other parameters of the tester change. This tests fails weak generators as expected and is reasonably powerful. One surprise is that it fails the FCSR generator, which no other tester other than the gap test fails. The two results this tester produces are Chi-squared vlues for runs up and runs down.



3.4.8 Runs above/below the mean

This is a very weak test and fails only the poorest of generators. This generator produces four outputs - a pair of KS results for each of runs above and below the mean. This test doesn't fail the zero generator because of a peculiarity in the formulae which will only occur when the entire input is above or below the mean.



3.4.9 Maurer's Universal Statistical bit test

Maurer's test almost lives up to his claim of being able to detect any flaw in a generator detectable by other means although fails to spot flaws spotted by the gap tester. Unfortunately, his original paper lists an incorrect expected value when calculated on 16-bit numbers, so the test was conducted only on 8-bit. Maurer's paper lists the expected value as 15.1670.001 whereas experimental evidence strongly indicates a value closer to 15.159. The version of the code implemented is a hand-converted C version of Maurer's Pascal code [5] and performs flawlessly on 8-bit numbers so the error appears to be in the expected values provided. The error appears to be mathematical, not typographical, as the same value is repeated elsewhere within the paper. Unfortunately, calculating the value to the required precision for testing a million random numbers (Maurer's code only tests 20,000 8-bit numbers) requires 200,000 iterations of the formula which to achieve in a reasonable amount of time requires more computing power than readily available. (A bc script to calculate it had not completed after over two days of running on a PPro200 when the machine was shut down)



3.4.10 File tester

This tester simply writes 2 million bytes of the output of the generator as binary data to a file called output.dat. This file can then either be renamed for use as a seed/input for other generators or used for some other purpose.



3.5 Unimplemented tests



3.5.1 Serial correlation

This test isn't implemented as no exact version of it appears to exist. Knuth gives a version but only gives conjectured formulae for the expected values which appear not to fit well with actual empirical results. This test is not included in modern random number packages and appears to have fallen into disuse, possibly because it measures similar properties to other non-statistical tests such as the run and gap tests.



3.5.2 Marsagla's "Stringent" tests

Without further analysis, Marsagla's tests are not easy to implement. The information supplied with Diehard [9] is insufficient to easily figure out the algorithm being used for some tests. Unfortunatly, as has been mentioned previously, the code in Diehard is totally impenetrable and gives no clue as to the authors intentions.



3.6 General observations and conclusion

Implementing some of the testers, particularly the more common "Knuth" types, proved quite difficult due to conflicting and inaccurate literature. For example, Banks et al. Give the number of degrees of freedom for a gap test as the sample size whereas in practice it is actually the number of discrete values the output can have. (256 for an 8-bit output) The error in Maurer's paper, mentioned previously, is another example of this problem. Peer-review of papers in this area appears poor, possibly because most current work concentrates on cryptographic generators and there is currently little interest in non-cryptographic implementations.



When testing new generators, it would seem sensible to test using Maurer's test or the Gap test first as they are the most powerful tests which are more likely to reject a generator than any other if it is flawed. The generator should then be tested against the other, less powerful, tests if desired.

4 Generators



4.1 Introduction

The generators use a similar API to the testers. The internals of the generator are normally kept hidden from the calling program so that generators can be changed and the program relinked with no modification. There are routines for changing the internal variables of a generator however and these can be used by programs with a greater knowledge of the internals of a specific generator or by a user to select different values for the generator, either for testing purposes or because specific values might be more appropriate for a certain applications. Programs can get values from a generator by means of a routine which extracts a specified number of bits at a time. There is also an information routine in each generator which will output a textual description of a generator.



4.2 API



There are many more routines used by the generators than there are for testers. These are:



char *rangen_info_text(void *state);

unsigned int rangen_info_statesize(void);

void *rangen_create(void);

void rangen_init(void *);

void rangen_iterate(void *state);

char *rangen_info_getparametername(unsigned int number);

unsigned long long rangen_info_getparametervalue(unsigned int parameter void *state);

void rangen_init_setparameter(unsigned int parameter, unsigned long long value, void *state);

unsigned long long rangen_getbits(void *state, unsigned int bits);



In addition, the state structure should begin as follows:



struct state_struct {

unsigned long long output;

unsigned int outputbitsavailable; };

Rangen_info_text takes a pointer to generator state and returns a pointer to a string giving details of the generator. If VERBOSE is defined in the Makefile this can include details of the current generator settings. Rangen_info_statesize returns the current size of the state array which can be used to make a copy of the generator or, if desired, save it's current state to disk. Generators that cannot be copied or saved to disk in this way return 0.



Rangen_create allocates memory for a generator state and returns a pointer. It will also usually fill in some reasonable default values for the generator. Rangen_init should be called once all parameters have been set using the state pointer to initialize any arrays and values before rangen_getbits is called for the first time. If parameters are changed and rangen_init is not called before the next call to rangen_getbits, behavior is undefined. This will make some generators crash as checks for this are not performed in the interests of speed.



Rangen_info_getparametername returns a pointer to a string describing parameter number. If number is higher than the maximum parameter number, NULL is returned. If the first three letters of a parameter are "Alg" (Short for algorithm) the value required is a pointer to a generator function and the next parameter is a pointer to generator state. Rangen_info_getparametervalue takes a state pointer in addition to a parameter number and returns the value of that parameter. Similarly, rangen_info_setparametervalue can be used to set a value. If parameter zero is available it should be a value appropriate for "seeding" the generator. No guarantees are made as to how well any particular seed value will perform however.



There is no sanity checking performed on values selected in this way and unpredictable things (Like crashes) could happen if invalid values are entered.



Rangen_getbits is the most-used routine and returns an appropriate number of bits from the generator. In all the generators implemented her this is simply a call to the library routine rangen_generic_getbits discussed below. Rangen_iterate is intended primarily for use by librandom for putting a new value into STATE->output. STATE->output always contains the current generator output and STATE->outputbits the number of bits of random number left in STATE->output.



Throughout the generators, STATE is used as a shorthand for ((struct state_struct *)state). This is because the state is always passed as a void pointer so that programs without knowledge of the generator internals can pass it around and thus it requires explicit typing every time it is referenced.



4.3 Library routines

Two library routines are used by generators. These are:



int countbits(unsigned long long);

unsigned long long rangen_generic_getbits(void *state, unsigned int bits, void (*iterate)(void *));



Countbits is used simply to count the position of the left-most one in a binary number and is used by generators such as Von Neumann and LCGs either as part of the algorithm or to calculate the number of random bits generated per iteration.



Rangen_generic_getbits takes a state pointer, which must start in the way described above, a number of bits desired and a pointer to an iteration function and will iterate an appropriate number of times to produce the desired number of bits. Every generator implemented here uses this routine.



4.4 Implemented generators



4.4.1 "Reference" generators

There are three generators which are implemented to allow testing of the testers. These are Linux, File and Zero - any valid test should pass the first two and fail the zero generator.



The "Linux" generator uses the crypto-random generator present in the Linux operating system kernel which uses internal hardware sources of the machine to generate an unpredictable bit stream. The generator actually uses the non-cryptographic file /dev/urandom as this is much faster than the cryptographically secure generator /dev/random. It still uses cryptographic functions and hardware sources so is indeed random.



"File" reads random data from a file "random.dat". The file used in testing was generated from http://www.random.org/, which gets data from a hardware random source. The reason this generator fails the Chi-squared test is because the file in use was only 2Mb and twenty million bytes (10,000,000 16-bit numbers) were tested.



Although primarily intended to test the testers, these two generators could be used in real-world applications if relatively small amount of randomness is required. There are faster and more convenient generators available however.



"Zero", as the name implies, simply generates a stream of zeros.



The Zero and File generators also give an indication of the maximum possible speed of a generator - I.e. how much is lost in function call overhead compared to actual computational time. The file generator is marginally faster on Intel because it works on 64 bits values whereas the zero generator only generates one 32-bit per call to rangen_iterate. (The file generator was reading data off a remote filesystem on the Sun machine, hence the lower speed)



None of these generators take any parameters and their state cannot be saved. (Rangen_info_stateside returns zero)



4.4.2 C Library generators

The "Rand" generator tests the properties of the random number generator random() in the system C library. Rand 1 is results from a Linux machine running glibc2, which performs admirably and is also reasonably fast. The source code for the library doesn't give references as to the source of the generator but it appears to be a "polynomial" type of some description. On a glibc2 machine, this generator is certainly more than acceptable for randomness although slower than other generators.



Rand 2 are results from a Solaris 2.5.1 machine and passes all the tests except Chi-squared. Documentation claims this is a "non-linear additive feedback" generator with a period in the order of 2^31 which if true indicates that this failure isn't in fact due to a small period in the generator but to a major flaw in it's implementation.



This generator takes no parameters.



4.4.3 Von Neumann

The Von Neumann generator as implemented performs poorly in almost all tests. The version used is a "Binary" version - taking the middle binary digits and squaring them - rather than the original decimal version Von Neumann proposed. A hexadecimal version could also easily be implemented, although would offer no obvious benefits over the binary version. The original "decimal" version of this generator would be much slower and also offer no improvement over the binary version.



This generator takes one parameter - number zero is the current internal state of the generator and can be used to "seed" the generator. The default seed value used is one that has been observed to perform reasonably well in the Maurer test, which is the most powerful test available, compared to other seed values. Although this is a relatively fast generator on most platforms it's failure in over half the tests makes it unsuitable for use.



4.4.4 Linear Congruential Generator

The values used in LCG 1 are those suggested by Knuth and those in LCG 2 are the defaults compiled into the generator as suggested by Ritter. Neither set performs particularly well, failing even the Chi-squared test suggesting that the period of these generators is quite low. Although other values may perform better the generator as implemented is reliant to m being a power of two as we are producing random bit streams rather than random floating point numbers. Converting from values of m other than powers of two into a random bit stream is very difficult to do without destroying the randomness of the generator. LCGs are the most common generators in current usage however because although slower for large batches of numbers they are easily implemented and fast if only a few values are needed.



This generator takes four parameters. Parameter zero is the internal state of the generator and can be used as a seed. The first, second and third parameters are m, c and a respectively. M is the modules of the generator, c the value added each cycle (Zero for pure Multiplicative Congruential Generators) and a the multiplication value.



4.4.5 Additive Congruential Generators

ACGs perform incredibly well given their simplicity and speed. The initial table used is based on the first few values in the file used for the "File" generator with a default "lag" of 97. This is the only generator actually implemented other than Mersenne Twisters that actually passes all the tests and achieve over a million numbers per second on a PPro200.



This generator takes two parameters. Parameter one is m, the modules and number two is k, the "lag". As this generator cannot be usefully seeded in it's current incarnation there is no parameter zero.



4.4.6 Inversive Congruential Generators

ICGs perform disappointingly, failing every single test, even the very weak runs above/below the mean tests. Looking at the output it is clear this generator has a very low period - a few hundred - making it totally useless. There appear to be some suggestions that this is a good generator however I can find no source that states this with any certainly. Due to it's use of inverse multiplication it is also very slow, making it unsuitable for general use. This generator is further hindered by it's use of the external library "gmp" - the GNU Multi-Precision math library - for inversive multiplication, which is probably not present on all systems.



Although it might be possible to speed up this generator and increase it's period, there seems little reason to do so. As there is no "hidden" state in this generator it's maximum period is it's modules as with LCGs and there appear to be no benefits of this generator over LCGs despite being slower so the effort does not seem to be worthwhile.



The parameters of the ICG are the same as for the LCG - zero is the state then one to three are m, c and a.



4.4.7 L'Ecuyer

The L'Ecuyer combination generator also performs poorly. L'Ecuyer doesn't give a convincing argument as to why the output should be uniformly distributed and looking at the verbose output of Chi-squared is clear there is bias towards particularly high or low results. The bias is only significant for a reasonably large sample size, greater than 10,000, which is probably why L'Ecuyers original testing of his generator did not spot this flaw.



The API as defined can not handle an arbitrary number of sub-generators, as L'Ecuyer uses, so the configuration of this generator is static and there are no parameters.



4.4.8 Algorithm M

Algorithm M performs comparably to the LCGs - It is based on the same generator as LCG 1 for it's actual output, shuffled by another identical LCG generator. Although the output of the generator in terms of it's ordering is "more random" than a plain LCG, the LCGs used pass most of the ordering-dependant However, LCGs do fail a Chi-squared test and shuffling the output does not help here as Algorithm M also fails the Chi-squared test.



Algorithm M has quite a few parameters. Firstly, powerk defines how big a "shuffle box" is required - the implementation limits this to powers of two so, for example, setting powerk to 8 gives a box of size 256. Parameter two is the number of bits obtained from the generator each cycle. Parameters three through six are the algorithm and state for the two generators X (Number source) and Y. (Shuffle source)



4.4.9 Linear Feedback Shift Register

This generator passes most of the tests well. Rather surprisingly it fails the gap test, which examination of verbose output reveals is due to very high gaps between certain reoccurances of the same number.



Unfortunately, because they have to work bit-by-bit it is rather slow to implement in software and are nearly and order of magnitude slower than Mersenne Twisters and ACGs which perform just as well. The default settings for the tap bits on this generator is taken from Schneider [4].



There are fiver parameters for this generator. Parameter zero is, as always, the current state and can be used to seed the generator. (Although a seed of zero will produce a zero output) Parameters one to four are the tap bit locations, from 0 to 63.



4.4.10 Feedback with Carry Shift Register generators

As with LFSRs, FCSRs suffer from the same failure in the gap test and are even slower. They offer no obvious advantage over FCSRs and take the same parameters.



4.4.10 Mersenne Twisters

The code for the Mersenne Twisters is adapted from the original code by Matsumoto and Nishimura [12] with no major modifications. Despite it's reliance on a relatively large number of operations per cycle mathematical operations it is very fast as it generates over eight thousand random bits each time. Rangen_iterate doesn't return all generated bits every time it is called but breaks them down into more manageable 32-bit chunks - the real iterate routine is called as required. This generator passes all the tests easily without any problems and because of it's speed seems the generator of choice where anything more than a few random numbers are required. "Seeding" this generator would be difficult as the initial state is relatively large (Over 8 thousand random bits) although this could be overcome simply by taking an offset into an existing random source, such as the random.dat currently used to seed the generator.



Mersenne Twisters have no configurable parameters.



4.5 Unimplemented Generators



4.5.1 Wichmann & Hill Combined

This generator, presented in [2] only produces floating point results and there is insufficient information in the paper to determine the granularity of the generator which would be required to convert the output to a bit-stream efficiently.



4.5.2 Polynomial Congruential

As seen by the results of the Rand test, Polynomal Congruential generators are both reasonably fast and quite random. Unfortunately, although this syle of generator is used by glibc2 and also mentioned by Knuth insufficient information is available to attempt a decent implementation.



4.6 General observations and conclusion

Implementing the generators proved a much eaiser task than the testers. To construct a good generator far more theory is required and invariably all the generators include either working code or an algorithm that can be used to quickly produce working code.



It is clear from the table of results in Appendix B that Mersenne Twisters and ACGs are by far the generators of choice. They are both very fast and fail none of the tests presented here.

5 Front-ends



5.1 Tester

The first and most useful front-end, "tester", takes the name of a generator and tester library and runs them. This program shows how to dynamically load tester libraries on a Unix system that supports dlopen(). It is necessary to set the environment variable LD_LIBRARY_PATH to the directory holding the libraries or install the libraries in the system libraries directory so that the dynamic linker can find them.



Tester is called with two arguments, firstly the full filename of a generator library and second the full filename of a tester library, for example:



tester librnd_mersenne.so libtest_display.so

(Assuming "tester" is in the current PATH)



If VERBOSE is defined when compiling a library then tester first outputs a short text description of each library obtained using the info_text functions.



If there are any configurable parameters for the generator then tester will prompt for them one by one, showing default values. The default values can be accepted by just pressing enter without entering any data.Tester cannot handle "Alg" type parameters and ignores them.



5.2 test-static

The second tester, test-static, shows how simple it is to include a generator (And tester) in a program. The libraries included in the static executable can easily be changed by modifying the STATIC_LIBRARIES variable in the Makefile. Default settings produce a completely static executable, i.e. one that requires no support libraries other than the standard C libraries. Substituting the .so files for .o gives a dynamically linked version, where the library files are required to run the program, but with no modifications to the source. This latter method is the preferred method of linking in the random number libraries as if a generic name, such as librndgen.so, is used the generator can be changed merely by replacing the library file with no need to change the source code.



6 Conclusions

The results here are not entirely surprising. As noted in the first report, both Mersenne Twisters and ACGs were expected to do reasonably well. The failure of LFSRs is a surprise however, as these seem to be regarded as reasonably proficient generators and although slow in software are often used in hardware implementations. The failure of combined generators is also somewhat surprising, although recent research work has moved away from this and into totally new generators.



There is little to be said about the testers as they all performed as expected. It is worth reiterating the poor quality of work on random number testing however, in the hope that someone may eventually produce a work that actually includes algorithms for implementing tests quickly and accurately. Algorithms for this are very thin on the ground and often had to be derived from ambiguous statements and the ambiguities resolved by experimentation.

References

[1] Knuth, D.E., 1969, The Art of Computer Programming, Volume 2 - Seminumerical Algorithms, Addison-Wesley

[2] Wichmann, B.A., Hill, I.D., 1982, A Pseudo-Random Number Generator, NPL Report DITC 6/82, National Physical Laboratory.

[3] L'Ecuyer, P., 1988, Efficient and Portable Combined Random Number Generators, pp.742-749, 774, Communications of the ACM Volume 31, Number 6.

[4] Schneier, B., 1996, Applied Cryptography, John Wiley & Sons.

[5] Maurer, U.M., 1992, A Universal Statistical Test for Random Bit Generators, pp.89-105, Journal of Cryptology, Vol.5, No. 2.

[6] Banks, J., Carson 2nd, J., Nelson, B., 1996, Discrete-Event System Simulation, Prentice-Hall.

[7] Fishman, G., 1973, Concepts and Methods in Discrete Event Digital Simulation, John Wiley & Sons.

[8] Ritter, T., Ciphers by Ritter, http://www.io.com/~ritter/

[9] DIEHARD, http://stat.fsu.edu/~geo/diehard.html

[10] Haahr, M., http://www.random.org/

[11] Marsaglia, G., 1984, A current view of Random Number Generators, Computer Science and Statistics: The Proceedings of the 16th Symposium on the Interface, Atlanta, Elsevier Press.

[12] M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator, ACM Transactions on Modeling and Computer Simulation, Vol. 8, No. 1, pp.3-30.

[13] Sullivan, J., 1998, Numerical Analysis & Associated Fields Resource Guide, http://net.indra.com/~sullivan/q10.html

Appendix A - Evaluation



A.1 Project Overview



Initial work was on structuring the API in such a way that it was flexible enough to cope with most generators and testers although still specific enough to allow generators to be interchanged easily in a program. Once a rough outline had been produced the "reference" generators and display and speed tester were written as a "proof of concept" and the API refined as a result of this. Constructing the API and resolving issues surrounding the basic framework took around a quater of the time, primarily because constructing dynamic-link libraries and particularly using the dlopen() call causes problems not encountered in normal programming. Debugging also proved a problem - although most generators can be linked statically combination generators such as L'Ecuyer rely on dlopen() to work which makes using standard debuggers such as gdb tricky.



Only around ten percent of the time was spent implementing generators as this proved a relatively easy task given the wide availability of algorithms to perform the most common generation functions. Writing the testers and running tests took over half the time taken. This was due primarily to the difficulties in implementing testing libraries that have been discussed at length previously. The remaining time was spent writing the report.

A.2 The Approach



There is little to be said about the approach that has not already been covered in the body of the report. No major assumptions were required for implementation and the reasoning behind the structure of the project and choice of language has already been covered in section 2.1.



A.3 Possible Future Work In The Area



This project could be extended quite considerably to encompass far more generators and testers than are currently implemented. Trying to implement even a significant portion of the available generators would be quite a significant work however as there are a large number of generators and an almost infinite number of possible parameters for those generators.

Appendix B - Tables of results

B.1 Empirical test results

Chi-sq KS Gap Poker Runs Mean Maurer
Linux 0.362208 0.510559

0.675486

0.712583

0.000526

0.478271

0.428924

0.359596

0.874975

0.247679

0.393694

0.184395

0.458329

7.184869
File 1.000000 0.738718

0.948190

0.935290

0.001834

0.011070

0.011181

0.497735 0.132207 0.130925

0.047258

0.174363

0.272003

7.184507
Zero 1.000000 1.000000

0.000000

0.000000

1.000000

1.000000

1.000000

1.000000

1.000000

0.000000

0.000000

0.000000

0.000000

0.000001
Rand 1 0.715177 0.820229

0.018466

0.480549

0.096585

0.414513

0.399850

0.138942

0.473569

0.252494

0.710310

0.138706

0.399274

7.184058
Rand 2 1.000000 0.753250

0.382823

0.071074

0.162250

0.131359

0.109035

0.228733

0.801380

0.473434

0.175626

0.588748

0.818538

7.185135
Von N. 1.000000 1.000000

1.000000

0.000000

1.000000

0.996104

0.972135

1.000000

1.000000

0.859635

0.757239

0.931238

0.753966

7.196248
LCG 1 1.000000 0.247861

0.804789

0.000000

1.000000

0.154877

0.061890

0.230819

0.159677

0.215360

0.324168

0.477696

0.260543

7.198858
LCG 2 0.000000 0.033461

0.952883

0.000000

1.000000

0.032041

0.083639

0.014468

0.013934

0.885685

0.897066

0.914775

0.975280

7.233509
ACG 0.191041 0.771454

0.167969

0.376785

0.270558

0.022058

0.093272

0.292254

0.498318

0.180289

0.403473

0.231975

0.144920

7.183310
ICG 1.000000 1.000000

1.000000

0.999999

0.998766

0.999999

0.998766

1.000000

1.000000

0.928292

0.739950

0.999918

1.000000

7.180711
L'Ecuyer 1.000000 1.000000

1.000000

0.000000

1.000000

0.852401

0.554670

1.000000

1.000000

0.118549

0.250390

0.180989

0.411692

7.169784
Alg. M 1.000000 0.033915

0.811274

0.000000

1.000000

0.149044

0.061221

0.010887

0.498318

0.295884

0.919160

0.324374

0.510433

7.196667
LFSR 0.505779 0.657751

0.104576

0.000000

1.000000

0.033919

0.012633

0.160175

0.184342

0.138039

0.101322

0.291267

0.243762

7.184588
FCSR 0.084193 0.517227

0.511065

0.000000

1.000000

0.033539

0.056973

0.821765

0.818449

0.580329

0.835615

0.836863

0.335689

7.183938
Mer.Tw. 0.806905 0.642702

0.057305

0.532591

0.116994

0.097397

0.119505

0.424862

0.960693

0.717156

0.727823

0.245693

0.405926

7.183670



Red: Failure at the 1% level Yellow: Failure at the 5% level Green: Pass

B.2 Speed test results

Intel PPro200 Sun Ultra 5/10

(Sun4u 299MHz)

Alpha 21164 533MHz
file 1851 2251
zero 1770 2483 6736
Mersenne 1470 2125
Von Neumann 1418 727 5120
ACG 1190 1309 4808
Rand 1190 1636
LCG 1058 920 4367
L'Ecuyer 794 336 1295
Algorithm M 374 304 1694
LFSR 160 176
FCSR 141 171
ICG 54
Linux 34



All values are thousands of 16-bit values per second.

Appendix C - Source Code

©1995-2006 Zoë O'Connell