next up previous contents
Next: Serial Tests for Pseudorandom Up: Generalized -DivergenceandFrequency Analysis in Previous: The Central Limit Theorem

   
Serial Tests and Markov Chains

Serial tests are a backbone in the equidistribution and correlation analysis of pseudorandom number generators (PRNGs). A PRNG produces a sequence $(y_l)_{l \in {\rm I\!N}_0}$ of so-called pseudorandom numbers (PRNs) in the unit interval [0,1) which are assumed to behave like a realization of an independent sequence of random variables distributed uniformly on [0,1). We refer the reader to [19] for an introduction.

As an example, consider the linear congruential generator, LCG for short. Defined by integer parameters M, a, b, and <tex2htmlverbmark>1<tex2htmlverbmark>0, the LCG (M,a,b,<tex2htmlverbmark>1<tex2htmlverbmark>0)produces a sequence $({\verb*+prn+}_l)_{l \ge 0}$of integers by ${\verb*+prn+}_{l+1} = (a \; {\verb*+prn+}_l + b) (\bmod\, M)$, i.e.  <tex2htmlverbmark>1<tex2htmlverbmark>l+1 is the integer remainder of dividing $a \; {\verb*+prn+}_l + b$ by M. A sequence $(y_l)_{l \geq 0}$ of pseudorandom numbers in [0,1) is defined by yl = <tex2htmlverbmark>1<tex2htmlverbmark>l / M. The LCG's output is a periodic sequence whose period length ${\varrho}$ is mostly M and depends on the choice of parameters [39, p.169].

Pseudorandom number generators have received increasing interest over the past years since many numerical algorithms depend on a reliable source of pseudo-randomness. For instance, think about the field of stochastic simulation where one is interested in the typical behaviour of a (real-world) system such as a queue in a bank. The state of the system Xn at time n (e.g. the number of customers in the line) is supposed to depend in a deterministic way on a set of parameters Yi, i in some index set, (Yi could be the arrival time of the customer i for example) for which no purely deterministic description is available. Based on the assumption that these parameters can be modelled as random variables with a certain distribution function, the state of the system itself becomes a random variable. If direct mathematical analysis of the properties such as expectation, variance, or distribution function, is not feasible, simulation provides a way of estimating these quantities based on a sample. The PRNs are used as realizations for the parameters Yi. We will sort of reverse this setup and deduce criteria on the quality of the pseudorandom numbers from known asymptotic properties of a chain model.

Other applications of PRNGs such as numerical integration in higher dimension or optimization seem to involve no randomness at first. Here, randomization can be used to control the ratio between the speed of computation and the accuracy of the result. The value in demand is viewed as expected value of a random variable and thereafter estimated based upon a sample. As for numerical integration, this leads to so-called Monte Carlo methods, as for optimization, we have simulated annealing, genetic algorithms, and others.

In any case, the quality of the chosen source of randomness - the PRNG - can decisively influence the results. Although there exist non-deterministic devices such as amplified noise from a transistor, which play an upcoming role in the field of encryption, the generator is usually implemented by means of a deterministic algorithm which is able to ``fake'' the required aspects of randomness. Deterministic algorithms have the advantage that their results can be replicated and controlled, and that the modelling and debugging process is simplified significantly. Moreover, they allow rigorous analysis and testing to be performed independently of the actual application. The term pseudo-random number generator has been adopted to stress the deterministic provenance of the numbers which are hereafter supposed to behave like realizations of independent random variables distributed uniformly on [0,1). In the following we will call this assumption H0. H0 is in fact no severe restriction since any arbitrary distribution function can be obtained by a transformation of uniform variates. For an implementation and discussion of some recent transformation methods see [51,52].

Note that due to the finite state space of the computer program which implements the generator, the output of every PRNG becomes periodic, eventually.

In order to be accepted by the scientific community and by the practitioners every PRNG has to undergo an extensive testing procedure. Such tests are aimed to assure that the generated pseudo randomness cannot be distinguished from ``real'' randomness in a sense which strongly depends on the target application. As discussed in [18,29] it is clear that no test can ever prove that a given generator is flawless, because such general purpose generators do not exist: every fixed generator (i.e. every fixed periodic sequence of numbers in the unit interval) will fail in several tests if no further restrictions are imposed on the set of admissible tests, see also Chapters 1 and 2 in [55]. As a consequence, different types of generators are used - for instance - for ciphering clear text and for integrating high-dimensional functions.

However, tests can improve or destroy our confidence into that a generator will do well in certain applications. So-called theoretical tests, [9,10,12,13,14,15,40] prove properties of the generated sequence of numbers based on a mathematical analysis of the underlying algorithm of the generator. These tests are done a-priori, that is, before any single pseudorandom number has been generated on a computer. They are often used in order to parameterize a generator such like tests for the period length of LCGs given the parameters M, a, and b, for instance. A drawback of theoretical tests is that they usually only allow prognostications of properties of the whole period of PRNs. Since nowadays generators offer period lengths ${\varrho}$ up to 219937, such tests tell us only little about the comparatively small samples $y_0,\ldots,y_{n-1}$that will actually be used in an application of the generator. Regarding the whole period of PRNs as sample space and regarding the seed (i.e. the starting point of the subsequence of length n) as random variable distributed uniformly over the period length, the equidistribution analysis using theoretical tests assures the average quality of samples, however.

Theoretical tests always have to be complemented by empirical tests where the sample size n $(n<<{\varrho})$ is chosen with respect to the application, see [2,5,28,30,32,46,53]. Here, the generator is treated as a black box and its behavior with respect to several test statistics is studied. If the distribution of the test statistic is known under H0, this hypothesis can be tested on a desired level of significance. An empirical test is especially valuable if the used test statistic resembles the target application, see [55, Chapter 2] and the forthcoming [19].

These arguments motivate empirical tests for pseudo-randomness which are based on the simulation of a Markov chain. The chain model permits a kind of flexibility with respect to parameters such as the sample size and the number of possible states of the system - actually, Markov chains are often used to model certain aspects of a real-world system. The distribution of a suitable test statistic, on the other hand, can be derived from known facts about the long-time behavior of MCs, as we will see below.

In this chapter we recall a standard empirical test known as ``Serial test'' and model it by counting the number of visits of certain Markov chains in each state in the state space during a given interval of time, see Section 3.1. We complete the construction of a test statistic by applying the chi-square statistic and a generalization thereof in Section 3.2. The chi-square statistic is used to measure the deviation of the counter vector from its expectation. In Chapter 4 we will deal with a more general test statistic for this purpose. For a case study with respect to a well known PRNG we refer the reader to Chapter 5.



 
next up previous contents
Next: Serial Tests for Pseudorandom Up: Generalized -DivergenceandFrequency Analysis in Previous: The Central Limit Theorem
Stefan Wegenkittl
1998-05-19