In order to build a test for s-dimensional random numbers we have to construct
such vectors from the sequence of onedimensional random numbers. As any other
step in the construction of the test statistic, this transformation will be expressed
by a function which can be applied to the random numbers
, but also to the random
variables
. The result of the former are vectors of random numbers which will be
tested,
the result of the later are random vectors from which we deduce the distribution of
the test statistic.
The first simplification that we have to make concerns the transformation of the sequence
of
random variables and PRNs, respectively, into a sequence of s-dimensional vectors,
where
. We will only consider the
following two, somewhat natural ways to proceed:
As we are thinking of an empirical test that should be evaluated by a computer in a finite
amount of time, we have to limit the number of PRNs that are to be tested. We will only use
a finite sample of length
consisting of the vectors
.
Let us denote this sample by
and a realization of it by
.
The difference between the two methods to form s-dimensional vectors is the
different common distribution of the resulting random vectors.
Random vectors
formed from nonoverlapping tuples are
independent
, whereas random vectors
formed from overlapping tuples are
dependent. The common distribution function for overlapping tuples will not
factor to the product of the distributions of the single random variables.
In many applications it will be desirable to have independent vectors and one will choose the nonoverlapping form of tuples. However, in building statistical tests we can use both forms of tuples since we will pay attention to the common distribution while calculating the distribution of the test statistic.

Note that choosing overlapping tuples is more general in that the resulting sequence will
also include all vectors resulting from nonoverlapping tuples. However, the information
represented
by the two possible sequences of random vectors is exactly the same, since every single
random variable
can be reconstructed from both forms.
For the moment let us concentrate on the nonoverlapping case in order to develop the
ideas.
The empirical distribution function for such a sample
is given by
![]()
for every
, where
![]()
When we substitute the sequence
for the sample
, the Glivenko-Cantelli
theorem proves that the empirical distribution function of these random vectors will
almost
surely converge to the distribution function of
, for
, which is given
by
![]()
since the components of
are independent uniformly distributed random
variables
.
Convergence in the above sense is expressed by the distance of the two functions with
respect to supremum norm: put d the distance
![]()
then d will converge to zero almost surely when we increase the sample
size
.
Almost sure is defined with respect to
the probability space
which contains all infinite sequences
of random numbers: the described convergence will happen for almost all
infinite sequences of random numbers.
In other words, d degenerates to a random
variable
having value zero on a set of measure one if
. This is a consequence of
the
strong law of large numbers and the compact range of probabilities.
For the 'esoterical' case
we thus have found a test statistic d
expressing
the numerical properties we consider
important in our definition of iud sequences of PRNs. The set of sequences for which d
does not converge to zero thus contains valid sequences of random numbers which do not
lead to the desired results with our 'simulation problem' d.
But since we have only finite sequences of some length
we have to calculate the
distribution of the quantity d for this given sample size. Note that
is a
random
variable which is not only defined on
but also on the probability space
which contains all sequences of
random numbers of length
. Due to the construction of
, the distribution of
is the
same with
respect to
these two probability spaces.
If it would be possible to calculate the distribution of d for a given
, we could
mark large
values
of d within its range as the critical region and build a statistical test. The test would
exclude sequences of vectors of random numbers of length
for which the
empirical
distribution function differs - in the sense of supremum norm - too much from the
theoretical
distribution function of iud random vectors.
The weak law of large numbers tells us that the proportion of sequences of length
that
approximate the theoretical distribution function up to an error less or equal
becomes one as
increases towards infinity.
Two problems remain. The first one is a mathematical one. The distribution of d is
unknown for
dimensions s>1. Even in the case s=1 the distribution of d is only asymptotically
independent
of the shape of
.
The second problem is a computational one. The computation of d is practically
impossible for dimensions s>1 and interesting sample sizes (
) since the complexity of the algorithm is about
, which is
exponential in
s.
These problems are solved by a two-step procedure. We first partition the range of the random vectors into a finite number of classes thereby simplifying the calculation of the empirical distribution function, which can be expressed by counting the number of hits for every such class.
In the second step, we
calculate a certain distance
measure
which expresses the distance of the observed counters to the expected
value of each such counter. The distribution of
can be calculated approximately
and provides a test statistic for iud PRNs.
We will define the partition of
by splitting each coordinate into a finite
number
of classes
![]()
The components of our random vectors
are the random variables
and we
will use the following notations in order to denote the partition of these: put
![]()
an alphabet of cardinality
containing a symbol for each class of the partition and
let
![]()
be a function that maps the range of
onto the alphabet. Finally put
![]()
thus becomes a sequence of abstract random
variables from
to
.
We will use the notation
![]()
in order to denote the partitioned random numbers.
We now again form vectors using nonoverlapping s-tuples of these symbols setting
![]()
and
![]()
Note that each vector is an element of
. The unit cube
has been
partitioned into
classes which we denote by
![]()
A sample of
such vectors will be denoted by
and
respectively.
The partition transforms the multidimensional continuous random variables respectively
random numbers into discrete random variables. For realizations
of such
random variables
the information represented by the empirical distribution function, that is the
function
![]()
where
and
means that
![]()
is equivalent to knowing the 'empirical probability density', that is the set of values
![]()
since either quantity can be constructed from the other one. We thus can simplify the
evaluation of
the
empirical distribution function to counting hits in classes. The empirical probability
density is an estimator for the probability density of the random counters of hits in the
classes. This function will be written
![]()
We again have to introduce arbitrariness into our test, namely the way in which we
split the unit cube. An easy and evident way to do so will be to divide [0,1[ into l equals
segments of length 1/l, thereby generating
half-open
mini-cubes in
. The number l has to be chosen small enough in order to allow
the
evaluation and storage of the empirical probability density but also large enough in order to
represent actual applications which will usually work
with very fine partitions,
if
they split the vectors at all.
Since we are working with computers, a quite general and easy to implement
way to split the unit cube,
including the above one for l of the form
, are bit manipulations: we will
consider the single random numbers
as, say, 32-bit integers, that is, we transform
them by multiplying
with
. Thus, our sequence
becomes a sequence of 32 bit numbers.
We now cut out some k different bits of every number. These k bits denote a unique
symbol in the set
, where
denotes the number of combinations possible with k bits,
.
This can be described by a function
![]()
where k(x) is the binary number represented by the k bits selected from the random
number x.
The same procedure applied to the sequence
results in
a sequence of independent random variables distributed uniformly on the alphabet
since the probability for each symbol is the same:
![]()
The described bit manipulations always lead to uniformly distributed symbols, since the
whole
measure on [0,1[ is divided into two parts of equal measure by every bit in a binary
representation.
In the tests presented in Chapter 4, we used one of the following bit manipulations:
If you are familiar with statistical testing, you will already have recognized that we are
constructing an usual
goodness of fit test. Such tests are used to
decide
whether a set of data arises from a specified
probabilistic model, or not. The test is based on two properties of the multinomial
distribution:
Let us introduce the distance measure
. For every class
in our
partition we compute the distance between the actual number of hits
and the expected number of hits
![]()
A weighted sum of the squares of
these distances becomes the overall test statistic. The weights are chosen in order to
stabilize the mean and variance of the distribution of the test statistic as a function of
.
This is needed to get a convergence in distribution which in turn is required since we can
only calculate the asymptotic distribution. We therefore define

This really is something like a distance between the empirical distribution function for
and the theoretical distribution function of
, since the function becomes zero
if and
only if the mentioned distance is zero. To see this, observe that

The original random numbers
have got more freedom to vary since the partition
provides only a reduced amount of information. By this, the empirical distribution function of
the
needs only to be near to the theoretical
in order to get
.
The theorem of Pearson states, that the test statistic
will asymptotically be distributed
with
degrees of freedom if we substitute R for r. We can thus perform a
statistical test on
any set of random numbers
by marking a set of values within the range of such a
variable as the critical region.
The quality of the approximation depends on the expectation of the number of hits in the
classes and on the total number of such classes.
The higher the smallest of these expectations or the higher the number of classes,
the better the approximation. We
will return to this question in the next section.
As we have constructed the test in order to exclude random numbers whose empirical
distribution function is far from the theoretical one, the critical region should be placed at the
utmost end of the values of
. However, traditional statistics often uses a critical region
including values near zero as well as
very high values. A test with such a critical region would exclude random numbers whose
empirical distribution function is either too close to or too far from the theoretical one. The
construction of the critical region
as well as the selection of the other parameters
, s,
m and the set of bits is called test design. It will be covered in the next subsection.
Yet another method is the combination of a
test statistic with a Kolmogorov-
Smirnov test. As has already been mentioned, such a test is used to compare the
empirical distribution function of realizations of an onedimensional random variable Y to
the distribution function
itself by means
of the supremum norm
![]()
Provided that the distribution function
is continuous,
the asymptotic distribution of
for
is the Kolmogorov-Smirnov
distribution, which is
independent of the shape of
.
A good approximation is available for sample
sizes
k>4 using an approximate distribution function which is given in [40]. See also
[17, p.184,] for further information.
We will denote this test statistic by
The combined test is performed by first evaluating k times the
test statistic. On
these approximately
distributed values a final K-S test is calculated. The critical
region of the K-S test is set to include high values since we want to exclude sequences of
PRNs that do not approximate the
distribution within a fixed amount of error d.
Usually the critical region is set to the interval
which has a probability of
approximately 0.01 if k=32.
The sample tests in Chapter 4 have all been performed using such combined tests. In our opinion, they are most likely to yield sequences of PRNs that are useful in the class of applications where the user makes not only one but k simulation runs in order to get a good approximation to the distribution of their model.