next up previous contents
Next: A sequence of independent Up: Random numbers Previous: Pseudorandom numbers

Estimation techniques and the empirical distribution function

  The empirical distribution function (EDF) of a finite sequence of realizations of a random variable lies at the heart of statistical inference. We will first introduce the concept of estimation and then define the EDF.

Consider a random variable X with distribution function tex2html_wrap_inline3024. We call tex2html_wrap_inline3338 a random sample from tex2html_wrap_inline3024 if, for all i, tex2html_wrap_inline3344 and the tex2html_wrap_inline3346 are independent. Let tex2html_wrap_inline2958 be a property of tex2html_wrap_inline3024, e.g. a parameter in the model describing X such as the expectation, the variance or a general statement like the probability that X will assume a value less than t, tex2html_wrap_inline3358.
 defn408

Estimators are used to measure properties of distribution functions. In the field of statistical inference they are used to construct models from empirical data. We have already commented on the principal difficulties arising from such a setup: the estimations cannot be falsified. However, they are a very important tool for both practical and theoretical statistics. If an estimator is calculated from some k experiments yielding the random numbers tex2html_wrap_inline3368, the resulting number is a random number itself, i.e. a realization of the random variable tex2html_wrap_inline2968.

The above form of estimators is called 'point estimate' since it will yield a single real value. There also exist so-called 'interval estimates' that yield intervals containing the parameter tex2html_wrap_inline2958 with a given probability. We will make use of an estimator for the whole distribution function tex2html_wrap_inline3024, however. Such an estimator needs to be a function itself. We therefore define the empirical distribution function.


 defn414

The strong law of large numbers (S.L.L.N.) immediately gives pointwise convergence if the random sample is drawn such that the tex2html_wrap_inline3346 are independent:
displaymath3384
The theorem of Glivenko-Cantelli (Theorem 20.6. in [3]) even proves a uniform convergence in t:
displaymath3388

The empirical distribution thus serves as an estimator for the distribution function of a random variable. It can easily be extended to the multivariate case by defining
displaymath3390
where
displaymath3392
and s is the dimension of the vector tex2html_wrap_inline2956.

Note that the independence of the random variables tex2html_wrap_inline3346 is essential to get the desired limiting behavior of the EDF. It can easily be seen that the validity of the Glivenko-Cantelli theorem remains valid if we substitute any measurable function g(X) for X and consider the distance of the EDF tex2html_wrap_inline3404 and the distribution function tex2html_wrap_inline3406. This property shows again the great difference between random variables and random numbers: if we fix a sequence tex2html_wrap_inline3408 of random numbers, we could get the right limiting behavior, e.g. convergence, for one function tex2html_wrap_inline3410 and a wrong behavior for another function tex2html_wrap_inline3412! This is due to the fact that our sequence may be contained in the set of measure zero for which the Glivenko-Cantelli theorem does not claim convergence for tex2html_wrap_inline3412. This set can differ from the according set for the function tex2html_wrap_inline3410.

The term EDF denotes a random function, i.e. an infinite-dimensional random vector, as well as a concrete realization of this vector, which results from substituting random numbers for the random sample. From now on we will always refer to the second meaning of EDF, thus:


tabular1658


next up previous contents
Next: A sequence of independent Up: Random numbers Previous: Pseudorandom numbers

Stefan Wegenkittl
Tue Dec 3 09:56:35 MET 1996