![]() ![]() ![]() |
The sequences generated by the cellular automaton of Eq. (3.1) may be considered effectively random if no feasible procedure can identify a pattern in them, or allow their behaviour to be predicted. Even though it may not be possible to prove that no such procedure can exist, circumstantial evidence can be accumulated by trying various statistical procedures and finding that they reveal no regularities. The basic approach is to compare statistical results on sequences generated by (3.1) with those calculated for sequences whose elements occur purely according to probabilities.
To establish the validity of (3.1) as a general-purpose random sequence generator, one should apply a variety of statistical procedures, related to various different kinds of calculations. The choice of tests is necessarily as ad hoc as the choice of calculations done. Appendix A lists those used here. (But see also [35].) Some can be considered related to Monte Carlo simulations of physical and other systems. Others to statistical analyses that would be done on data from various kinds of measurements. While quite ad hoc, the tests seem to be sensitive, and reasonably independent.

temporal sequences generated from all possible seeds by evolution according to Eq. (3.1) in a length
circular register. Results for successive values of
are given in successive columns. The results are plotted in Fig. 9.3.
sequences obtained by evolution from all possible seeds according to Eq. (3.1) in a size
circular register. The three-dimensional view is from the point
, with elevation 2.As an example, consider the ``equidistribution'' or ``frequency'' test. If a sequence of zeroes and ones is to be random, the digits zero and one must occur in it with equal frequency. In general, in fact, all
possible length
blocks of digits must also occur with equal frequency. (The measure entropy of (4.2) is maximal exactly when such equidistribution occurs.) However, in a finite sample of length
, there are expected to be statistical fluctuations, which lead to slightly different numbers of zeroes and ones. (The value of entropy deduced from a finite sample is thus almost always not maximal, even if it would be maximal were the sequence to be continued forever.) As a consequence, one can never definitively conclude by studying a finite sample that the complete sequence is not random. One can however calculate the probability that a truly random sequence would have the properties seen in the finite sample.
To do this (e.g., [36]), one evaluates
, defined in terms of the observed and expected frequencies
and
as

Here
gives the number of degrees of freedom, or number of distinct objects whose frequencies are included in the sum. If blocks of length
are studied then
. Now one must find the probability that a value of
larger than that observed would occur for a random sequence. This ``confidence interval'' is obtained immediately from the integral of the
distribution (e.g., [36]).
If the confidence interval is very close to zero or one, then the observed
is unlikely to be produced from a random sequence, and one may infer that the observed sequence is not random. Of course, if say a total of
tests are done, it is to be expected that the confidence interval for at least one of them will be less than
. Evidence for nonrandomness in a sequence must come from an excess of confidence interval values close to zero or one, over and above the number expected for a uniform distribution.
Table 10.1 gives results from the statistical tests described in Appendix A for sequences generated by the cellular automaton (3.1) in a finite circular register. Except when the sample sequence is comparable in length to the period of the system, as given by Table 9.2, no significant deviations from randomness are found.

(
) generated by the cellular automaton of Eq. (3.1) (rule number 30) in circular registers of length
. In each case, the seed used consists of a single nonzero site. The numbers given are the probabilities (confidence intervals) for statistical averages of truly random sequences to exceed those of the sequences analysed. The numbers should be uniformly distributed between 0 and 1 if the sequences analysed are indeed truly random. Results below 0.05 and above 0.95 are shown in bold type. Accumulations close to 0 or 1 suggest deviations from randomness. Such accumulations are seen in this case only when the period of the cellular automaton is comparable to the length of the sequence sampled. (The statistical test programs used here were written in C by Don Mitchell.)
circular register. LFSR is a linear feedback shift register of length
with period
. For
the shift register taps are at positions 14 and 17; for
they are at positions 27 and 29. For CA60 and LFSR seeds consisting of a single nonzero site were used. LCG is the linear congruential generator
(used, for example, in many implementations of the UNIX operating system). The seed
was used. The behaviour of CA60, LFSR and LCG are illustrated in Fig. 11.1.
,
and
are the binary digit sequences of the square root of two, the exponential constant, and pi, respectively. (These digit sequences were obtained by R. W. Gosper using a Symbolics 3600 LISP machine.)Table 10.2 gives statistical results for sequences generated by other procedures. Those obtained from linear feedback shift registers, while provably random in some respects (e.g., [4]), are revealed as significantly nonrandom by several of the tests used here. Many sequences obtained from linear congruential generators are also found to be significantly nonrandom with respect to these tests. No regularities are detected in the digit sequence of
(and other surds tried) (cf. [37]). There is, however, some possible evidence for nonrandomness in the digit sequences of
and
(cf. [38]). (This will be explored elsewhere.)
Table 10.3 gives statistical results for temporal sequences in the pattern of Figure 6.1 obtained by evolution according to Eq. (3.1) from a single nonzero initial site on an infinite lattice. Once again, no significant deviations from randomness are seen.
If deviations from randomness were detected by some statistical procedure, then this procedure could be used to make statistical predictions about the sequence. In addition, it could be used to obtain a compressed representation for the sequence, and would thus demonstrate that the sequence did not have maximal information content. The fact that deviations from randomness have not been found by any of the statistical procedures considered lends strong support to the belief that sequences produced by Eq. (3.1) with large
are indeed random for practical purposes.

in the pattern of Fig. 6.1 generated by evolution according to Eq. (3.1) from a single nonzero initial site on an infinite lattice. Leading zeroes in each sequence were truncated. (The sequences were obtained by Jim Salem using a prototype Connection Machine computer.)