## 10. Statistical Properties

The sequences generated by the cellular automaton of Eq. (3.1) may be considered effectively random if no feasible procedure can identify a pattern in them, or allow their behaviour to be predicted. Even though it may not be possible to prove that no such procedure can exist, circumstantial evidence can be accumulated by trying various statistical procedures and finding that they reveal no regularities. The basic approach is to compare statistical results on sequences generated by (3.1) with those calculated for sequences whose elements occur purely according to probabilities.

To establish the validity of (3.1) as a general-purpose random sequence generator, one should apply a variety of statistical procedures, related to various different kinds of calculations. The choice of tests is necessarily as ad hoc as the choice of calculations done. Appendix A lists those used here. (But see also [35].) Some can be considered related to Monte Carlo simulations of physical and other systems. Others to statistical analyses that would be done on data from various kinds of measurements. While quite ad hoc, the tests seem to be sensitive, and reasonably independent.

As an example, consider the ''equidistribution'' or ''frequency'' test. If a sequence of zeroes and ones is to be random, the digits zero and one must occur in it with equal frequency. In general, in fact, all possible length blocks of digits must also occur with equal frequency. (The measure entropy of (4.2) is maximal exactly when such equidistribution occurs.) However, in a finite sample of length , there are expected to be statistical fluctuations, which lead to slightly different numbers of zeroes and ones. (The value of entropy deduced from a finite sample is thus almost always not maximal, even if it would be maximal were the sequence to be continued forever.) As a consequence, one can never definitively conclude by studying a finite sample that the complete sequence is not random. One can however calculate the probability that a truly random sequence would have the properties seen in the finite sample.

To do this (e.g., [36]), one evaluates , defined in terms of the observed and expected frequencies and as

Here gives the number of degrees of freedom, or number of distinct objects whose frequencies are included in the sum. If blocks of length are studied then . Now one must find the probability that a value of larger than that observed would occur for a random sequence. This ''confidence interval'' is obtained immediately from the integral of the distribution (e.g., [36]).

If the confidence interval is very close to zero or one, then the observed is unlikely to be produced from a random sequence, and one may infer that the observed sequence is not random. Of course, if say a total of tests are done, it is to be expected that the confidence interval for at least one of them will be less than . Evidence for nonrandomness in a sequence must come from an excess of confidence interval values close to zero or one, over and above the number expected for a uniform distribution.

Table 10.1 gives results from the statistical tests described in Appendix A for sequences generated by the cellular automaton (3.1) in a finite circular register. Except when the sample sequence is comparable in length to the period of the system, as given by Table 9.2, no significant deviations from randomness are found.

Table 10.2 gives statistical results for sequences generated by other procedures. Those obtained from linear feedback shift registers, while provably random in some respects (e.g., [4]), are revealed as significantly nonrandom by several of the tests used here. Many sequences obtained from linear congruential generators are also found to be significantly nonrandom with respect to these tests. No regularities are detected in the digit sequence of (and other surds tried) (cf. [37]). There is, however, some possible evidence for nonrandomness in the digit sequences of and (cf. [38]). (This will be explored elsewhere.)

Table 10.3 gives statistical results for temporal sequences in the pattern of Figure 6.1 obtained by evolution according to Eq. (3.1) from a single nonzero initial site on an infinite lattice. Once again, no significant deviations from randomness are seen.

If deviations from randomness were detected by some statistical procedure, then this procedure could be used to make statistical predictions about the sequence. In addition, it could be used to obtain a compressed representation for the sequence, and would thus demonstrate that the sequence did not have maximal information content. The fact that deviations from randomness have not been found by any of the statistical procedures considered lends strong support to the belief that sequences produced by Eq. (3.1) with large are indeed random for practical purposes.