## 1. Random Sequence Generation

Sequences that seem random are needed for a wide variety of purposes. They are used for unbiased sampling in the Monte Carlo method, and to imitate stochastic natural processes. They are used in implementing randomized algorithms which require arbitrary choices. And their unpredictability is used in games of chance, and potentially in data encryption.

To generate a random sequence on a digital computer, one starts with a fixed length seed, then iteratively applies some transformation to it, progressively extracting as long as possible a random sequence (e.g., [1]). In general one considers a sequence ''random'' if no patterns can be recognized in it, no predictions can be made about it, and no simple description of it can be found (e.g., [2]). But if in fact the sequence can be generated by iteration of a definite transformation, then a simple description of it certainly does exist. (1) The sequence can nevertheless seem random if no computations done on it reveal this simple description. The original seed must be transformed in such a complicated way that the computations cannot recover it.

The degree of randomness of a sequence can be defined in terms of the classes of computations which cannot discern patterns in it. A sequence is ''random enough'' for application in a particular system if the computations that the system effectively performs are not sophisticated enough to be able to find patterns in the sequence. So, for example, a sequence might be random enough for Monte Carlo integration if the values it yields are distributed sufficiently uniformly. The existence say of particular correlations in the sequence might not be discerned in this calculation. Whenever a computation that uses a random sequence takes a bounded time, there is a limit to the degree of randomness that the sequence need have. Statistical tests of randomness emulate various simple computations encountered in practice, and check that statistical properties of the sequence agree with those predicted if every element occurred purely according to probabilities. It would be better if one could show in general that patterns could not be recognized in certain sequences by any computation whatsoever that, for example, takes less than a certain time. No such results can yet be proved, so one must for now rely on more circumstantial evidence for adequate degrees of randomness.

The fact that acceptably random sequences can indeed be generated efficiently by digital computers is a consequence of the fact that quite simple transformations, when iterated, can yield extremely complicated behaviour. Simple computations are able to produce sequences whose origins can apparently be deduced only by much more complex computations.

Most current practical random sequence generation computer programs are based on linear congruence relations (of the form ) (e.g., [1]), or linear feedback shift registers [4] (analogous to the linear cellular automata discussed below). The linearity and simplicity of these systems has made complete algebraic analyses possible and has allowed certain randomness properties to be proved [1, 4]. But it also leads to efficient algebraic algorithms for predicting the sequences (or deducing their seeds), and limits their degree of randomness.

An efficient random sequence generator should produce a sequence of length in a time at most polynomial in (and linear on most kinds of computers). It is always possible to deduce the seed (say of length ) for such a sequence by an exhaustive search which takes a time at most . But if in fact such an exponentially long computation were needed to find any pattern in the sequence, then the sequence would be random enough for almost any practical application (so long as it involved less than exponential time computations).

No such lower bounds on computational complexity are yet known. It is however often possible to show that one problem is computationally equivalent to a large class of others. So, for example, one could potentially show that the problem of deducing the seed for certain sequences was NP-complete [5]: special instances of the problem would then correspond to arbitrary problems in the class NP, and the problem would in general be as difficult as any in NP. (One should also show some form of uniform reducibility to ensure that the problem is difficult almost always, as well as in the worst case.) The class NP (nondeterministic polynomial time) includes many well-studied problems (such as integer factorization), which involve finding objects (such as prime factors) that satisfy polynomial-time-testable conditions, but for which no systematic polynomial time (P) algorithms have ever been discovered.

Random sequence generators have been constructed with the property that recognizing patterns in the sequences they produce is in principle equivalent to solving certain difficult number theoretical problems [2] (which are in the class NP, but are not NP-complete). An example is the sequence of least significant bits obtained by iterating the transformation , where and are large primes (congruent to 3 modulo 4) [6]. Making predictions from this sequence is in principle equivalent to factoring the integer [6, 7].

There are in fact many standard mathematical processes which are simple to perform, yet produce sequences so complicated that they seem random. An example is taking square roots of integers. Despite the simplicity of its computation, no practical statistical procedures have revealed any regularity in say the digit sequence of (e.g., [8]). (Not even its normality or equidistribution has however actually been proved.) An even simpler example is multiplication by , say in base 6. (2) Starting with 1, one obtains the pattern shown in Fig. 1.1 The center vertical column of values, corresponding to the leading digit in the fractional part of , seems random [10]. (Though again not even its normality has actually been proved.) Given the complete number obtained at a particular stage, multiplication by suffices to reproduce the original seed. But given only the center column, it seems difficult to deduce the seed.

Many physical processes also yield seemingly random behaviour. In some cases, the randomness can be attributed to the effects of external random input. Thus, for example, ''analog'' random sequence generators such as noise diodes work by sampling thermal fluctuations associated with a heat bath containing many components. Coin tossings and Roulette wheels produce outcomes that depend sensitively on initial velocities determined by complex systems with many components. It seems however that in all such cases, sequences extracted sufficiently quickly can depend on only a few components of the environment, and must eventually show definite correlations. One suspects in fact that randomness in many physical systems (probably including turbulent fluids) arises not from external random input, but rather through intrinsic mathematical processes [11]. This paper discusses the generation of random sequences by simple procedures which seem to capture many features of this phenomenon. The investigations described may not only suggest practical methods for random sequence generation, but also provide further understanding of the nature and origins of randomness in physical processes.