Computation Theory of Cellular Automata (1984)

Section 2 showed that after any finite number of time steps, the set of configurations generated by any cellular automaton forms a regular language. Some cellular automata yield regular languages even in the infinite time limit; others appear to generate limit sets corresponding to more complicated formal languages. Cellular automata which exhibit different classes of overall behaviour appear to yield characteristically different limiting languages.

As discussed in Sect. 4, some cellular automata in Tables 3.1--3.3 yield regular languages which attain a fixed form after a few time steps. The limit sets for such cellular automata are thus regular languages. In fact, except for surjective rules, the limit sets found appear to contain only temporally periodic configurations, and are therefore finite complement languages. These cellular automata exhibit simple large time behaviour, characteristic of classes 1 and 2.

Rule 128 provides a more complicated example, discussed in Sect. 4. After time steps, any pair of ones in configurations generated by this rule must be separated by at least sites. The complete set of possible configurations forms a finite complement regular language, with a minimal DFA illustrated in Fig. 4.1 whose size increases linearly with time. After many time steps, almost all initial configurations evolve to the null configuration. However, even after an arbitrarily long time, configurations containing just a single block of ones may still appear. A block of ones, flanked by infinite sequences of zeroes, is generated after any number of time steps from a block of ones. Such configurations therefore have exactly one predecessor under any number of time steps of the cellular automaton evolution. They thus appear in the limit set for rule 128, although if all initial configurations are given equal weight, they are generated with zero probability. Once generated, their evolution is never periodic. An increasing number of distinct blocks are excluded from the successive obtained by evolution under rule 128. The set of configurations generated in the infinite time limit does not, therefore, correspond to a finite complement language. Nevertheless, the set does form a regular language, shown in Fig. 6.1. While the set contains an infinite number of configurations, its entropy vanishes, as given by the limit of Eq. (4.1).

Several rules given in Tables 3.1--3.3 exhibit behaviour similar to rule 128: they generate (finite complement) regular languages whose minimal grammars increase in size linearly or quadratically with time, but in the infinite time limit, yield regular language limit sets. These limit sets contain one or a few periodic configurations, together with an infinite number of aperiodic configurations, generated from a set of initial configurations of measure zero. The sets have zero entropy, and do not correspond to finite complement languages. (Only trivial finite complement languages can have zero entropy.) All the class 1 cellular automata (except for the trivial rule 0) in Tables 3.1--3.3 exhibit such limiting behaviour. The generation of limit sets corresponding to regular languages that are not finite complement languages appears to be a general feature of class 1 cellular automata.

Tables 3.1--3.3 suggest the result, discussed in Sect. 4, that cellular automata capable of class 3 or 4 behaviour give rise to sets of configurations represented by regular languages whose complexity increases rapidly with time. The limit sets for such cellular automata are therefore presumably not usually regular languages. If a finite description of them can be given, it must be in terms of more complicated formal languages.

Any language that can be described by a regular grammar must obey the regular language ''pumping lemma'' (e.g. [7]). This requires that it be possible to write all sufficiently long symbol sequences appearing in the language in the form so that for any the symbol sequence also appears in the language. (This result follows from the fact that any sufficiently long sequence must correspond to a path containing a cycle in the DFA. This cycle may then be traversed any number of times, yielding arbitrarily repeated symbol sequences.) The sets generated after a finite number of time steps in cellular automaton evolution always obey this condition: arbitrary repetitions of the string are obtained by evolution from initial configurations containing arbitrarily-repeated sequences evolving to .

It is possible to construct cellular automata for which the regular language pumping lemma fails in the large time limit, and which therefore yield non-regular language limit sets. In one class of examples [39, 34], there are pairs of localized structures which propagate with opposite velocities from point sources. After time steps, such cellular automata generate configurations consisting roughly of repetitions of sequences

In the infinite time limit, arbitrarily long identical pairs of symbol sequences thus appear. The limit sets for such cellular automata are therefore not regular languages. Instead it appears that they correspond to context-free languages.

The pumping lemma for regular languages may be generalized to context-free languages. Any sufficiently long sequence in a context-free language must be of the form such that is also in the language for any . The possibility for separated equal length identical substrings is a reflection of the non-local nature of context-free languages, manifest for example in the indefinitely large memory stacks required in machines to recognize them.

Limiting sets of configurations of the form (6.1) that violate the regular language pumping lemma nevertheless obey the context-free language pumping lemma, and thus correspond to context-free languages.

The correspondence between sets of infinite cellular automaton configurations and context-free languages is slightly more complicated than for regular languages. In all cases, the cellular automaton configurations correspond to infinite symbol sequences generated according to a formal grammar. For regular languages, it is also possible to construct finite automata which recognize words in the language, starting at any point. The necessity for a stack memory in the generation of context-free languages makes their recognition starting at any point in general impossible. Infinite configurations generated by context-free grammars must thus be viewed as concatenations of finite context-free language words. Only at the boundaries between these words is the stack memory for the machine generating the configuration empty, so that sequences of symbols may be recognized. Configurations generated by context-sensitive and more complicated grammars must be considered in an analogous way.

If the limit set for a cellular automaton is a context-free language, whose generation requires a computer with an indefinitely large stack memory, then one expects that the regular language sets obtained at successive finite time steps in the evolution of the cellular automaton would require progressively larger finite size stack memories. If the limiting context-free grammar contains say (non-terminal) productions, then there are possible stack configurations after time steps, and the set of configurations obtained may be recognized by a finite automaton with about states. In addition, the context-free pumping lemma is satisfied for repetitions of substrings of length up to about . Regular languages that approximate context-free languages for time steps should have comparatively simple repetitive forms. The regular languages of Fig. 4.1 generated at finite times by rule 128 have roughly the expected form, but their limit is in fact a regular language. The absence of obvious patterns in the regular grammars such as Figs. 4.2--4.4 generated by typical class 3 cellular automata after even a few time steps suggests that the limiting languages in these cases are not context-free. They are presumably context-sensitive or unrestricted languages.

The entropies of regular languages are always logarithms of algebraic integers [as in Eq. (3.5)]. Context-free languages may, however, have entropies given by logarithms of general algebraic numbers (whose minimal polynomials are not necessarily monic). The enumeration of words in a formal language may be cast in algebraic terms by considering the sequence of words in the language as a formal power series satisfying equations corresponding to the production rules for the language (e.g. [40]). For the simple regular language (repetition understood) of Fig. 1.3b, with production rules

(where the terminal symbols and represent 0 and 1 respectively), the corresponding equations are

Solving for as the start symbol one obtains

The expansion of this generating function (accounting for the non-commutative nature of symbol string concatenation) yields the sequence of possible words in the language. Replacing all terminal symbols in the generating function by a dummy variable , the coefficient of in its expansion gives the number of distinct symbol sequences of length in the language. The asymptotic growth rate of this number, and thus the entropy of the language, are then determined by the smallest real root of the (monic) denominator polynomial. The generating function for any regular language is always a rational function of . For a context-free language, however, the equations analogous to (3.8) are in general non-linear in the . At least for unambiguous languages, the positions of the leading poles in the resulting generating functions obtained by solving these simultaneous polynomial equations are nevertheless algebraic numbers [41].

There is a finite procedure to find the minimal regular grammar that generates a given regular language, as described in Sect. 2. No finite procedure exists in general, however, to find the minimal context-free or other grammar corresponding to a more complicated language. The analogue of the regular language complexity is thus formally non-computable for context-free and more complicated languages. This is an example of the result that no finite procedure can in general determine the shortest input program that generates a particular output when run for an arbitrarily long time on some computer (e.g. [42]). Explicit testing of successively longer programs is inadequate, since the insolubility of the halting problem implies that no upper bound can in general be given for the time before the required sequence is generated. Particular simple cases of this problem are nevertheless soluble, so that, for example, the minimal grammars for regular languages are computable.

The entropies for regular and context-free languages may be computed by the finite procedures described above. The entropies for context-sensitive (type 1) and unrestricted (type 0) languages are, however, in general non-computable numbers [43]. Bounds on them may be given. But no finite procedure exists to calculate them to arbitrary precision. (They are in many respects analogous to the non-computable probabilities for universal computers to halt given random input.) If many class 3 and 4 cellular automata do indeed yield limit sets corresponding to context-sensitive or unrestricted languages, then the entropies of these sets are in general non-computable.

The discussion so far has concerned the generation of infinite configurations by cellular automaton evolution. One may also consider the evolution of initial configurations in which nonzero sites exist only in a finite region. Then for class 3 cellular automata with almost all initial states, the region of nonzero sites expands linearly with time. (Such expansion is guaranteed if, for example, and so on.) For class 4 cellular automata, the region may expand and contract with time. One may characterize the structures generated by considering the set of finite sequences generated at any time by evolution from a set of finite initial configurations. For class 3 cellular automata, this set appears to be no more complicated than a context-sensitive language, while for class 4 cellular automata, it may be an unrestricted language. Notice that the set generated after a fixed finite number of time steps always corresponds to a regular language, just as for infinite configurations. (The regular grammar for these finite configurations consists of all paths with the relevant length that begin and end at the 000 node of the NDFA analogous to Fig. 2.1.)

Consider the language formed by the set of sequences of length generated after any number of time steps in the evolution of a class 3 cellular automaton from all possible initial configurations with size (14).This language appears to be at most context-sensitive, since a word of length in it can presumably be recognized in a finite time by a computer with a memory of size at most . In its simplest form, the computer operates by testing configurations generated by evolution from all possible initial states. Since the configurations expand steadily with time, the evolution of each configuration need be traced only until it is of size ; the required configuration of length is either reached at that time, or will never be reached.

In a class 4 cellular automaton, evolution from an initial configuration of size may yield arbitrarily large configurations, but then ultimately contract to give a size configuration. No upper bound on the time or memory space required to generate the size configuration may therefore be given. The problem of determining whether a particular finite configuration is ever generated in the evolution of a class 4 cellular automaton from one of a finite set of initial configurations may therefore in general be formally undecidable. No finite computation can give all the structures of a particular size ultimately generated in the evolution of a class 4 cellular automaton.

The procedure for recognizing finite configurations generated by class 3 cellular automata, while finite in principle, may require large computational resources. Whenever the context-sensitive language corresponding to the set of finite configurations cannot be described by a context-free or simpler grammar, the problem of recognizing words in the language is PSPACE-complete with respect to the lengths of the words (e.g. [28]). It can thus presumably be performed essentially no more efficiently than by testing the structures generated by the evolution of each of the possible finite initial configurations.

As well as considering the evolution of finite complete configurations, one may also consider the generation of finite sequences of symbols in the evolution of infinite configurations. Enumeration of sets of length sequences that can and cannot occur provide partial characterizations of sets of infinite configurations. However, even for configurations generated at a finite time , such enumeration in general requires large computational resources. A symbol sequence of length appears only if at least one length initial block evolves to it after time steps. A computation time polynomial in and suffices to determine whether a particular candidate initial block evolves to a required sequence. The problem of determining whether any such initial block exists is therefore in the class NP. One may expect that for many cellular automata, this problem is in fact -complete. (The procedure of Sect. 2 provides no short cut, since the construction of the required DFA is an exponential computational process.) It may therefore effectively be solved essentially only by explicit simulation of the evolution of all exponentially-many possible initial sequences.

In the limit of infinite time, the problem of determining whether a particular finite sequence is generated in the evolution of a cellular automata becomes in general undecidable. For a cellular automaton with only class 1 or 2 behaviour, the limit set always appears to correspond to a regular language, for which the problem is decidable. But for class 3 and 4 cellular automata, whose limit sets presumably correspond to more complicated formal languages, the problem may be undecidable. (The problem is in general in the undecidability class [46]; the set of finite sequences that occur is thus recursively enumerable, but not necessarily recursive.) Even when the general problem is undecidable, the appearance of particular finite sequences in the limit set for a cellular automaton may be decidable. The fraction of particular sequences whose appearance in the limit set is undecidable provides a measure of the degree of unpredictability or ''computational achievement'' of the cellular automaton evolution (presumably related to ''logical depth'' [47]).