Quantities such as entropy and dimension, suggested by information theory, give only rough characterizations of cellular automaton behaviour. Computation theory suggests more complete descriptions of self-organization in cellular automata (and other systems). Sets of cellular automaton configurations may be viewed as formal languages, consisting of sequences of symbols (site values) forming words according to definite grammatical rules.
The set of all possible initial configurations corresponds to a trivial formal language. The set of configurations obtained after any finite number of time steps are found to form a regular language. The words in a regular language correspond to the possible paths through a finite graph representing a finite state machine. It can be shown that a unique smallest finite graph reproduces any given regular language (see ref. ). Examples of such graphs are shown in Fig. 8. These graphs give complete specifications for sets of cellular automaton configurations (ignoring probabilities). The number of nodes in the smallest graph corresponding to a particular set of configurations may be defined as the `regular language complexity' of the set. It specifies the size of the minimal description of the set in terms of regular languages. Larger correspond to more complicated sets. (Note that the topological entropy of a set is given by the logarithm of the algebraic integer obtained as the largest root of the characteristic polynomial for the incidence matrix of the corresponding graph. The characteristic polynomials for the graphs in Fig. 7 are , and
The regular language complexity for sets generated by cellular automaton evolution almost always seems to be nondecreasing with time. Increasing signals increasing self-organization. may thus represent a fundamental property of self-organizing systems, complementary to entropy. It may, in principle, be extracted from experimental data.
Cellular automata that exhibit only class 1 and 2 behaviour always appear to yields sets that correspond to regular languages in the large time limit. Class 3 and 4 behaviour typically gives rise, however, to a rapid increase of with time, presumably leading to limiting sets not described by regular languages.
Formal languages are recognized or generated by idealized computers with a `central processing unit' containing a fixed finite number of internal states, together with a `memory'. Four types of formal languages are conventionally identified, corresponding to four types of computer:
Examples are known of cellular automata whose limiting sets correspond to all four types of language (L. Hurd, in preparation). Arguments can be given that the limit sets for class 3 cellular automata typically form context-sensitive languages, while those for class 4 cellular automata correspond to unrestricted languages. (Note that while a minimal specification for any regular language may always be found, there is no finite procedure to obtain a minimal form for more complicated formal languages; no generalization of the regular language complexity may thus be given.)