Stephen Wolframstephenwolfram.com
Publications by Stephen Wolfram * Articles * Cellular Automata * Universality and Complexity in Cellular Automata (1984)
Universality and Complexity in Cellular Automata (1984)


4. Quantitative Characterizations of Cellular Automaton Behaviour

This section describes quantitative statistical measures of order and chaos in patterns generated by cellular automaton evolution. These measures may be used to distinguish the four classes of behaviour identified qualitatively above.

Consider first the statistical properties of configurations generated at a particular time step in cellular automaton evolution. A disordered initial state, in which each site takes on its possible values with equal independent probabilities, is statistically random. Irreversible cellular automaton evolution generates deviations from statistical randomness. In a random sequence, all possible subsequences (``blocks'') of length must occur with equal probabilities. Deviations from randomness imply unequal probabilities for different subsequences. With probabilities for the possible sequences of site values in a length block, one may define a specific ``spatial set entropy''

where for and , and a specific ``spatial measure entropy''

In both cases, the superscript indicates that ``spatial'' sequences (obtained at a particular time step) are considered. The ``set entropy'' (4.1) is determined directly by the total number of length blocks generated (with any nonzero probability) in cellular automaton evolution, according to

In the ``measure entropy'' (4.2) each block is weighted with its probability, so that the result depends explicitly on the probability measure for different cellular automaton configurations, as indicated by the subscript . Set entropy is often called ``topological entropy''; measure entropy is sometimes referred to as ``metric entropy'' (2) (e.g. [6]). For blocks of length 1, the measure entropy is related to the densities , of sites with each of the possible values . is related to the densities of ``digrams'' (blocks of length 2), and so on. In general, the measure entropy gives the average ``information content'' per site computed by allowing for correlations in blocks of sites up to length . Note that the entropies (4.1) and (4.2) may be considered to have units of (-ary) bits per unit distance.

In the equations below, stands for either set entropy or for measure entropy .

The definitions (4.1) and (4.2) yield immediately

The first inequality is saturated (equality holds) only for ``equidistributed'' systems, in which all nonzero block probabilities are equal. The second inequality is saturated if all possible length blocks of site values occur, but perhaps with unequal probabilities. only for ``-random'' sequences [7], in which all possible sequences of site values occur with equal probabilities. In addition to (4.4), the definitions (4.1) and (4.2) imply

if and only if just one length block occurs with nonzero probability, so that also. As discussed below, the inequality (4.5) is saturated for class 1 cellular automata.

Both set and measure entropies satisfy the subadditivity condition

The inequality is saturated if successive blocks of sites are statistically uncorrelated. In general, it implies some decrease in with (for example, ). For cellular automata with translation invariant initial probability measures, stronger constraints may be obtained (analogous to those for ``stationary'' processes in communication theory [8]). First, note that bounds on valid for any set of probabilities also apply to , since may formally be reproduced from the definition (4.2) for by a suitable (extreme) choice of the . The probability for the sequence of site values is given in general by

where denotes the conditional probability for a site value , preceded by site values . Defining a total entropy

and corresponding conditional total entropy

one obtains

Hence

so that the set and measure entropies for a translationally invariant system decrease monotonically with the block size . One finds in addition in this case that

so that is a convex function of .

With the definition , this implies that there exists a critical block size , such that

The significance and values of the critical block size will be discussed in section 7.

The entropies and may be evaluated either for many blocks in a single cellular automaton configuration, or for blocks in an ensemble of different configurations. For smooth probability measures on the ensemble of possible initial configurations, the results obtained in these two ways are almost always the same. (A probability measure will be considered ``smooth'' if changes in the values of a few sites in an infinite configuration lead only to infinitesimal changes in the probability for the configuration.) The set entropy is typically independent of the probability measure on the ensemble, for any smooth measure. The measure entropy in general depends on the probability measure for initial configurations, although for class 3 cellular automata, it is typically the same for at least a large class of smooth measures. Notice that with smooth measures, the values of and are the same whether the length blocks used in their computation are taken disjoint or overlapping.

The entropies (4.1) and (4.2) are defined for infinite cellular automata. A corresponding definition may be given for finite cellular automata, with a maximum block length given by the total number of sites in the cellular automaton. The entropies and are related to global properties of the state transition diagram for the finite cellular automaton. The value of at a particular time is determined by the fraction of possible configurations which may be reached at that time by evolution from any initial configuration. The limiting value of at large times is determined by the fraction of configuration on cycles in the state transition graph. Starting from an initial ensemble in which all configurations occur with equal probabilities, the limiting value of is equal to the limiting value of if all transient trees in the state transition graph for the finite cellular automaton are identical, so that all configurations with nonzero probabilities are generated with the same probability (cf. [2]).

As mentioned in section 2, the configurations of an infinite cellular automaton may be considered as elements of a Cantor set. For an ensemble of disordered configurations (in which each site takes on its possible values with equal independent probabilities), this Cantor set has fractal dimension 1. Irreversible cellular automaton evolution may lead to an ensemble of configurations corresponding to elements of a Cantor set with dimension less than one. The limiting value of as gives the fractal or ``set'' dimension of this set.

Relations between entropy and dimension may be derived in many ways (e.g. [6], [7], [8] and [9]). Consider a set of numbers in the interval of the real line. Divide this interval into bins of width , and let the fraction of bins containing numbers in the set be . For large (small bin width), this number grows as . The exponent is the Kolmogorov dimension (or ``capacity'' (cf. [8])) of the set. If the set contains all real numbers in the interval , then , and , as expected. If the set contains only a finite number of points, then must tend to a constant for large , yielding . The classic Cantor set consists of real numbers in the interval , whose ternary decomposition contains only the digits and . Dividing the interval into equal bins, it is clear that of these bins contain points in the set. The dimension of the set is thus . This dimension may also be found by an explicit recursive geometrical construction, using the fact that the set is ``self-similar'', in the sense that with appropriate magnification, its parts are identical to the whole.

The example above suggests that one may define a ``set dimension'' according to

where is the number of bins which contain elements of the set. The bins are of equal size, and their total number is taken as . Except in particularly pathological examples, (3) the dimension obtained with this definition is equal to the more usual Hausdorff (or ``fractal'') dimension (e.g. [11]) obtained by considering the number of patches at arbitrary positions required to cover the set (rather than the number of fixed bins containing elements of the set).

The definition (4.14) may be applied directly to cellular automaton configurations. The ``bins'' may be taken to consist of cellular automaton configurations in which a block of sites has a particular sequence of values. The definition (4.3) of set entropy then shows that the set dimension is given by

A disordered cellular automaton configuration, in which all possible sequences of site values occur with nonzero probability (or an ensemble of such configurations), gives , as expected. Similarly, a homogeneous configuration, such as the null configuration, gives .

The set of configurations which appear at large times in the evolution of a cellular automaton constitute the attractors for the cellular automaton. The set dimension of these attractors is given in terms of the entropies for configurations appearing at large times by eq. (4.15).

Accurate direct evaluation of the set entropy from cellular automaton configurations typically requires sampling of many more than length blocks. Inadequate samples yield systematic underestimates of . Direct estimates are most accurate when all nonzero probabilities for length blocks are equal. In this case, a sample of blocks yields an entropy underestimated on average by approximately

Unequal probabilities increase the magnitude of this error, and typically prevent the generation of satisfactory estimates of from direct simulations of cellular automaton evolution. (If the probabilities follow a log normal distribution, as in many continuous chaotic dynamical systems [12], then the exponential in eq. (4.16) is apparently replaced by a power [13].)

The dimension (4.15) is given as the limiting exponent with which increases for large . In the formula (4.15), this exponent is obtained as the limit of for large . If indeed increases roughly exponentially with , then the alternative formula

is typically more accurate if entropy values are available only for small .

The set dimension (4.15) may be used to characterize the set of configurations occurring on the attractor for a cellular automaton, without regard to their probabilities. One may also define a ``measure dimension'' which characterizes the probability measure for the configurations (cf. [12]):

It is clear that

The measure dimension is equal to the ``average information per symbol'' contained in the sequence of site values in a cellular automaton configuration. If the sequence is completely random (or ``-random'' [7]), then the probabilities for all sequences of length must be equal for all , so that . In this case, there is no redundancy or pattern in the sequence of site values, so that determination of each site value represents accquisition of one (-ary) bit of information. A cellular automaton configuration with any structure or pattern must give .

In direct simulations of cellular automaton evolution, the probabilities for each possible length block are estimated from the frequencies with which the blocks occur. These estimated probabilities are thus subject to Gaussian errors. Although the individual estimated probabilities are unbiased, the measure entropy deduced from them according to eq. (4.2), is systematically biased. Its mean typically yields a systematic underestimate of the true measure entropy, and with fixed sample size, the underestimate deteriorates rapidly with increasing , making an accurate estimate of impossible. However, since an unbiased estimate may be given for any polynomial function of the , unbiased estimated upper and lower bounds for the measure entropy may be obtained from estimates for polynomials in just larger and just smaller than for [14]. In this way, it may be possible to obtain more accurate estimates of for large , and thus of .

The ``spatial'' entropies (4.1) and (4.2) were defined in terms of the sequence of site values in a cellular automaton configuration at a particular time step. One may also define ``temporal'' entropies which characterize the sequence of values taken on by a particular site through many time steps of cellular automaton evolution, as illustrated in fig. 7. With probabilities for the possible sequences of values for a site at successive time steps, one may define a specific temporal set entropy in analogy with eq. (4.1) by

and a specific temporal measure entropy in analogy with eq. (4.2) by



[ Figure 7 ] Space-time regions sampled in the computation of (a) spatial entropies, (b) temporal entropies and (c) patch or mapping entropies. In case (c), the values of sites in the cross-hatched area are completely determined by values in the black ``rind''.

These entropies satisfy relations directly analogous to those given in eqs. (4.3) through (4.6) for spatial entropies. They obey relations analogous to (4.11) and (4.12) only for cellular automata in ``equilibrium'', statistically independent of time. The temporal entropies (4.20) and (4.21) may be considered to have units of (-ary) bits per unit time.

Sequences of values in particular cellular automaton configurations typically have little similarity with the ``time series'' of values attained by a particular site under cellular automaton evolution. The spatial and temporal entropies for a cellular automaton are therefore in general quite different. Notice that the spatial entropy of a cellular automaton configuration may be considered as the temporal entropy of a pure shift mapping applied to the cellular automaton configuration.

Just as dimensions may be assigned to the set of spatial configurations generated in cellular automaton evolution, so also one may assign dimensions to the set of temporal sequences generated by the evolution. The temporal set dimension may be defined in analogy with eq. (4.15) by

and the temporal measure dimension may be defined by

If the evolution of a cellular automaton is periodic, so that each site takes on a fixed cycle of values, then

As discussed in section 6 below, class 2 cellular automata yield periodic structures at large times, so that the correspondingly temporal entropies vanish.

As a generalization of the spatial and temporal entropies introduced above, one may consider entropies associated with space-time ``patches'' in the patterns generated by cellular automaton evolution, as illustrated in fig. 7. With probabilities for the possible patches of spatial width and temporal extent , one may define a set entropy

and a measure entropy

Clearly

If no relation existed between configurations at successive time steps then the entropies (4.25) and (4.26) would be bounded simply by

The cellular automaton rules introduce definite relations between successive configurations and tighten this bound. In fact, the values of all sites in a space-time patch are determined according to the cellular automaton rules by the values in the ``rind'' of the patch, as indicated in fig. 7. The rind contains only sites (where is the ``range'' of the cellular automaton rule, defined in section 2), so that

For large (and fixed ), therefore

If both and tend to infinity with fixed, eq. (4.30) implies that the ``information per site'' in a patch must tend to zero. The evolution of cellular automata can therefore never generate random space-time patterns.

With , fixed, the length horizontal section of the rind makes a negligible contribution to the entropies. The entropy is maximal if the vertical columns in the rind are statistically independent, so that

In addition,

where the bounds are saturated for large if the time series associated with different sets of sites are statistically uncorrelated.

The limiting set entropy

for temporally-extended patches is a fundamental quantity equivalent to the set (or topological) entropy of the cellular automaton mapping in symbolic dynamics. may be considered as a dimension for the mapping. It specifies the asymptotic rate at which the number of possible histories for the cellular automaton increases with time. The limiting measure entropy

gives the average amount of ``new information'' contained in each cellular automaton configuration, and not already determined from previous configurations. Equations (4.31) and (4.32) show that

In addition,

The basic cellular automaton time evolution rule (2.1) implies that the value of a site at a particular time step depends on sites a maximum distance away on the previous time step according to the function . After time steps, the values of the site could depend on sites at distances up to , so that features in patterns generated by cellular automaton evolution could propagate at ``speeds'' up to sites per time step. For many rules, however, the value of a site after many time steps depends on fewer initial site values, and features may propagate only at lower speeds. In general, let denote the minimum for which the value of site depends only on the initial values of sites . Then the maximum propagation speed associated with the cellular automaton rule may be defined as

(The rule is assumed symmetric; for nonsymmetric rules, distinct left and right propagation speeds may be defined.) Clearly,

When , finite regions of the cellular automaton must ultimately become isolated, so that

The construction of fig. 8 shows that for any ,

In the limit , the construction implies

The ratio of temporal to spatial entropy is thus bounded by the maximum propagation speed in the cellular automaton. The relation is consistent with the assignment of units to the spatial and temporal entropies mentioned above.

The corresponding inequalities for mapping entropies are:

The quantity defined by eq. (4.37) gives the maximum speed with which any feature in a cellular automaton may propagate. With many cellular automaton rules, however, almost all ``features'' propagate much more slowly. To define an appropriate maximum average propagation speed, consider the effect after many time steps of changes in the initial state. Let denote the probability that the value of a site at position is changed when the value of a site at position is changed time steps before. The form of for various cellular automaton rules is suggested by fig. 3. may be considered as a Green function for the cellular automaton evolution. For large , typically vanishes outside a ``cone'' defined by . may then be considered as a maximum average propagation speed. In analogy with eqs. (4.41) and (4.42), one expects

Mapping and temporal entropies thus vanish for cellular automata with zero maximum average propagation speed. Cellular automata in class 2 have this property.



[ Figure 8 ] Pattern of dependence of temporal sequences on spatial sequences, used in the proof of inequalities between spatial and temporal entropies.

The maximum average propagation speed specifies a cone outside which almost always vanishes. One may also define a minimum average propagation speed , such that for almost any .

The Green function gives the probability that a particular site is affected by changes in a previous configuration. The total effect of changes may be measured by the ``Hamming distance'' between configurations before and after the changes, defined as the total number of site values which differ between the configurations after time steps. ( is analogous to Lyapunov exponents for continuous dynamical systems.) Changing the values of initial sites in a small region, may be given as a space integral of the Green function, and for large obeys the inequality

to be compared with the result (4.43) obtained above.

The definitions and properties of dimension given above suggest that the behaviour of these quantities determines the degree of ``chaotic'' behaviour associated with cellular automaton evolution. ``Spatial chaos'' occurs when , and ``temporal chaos'' when . Temporal chaos requires a nonzero maximum average propagation speed for features in cellular automaton patterns, and implies that small changes in initial conditions lead to effects ever-increasing with time.

previous  l  next