A New Kind of Science and the Future of Mathematics

AMS/MAA joint invited address at the Joint Mathematics Meetings, Phoenix, Arizona, January 7, 2004

Thank you for that nice introduction. And thank you for inviting me.

Well, what I’d like to do this morning is to talk about some things I’ve done over the course of many years—that I think have some significance for mathematics and mathematicians. Mostly I’m just going to sketch things; if you want details, they’re mostly in that big book.

Well, I guess it all started with a little computer experiment that I did at the beginning of the 1980s. That I’d like to show you.

Well, I’d been working on some basic problems in physics, and a little bit also in biology. And I’d wanted the simplest possible models. And that led me to a particular kind of very simple model—that turned out to be a version of what had been called a cellular automaton.

Here’s how a one- dimensional cellular automaton works. It’s got a line of cells, each either black or white. [See A New Kind of Science, page 24.]

And it evolves down the page, in discrete steps. [See A New Kind of Science, page 24.]

Replay Animation

With the color of each cell at each step being determined by a fixed rule—from the color of the cell and its neighbors on the step before. So that if a sub i is the color of the i^th cell at step t:

Well, you can represent that rule by an icon that just shows what happens in each of the 8 possible cases. [See A New Kind of Science, page 24.]

So starting off for example from a single black cell, this rule just gives a simple uniform region of black. [See A New Kind of Science, page 24.]

OK, let’s try changing the rule a bit. Here’s another example. [See A New Kind of Science, page 25.]

This gives a checkerboard pattern.

So far none of this is at all surprising. We’ve got very simple rules, and we’re getting very simple behavior out.

So is that always how it works? Well, take a look at this. [See A New Kind of Science, page 25.]

It doesn’t seem to repeat. Let’s run it longer. [See A New Kind of Science, page 26.]

Well, it’s a very regular nested fractal pattern.

And perhaps there’s a general result: that if the rule and then initial conditions are simple, then the behavior will always have to be correspondingly regular and simple. Well, at least back in 1981 that’s what I assumed was true.

But one day I decided to do a systematic experiment. And just to run every single one of the possible simplest cellular automata.

There are 256 of them, and you can number their rules like this. [See A New Kind of Science, page 53.]

And here’s what they all do. There’s quite a bit of diversity. But mostly the behavior looks pretty simple. Uniform, or repetitive, or at least nested. [See A New Kind of Science, page 55.]

But if you look carefully, there are some cases where other things seem to be going on. The first one is rule 30. [See A New Kind of Science, page 27.]

So let’s take a look at that.

It seems quite complicated. It’s asymmetric because the rule was. Let’s try running it a bit longer. [See A New Kind of Science, page 30.]

Well, there are a bunch of periodic stripes over on the left. But otherwise, this looks really complicated. Actually kind of random.

Well, when I first saw this, I thought it must just be a problem with our visual system. That somehow there is regularity, but our visual system just doesn’t pick it up.

So I started trying all sorts of statistical and mathematical methods on it. And there are certainly lots of things one can say about rule 30. It’s an onto map. Small perturbations always grow. It has a certain spectrum of periodic points.

And in the pattern here a depth d diagonal has period at most 2^d. On the left, the period’s much much lower. But on the right, it’s close to the bound.

And let’s say one looks at the center vertical column of cells. One can imagine using it to make a (quotes) “random walk”. Here are the first million steps. [See A New Kind of Science, page 871.]

It certainly doesn’t ever seem to repeat. And if one adds in an occasional neighboring cell, one can prove that. And in fact it seems to pass every standard statistical test of randomness. Including cryptanalytic ones.

And actually we’ve been using essentially this sequence to make random integers in Mathematica for the past 16 years. And it’s passed zillions of tests—even while many other proposed pseudorandom generators have failed. [See A New Kind of Science, page 30.]

But still, rule 30 is so simple underneath, one might figure that out there in the world of math, there has to be some way to analyse it.

Well, as a warmup let’s take a look at rule 90. [See A New Kind of Science, page 26.]

It turns out this rule just adds neighbors mod 2.

So it makes Pascal’s triangle mod 2. [See A New Kind of Science, page 611.]

And that means you can find the color at position x and step t from a binomial coefficient. Or actually also just by taking bitwise ORs of the digits in the integers x and t. [See A New Kind of Science, page 608.]

Well, here’s another example like that. Rule 150.

Another additive rule. [See A New Kind of Science, page 611.]

But now dealing with trinomials, not binomials. So it involves Gegenbauer functions. [See A New Kind of Science, page 610.]

Well, any additive rule will give a nested pattern. That I suspect can be described by some generalized hypergeometric function.

But rule 30 isn’t additive, and there’s no special function that’s going to describe it. [See A New Kind of Science, page 30.]

Of course at each step it’s just a simple Boolean function. [See A New Kind of Science, page 616.]

Which one can write in several ways.

But the point is that its iterates aren’t simple. Iterating rule 90, everything stays pretty simple. [See A New Kind of Science, page 618.]

But iterating rule 30, it doesn’t. [See A New Kind of Science, page 618.]

These are the minimal DNF expressions one gets up to 5 steps. And here are the absolute minimal NAND trees up to two steps. [See A New Kind of Science, page 619.]

And one can go on. But the basic story is that there’s definitely something real going on. There’s definitely something that’s stopping us from cracking rule 30. [See A New Kind of Science, page 30.]

But is that interesting? Is it just some isolated curiosity about rule 30? Or is it the first sign of some deep and general phenomenon?

Well, here are some other cellular automata. [See A New Kind of Science, page 67.]

Replay Animation

Very much the same kind of thing happens. In one dimension. Or two dimensions. [See A New Kind of Science, page 174.]

Or three. [See A New Kind of Science, page 183.]

But what about other systems? Cellular automaton rules have lots of special features. So what happens if you give those up?

Here’s a Turing machine for example. [See A New Kind of Science, page 78.]

In many ways it’s like a cellular automaton. But one difference is that it only ever updates one cell at each step.

Well, the first few thousand Turing machines do only these kinds of things. [See A New Kind of Science, page 79.]

But if one keeps going, still with very simple rules, one gets this. [See A New Kind of Science, page 81.]

The same kind of thing we saw in cellular automata.

And here’s an iterated finite automaton that some group theory folk recently got me interested in. [See A New Kind of Science Forum]

Again, same story.

Well, what happens if we don’t have a fixed array of cells? Here’s a sequential substitution system, that just iteratively rewrites strings. [See A New Kind of Science, page 90.]

And it takes only simple rules to get very complex behavior. [See A New Kind of Science, page 92.]

Here’s a rather different type of system: a register machine. [See A New Kind of Science, page 100.]

It’s like a minimal idealization of machine language. And this is the smallest register machine with complex behavior.

Here are what I call symbolic systems. [See A New Kind of Science, page 102.]

Kind of minimal idealizations of the underlying structure of Mathematica. Or generalizations of combinators. And again, it’s the same story. [See A New Kind of Science, page 104.]

One thing about all the systems I’ve showed so far is that they all just evolve sequentially from one step to the next. Well, here’s one that doesn’t. [See A New Kind of Science, page 205.]

I call it a multiway system. It has rules for rewriting strings, that get applied in all possible ways. It’s kind of like a semi-semi-group. And again, even simple rules give all sorts of complex behavior. [See A New Kind of Science, page 206.]

One doesn’t actually have to have any kind of process of evolution at all. Like consider a tiling problem, where one has the constraint that around every cell, only certain neighborhoods are allowed. [See A New Kind of Science, page 213.]

Well, with exactly this setup, it turns out that if one can tile the plane at all, then one can always do it periodically in one of these 171 ways. [See A New Kind of Science, page 214.]

But if one also says that the plane can’t be all white, then about the 19 millionth constraint gives this. [See A New Kind of Science, page 219.]

It’s the first necessarily nonperiodic tiling. Kind of the minimal square quasicrystal. Or an interesting two-dimensional subshift of finite type.

Well, going a bit further one sees cases like this. [See A New Kind of Science, page 221.]

A tiling forced to make a rule 30 pattern. Which would be pretty interesting to find in a real crystal.

Well, OK, what about numbers? Here are the digit sequences one gets by successively incrementing an integer. [See A New Kind of Science, page 118.]

A simple nested pattern.

Here are powers of two in base two. [See A New Kind of Science, page 119.]

Quite trivial. But here are the powers of 3 in base 2. [See A New Kind of Science, page 119.]

Kind of like a minimal linear congruential pseudorandom generator. And already remarkably complicated—very much like a cellular automaton. And actually, in cases like these, it turns out to be exactly a cellular automaton. [See A New Kind of Science, page 614.]

Here’s another example with digits. [See A New Kind of Science, page 126.]

Iterating addition and digit reversal.

Well, OK, what if one deals with integers as a whole, not digit sequences. Let’s say one defines a recursive function. Well, again there are remarkably simple ones that give really complex behavior. [See A New Kind of Science, page 130.]

Primitive recursive functions are particularly easy to enumerate. And then I can tell you that this is the very simplest one that gives complex behavior. [See A New Kind of Science, page 908.]

Last summer we had an NKS summer school, and did a class project on recursive functions. Where we found some really neat things. Like here’s the very simplest nestedly recursive function with complex behavior.

OK, what about real numbers? Here’s a continuous cellular automaton, with values from 0 to 1 at each cell. [See A New Kind of Science, page 157.]

Same story.

But what about PDEs? Well, most of the PDEs people normally study have rather simple behavior, at least with simple initial conditions. [See A New Kind of Science, page 163.]

Like here’s the diffusion equation. And the wave equation. And the sine-Gordon equation.

But a few years ago I decided just to do a systematic search of possible PDEs. Many of them have pathologies. But after a little while I found these. [See A New Kind of Science, page 165.]

They’re just simple nonlinear wave equations. But even starting from a little Gaussian they give all this stuff. That looks a bit like rule 30.

Of course, it’s hard to know exactly what these PDEs do. We’ve been using these as test cases for numerical PDE solving in Mathematica for ages. But without already more or less knowing the answer, it’s hard for numerical analysis to be sure things are real.

But with the likes of rule 30 it’s a different story. [See A New Kind of Science, page 30.]

Because the bits are the bits. And so there’s no question one’s discovered something real.

And actually it’s so straightforward to get, grade schoolers could do it. And I sometimes wonder whether, a bit like these… [See A New Kind of Science, page 43.]

…some Babylonian mosaic of rule 30 might one day be unearthed.

But I doubt it. Because I think if rule 30 had been known in antiquity, science and math would have developed rather differently. Of course, now that one’s seen the rule 30 phenomenon, one can go back and find hints of it from really long ago.

Like the primes, for example. The Greeks knew an easy rule for making them. [See A New Kind of Science, page 122.]

But the pattern of primes that comes out is irregular and in some ways quite random.

Or the digits of pi. [See A New Kind of Science, page 137.]

Which seem for all practical purposes random.

There are all sorts of examples. Particularly from recreational math.

But somehow they were always treated as curiosities. That didn’t fit together, and didn’t really seem to add up to much.

Well, here’s the critical fact: they’re all signs of a very general phenomenon. That’s deep and important. And there’s a whole new conceptual framework that can be built around it—a whole new kind of science. [See A New Kind of Science, page 30.]

It changes a lot of things. Like the way we approach modelling.

I mean, for three hundred years there’s been this idea that if one wants to make a really good model of something, it should be based on equations. And that certainly worked well for Newton and friends in figuring out orbits of comets. And for lots and lots of things since then. But when we see behavior that really looks complex, it hasn’t worked very well.

Now, if one’s going to be able to do theoretical science, the systems one studies had better follow definite rules. But why should those rules be based on the constructs we’ve studied in mathematics: on numbers and calculus and so on? Can’t they be more general?

In the past one wouldn’t have had much of a way to think about more general rules. But now we have computers, with programs that in effect implement arbitrarily general rules. Of course, the programs we use in practice are long and complicated, set up for particular purposes.

But here’s a basic science question that’s at the core of the new kind of science I’ve been building. If one just enumerates simple programs, what do they do?

In computer science we’re used to studying programs constructed for particular purposes. But what if we just go out and explore the computational universe?

It’s exciting. It’s kind of like 400 years ago when telescopes were first invented. One could just point one’s telescope somewhere in the sky and see what’s out there. Well, now we get to do that in the computational universe.

And actually, being able to do that is one of the main reasons I built Mathematica. People of course use Mathematica a lot to do traditional math, and we’re proud of that. But at its core Mathematica is a lot more general. General enough to really let one explore the computational universe.

And it’s incredible what happens when one does that. First, one sees all this strange and wonderful flora and fauna. But then—and it took me years—one gradually builds new principles, and new intuition.

Our normal intuition tends to be that it’s complicated to make something complicated. But that’s just not true in the general computational universe. Look at rule 30. But in doing things like engineering we usually have the constraint that we have to foresee what the things we build are going to do. And so we’ve been forced to use only rules that have simple, foreseeable, behavior.

But the point is that nature is under no such constraint. So there’s nothing wrong with it using rules like rule 30. And I think that’s the secret of how it makes complexity.

Well, OK, so how can we use what we learn from simple programs? We can make technology. New kinds of devices. Or efficient algorithms that don’t just have iteration, or recursion, but rule 30 type behavior.

We can also make new kinds of models of things. In the past, if we saw this…

…we’d probably assume its rules were really complicated. But now we might just search, and find that this is all that’s going on.

So, OK, so let’s say we have a system in nature. Like a snowflake. [See A New Kind of Science, page 370.]

How do we make a good model of it? Not just a picture, but a model of how it works. Well, snowflakes grow by adding pieces of ice to their surface. And we can make a simple hexagonal cellular automaton that does that. [See A New Kind of Science, page 369.]

Replay Animation

Not much like a snowflake. But let’s add the next important effect: growth inhibition from latent heat. Let’s try the simplest cellular automaton rule that captures that. Here’s what we get. [See A New Kind of Science, page 371.]

Replay Animation

And here are all the stages. [See A New Kind of Science, page 371.]

Remarkably like real snowflakes. It really seems like we’ve captured what makes snowflakes have the shapes they do. And we get predictions, like that big snowflakes have little holes in them. Which indeed they do.

OK, but there are obviously details of snowflakes that we haven’t captured. But that’s always going to happen. Because the whole point of a model is to capture certain features, and idealize everything else away. And depending on what one’s interested in, one may pick different features. And so the cellular automaton model is good, for example, if one’s interested in finding a snowflake shape distribution. But it’s not so useful if one wants to calculate how fast each arm grows at a certain temperature.

There are certainly traditional PDEs for snowflakes. But they’re complicated and hard to solve. And the point is that for many questions, a cellular automaton model picks out much more directly what one really needs to know.

And it also lets one generalize, for example enumerating other shapes that might grow. ([See A New Kind of Science, page 373.]

OK. Let’s take another example. Fluid turbulence. [See A New Kind of Science, page 377.]

Where does all that randomness in a turbulent flow come from?

Well, one can ask that about randomness in any system. And I think there are really three ways randomness can arise.

First, it can be continually injected from the environment. Like with a boat bobbing on an ocean. Where there’s random motion because of the randomness of the ocean.

Usually we’d build a stochastic or probabilistic model for that, with randomness built right in. [See A New Kind of Science, page 591.]

But in recent decades another source of randomness that’s been talked about is chaos theory. Where the randomness isn’t continually injected, but instead comes from the details of the initial conditions for the system. Like when one tosses a coin or spins a wheel. Where the final outcome depends sensitively on the initial speed. [See A New Kind of Science, page 305.]

Well, there are stronger version of this, where one’s effectively taking numbers that represent initial conditions, and continually excavating higher and higher order digits from them. Like in the shift map, x[t+1] = FractionalPart[2 x[t]]. [See A New Kind of Science, page 308.]

Each step shifts every digit one position to the left. So that eventually even the most insignificant digit will have a big effect.

Well, so why does this make randomness? It makes randomness if the initial digit sequence is random. And with the usual view of real numbers, one images that a typical number picked as an initial condition would almost always have that kind of randomness.

But if one thinks in terms of information, this isn’t really unsatisfactory. It’s like saying that one gets randomness out because one put randomness in. And that the randomness is just coming from outside the system we’re looking at.

Well, is there any other possibility? Yes. Just look at rule 30. There’s no randomness from the environment—it’s a deterministic rule. And there’s no randomness in the initial conditions—they’re just a single black cell. But still, the actual evolution of the system intrinsically generates apparent randomness. [See A New Kind of Science, page 30.]

It’s a bit tricky at first to analyse this. Not least because systems often have both sensitive dependence on initial conditions, and intrinsic randomness generation. Rule 30 has sensitive dependence; the slopes of these difference patterns give its Lyapunov exponents. [See A New Kind of Science, page 251.]

And iterated maps can show intrinsic randomness generation. [See A New Kind of Science, page 150.]

But that’s not what dynamical systems theory picks up. Because it’s talking about all possible trajectories. Not just trajectories that have “simple to make” initial conditions.

But if you really want to explain randomness, you need to get more precise.

Well, let’s go back to fluid turbulence. [See A New Kind of Science, page 377.]

Where does its randomness come from? The Navier–Stokes equations haven’t told us. It could even be that we need to add Brownian motion terms. We know there are some instabilities, but we don’t really know their large-scale effects. And in numerical experiments, it’s hard to distinguish numerical glitches from genuine randomness.

Well, a different approach to modelling helps with that. At the lowest level a fluid just consists of a bunch of molecules. And we know that the details aren’t too important—we always get the same continuum behavior. So we can try just making a minimal model for the molecules. Put them on a discrete grid, with discrete velocities, and so on. [See A New Kind of Science, page 378.]

Well, in the appropriate limit, that gives Navier–Stokes, and one can actually do nice practical fluid dynamics. [See A New Kind of Science, page 380.]

But one can also investigate some basic questions. And one finds that one can indeed get large-scale apparent randomness, without any random input. Which means there’s intrinsic randomness generation going on. [See A New Kind of Science, page 30.]

One can test that in physical experiments. Because if randomness comes from the environment, or from initial conditions, it’ll always be different every time one runs an experiment. But with intrinsic randomness generation, sufficiently carefully controlled experiments will give repeatable randomness, which one can look for. You know, probabilistic models are very common. But I think a lot of them really aren’t necessary. And that a lot of systems can just make their own randomness, without getting it from outside.

Like in an Ising model. Where one normally assumes that spins are kicked around by a heat bath. Well, there’s a simple deterministic model that seems to have the same average behavior—and that’s probably much closer to an actual spin system. [See A New Kind of Science, page 982.]

It seems like there are a lot of probabilistic models where there are deterministic systems that have the same behavior. Like here’s a deterministic 2D cellular automaton on a square grid that almost—though not quite—makes enough randomness to give a circle like a random walk. [See A New Kind of Science, page 178.]

By the way, I should say something about what I mean by randomness. In everyday speech, we usually call things random if we can’t find a compressed description of them. Well, the ultimate is algorithmic randomness, where there just is no compressed description: there’s no program shorter than a sequence that will reproduce it. It’s an interesting definition. But it probably says that nothing in our universe can be considered random. Certainly not a rule 30 sequence. Because after all, it’s generated by a very small program.

But the point is that we can’t find that program. Statistical analysis, data compression, visual perception, cryptanalysis—none of them can crack rule 30. [See A New Kind of Science, page 561.]

So that’s why its sequences seem random. And one can make that more precise by talking about analysis with all possible simple programs.

This turns out to be related to the Second Law of Thermodynamics: the law of entropy increase. It’s easy to make a system like a cellular automaton microscopically reversible—so it’s bijective, just like the laws of physics seem to be. [See A New Kind of Science, page 437.]

But the point is that intrinsic randomness generation in effect encrypts even simple initial conditions to the point where they seem typical of the ensemble—at least with respect to any reasonably simple computation. And from this one seems finally to get at least the outline of a real proof of the Second Law. As well as seeing potential exceptions. [See A New Kind of Science, page 440.]

OK, well that’s a little about physics. What about biology? Where does all the complexity in biology really come from? [See A New Kind of Science, page 385.]

It’s never been clear why natural selection should give it. And in fact, I think it’s much more just a consequence of properties of simple programs. That if one picks a genetic program at random, it’ll often show complex behavior.

Let me give you an example. Here’s a mollusc shell. [See A New Kind of Science, page 423.]

Which we might have thought must be the result of some sophisticated optimization process. But actually its pattern is incredibly similar to what we get just from rule 30. [See A New Kind of Science, page 30.]

In the actual mollusc there’s a line of cells that lay down pigment as the shell grows. And we can model that as a cellular automaton. [See A New Kind of Science, page 414.]

Well, let’s say we just consider all possible rules of the simplest kind. Here are all the patterns they make. [See A New Kind of Science, page 424.]

And here’s a remarkable fact. The classes of patterns we get seem to be just the same as in actual molluscs. [See A New Kind of Science, page 423.]

It’s as if the mollusc species of the Earth just sample different possible programs, and then we see the results displayed on their shells.

You know, it usually seems hopeless to make a predictive theory of biology. Because everything seems to depend on historical accidents.

But if different species just uniformly sample simple programs, we get to make predictions just by knowing abstract properties of programs. Let me give you another example. Shapes of leaves. [See A New Kind of Science, page 403.]

They’re pretty diverse. But it turns out that there’s a simple program that seems to capture most of them. It’s just based on successive repeated branchings. [See A New Kind of Science, page 400.]

That fill in leaf material. And get shapes that look just like actual leaves—sometimes smooth, sometimes jagged, and so on. [See A New Kind of Science, page 402.]

Well, what happens with all possible such models? Here’s a simple case where it’s easy to lay out the possible shapes. [See A New Kind of Science, page 406.]

And one can summarize these in a parameter space set. [See A New Kind of Science, page 407.]

From whose properties one can deduce all sorts of things about leaves and their likely evolution.

It’s actually a mathematically rather interesting set. With properties a bit like the Mandelbrot set. By the way, for people who know about such things, my set is formed by looking at whether each tree reaches a given point. If you look instead at connectedness, you get a different set that’s much less interesting set. Well, there are lots of applications in biology. Mixing traditional math—like differential geometry—with simple programs gives lots of results on how things grow. [See A New Kind of Science, page 418.]

And in molecular biology there’s much to say about using primitives based on simple programs to describe biological processes.

But let me talk a little more about physics. And in particular fundamental physics.

Well, now that we’ve seen the things simple programs can do, we might ask the ultimate question: underneath all of physics could there just be a simple program? A little program that if run for long enough would reproduce in precise detail the whole history of our universe. In a sense the ultimate reduction of physics to an abstract system.

Well, so what might the program for the universe be like? One thing that’s kind of inevitable is that very few familiar features of the universe will immediately be visible in it. There just isn’t room. I mean, in a small program there’s no way to fit in separate identifiable pieces that represent electrons, or gravity, or even space and time.

In fact, if the program’s going to be really small, it sort of has to have the very least possible structure built in. And for example I think a cellular automaton already has far too much structure. It’s got a whole rigid array of cells laid out in space. And it also separates the notion of space from the notion of states of cells. And I don’t think one needs even that.

In physics, space is normally a kind of background—on top of which matter and everything else exists. But in an ultimate model I think space is the only thing one needs.

So what might space then be? I suspect it’s ultimately just a bunch of points with a certain set of connections. There are various ways to formulate it. But one is just to say it’s a giant trivalent network—a cubic graph. [See A New Kind of Science, page 1039.]

In which one just specifies the combinatorics of what point is connected to what.

So how does space as we know it emerge from that? The idea is that with enough nodes, it approximates a continuum. Like with molecules in a fluid. But more subtle.

Here are some networks that look like one-, two- and three-dimensional space. [See A New Kind of Science, page 477.]

But remember, all we’ve actually specified is how the nodes are connected. So how do we ultimately tell what dimension a network corresponds to? Just start from a node. Then look at how many distinct nodes you reach by going r connections. [See A New Kind of Science, page 477.]

That’s essentially the volume of a ball in the network. And if the network is going to correspond to d-dimensional space, that volume must grow like r^d.

Here I’ve laid out nodes so that their x position is determined by their minimum distance from a starting point. [See A New Kind of Science, page 479.]

And you can see the different growth rates for different dimensions.

What about curved space? In a flat hexagonal lattice, the number of nodes at distance r goes exactly like r². [See A New Kind of Science, page 532.]

But with curvature there’s a correction. That turns out to be proportional to the Ricci scalar curvature for the limiting manifold. [See A New Kind of Science, page 1050.]

Well, it’s pretty common to go from manifolds to more discrete things. Like in triangulating a manifold. [See A New Kind of Science, page 533.]

But there one’s still giving embedding information: saying how to lay out the points in an ambient space. In combinatorial topology one gets rid of the embedding. But one still has data on which nodes bound a face, and so on.

But in my combinatorial networks one doesn’t have even that. So it’s sort of remarkable how much limiting differential geometry one can get. Ricci scalars from growth rates of balls. Ricci tensors from growth rates of cylinders generated by geodesics. But, significantly, one can’t directly get Riemann tensors. Well, OK, in our universe there’s not only space, but also time. And in ordinary physics, one thinks of time as just another coordinate. But in programs it tends to be very different. Like in a cellular automaton. Where one moves in space by going from one cell to another. But one moves in time by actually applying the cellular automaton rule.

Well, I suspect that’s closer to how things work in physics. But it won’t be exactly like a cellular automaton. Because that would need a global clock to synchronize everything.

OK, so how might it actually work? Well, here’s something that at first sounds kind of crazy. How about if the universe is like a Turing machine—or what I call a mobile automaton—where only one cell gets updated at each step? [See A New Kind of Science, page 487.]

There’s no problem with synchronization, because only one thing happens at a time. But how can it possibly be right? I certainly don’t have the impression that for example first I’m getting updated, then you’re getting updated, and so on. But how would I know? Because until I’m updated I can’t tell whether you’ve been updated or not. And, if one follows this through one realizes that all that we can ever actually know about is a kind of a causal network of what event influences what other event.

Here’s an example of how one can go from an underlying sequence of updates to a causal network. And even though each update only affects one cell at a time, the final causal network—the partially ordered set of events—corresponds to something kind of uniform in spacetime. [See A New Kind of Science, page 489.]

OK, so how might time work in the space networks I talked about? One could imagine that there’s an updating rule that just says that whenever there’s a piece of network that has a particular form, it should get replaced. Like in these rules. [See A New Kind of Science, page 509.]

But there’s an immediate problem: what if there are several places a replacement could be done? Different updating orders in general lead to different causal networks. So one will get a whole tree of possible histories for the universe. So that to say what actually happens in our universe, one will somehow have to say which branch one’s on.

But there’s another possibility. It turns out that with some underlying rules it actually doesn’t matter what order they’re applied in: there’s what I call causal invariance, and the causal network is always the same.

This is related to so-called confluence or Church–Rosser properties in rewrite systems: it also happens to be related to canonical forms in Mathematica. But anyway, there are conditions about avoiding overlaps, which turn out for example to allow rules based on graphs like these. [See A New Kind of Science, page 515.]

So that whenever one uses just graphs like these, one gets causal invariance, and that means that in a sense there’s always just a single thread of time in the universe.

But it turns out that this also has another, rather remarkable consequence: it implies that special relativity must hold. It’s a slightly complicated story, but roughly, different updating orders correspond to different spacelike hypersurfaces. [See A New Kind of Science, page 516.]

And that’s why—subject to various tricky issues of limits and averages and things—causal invariance implies relativistic invariance.

OK, well that’s remarkable. But what about general relativity? I mentioned getting Ricci tensors from space networks. For general relativity, one needs spacetime Ricci tensors. But one can get those from growth rates of volumes of cones in causal networks.

And then one discovers something rather remarkable. If there’s microscopic randomness, and if things stay finite dimensional, then in a suitable limit, and with various other conditions, one can show that the spacetime Ricci tensor must vanish. Which is equivalent to saying that the Einstein equations hold.

Well, that’s rather remarkable. To have started with so little. But to have been able to derive something sophisticated like that.

But, OK, what about matter, and particles, like electrons and photons? Remember that all we have in the universe is space. So what are particles? Well, here’s how things can work in something like a cellular automaton. [See A New Kind of Science, page 229.]

This starts off randomly. But quickly organizes into a few persistent localized structures. That act just like particles. Like here are a couple of collisions between them. [See A New Kind of Science, page 295.]

Two particles come in, and interact, and then a whole bunch of particles come out. Almost like an explicit version of a Feynman diagram.

Well, OK, so how does something like this work in a network? Here’s a very simple example. Imagine we have rules that always preserve planarity. Then imagine that we introduce some lumps of nonplanarity. Well, those end up being conserved until they collide, much like particles. [See A New Kind of Science, page 527.]

And in this simple example we know a lot about how to classify the possible particle-like objects. Kuratowski’s theorem tells us that in our networks single lumps of nonplanarity must be like K_3,3s. But the actual nonplanarity is sort of hard to localize. A bit like in quantum mechanics.

Well, in more realistic cases it’s much more complicated, and there’s lots of unknown graph theory. But there are signs of all sorts of other features of quantum field theory, and gauge invariance, and so on.

It all looks extremely promising. It really seems like out there in the computational universe there may be a simple program that’s a precise version of our universe. And if nothing else, the process of trying to find it is bringing up some pretty interesting issues linking geometry, computation, graph theory, recursion theory and so on.

Well, OK. I want to come back now to what really launched everything I’ve been talking about: the discovery that simple programs—like rule 30—can make immense complexity. So why does that happen? What’s the fundamental reason? [See A New Kind of Science, page 30.]

Well, to answer that one needs a somewhat new conceptual framework. And the basis is to think about all processes as computations. The initial conditions are the input. And the behavior that’s generated is the output.

Well, sometimes the computations are ones that we—kind of—immediately know the point of. Like here’s a cellular automaton that computes the square of any number. [See A New Kind of Science, page 639.]

You give it a block of n cells, and it generates a block of n² cells. And here’s a cellular automaton that generates the primes. [See A New Kind of Science, page 640.]

But actually any cellular automaton can be thought of as doing a computation. It just isn’t necessarily a computation that we know the point of beforehand.

OK, so we have all sorts of systems and they do all sorts of computations. But what general statements can we make? How do all these computations compare?

Well, we might have thought that to do a different kind of computation we’d always need a different kind of system. But the remarkable idea that’s now about 70 years old is that no, that’s not true. Instead, it’s possible to have a universal system that can do any computation if fed the right input. And of course that’s been a pretty important idea. Because it’s the idea that makes software possible, and it’s the idea that launched the computer revolution.

Well, here’s a universal cellular automaton. [See A New Kind of Science, page 645.]

That with different initial conditions can be made to act like any other cellular automaton. So here it’s behaving like rule 254, which makes a simple uniform pattern. [See A New Kind of Science, page 646.]

Here’s it behaving like rule 90. [See A New Kind of Science, page 647.]

And here it’s behaving like rule 30. And remember, each of these pictures is of the same universal cellular automaton, with the same underlying rules. But what’s happening is that by giving different initial conditions, we’re effectively programming the universal cellular automaton to emulate other cellular automata. And in fact it can emulate absolutely any other cellular automaton, with rules of any size.

One might have thought that a system would only be able to emulate systems that were somehow simpler than itself. But universality says that there are fixed systems that can emulate any system, however complicated. So in a sense, once one’s got universality, one’s maxed out from a computational point of view—one can do essentially any computation, however sophisticated.

OK, so what systems are like rule 30? Or systems we see in nature? How sophisticated are the computations that they’re doing? [See A New Kind of Science, page 30.]

Well, I spent a long time thinking about this and accumulating all sorts of evidence. And I ended up concluding something that at first seems pretty surprising. I call it the Principle of Computational Equivalence. It’s a very general principle. And in its roughest form what it says is this: that essentially any time the behavior of a system looks complex, it will correspond to a computation of exactly equivalent sophistication. If we see repetitive or nested behavior then it’s pretty obvious that it corresponds to a simple computation. [See A New Kind of Science, page 58.]

But what the Principle of Computational Equivalence says is that when we don’t see regularities like those, we’re almost always seeing a process that’s in a sense maximally computationally sophisticated.

Now at first that’s pretty surprising. Because we might have thought that the sophistication of the computations that get done would depend on the sophistication of the rules that got put in. But the Principle of Computational Equivalence says it doesn’t. And that immediately gives us a prediction. It says that even though their rules are extremely simple systems like rule 30 should be computationally universal. [See A New Kind of Science, page 30.]

Well, normally we’d imagine that to achieve something as sophisticated as computation universality we’d somehow need sophisticated underlying rules. And certainly actual computers have CPU chips with millions of components, and so on. But the Principle of Computational Equivalence says you don’t need all of that. It says that even cellular automata with very simple rules should be universal.

Well, here’s one them. It’s the hundred-and-tenth elementary cellular automaton, rule 110. [See A New Kind of Science, page 229.]

It’s got a very simple rule. But as you can see, it does some fairly complicated things. It’s got lots of little structures running around doing things. But can one really assemble them to get something one can see is universal? Well, after a lot of painstaking work, one gets the result: that indeed rule 110 is universal!

Well, that’s just what the Principle of Computational Equivalence said should be true. But it’s really a remarkable thing. Because it means that this little rule

can in effect produce behavior that’s as complex as any system. One doesn’t need anything like a whole computer CPU to do universal computation. One just needs this little rule.

And that has some very important consequences, for example in thinking about nature. Because we wouldn’t expect to find whole computer CPUs just lying around in nature, but we definitely can expect to find things like rule 110. And that means for example that lots of everyday systems are likely to be universal. By the way, in the past this was the simplest known universal Turing machine.

But actually this much simpler Turing machine is also universal. [See A New Kind of Science, page 707.]

And I suspect this is the very simplest possible one that’s universal. [See A New Kind of Science, page 709.]

Well, there’s a lot to say about what kind of a thing the Principle of Computational Equivalence is, and what it means. One thing it does is to make Church’s thesis definite by saying that there really is a hard upper limit on the computations that can be done in our universe. But the place where it really starts to get teeth is when it says that not only is there an upper limit—but that limit is actually reached most of the time. The very simplest rules will just give simple behavior—say repetitive or nested. But what the Principle of Computational Equivalence says is that if one looks at just a few more rules, one will suddenly cross a threshold—and end up with maximal computational sophistication.

Actually the principle goes even further than that. Normally with universal computation one imagines being able to set up whatever initial conditions one wants. But the Principle of Computational Equivalence says that that’s not necessary. It says that even when the initial conditions are simple, there’ll still usually be maximally sophisticated computation going on. OK, so what does this all mean? Well, first it gives us a way to answer the original question of how something like rule 30 manages to seem so complex. [See A New Kind of Science, page 30.]

The point is that there’s always a competition between an observer and a system they’re observing. And if the observer is somehow computationally more sophisticated than the system, then they can in a sense decode what the system is doing—so it’ll look simple to them. But what the Principle of Computational Equivalence says is that the observer will usually be exactly computationally equivalent to the system they’re observing. So inevitably the behavior of the system will seem to them complex.

Well, a related consequence of the Principle of Computational Equivalence is a phenomenon I call computational irreducibility.

Let’s say you know the rules and the initial conditions for a system. Well, then you can certainly work out what the system will do just by explicitly running it. But the question is whether you can somehow shortcut that process. Can you for example just work out a formula for what will happen?

That kind of computational reducibility is at the core of most traditional theoretical science. If you want to work out where an Earth in an elliptical orbit will be a million years from now, you don’t have to trace a million orbits; you just have to plug a number into a formula. But the problem is, what happens when the behavior is more complex? It’s easy to shortcut repetitive behavior, and fairly easy to shortcut nested behavior.

But what about a case like this? There’s certainly no obvious way to shortcut this, and in fact I think it’s computationally irreducible: there’s essentially no way to work out what the system will do by any procedure that takes less computational effort than just running the system and seeing what happens. [See A New Kind of Science, page 740.]

In traditional theoretical science, there’s sort of been an idealization made that the observer is infinitely computationally powerful relative to the system they’re observing. But the point is that when there’s complex behavior, the Principle of Computational Equivalence says that instead the system is just as computationally sophisticated as the observer. And that’s what leads to computational irreducibility. And that’s in a sense why traditional theoretical science hasn’t been able to make more progress when one sees complexity. There are always pockets of reducibility—like the kinds that can be represented by all those special functions of mathematical physics on our Functions Site—but there’s always a core of computational irreducibility.

Well, I think computational irreducibility is a pretty important phenomenon. With some pretty interesting philosophical as well as scientific implications. It’s also in a sense what’s inside the phenomenon of undecidability. Look at this cellular automaton. [See A New Kind of Science, page 754.]

And ask the question: starting from a given initial condition, will the pattern that’s produced eventually die out, or will it just keep going forever? In the first case, running the cellular automaton tells one that after 36 steps it dies out. In the next case, though, it takes 1017 steps to find that out. And in the next two cases, even after 10 million steps it’s still not clear what’s going to happen.

And the point is that if there’s computational irreducibility, there’s no way to shortcut this evolution—so no finite computation can always figure out what will happen after an infinite time. And that means one has to say that it’s in general formally undecidable.

Well, undecidability has been known about in mathematics and in computer science since the 1930s. But what we’re seeing is that it’s also relevant to natural science.

If one asks about infinite time or infinite size limits the answers can be undecidable. Like whether a body will ever escape in a three-body problem. [See A New Kind of Science, page 973.]

Or whether there’s a way to arrange some complicated molecule into a crystal below a certain temperature. Or whether some idealized biological cell line will grow forever or will eventually die out.

Well, there’s another big place where I think computational irreducibility is very important—and that’s in the foundations of mathematics. It may sound kind of obvious, but it’s really a deep fact that mathematics is often hard to do. Yet it’s based on pretty simple axioms. In fact, right here are ones for essentially all of current mathematics—from logic, to algebra, geometry, and set theory. [See A New Kind of Science, page 773.]

But even though these axioms are all quite simple, our proofs of things like the Four-Color Theorem or Fermat’s Last Theorem are really long. And one can think of that as another example of computational irreducibility.

Here’s an example of a simple proof. At the top are some axioms—in this case specifying equivalences in logic. And then here’s a proof. It starts from one expression, then keeps on using the axioms, and eventually proves the theorem that the expression at the top is equivalent to the one at the bottom. [See A New Kind of Science, page 775.]

Well, OK, just like we made minimal models of natural systems like snowflakes, we can also think about doing that for math. So as a kind of minimal idealization of math, one can imagine that the axioms just define transformations between strings. So with the axioms at the bottom here, here are proofs of a few theorems. [See A New Kind of Science, page 776.]

But how long do the proofs need to be? Well, here’s a picture for three axiom systems, showing the network of all possible transformations. And the way this works, each possible path through the network corresponds to the proof of some theorem. [See A New Kind of Science, page 778.]

Well, the point is that the shortest path from one particular string to another may be really long. And that means that the theorem that the strings are equivalent has only a really long proof.

Well, when people were thinking about formalizing mathematics a century ago, they kind of assumed that in any reasonable axiom system it’d always eventually be possible to give a proof of whether a particular statement was true or false. So it was a big shock in 1931 when Gödel’s theorem showed that that wasn’t true for Peano arithmetic.

What Gödel actually did, though, was to look at the statement “this statement is unprovable”. Well, just by what it says, the statement obviously can’t consistently be proved true or false. But as such it doesn’t seem like a statement in arithmetic. And Gödel’s real achievement was essentially to show that arithmetic is universal, so that in particular it can encode his statement.

Well, for the foundations of math, Gödel’s theorem was a pretty big deal. But somehow in all these years it’s never seemed too relevant to most of the things working mathematicians deal with. And if you really needed something as funky as Gödel’s statement to get undecidability that wouldn’t be surprising.

But here’s the thing: the Principle of Computational Equivalence should also apply to systems in mathematics. And it then says that computational irreducibility—and undecidability—should actually not be rare at all.

So where are all these undecidable statements in mathematics? Well, one knows that there are Diophantine equations about which there are undecidable statements. Here’s an explicit example—a Diophantine equation set up to emulate rule 110. [See A New Kind of Science, page 786.]

Well, this is obviously pretty complicated, and not something that would show up everyday. But what about simpler Diophantine equations? Here are a bunch. Well, linear Diophantine equations were cracked in antiquity. Quadratic ones two centuries ago. And so far another kind seems to be cracked roughly every fifty or a hundred years. But I’m guessing that that’s not going to go on. And that actually many of the currently unsolved problems in number theory will turn out to be undecidable. [See A New Kind of Science, page 790.]

OK, but why has so much math successfully been done without running into undecidability? I think it’s kind of like theoretical physics: it’s tended to stick to places where there’s computational reducibility, and where its methods can make progress.

But at least in recent times mathematics has sometimes prided itself on somehow being as general as possible. So then why haven’t rule 30 and rule 110 and all the phenomena I’ve found in simple programs showed up?

Well, I think part of the reason is that mathematics isn’t really quite as general as one might think. As one way to see what it could be, one can just imagine enumerating possible axiom systems. And for example this shows what theorems are true for a sequence of different axiom systems. It’s like an ultimately desiccated form of mathematics: the axioms go down the left, the theorems go across the top—and every time there’s a theorem that’s true for a particular axiom system there’s a black dot. [See A New Kind of Science, page 812.]

So is there something special about the actual axiom systems that get used in mathematics? Perhaps something that makes undecidability less rampant? Well, if one looks at axiom systems from textbooks, they’re usually pretty complicated. Here’s propositional logic, for instance. [See A New Kind of Science, page 773.]

Well, actually, it’s been known for a hundred years that one doesn’t have to have all those operators in there. The single NAND operator—or Sheffer stroke—is enough. But the obvious axiom system with that is still pretty complicated. [See A New Kind of Science, page 773.]

Well, from my intuition from the world of simple programs, I suspected that there should actually be a really simple axiom system for logic—probably with just one axiom. And so I just did a systematic search, and lo and behold, I found one. [See A New Kind of Science, page 773.]

And I can tell you that this is the very simplest possible axiom system for logic. Well, knowing this I can say that if one just enumerates axiom systems, logic will be about the 50,000^th one that one reaches. [See A New Kind of Science, page 773.]

But what about all the other ones? Well, most of them are perfectly reasonable axiom systems too. They just don’t happen to be known fields of mathematics.

But why not? Well, actually what I think is that mathematics has been in a sense tremendously constrained in the way it’s developed. And in fact at some level it’s really still just about various direct generalizations of the arithmetic and geometry that got studied in ancient Babylon. It’s explored only a tiny part of what’s out there in the general computational world.

But why really is that? I think a lot of it is the way generalization has been done in mathematics. Usually one has some object, that satisfies certain theorems. And then one generalizes by looking for a larger class of objects that satisfy the theorems.

Well, this is a good strategy if one wants to be sure one can still prove theorems. And probably it’s why undecidability hasn’t been more common in math research. But it means that mathematics hasn’t particularly sampled what’s out there in the general world of abstract systems.

In natural science we’re used to just doing experiments to find out what’s there. But in mathematics, we haven’t been. So we just haven’t known about all this stuff.

But OK, is the stuff actually “interesting”? Well, it seems important for understanding nature. And for making technology.

But what about for math? I mean, if we’re just handed an axiom system, what would make it “math-like”?

Maybe it’s like defining life. It’s easy to tell living systems on Earth, because they all have common historical details. But abstract definitions—from thermodynamics, self-reproduction, etc.—don’t really seem to work.

So what about math? Is there a definition that’s more than historical? That’s one of those foundational questions with very practical consequences. Because in Mathematica we’d really like to make a sort of intermediate layer between the decision procedures for definite mathematical theories (yes, that’s new Version 5 stuff), and the general symbolic language underneath.

Well, so if we look at existing math axiom systems, can we tell what’s special about them? There doesn’t seem to be anything in the distribution of theorems they make true. Or in the lengths of proofs one finds in them.

One might think that what’s special is that the axioms are actually “about things”. That somehow the constraints they define force one to be talking about some definite kind of object. But actually there are often a lot of “models” that are consistent with a given axiom system. So there’s nothing special there. [See A New Kind of Science, page 796.]

In an existing field of mathematics it’s interesting to do a sort of empirical metamathematics. Looking at the dependency network of known theorems. Like here’s the one for Euclid’s Elements. The longest proof is 30 steps—for the theorem that there are five Platonic solids. [See A New Kind of Science, page 1176.]

Well, one can ask many global questions about theorem networks. Like geometrical ones. And one can also try to find precise graph properties that capture our general notions of powerful theorems, or surprising theorems—or interesting theorems.

Here are the first few hundreds theorems of propositional logic, written out in a straightforward order. [See A New Kind of Science, page 817.]

The ones that are highlighted are the ones considered interesting enough to be given names in typical textbooks. Well, here’s a remarkable fact. You can predict which ones those will be. It turns out they are exactly those theorems that can’t be deduced from ones earlier in the list. They’re the first ones that in a sense give new pieces of information.

Well, so that’s a criterion for interestingness. And one can apply it to any axiom system at all. And sort of automatically begin to build up traditional math-like structure. But without the history.

Well, OK, so what’s going to happen with all the stuff I’ve talked about here today? The big book I spent a decade writing has lots and lots of detailed results. That’ll certainly take quite a while to absorb.

But I think three big things will emerge. First, a new kind of basic science. Like a physics, or a chemistry. Or even a mathematics. But concerned with what’s out there in the computational universe. Second, a whole bunch of applications. To science. To technology. To other areas. And third, a new way of thinking. That’s relevant for many things. Including for example education.

You know, it’s been exciting in the year and a half since my book came out to watch some of this start to unfold. It’s been tremendous to see how many talented people want to get involved.

It’s a challenge to set up the infrastructure for it all. We’ve been building a growing website.

And making available programs

and software for doing computer experiments.

We had a very successful NKS 2003 conference. And we’re having an NKS 2004 conference in Boston in April. We had a summer school last summer. And we’re having another one this year.

There’s a lot to do. And I’ve started writing down some of the obvious open problems and projects. It’s averaging more than one per page of the book. So there’ll be more than a thousand in the end.

Really a lot is happening. Lots of papers being published. Particularly in what one might call Applied NKS.

In Pure NKS, a big initiative is the atlas we’re building. A terabyte or so of data about what’s out there in the computational universe. That one can consult and search to find models, or algorithms, or for example raw material for math. Like here’s a trip into a tiny corner of the latest development version of the Atlas. We could look at the live version later. We’re looking at some properties of rule 30. Simple initial conditions.

Single black cell.

State transition graphs.

And a table of the one for all rules.

Can you explain those pictures? In this particular corner, there’s some nice combinatorics, and group theory, involved. And there’s really a lot that can be done in pure NKS with existing mathematical methods. And we’re counting on mathematicians to get involved.

One of the exciting things about NKS is how broad the base of people is who can get involved. If they understand the methodology of doing good computer experiments, it’s perfectly realistic for a high school student to discover something new and interesting. But there are also plenty of things that reach the frontiers of math.

I wrote my book to be very accessible. Though with lots of definite stuff, particularly in the notes. And I think that worked out well. Because it’s let a tremendous range of people read it. And begin to get involved with it.

And the result right now feels a little like an intellectual analog of a new industry starting up. One of those rare exciting times when a big new direction is just getting defined.

To cover what’s in the book in a serious way is probably a several-semester undertaking. So I’ve only been able to scratch the surface and sketch a few things here today. But I hope I’ve been able to give you at least some idea of what’s in that big book of mine. And what at least I’ve been so excited about all these years. Thank you.