Mathematica, NKS and Wolfram|Alpha: Some Mathematical Perspectives

This talk was given at The Ninth Gathering 4 Gardner (G4G9) on Friday, March 26, 2010

Well, I’ve been hoping to come to one of these gatherings for more than a decade—so I’m glad I’ve finally been able to do so.

I’ve spent the past 30 years or so on a small number of big projects, and I thought I’d try to touch briefly on each of them here.

Let me talk first about Mathematica, because it’s the tool that’s made all the other ones possible.

The prehistory of Mathematica is that when I was a kid, I used to do theoretical physics.

For which I wanted to do all these complicated calculations.

That I realized could be done much better by a computer than by me.

Well, I built some systems for doing that.

And at some point I got to thinking: can I just build one unified system that’ll do all the kinds of computations I’ll ever want to do?

It was a bit like a physics problem. Thinking about all these kinds of computations.

And trying to drill down and figure out the primitives from which they could all be made.

Well, eventually I came up with a structure that’s based on the idea of symbolic programming.

And on that structure I started to build Mathematica.

Well, the first version of Mathematica came into the world on June 23, 1988.

And since that time—with an increasing number of very talented people—we’ve been pouring more and more capabilities into the system.

One of the big principles has been to try to automate as much as possible.

To make it so you just have to say what you want to do, and Mathematica will automatically figure out which fancy algorithm to use to actually do it.

Or how best to produce a visualization. Or whatever.

One of my big goals has been to keep the whole system coherent, and unified. To make sure that every piece fits together with every other one.

And actually, particularly in recent years, that’s paid off in more and more spectacular ways.

If you look at the plot of the number of functions in Mathematica against time… there’s this wonderful acceleration that’s happening.

That comes because every new piece that’s added gets to be built with larger and larger building blocks.

So that when you’re in the middle of some algorithm of one type, it’s kind of free to just dip into all sorts of other algorithms.

I think Mathematica long ago outgrew the “math” part of its name.

But we’ve had a lot of very active R&D inventing new algorithms in the math area.

Partly because they’re useful for other things. And partly as a kind of math philanthropy.

Well, OK. I suspect a lot of you use Mathematica pretty routinely.

But I think I should at least show a few tiny examples of modern Mathematica.

This idea of instantly building interactive interfaces is pretty powerful for studying things.

And for demonstrating things.

For the last several years we’ve had our Demonstrations Project website.

Where lots of people have contributed now over 6000 interactive Demonstrations of all sorts of things.

It’s a pretty neat way to communicate ideas.

A very practical way. Because if you see a Demonstration you like, the code’s probably quite simple, and you can pull it out and use it yourself.

Once you’ve written a Demonstration, there’s a free player that lets anyone run it and interact with it.

And actually, this year, there’s going to be a new thing called CDF—the computable document format—that makes our Demonstrations technology incredibly portable.

OK. Well I have to say that the #1 reason I built Mathematica was quite personal: I wanted to use it myself.

To do computational experiments.

Well, so what are the most basic computational experiments one can do?

One could just start studying the simplest possible programs.

Normally we deal with pretty complicated programs, that we’ve constructed for particular purposes.

But what if you just start enumerating all possible programs?

Well, I started doing this back in the early 1980s with a class of programs called cellular automata.

Here’s how they work.

Let’s run one.

Start with a single black cell. Apply the same rule for every cell.

In this case we get a simple pattern.

But let’s try changing the rule a bit.

Slightly more complicated. But still repetitive.

Try again.

OK. Now we get something really intricate.

But if we run it longer we can see it’s ultimately very regular: just a nested fractal pattern.

OK. So is that all that can happen?

Well, we can do a simple computer experiment to find out.

It’s a lot easier now with a modern Mathematica than it was when I first did this.

But here. Let’s enumerate the first 64 possible cellular automaton rules.

What happens is pretty diverse. But usually pretty simple.

But what’s going on here? At rule number 30.

Let’s run it a bit longer.

Now remember that we’re starting out from just one black cell.

And we’re just using that rule at the bottom.

But look at what happens.

Well, when I first saw this, it was really a shock to my intuition.

I mean, from doing engineering and so on, we’re used to the idea that to make something complicated, you have set up complicated rules, and go to lots of effort.

But here, we’ve just got this tiny rule. And yet it’s producing this really complicated pattern.

Well, of course, we often see really complicated patterns in nature.

It seems like nature has some secret that lets it produce those.

That we in our engineering and so on don’t have.

Well, I think rule 30 shows us how that secret works.

It’s basically that nature is sampling programs out there in the computational universe—and many of those programs do things like rule 30.

Now of course one can wonder just how general this rule 30 behavior actually is.

I think a lot of you have actually studied systems of one kind or another whether things like this happen.

And one of the things I want to communicate is that those aren’t just curiosities or novelties.

They’re actually examples of a central phenomenon that’s really important for understanding nature, and that’s at the core of a whole new kind of science.

A kind of science that’s based on studying the whole computational universe of possible simple programs.

There’s a lot of interesting stuff out there.

I spent a long time writing a big book that talks about some of it, and some of its implications.

Let me flash up a few random slides of things I discovered.

Rule 30 like phenomena in Turing machines.

In primitive recursive functions.

Even in the core of traditional mathematical physics: partial differential equations.

Not the usual ones from textbooks.

But ones one quickly finds out in the universe of all possibilities.

The phenomenon also occurs in systems like tilings that work with constraints.

Well, here’s the simplest square tiling that’s forced to show nonperiodicity.

It’s nested. But I think not too far away are tilings that are forced to be random, like rule 30.

You know, there are just a zillion things to explore and study in the computational universe. There’s a whole new basic science—a kind of generalization of mathematics—out there to do.

But so how does all this map onto nature?

You can find some pretty clear correspondences.

Like in mollusc shell patterns.

That work incredibly like 1D cellular automata.

With the molluscs of the Earth effectively just sampling different programs.

And sometimes producing simple patterns. And sometimes things like rule 30.

OK. So there are lots of phenomena to see in the computational universe.

What general principles emerge?

The most significant I think is what I call the Principle of Computational Equivalence.

Think about all these programs out there.

Some behave in trivial ways.

But a lot seem complicated.

Well, you might think that as you make the rules more complicated, the behavior would somehow become correspondingly more complicated.

But I don’t think that’s what happens.

I think above some very low threshold, all these systems are actually in a sense precisely equivalent in the sophistication of the computations they’re doing.

And this is the Principle of Computational Equivalence.

It’s a pretty deep principle. With, needless to say, a complicated epistemological character.

But with a lot of predictions.

One of the more obvious ones is that as soon as the behavior of a system in some sense looks sufficiently complicated, the system will be capable of universal computation.

Well, we’re gradually getting more and more evidence for this.

Here are two of the best pieces.

The simplest cellular automaton that’s known to be universal—rule 110.

And now—thanks to a little competition we held a couple of years ago—also the simplest possible universal Turing machine.

Now from following the history of engineering you might have thought that to make a universal computer you’d have to have all this engineered stuff.

But you don’t. In fact, say in cellular automata, just enumerating possible rules, already by the 110^th rule you’ve got something universal.

Well, that fact alone has a lot of significance.

For example, it means that it’s a lot easier to make a universal computer, say out of molecules.

But this Principle of Computational Equivalence also has another important implication: what I call computational irreducibility.

We’re used to the idea that the big achievement of theoretical science is to make predictions.

To say where the Earth will be many years from now by just applying some kind of formula, without having to trace every orbit.

Well, that may be what happens in the systems that things like physics have been successful with so far.

But there are lots of systems out there in the computational universe that don’t work that way at all.

Take this, for example.

I think this is computationally irreducible. You can’t figure out much about what will happen by any procedure that’s much more efficient than just running each step and watching the outcome.

If we were going to be able to reduce this, it would in a sense mean that we as observers or predictors would have to be doing a more sophisticated computation than the system itself.

But the Principle of Computational Equivalence says that the computations we do with our brains, or our mathematics, or our computers, are actually just equivalent to what the system itself is doing.

And that’s why its behavior is computationally irreducible.

Well, this has lots of implications. About old philosophical problems like free will. About thinking about things like extraterrestrial intelligence. And about the practicalities of having to do simulation to figure out how systems behave.

It also means at a theoretical level that limiting questions about systems are often undecidable.

And that brings up something about mathematics.

You know, ever since Gödel’s theorem, we’ve known that in mathematics we can in principle formulate questions that are undecidable.

But mostly it hasn’t seemed to matter to the research that gets done in mathematics.

Now I have to say that I think the mathematical things many of you do give good evidence that actually it does matter.

There’s been this view that somehow undecidability is far away from interesting mathematics.

But I don’t think so. It’s part of my Principle of Computational Equivalence.

That, say, in Diophantine equations, we’re not going to see progressive steady progress.

We’re going to reach—very soon—a threshold where there’s undecidability all over the place.

So why hasn’t mathematics seen more of this already?

I think it’s really because mathematics is very much a historical artifact. It’s grown from the arithmetic and geometry of ancient Babylon by a certain process of generalization that specifically keeps it away from all that undecidability.

But when we start just enumerating things, like in the computational universe, we’re immediately confronted with lots of undecidability.

It’s there in nature, I think. It’s just that our mathematics has been engineered to avoid it.

Well, what about our mathematics?

You can write down all the axioms that are commonly used on just a couple of pages.

And from these few rules, in a sense, all those millions of theorems that mathematicians have worked out emerge.

But, OK, could one pick different axioms?

One can enumerate axiom systems just like one enumerates programs.

Each one in a sense defines a different field of mathematics.

So here’s an ultimately desiccated version of mathematics.

Enumerate axiom systems down the left, theorems across the top.

And there’s a black dot for each theorem that’s tried in a particular axiom system.

So: can you pick out the particular mathematics that we use?

Where does it lie in the space of all “possible mathematics”?

Here’s an example of that.

Consider logic. Boolean algebra.

Here are some standard axioms for it.

They look pretty complicated. Pretty far out in the space of “possible mathematics”.

But we know that you don’t need all those connectives.

Nand—or the Sheffer stroke—is enough.

But still the obvious axiom system one gets is pretty complicated.

Well, a few years ago I decided to try to find the very simplest possible axiom system for Boolean algebra.

It turns out to be very simple. It’s just a single axiom. With 6 Nands in it.

These days Mathematica can automatically prove that this is correct in a few seconds.

Here’s a version of the proof.

But it’s sort of interesting. This tells us that if we just start enumerating axiom systems, logic is perhaps the 50,000^th one we’d encounter.

And dotted around out there in the space of all possible axiom systems, we can find all sorts of other familiar areas of mathematics too.

But what about the rest of what’s out there?

Well, it’s in a sense possible mathematics that just doesn’t happen to be what we’ve encountered in that historical tradition we call mathematics.

But with the new kind of science that comes from exploring the computational universe, we’ve got something much more general—which has our current mathematics just as a little dot.

Well, OK. So what can we do with all this stuff out there in the computational universe?

Well, for example, it’s great raw material for making models of systems in nature and elsewhere.

It’s also a great source of technology.

It’s a bit like with physical materials. Where we go out into the physical universe, and find all sorts of materials that we then end up using for our technology.

Well, in the computational universe there are all these programs—each defining an algorithm. And to make technology out of them we have to match them up with our human purposes.

Like rule 30 turns out to be a really good source of randomness.

We’ve used something that’s like the center column of this pattern as a source of pseudorandom numbers in Mathematica for the past 23 years.

Well, different programs out there in the computational universe can get used for different purposes.

And in fact now in Mathematica there are lots and lots of algorithms that we’ve just discovered by searching the computational universe.

Things that no engineer would ever have constructed. But that we can prove do the job.

You know, in working on all this, I’d wondered for a long time what would be the first real killer app that would emerge.

Well, I think we can now see that. It might not quite be what one expected.

It actually comes more than anything from really taking seriously the paradigm that exploring the computational universe gives us.

That it’s possible to construct a lot from simple rules.

Well, think about human knowledge. About all the systematic facts and methods that our civilization has produced.

Can we imagine being able to make a system that captures all this? And lets one use it to automatically compute answers to factual questions?

Well, I’ve wondered about this for a long time. And people like Leibnitz already thought about it a very long time ago.

I’d always assumed that it was somehow completely out of reach.

But the science that I’ve done makes one think that perhaps it isn’t.

And with all the practical structure that we’ve built in Mathematica, I decided a few years ago that it was worth trying to see if it was actually possible to build a system that would capture the computable knowledge of the world.

I frankly wasn’t at all sure that this would be possible this decade, or even this century.

But with a lot of help from NKS—the new kind of science—and from Mathematica, we’ve come a long way.

And last year we released out into the wild Wolfram|Alpha—a serious computational knowledge engine.

The first form of the technology is a free website that millions of people now use every day.

Let’s give it a try.

2+2
integrate x^2 sin^3 x
gdp france / germany
banana consumption USA
uncle’s uncle’s uncle
weather atlanta
(P || Q || R) && (~P || ~Q)
ATTCTTGTTATTAA
.353535335

So, Wolfram|Alpha knows about lots and lots of kinds of things. It’s got by now, a pretty good coverage of everything you might find in a standard reference library and so on.

A crucial point is that we want to let people ask their questions however they think of them—in completely free form.

Which means we’ve had to figure out how to understand all that weird human linguistic input.

Which is a huge challenge. But it turns out that we’ve been able to use ideas from the science of the computational universe to figure out some completely new ways to solve this problem.

Which seem to work remarkably well.

The general idea of Wolfram|Alpha is to be able to get expert-level answers to all kinds of specific questions.

We’re not just searching the web to find things people happen to have written down before.

We’re putting knowledge into computable form, so that we can explicitly compute fresh answers to questions when they’re asked.

It’s a huge project—which, like Mathematica for example, will keep going forever.

But I am pretty pleased with how far we’ve already got.

With curating data from thousands of different domains.

With implementing the methods and models and algorithms from all different fields.

And with figuring out how to present the answers in the best human way.

We’ve still only got about 200 people at our company working on this—though that’s expanding rapidly.

Wolfram|Alpha is by now a pretty big Mathematica system—about 8 million lines long.

Running in parallel on thousands of CPUs, with zillions of real-time data feeds coming in and so on.

And as a business, it’s increasingly dealing not only with public knowledge and data, but now also with internal data of various kinds.

I must say that this is an exciting project for me.

Interestingly different from Mathematica.

You see, in working on Mathematica for the past 23 years I’ve been creating this sort of perfect structure that one can build arbitrarily large systems with.

In Wolfram|Alpha, though, we’re trying to create something that in a sense caters to the whole messiness of the world.

Where in a sense anything goes, but we’re able to cover vastly more.

Actually, some of the most exciting things right at the moment are taking shape on the boundary between Mathematica and Wolfram|Alpha.

Using the Mathematica interactive interface inside Wolfram|Alpha, even on the web.

Calling Wolfram|Alpha from inside Mathematica.

And being able to have a Mathematica session, where one mixes precise Mathematica language syntax with the freeform linguistics of Wolfram|Alpha.

You know, from a Wolfram|Alpha point of view, I’m starting to understand all sorts of stuff in new ways.

Like algorithm discovery in the computational universe.

Wolfram|Alpha right now is basically relying on things that exist in the knowledge base of our civilization, from the last few hundred years of science and so on.

But by going out into the computational universe, we can have it not just compute, but also actually discover on the fly.

In effect creating new models and new algorithms on demand.

I’ll mention one more thing that I’ve slowly been figuring out recently.

Having to do with the computerization of mathematics.

You know, a hundred years ago there was this idea—culminating with Whitehead and Russell—that one could formalize and mechanize mathematics by making a system that constructs mathematical proofs.

Well, if one looks at history, this turns out not to have been such a fertile idea.

A much more useful idea—which we used in Mathematica—is to formalize mathematics in terms of computations: as a way to derive answers from questions.

Well, in Mathematica we have in a sense two levels of structure when it comes to mathematics.

One is at the lowest level: the very general symbolic computational structure of Mathematica.

The other is in a sense much more specific: the actual implementation of all those constructs from 19^th century mathematics. The special functions. The types of equations and so on.

But I’ve always wondered: how would one usefully implement things like Bourbaki-style mathematics?

Mathematics that’s not so much about answers as about defining structures.

Well, with the Wolfram|Alpha paradigm I think we finally have a way.

Thinking of mathematics much more like a cultural artifact, with all these named objects and so on.

Where you tell Wolfram|Alpha something about you’re thinking of.

And it tries to generating interesting facts about the thing.

Not an answer. Just interesting facts. Like it would about a country or a movie.

Though with mathematics there are ultimately more interconnections that can computed about.

Well, anyway, I have negative time here.

But I did just want to mention what one might think of as perhaps the ultimate puzzle from all of this.

It’s about fundamental physics.

And the question is whether out there in the computational universe of simple programs might lie our physical universe.

This is a long story.

But I have to say that I think it’s making some pretty exciting progress.

Starting in a sense from structures below space and time.

And trying to hunt for particular rules that correspond to our universe.

One representation starts from a network like this.

That in a large scale approximates a manifold corresponding to space.

But with little structures inside that correspond to things like quantum particles.

One imagines rewriting this network using simple rules.

And it turns out for certain kinds of rules one can actually—and very nontrivially—show that on a large scale the manifold that emerges from the network shows exactly the right curvature to correspond to gravity and Einstein’s equations.

Well, at the front lines of universe hunting, one’s just going through lots and lots of simple rules, trying to see what they do.

In each case, one can tell one doesn’t have our universe.

Because it has no notion of space, or of time, or some other pathology.

But in general one is bitten by computational irreducibility: one starts one’s universe off and it just bobs around.

And one can’t easily tell what it will do, or whether it correctly corresponds to our universe.

But the exciting thing is that caught in the net are already rules that are not obviously not our universe.

There’s a lot of computational, mathematical, and physical technology that has to be developed to figure out more.

And that’s a big project I’m hoping to do in coming years.

But, OK, I’d better stop here.

I’d be happy to talk separately about Mathematica, NKS, Wolfram|Alpha, physics, or probably lots of other things too.

And particularly right now for Wolfram|Alpha, we’re very keen to find good ideas and good people to help with what’s in a sense an insanely huge project.