Stephen Wolfram: Strong Opinions

Michael Swaine, Dr. Dobb’s Journal 18 (February 1993) 105–108. This month we continue with the conversation Ray Valdés and I had with Stephen Wolfram. Last month Wolfram talked about Mathematica and programming paradigms. This month he talks about science, programming, business, and why (some) mathematicians don’t like him. It’s a fact that not everyone likes Stephen Wolfram. He can be exceedingly impatient—he’s never had the patience to complete any sort of formal academic degree program, for example, although he was granted a well-deserved doctorate by Cal Tech. And he can come across as more than a little arrogant. Whether that’s an accurate impression, though, is open to question. Perhaps he simply knows how good his mind is and sees no point in false modesty. But when, in the conversation that follows, he claims to know “a lot of the leading scientists in a lot of different areas,” don’t think for a moment that he’s exaggerating.

DDJ: You’re a scientist and a mathematician, which makes you a target user of Mathematica, and you say you use it every day. But we’ve talked to various mathematician friends and they express mistrust for most mathematical programs, including Mathematica.

SW: I spend a part of my time doing science and I know a lot of the leading scientists in a lot of different areas. And in mathematics, for example, I think it’s fair to say that the people I know who are the best mathematicians in their various areas use Mathematica, and they use it in many cases extremely enthusiastically. Now there are two sociological phenomena that go on in mathematics. One of them is this idea that computers are anathema to mathematics. This is a mistake. It’s a fundamental intellectual mistake. You don’t have to take that from me; just look at what’s going on in the field, look at the fact that the best mathematicians use them.

DDJ: And the other phenomenon?

SW: There is this very weird thing about mathematics that’s different from every other science. The way you make progress in mathematics is that you think of a theorem and generate a proof for it. It’s a purely theoretical activity that proceeds in this theorem-proof mode. In every other field of science, experiment is the thing that originally drives what goes on. People don’t make models and theories and work out their consequences. But one of the things that Mathematica is making possible in mathematics—and this is one of the things that good mathematicians who use it often say about it—is that you can actually do experiments. In a reasonably short amount of time you can do a reasonably nontrivial experiment and find out what’s likely to be true before you go through the traditional mathematical approach.

DDJ: It’s a highly respected approach.

SW: Which is, guess what’s true and then try to prove it. From the point of view of a sort of mental exercise, guess what’s true and try to prove it is way out there; it’s one of the hardest things people can imagine doing. On the other hand, that’s not the way you’re likely to make the most progress, by doing the hardest thing people can imagine doing. You’re likely to make more progress by making things easier for yourself. If you look at every other field of science, physics, anything else, there are a lot more experimentalists than theoreticians.

DDJ: Then mathematicians have to become more like the experimentalists in being aware of the limitations of their instruments for experiment. There was an article in The Mathematica Journal by a mathematician at Berkeley who was using Mathematica to point out flaws in some other paper. The [original paper’s] conclusion was right but the process was wrong because of a floating-point error, which Mathematica didn’t have. Or maybe it had a different floating-point error….

SW: No.

DDJ: OK, but they need to be aware of their instruments, it seems.

SW: That’s true, but most of the proofs that are published are probably wrong. In detail, I mean. Checking a mathematical proof is at least as hard as debugging a computer program perfectly. The only difference is that, with a computer program, you can run it, so you can actually see what’s happening, whereas a proof just sits there in a journal and generations of graduate students try to [understand] it. One thing is sure: People have to learn to use these tools well. If they’re all writing Fortran-like programs in Mathematica, that’s not going to get them that far. In terms of understanding the limitations of these tools, yes, they have to have some concept of what’s going on. Being able to predict, if I do a calculation of such-and-such a size, will I be likely to run into a memory chip that blows up or a bug in the program—that’s hard for anybody to assess, really. And it’s certainly true in doing experiments—I know it is when I do experiments—the most likely form of error is human error. That your program has a bug in it is much more likely than that Mathematica has a bug in it, which is again more likely than that the CPU that you’re using has a bug in its logic.

DDJ: Which does happen.

SW: In testing out Mathematica, we found things like that. If you look at the history of computer programming, one of the extremely unexpected things that happened was that there were bugs in programs. This was not anticipated. In Turing’s original paper on the universal Turing machine, the program was riddled with bugs. From the very earliest computer program, programs had bugs. That was not something that people expected. And it’s an important conceptual thing for people to realize, that human fallibility is at its most obvious in writing computer programs. The mathematician who comes up to Mathematica and types something in and gets something they don’t expect and says, “That must be the right answer, I’m going to write it in my paper,” is a foolish person indeed. Because the chances are, if they didn’t expect it and can’t understand it and can’t explain it, then probably there was a bug in the program they wrote.

DDJ: But the idea of accepting the existence of bugs in programs—or proofs—somehow doesn’t sound like the kind of idea that would sit well with a mathematician. How would you characterize the acceptance of Mathematica by mathematicians?

SW: The mathematics community is a most puristic community. In a sense, I’ve been pleasantly surprised with how easily Mathematica has been accepted in that community. There’s another thing, quite honestly, that that community has a hard time with. They sort of hate one aspect of what I have done, which is to take intellectual developments and make a company out of them and sell things to people.

DDJ: Probably not surprising, if mathematicians are the most puristic of scientists.

SW: My own view of that, which has hardened over the years, is, my god, that’s the right thing to do. If you look at what’s happened with TeX, for example, which went in the other direction…well, Mathematica could not have been brought to where it is today if it had not been done as a commercial effort. The amount of money that has to be spent to do all the details of development, you just can’t support that in any other way than this unique American idea of the entrepreneurial company. If you ask a mathematician why they don’t like Mathematica—it’s more why they don’t like me than why they don’t like Mathematica—that’s it.

DDJ: There’s a lot of work involved in bringing a product up to commercial standards and making it something you can support.

SW: In the research I’ve been doing, one of the people who’s been working with me has developed a nice program that allows you to lay out networks on the screen. It’s a problem that I’ve wanted to have solved for ten years or so, and he’s got a fairly nice solution and a nice interactive program and all that. I’ve talked to people about it, so people know that my company did this program. The problem is, we didn’t develop this program in a commercial way. We developed it because I needed this thing for solving a particular problem. Most of the things that have been developed in the technical computing community have been developed in that kind of way, and I realized, knowing the standards that one has for having a commercial product that is properly supported, what incredible distance there is between this fairly nice piece of code that does something fairly useful and something like Mathematica. And in fact, I have a hard time even thinking of giving it to friends of mine because I know they will expect that, since it was produced by somebody who works for my company, it should be a thing like Mathematica. I suppose it’s obvious, but I think that many in the mathematics community don’t realize the distance.

DDJ: We know you’ve given a lot of thought to programming paradigms. Do you have any opinions on the dominant paradigms of today, and about which will survive into the next decade?

SW: I think the transformational-rule paradigm is working fairly well. I think the functional paradigm is largely working well. I think the procedural paradigm sucks, basically. I think the fundamental problem with it is there’s much too much hidden state in the procedural paradigm. You have these weird variables that are getting updated and things that are happening that you can’t see. I strongly believe that there is a way to do procedural programming that does not use hidden states. For example, here’s a thing that I’d love to be able to do: make a kind of symbolic template of the execution history of a program. The kind of thing that trace does—

DDJ: Right.

SW: —of taking a program that’s executing and giving you back the symbolic representation. What I’d like to be able to do is program by saying, here’s what I want—it’s not quite a flowchart, it’s something beyond a flowchart—my program to be like, now I would actually do the things that make it do this. That’s kind of vague, and the reason it’s vague is because I haven’t figured out how to do it. But I think one of the directions that could be very fruitful is how you take these conceptual ideas about procedural programming and turn them into something that’s easier to look at once you have a program. I mean, the idea of procedural programming, of loops and so on, people have no trouble grasping. But once they’ve written their programs, they have a lot of trouble grasping what the programs do. And if one could have a more explicit way of representing these things, one would be in good shape, I think.

DDJ: Yeah, so one rationale behind procedural programming is that it’s easy to learn. But one rationale for a hidden state is an optimization of some sort.

SW: I don’t think people need optimization any more.

DDJ: Oh, really?

SW: There are very few programs that are written for the first time where execution speed is an issue. When you’re running your word processor, you don’t want the scrolling to be slow, but that’s a different point. If you look at the history of programming-language design, almost every major screw-up is a consequence of people pandering to some optimization, starting from Fortran Hollerith-format statements. The trick is figuring out how to get there, rather than worrying all the time about how you’re going to get there. Another direction that I’ve thought about is parallel processing and its relationship to languages. There was a language called C* that I made the original design for, for the Connection machine. Unfortunately, what C* finally became was extremely far from what I had worked on. That’s one of the reasons I don’t do consulting any more.

DDJ: Parallel processing and its relation to language? What’s the question?

SW: One of the questions is, are there paradigms that are applicable to parallel programming that aren’t applicable to sequential programming, and what are they? Functional programming, list-based programming, things like that are readily applicable to parallel systems. In fact, they work very nicely and elegantly in parallel systems. That is indeed the main algorithm that is used in the various [parallel] Fortrans. There is a question, particularly with respect to SIMD architectures, of [whether] there are other fundamental kinds of programming-language ideas that we just don’t have yet?

DDJ: Do you have an answer?

SW: Well, I’ve spent a lot of time thinking about them and I didn’t come up with them. It could be that there aren’t any. One of the ways that you can get a clue about this relates to the other side of my life, which is trying to do science. The kinds of things I’m interested in are using fundamental ideas from computation to understand more about scientific systems. And one of the things one is led to there is what kind of simple computational systems really capture the essence of what’s going on in a biological system, in a growing plant. Figuring out the answer to that question has been one of my big projects. And what I’ve found is that so far, with one exception that I’m still grappling with, all of the things that I have found to be useful as fundamental models—whether they’re things like Turing machines, or cellular automata, or register machines, or graphs—turn out to be very simple to do in our existing programming paradigms. So one question is, is there something out there in nature that is working according to a different programming paradigm that we should be able to learn from? If you look at the construction of organisms, for instance, there are many segmented organisms: biology-discovered iteration. There are many branching organisms: biology-discovered recursion. There are a few other of these kinds of things that are a little less familiar but are still one line of Mathematica code, and that are commonly used in biology. Does that mean we’ve really discovered all the useful programming paradigms? And if nature presents us with something we can’t understand along those lines, that’s a good clue that there’s another programming paradigm out there to be figured out. And as far as I can tell, there isn’t much else out there.

DDJ: In the second Artificial Life conference proceedings Doyne Farmer says he’s now of the view that partial differential equations is the most general model….

SW: He is completely and utterly, unquestionably, unequivocally, totally wrong. It’s interesting you would pick him as an example, because his responses to things that I have been right about in the past have been as wrong as they could possibly be. As a matter of fact, I always find it very amusing—the equations of general relativity are partial differential equations, as you probably know, and there is this fairly amusing thing that is said about these equations, which is that there can be singularities, and the laws of physics as we know them break down. Well, what does this mean? This means the partial differential equations show singular behavior which can no longer be described by the partial differential equations. Well, it turns out that in the same sense physics breaks down around the space shuttle. Because the equations of fluid dynamcis have in them an approximation that works just fine so long as there aren’t certain kinds of strong shock waves involved. When you get into hypersonic flow, which is what happens around the shuttle as it enters the atmosphere, the shock front has a thickness that is less than the mean free path that molecules go before they collide with other molecules. So in this same sense, physics as we know it breaks down. Of course it doesn’t. What actually is going on is that partial differential equations are an approximation that turn out to be not a very good approximation in the case of hypersonic flow. It’s actually an interesting historical thing that I’ve been studying, how partial differential equations ended up being thought by people to be the fundamental equations of physics. It’s very bizarre, because it isn’t true, and not only is it not true, even the fact that atoms exist makes it clear that it’s not true. So why is it that people will [say] that the fundamental equations of physics are partial differential equations? What happened, I think, is that when these models were first developed, the only methods for figuring out what the consequences were was hand calculation. Computers are a very recent phenomenon in the history of science, and the fundamental models that exist in science have not yet adapted to computation. And that’s my next big thing.

[Editor’s note: In retrospect, Ray Valdés feels he may have overstated Farmer’s position. Here’s what Farmer actually said: “Connectionist models are a useful tool for solving problems in learning and adaptation…. However, connectionism represents a level of abstraction that is ultimately limited by such factors as the need to specify connections explicitly, and the lack of built-in spatial structure. Many problems in adaptive systems ultimately require models such as partial differential equations or cellular automata with spatial structure.” (“A Rosetta Stone for Connectionism,” in Emergent Computation, MIT Press, 1991, p. 183.)]