The CPU, a Necessary Evil
The universe can be seen as the ultimate parallel computer. I say ‘ultimate’ because, instead of having a single central processor that processes everything, every fundamental particle is its own little processor that operates on a small set of properties. The universe is a reactive computer as well because every action performed by a particle is a reaction to an action by another particle. Ideally, our own computers should work the same way. Every computer program should be a collection of small reactive processors that perform elementary actions (operations) on their assigned data in response to actions by other processors. In other words, an elementary program is a tiny behaving machine that can sense and effect changes in its environment. It consists of at least two actors (a sensor and an effector) and a changeable environment (data variable). In addition, the sensor must be able to communicate with the effector. I call this elementary parallel processor the Universal Behaving Machine (UBM).
More complex programs can have an indefinite number of UBMs and a sensor can send signals to more than one effector. Unfortunately, even though computer technology is moving in the general direction of our ideal parallel computer (one processor per elementary operator), we are not there yet. And we won’t be there for a while, I’m afraid. The reason is that computer memory can be accessed by only one processor at a time. Until someone finds a solution to this bottleneck, we have no choice but to use a monster known as the CPU, a necessary evil that can do the work of a huge number of small processors. We get away with it because the CPU is very fast. Keep in mind that understanding the true purpose of the CPU is the key to solving the parallel programming problem.
Multicore and The Need for Speed
Although the CPU is fast, it is never fast enough. The reason is that the number of operations we want it to execute in a given interval keeps growing all the time. This has been the main driving force behind CPU research. Over the last few decades, technological advances insured a steady stream of ever faster CPUs but the technology has gotten to a point where we can no longer make them work much faster. The solution, of course, is a no-brainer: just add more processors into the mix and let them share the load, and the more the better. Multicore processors have thus become all the rage. Unsurprisingly, we are witnessing an inexorable march toward our ideal computer in which every elementary operator in a program is its own processor. It’s exciting.
Mathematicians and the Birth of the Algorithmic Computer
Adding more CPU cores to a processor should have been a relatively painless evolution of computer technology but it turned out to be a real pain in the ass, programming wise. Why? To understand the problem, we must go back to the very beginning of the computer age, close to a hundred and fifty years ago, when an Englishman named Charles Babbage designed the world’s first general purpose computer, the analytical engine. Babbage was a mathematician and like most mathematicians of his day, he longed for a time when he would be freed from the tedium of performing long calculation sequences. All he wanted was a reasonably fast calculator that could reliably execute mathematical sequences or algorithms. The idea of using a single fast central processor to emulate the behaviors of multiple small parallel processors was the furthest thing from his mind. Indeed, the very first program written for the analytical engine by Babbage’s friend and fellow mathematician, Lady Ada Lovelace, was a table of instructions meant to calculate the Bernoulli numbers, a sequence of rational numbers. Neither Babbage nor Lady Ada should be faulted for this but current modern computers are still based on Babbage’s sequential model. Is it any wonder that the computer industry is having such a hard time making the transition from sequential to parallel computing?
Square Peg vs. Round Hole
There is a big difference between our ideal parallel computer model in which every element is a parallel processor and the mathematicians’ model in which elements are steps in an algorithm to be executed sequentially. Even if we are forced to use a single fast CPU to emulate the parallel behavior of a huge number of parallel entities, the two models require different frames of mind. For example, in a true parallel programming model, parallelism is implicit but sequential order is explicit, that is to say, sequences must be explicitly specified by the programmer. In the algorithmic model, by contrast, sequential order is implicit and parallelism must be explicitly specified. But the difference is even more profound than this. Whereas an element in an algorithm can send a signal to only one other element (the successor in the sequence) at a time, an element in a parallel program can send a signal to as many successors as necessary. This is what is commonly referred to as fine-grain or instruction-level parallelism, which is highly desirable but impossible to obtain in an MIMD execution model using current multicore CPU technology.
The image above represents a small parallel program. A signal enters at the left and a ‘done’ signal is emitted at the right. We can observe various elementary parallel operators communicating with one another. Signals flow from the output of one element (small red circle) to the input of another (white or black circle). The splitting of signals into multiple parallel streams has no analog in an algorithmic sequence or thread. Notice that parallelism is implicit but sequential order is explicit. But that’s not all. A true parallel system that uses signals to communicate must be synchronous, i.e., every operation must execute in exactly one system cycle. This insures that the system is temporally deterministic. Otherwise signal timing quickly gets out of step. Temporal determinism is icing on the parallel cake because it solves a whole slew of problems related to reliability and security.
It should be obvious that using Babbage’s and Lady Ada’s 150-year old computing model to program a parallel computer is like trying to fit a square peg into a round hole. One would think that, by now, the computer industry would have figured out that there is something fundamentally wrong with the way it builds and programs computers but, unfortunately, the mathematicians are at it again. The latest trend is to use functional languages like Erlang for thread-based parallel programming. Thread-based, coarse-grain parallelism is a joke, in my opinion. There is a way to design a fine-grain, self-balancing multicore CPU for an MIMD execution environment that does not use threads. Threaded programs are error-prone, hard to program and difficult to understand. Decidedly, the notion of a computer as a calculating machine will die hard. It is frustrating, to say the least. When are we going to learn?
Lifting the Curse of the Algorithm
To solve the parallel programming problem, we must lift the curse of the algorithm. We must abandon the old model and switch to a true parallel model. To do so, we must reinvent the computer. What I mean is that we must change, not only our software model, but our hardware model as well. Current CPUs were designed and optimized for the algorithmic model. We need a new processor architecture (both single core and multicore) that is designed from the ground up to emulate non-algorithmic, synchronous parallelism. It’s not rocket science. We already know how to emulate parallelism in our neural networks and our cellular automata. However, using current CPUs to do so at the instruction level would be too slow. The market wants super fast, fine-grain, self-balancing and auto-scalable multicore processors that use an MIMD execution model. It wants parallel software systems that are easy to program and do not fail. Right now there is nothing out there that fits the bill.
The Next Computer Revolution
It remains to be seen who, among the various processor manufacturers, will be the first to see the light. Which nation will be the standard bearer of the new computing paradigm? When will the big switch happen? Who knows? But when it does, it will be the dawning of the next computer revolution, one which will make the first one pale in comparison. We will be able to build super fast computers and programs of arbitrary complexity that do not fail. It will be the true golden age of automation. I can’t wait.
[This article is part of my downloadable e-book on the parallel programming crisis.]
Nightmare on Core Street
Why Parallel Programming Is So Hard
The Age of Crappy Concurrency: Erlang, Tilera, Intel, AMD, IBM, Freescale, etc…
Half a Century of Crappy Computing
Parallel Computers and the Algorithm: Square Peg vs. Round Hole