Sunday, December 7, 2008

How to Solve the Parallel Programming Crisis (Repost)

[I will repost this article periodically because it seems to be having an impact on some readers. See original article for comments.]


Solving the parallel computing problem will require a universal computing model that is easy to program and is equally at home in all types of computing environments. In addition, the applications must be rock-solid. Such a model must implement fine-grained parallelism within a deterministic processing environment. This, in essence, is what I am proposing.

No Threads

The solution to the parallel programming problem is to do away with threads altogether. Threads are evil. There is a way to design and program a parallel computer that is 100% threadless. It is based on a method that has been around for decades. Programmers have been using it to simulate parallelism in such apps as neural networks, cellular automata, simulations, video games and even VHDL. Essentially, it requires two buffers and an endless loop. While the parallel objects in one buffer are being processed, the other buffer is filled with the objects to be processed in the next cycle. At the end of the cycle, the buffers are swapped and the cycle begins anew. Two buffers are used in order to prevent racing conditions. This method guarantees rock-solid deterministic behavior and is thus free of all the problems associated with multithreading. Determinism is essential to mission and safety-critical environments where unreliable software is not an option.

Speed, Transparency and Universality

The two-buffer/loop mechanism described above works great in software but only for coarse-grain objects such as neurons in a network or cells in a cellular automaton. For fine-grain parallelism, it must be applied at the instruction level. That is to say, the processor instructions themselves become the parallel objects. However, doing so in software would be much too slow. What is needed is to make the mechanism an inherent part of the processor itself by incorporating the two buffers on the chip and use internal circuitry for buffer swapping. Of course, this simple two-buffer system can be optimized for performance by adding one or more buffers for use with an instruction prefetch mechanism if so desired. Additionally, since the instructions in the buffer are independent, there is no need to process them sequentially with a traditional CPU. Ideally, the processor core should be a pure MIMD (multiple instructions, multiple data) vector core, which is not to be confused with a GPU core, which uses an SIMD (single instruction, multiple data) configuration.

The processor can be either single core or multicore. In a multicore processor, the cores would divide the instruction load in the buffers among themselves in a way that is completely transparent to the programmer. Adding more cores would simply increase processing power without having to modify the programs. Furthermore, since the model uses fine-grain parallelism and an MIMD configuration, the processor is universal, meaning that it can handle all types of applications. There is no need to have a separate processor for graphics and another for general purpose computing. A single homogeneous processor can do it all. This approach to parallelism will do wonders for productivity and make both GPUs and traditional CPUs obsolete.

Easy to Program

The main philosophy underlying this parallel processing model is that software should behave logically more like hardware. A program is thus a collection of elementary objects that use signals to communicate. This approach is ideal for graphical programming and the use of plug-compatible components. Just drag them and drop them, and they connect themselves automatically. This will open up programming to a huge number of people that were heretofore excluded.

Admittedly, the solution I am proposing will require a reinvention of the computer and of software construction methodology, as we know them. But there is no stopping it. The sooner we get our heads out of the threaded sand and do the right thing, the better off we will be.

Related Articles:

Transforming the TILE64 into a Kick-Ass Parallel Machine
Heralding the Impending Death of the CPU
Parallel Computing: Both CPU and GPU Are Doomed
Parallel Computing: Why the Future Is Non-Algorithmic
Why Software Is Bad and What We Can Do to Fix It
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
Half a Century of Crappy Computing
The COSA Saga