Part I, II, III, IV, V
In the previous post, I wrote that the reason that the computer industry’s multicore strategy will not work is that it is based on multithreading, a technique that was never intended to be the basis of a parallel software model, only as a mechanism for executing multiple sequential (not parallel) algorithms concurrently. I am proposing an alternative model that is inherently non-algorithmic, deterministic and parallel. It is called the COSA software model and it incorporates the qualities of both MIMD and SIMD parallelism without their shortcomings. The initial reason behind COSA was to solve one of the most pressing problems in computer science today, software unreliability. As it turns out, COSA addresses the parallel programming problem as well.
The COSA Model
Any serious attempt to formulate a parallel software model would do well to emulate parallel systems in nature. One such system is a biological neural network. Imagine an interconnected spiking (pulsed) neural network. Each elementary cell (neuron) in the network is a parallel element or processor that waits for a discrete signal (a pulse or spike) from another cell or a change in the environment (event), performs an action (executes an operation) on its environment and sends a signal to one or more cells. There is no limit to the number of cells that can be executed simultaneously. What I have just described is a behaving system, i.e., a reactive network of cells that use signals to communicate with each other. This is essentially what a COSA program is. In COSA, the cells are the operators; and these can be either effectors (addition, subtraction, multiplication, division, etc…) or sensors (comparison or logic/temporal operators). The environment consists of the data variables and/or constants. Below is an example of a COSA low-level module that consists of five elementary cells.
Alternatively, a COSA program can be viewed as a logic circuit with lines linking various gates (operators or cells) together. Indeed, a COSA program can potentially be turned into an actual electronics circuit. This aspect of COSA has applications in future exciting computing technologies like the one being investigated by the Phoenix Project at Carnegie Mellon University. The main difference between a COSA program and a logic circuit is that, in COSA, there is no signal racing. All gates are synchronized to a global virtual clock and signal travel times are equal to zero, i.e., they occur within one cycle. A global clock means that every operation is assumed to have equal duration, one virtual cycle. The advantage of this convention is that COSA programs are 100% deterministic, meaning that the execution order (concurrent or sequential) of the operations in a COSA program is guaranteed to remain the same. Temporal order determinism is essential for automated verification purposes, which, in turn, lead to rock-solid reliability and security.
The COSA Process Emulation
Ideally, every COSA cell should be its own processor, like a neuron in the brain or a logic gate. However, such a super-parallel system must await future advances. In the meantime we are forced to use one or more very fast processors to do the work of multiple parallel cells. In this light, a COSA processor (see below) should be seen as a cell emulator. The technique is simple and well known. It is used in neural networks, cellular automata and simulations. It is based on an endless loop and two cell buffers. Each pass through the loop represents one cycle. While the processor is executing the cells in one buffer, the downstream cells to be processed during the next cycle are appended to the other buffer. As soon as all the cells in the first buffer are processed, the buffers are swapped and the cycle begins anew. Two buffers are used in order to prevent the signal racing conditions that would otherwise occur.
The COSA Processor
As seen above, we already know how to emulate deterministic parallel processes in software and we can do it without the use of threads. It is not rocket science. However, using a software loop to emulate parallelism at the instruction level would be prohibitively slow. For performance purposes, the two buffers should be integrated into the COSA processor and the cell emulation performed by the processor. In a multicore processor, each core should have its own pair of buffers. I previously began to write a series of blog articles on how to build a self-balancing, fine-grain multicore processor based on the COSA model. I did not finish the series due to business considerations.
Comparison Between COSA and Other Parallel Software Models
I plan to go over each item in the table above in detail in a future article. Let me just mention here that ease of programming is one of the better attributes of COSA. The reason is that programming in COSA is graphical and consists almost entirely of connecting objects together. Most of the time, all that is necessary is to drag the object into the application. High-level objects are plug-compatible and know how to connect themselves automatically.
The figure above is an example of a COSA high-level module under construction. Please take a look at the Project COSA web page for further information.
How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Programming: Why the Future Is Non_Algorithmic
Parallel Programming: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga
PS. Everyone should read the comments at the end of Parallel Computing: The End of the Turing Madness. Apparently, Peter Wegner and Dina Goldin of Brown University have been ringing the non-algorithmic/reactive bell for quite some time. Without much success, I might add, otherwise there would be no parallel programming crisis to speak of.