Sunday, December 7, 2008

How to Solve the Parallel Programming Crisis (Repost)

[I will repost this article periodically because it seems to be having an impact on some readers. See original article for comments.]


Solving the parallel computing problem will require a universal computing model that is easy to program and is equally at home in all types of computing environments. In addition, the applications must be rock-solid. Such a model must implement fine-grained parallelism within a deterministic processing environment. This, in essence, is what I am proposing.

No Threads

The solution to the parallel programming problem is to do away with threads altogether. Threads are evil. There is a way to design and program a parallel computer that is 100% threadless. It is based on a method that has been around for decades. Programmers have been using it to simulate parallelism in such apps as neural networks, cellular automata, simulations, video games and even VHDL. Essentially, it requires two buffers and an endless loop. While the parallel objects in one buffer are being processed, the other buffer is filled with the objects to be processed in the next cycle. At the end of the cycle, the buffers are swapped and the cycle begins anew. Two buffers are used in order to prevent racing conditions. This method guarantees rock-solid deterministic behavior and is thus free of all the problems associated with multithreading. Determinism is essential to mission and safety-critical environments where unreliable software is not an option.

Speed, Transparency and Universality

The two-buffer/loop mechanism described above works great in software but only for coarse-grain objects such as neurons in a network or cells in a cellular automaton. For fine-grain parallelism, it must be applied at the instruction level. That is to say, the processor instructions themselves become the parallel objects. However, doing so in software would be much too slow. What is needed is to make the mechanism an inherent part of the processor itself by incorporating the two buffers on the chip and use internal circuitry for buffer swapping. Of course, this simple two-buffer system can be optimized for performance by adding one or more buffers for use with an instruction prefetch mechanism if so desired. Additionally, since the instructions in the buffer are independent, there is no need to process them sequentially with a traditional CPU. Ideally, the processor core should be a pure MIMD (multiple instructions, multiple data) vector core, which is not to be confused with a GPU core, which uses an SIMD (single instruction, multiple data) configuration.

The processor can be either single core or multicore. In a multicore processor, the cores would divide the instruction load in the buffers among themselves in a way that is completely transparent to the programmer. Adding more cores would simply increase processing power without having to modify the programs. Furthermore, since the model uses fine-grain parallelism and an MIMD configuration, the processor is universal, meaning that it can handle all types of applications. There is no need to have a separate processor for graphics and another for general purpose computing. A single homogeneous processor can do it all. This approach to parallelism will do wonders for productivity and make both GPUs and traditional CPUs obsolete.

Easy to Program

The main philosophy underlying this parallel processing model is that software should behave logically more like hardware. A program is thus a collection of elementary objects that use signals to communicate. This approach is ideal for graphical programming and the use of plug-compatible components. Just drag them and drop them, and they connect themselves automatically. This will open up programming to a huge number of people that were heretofore excluded.

Admittedly, the solution I am proposing will require a reinvention of the computer and of software construction methodology, as we know them. But there is no stopping it. The sooner we get our heads out of the threaded sand and do the right thing, the better off we will be.

Related Articles:

Transforming the TILE64 into a Kick-Ass Parallel Machine
Heralding the Impending Death of the CPU
Parallel Computing: Both CPU and GPU Are Doomed
Parallel Computing: Why the Future Is Non-Algorithmic
Why Software Is Bad and What We Can Do to Fix It
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
Half a Century of Crappy Computing
The COSA Saga

Monday, November 17, 2008

Time Off

A close family member is in the process of dying and I need to take some time off. I'll get back to writing blog articles on parallel programming and other equally important subjects as soon as I can. Thanks for your patience.

Saturday, October 25, 2008

Why Software Is Bad, Part IV

Part I, II, III, IV

The Silver Bullet

The Billion Dollar Question

The billion (trillion?) dollar question is: What is it about the brain and integrated circuits that makes them so much more reliable in spite of their essential complexity? But even more important, can we emulate it in our software? If the answer is yes, then we have found the silver bullet.

Why Software Is Bad

Algorithmic software is unreliable because of the following reasons:

  • Brittleness

    An algorithm is not unlike a chain. Break a link and the entire chain is broken. As a result, algorithmic programs tend to suffer from catastrophic failures even in situations where the actual defect is minor and globally insignificant.

  • Temporal Inconsistency

    With algorithmic software it is virtually impossible to guarantee the timing of various processes because the execution times of subroutines vary unpredictably. They vary mainly because of a construct called ‘conditional branching’, a necessary decision mechanism used in instruction sequences. But that is not all. While a subroutine is being executed, the calling program goes into a coma. The use of threads and message passing between threads does somewhat alleviate the problem but the multithreading solution is way too coarse and unwieldy to make a difference in highly complex applications. And besides, a thread is just another algorithm. The inherent temporal uncertainty (from the point of view of the programmer) of algorithmic systems leads to program decisions happening at the wrong time, under the wrong conditions.

  • Unresolved Dependencies

    The biggest contributing factor to unreliability in software has to do with unresolved dependencies. In an algorithmic system, the enforcement of relationships among data items (part of what Brooks defines as the essence of software) is solely the responsibility of the programmer. That is to say, every time a property is changed by a statement or a subroutine, it is up to the programmer to remember to update every other part of the program that is potentially affected by the change. The problem is that relationships can be so numerous and complex that programmers often fail to resolve them all.
Why Hardware is Good

Brains and integrated circuits are, by contrast, parallel signal-based systems. Their reliability is due primarily to three reasons:
  • Strict Enforcement of Signal Timing through Synchronization

    Neurons fire at the right time, under the right temporal conditions. Timing is consistent because of the brain's synchronous architecture (**). A similar argument can be made with regard to integrated circuits.

  • Distributed Concurrent Architecture

    Since every element runs independently and synchronously, the localized malfunctions of a few (or even many) elements will not cause the catastrophic failure of the entire system.

  • Automatic Resolution of Event Dependencies

    A signal-based synchronous system makes it possible to automatically resolve event dependencies. That is to say, every change in a system's variable is immediately and automatically communicated to every object that depends on it.

Programs as Communication Systems

Although we are not accustomed to think of it as such, a computer program is, in reality, a communication system. During execution, every statement or instruction in an algorithmic procedure essentially sends a signal to the next statement, saying: 'I'm done, now it's your turn.' A statement should be seen as an elementary object having a single input and a single output. It waits for an input signal, does something, and then sends an output signal to the next object. Multiple objects are linked together to form a one-dimensional (single path) sequential chain. The problem is that, in an algorithm, communication is limited to only two objects at a time, a sender and a receiver. Consequently, even though there may be forks (conditional branches) along the way, a signal may only take one path at a time.

My thesis is that this mechanism is too restrictive and leads to unreliable software. Why? Because there are occasions when a particular event or action must be communicated to several objects simultaneously. This is known as an event dependency. Algorithmic development environments make it hard to attach orthogonal signaling branches to a sequential thread and therein lies the problem. The burden is on the programmer to remember to add code to handle delayed reaction cases: something that occurred previously in the procedure needs to be addressed at the earliest opportunity by another part of the program. Every so often we either forget to add the necessary code (usually, a call to a subroutine) or we fail to spot the dependency.

Event Dependencies and the Blind Code Problem

The state of a system at any given time is defined by the collection of properties (variables) that comprise the system's data, including the data contained in input/output registers. The relationships or dependencies between properties determine the system's behavior. A dependency simply means that a change in one property (also known as an event) must be followed by a change in one or more related properties. In order to ensure flawless and consistent behavior, it is imperative that all dependencies are resolved during development and are processed in a timely manner during execution. It takes intimate knowledge of an algorithmic program to identify and remember all the dependencies. Due to the large turnover in the software industry, programmers often inherit strange legacy code which aggravates the problem. Still, even good familiarity is not a guarantee that all dependencies will be spotted and correctly handled. Oftentimes, a program is so big and complex that its original authors completely lose sight of old dependencies. Blind code leads to wrong assumptions which often result in unexpected and catastrophic failures. The problem is so pervasive and so hard to fix that most managers in charge of maintaining complex mission-critical software systems will try to find alternative ways around a bug that do not involve modifying the existing code.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga

Tuesday, October 21, 2008

Why Software Is Bad, Part III

Part I, II, III, IV

Why The Experts Are Wrong (continued)

Turing's Monster

It is tempting to speculate that, had it not been for our early infatuation with the sanctity of the TCM, we might not be in the sorry mess that we are in today. Software engineers have had to deal with defective software from the very beginning. Computer time was expensive and, as was the practice in the early days, a programmer had to reserve access to a computer days and sometimes weeks in advance. So programmers found themselves spending countless hours meticulously scrutinizing program listings in search of bugs. By the mid 1970s, as software systems grew in complexity and applicability, people in the business began to talk of a reliability crisis. Innovations such as high-level languages, structured and/or object-oriented programming did little to solve the reliability problem. Turing's baby had quickly grown into a monster.

Vested Interest

Software reliability experts (such as the folks at Cigital) have a vested interest in seeing that the crisis lasts as long as possible. It is their raison d'être. Computer scientists and software engineers love Dr. Brooks' ideas because an insoluble software crisis affords them with a well-paying job and a lifetime career as reliability engineers. Not that these folks do not bring worthwhile advances to the table. They do. But looking for a breakthrough solution that will produce Brooks' order-of-magnitude improvement in reliability and productivity is not on their agenda. They adamantly deny that such a breakthrough is even possible. Brooks' paper is their new testament and 'no silver bullet' their mantra. Worst of all, most of them are sincere in their convictions.This attitude (pathological denial) has the unfortunate effect of prolonging the crisis. Most of the burden of ensuring the reliability of software is now resting squarely on the programmer's shoulders. An entire reliability industry has sprouted with countless experts and tool vendors touting various labor-intensive engineering recipes, theories and practices. But more than thirty years after people began to refer to the problem as a crisis, it is worse than ever. As the Technology Review article points out, the cost has been staggering.

There Is a Silver Bullet After All

Reliability is best understood in terms of complexity vs. defects. A program consisting of one thousand lines of code is generally more complex and less reliable than a one with a hundred lines of code. Due to its sheer astronomical complexity, the human brain is the most reliable behaving system in the world. Its reliability is many orders of magnitude greater than that of any complex program in existence (see devil's advocate). Any software application with the complexity of the brain would be so riddled with bugs as to be unusable. Conversely, given their low relative complexity, any software application with the reliability of the brain would almost never fail. Imagine how complex it is to be able to recognize someone's face under all sorts of lighting conditions, velocities and orientations. Just driving a car around town (taxi drivers do it all day long, everyday) without getting lost or into an accident is incredibly more complex than anything any software program in existence can accomplish. Sure brains make mistakes, but the things that they do are so complex, especially the myriads of little things that we are oblivious to, that the mistakes pale in comparison to the successes. And when they do make mistakes, it is usually due to physical reasons (e.g., sickness, intoxication, injuries, genetic defects, etc...) or to external circumstances beyond their control (e.g., they did not know). Mistakes are rarely the result of defects in the brain's existing software.The brain is proof that the reliability of a behaving system (which is what a computer program is) does not have to be inversely proportional to its complexity, as is the case with current software systems. In fact, the more complex the brain gets (as it learns), the more reliable it becomes. But the brain is not the only proof that we have of the existence of a silver bullet. We all know of the amazing reliability of integrated circuits. No one can seriously deny that a modern CPU is a very complex device, what with some of the high-end chips from Intel, AMD and others sporting hundreds of millions of transistors. Yet, in all the years that I have owned and used computers, only once did a CPU fail on me and it was because its cooling fan stopped working. This seems to be the norm with integrated circuits in general: when they fail, it is almost always due to a physical fault and almost never to a defect in the logic. Moore's law does not seem to have had a deleterious effect on hardware reliability since, to my knowledge, the reliability of CPUs and other large scale integrated circuits did not degrade over the years as they increased in speed and complexity.

Deconstructing Brooks' Complexity Arguments

Frederick Brooks' arguments fall apart in one important area. Although Brooks' conclusion is correct as far as the unreliability of complex algorithmic software is concerned, it is correct for the wrong reason. I argue that software programs are unreliable not because they are complex (Brooks' conclusion), but because they are algorithmic in nature. In his paper, Brooks defines two types of complexity, essential and accidental. He writes:

The complexity of software is an essential property, not an accidental one.
According to Brooks, one can control the accidental complexity of software engineering (with the help of compilers, syntax and buffer overflow checkers, data typing, etc...), but one can do nothing about its essential complexity. Brooks then explains why he thinks this essential complexity leads to unreliability:

From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.
This immediately begs several questions: Why must the essential complexity of software automatically lead to unreliability? Why is this not also true of the essential complexity of other types of behaving systems? In other words, is the complexity of a brain or an integrated circuit any less essential than that of a software program? Brooks is mum on these questions even though he acknowledges in the same paper that the reliability and productivity problem has already been solved in hardware through large-scale integration.

More importantly, notice the specific claim that Brooks is making. He asserts that the unreliability of a program comes from the difficulty of enumerating and/or understanding all the possible states of the program. This is an often repeated claim in the software engineering community but it is fallacious nonetheless. It overlooks the fact that it is equally difficult to enumerate all the possible states of a complex hardware system. This is especially true if one considers that most such systems consist of many integrated circuits that interact with one another in very complex ways. Yet, in spite of this difficulty, hardware systems are orders of magnitude more robust than software systems (see the COSA Reliability Principle for more on this subject).

Brooks backs up his assertion with neither logic nor evidence. But even more disturbing, nobody in the ensuing years has bothered to challenge the validity of the claim. Rather, Brooks has been elevated to the status of a demigod in the software engineering community and his ideas on the causes of software unreliability are now bandied about as infallible dogma.

Targeting the Wrong Complexity

Obviously, whether essential or accidental, complexity is not, in and of itself, conducive to unreliability. There is something inherent in the nature of our software that makes it prone to failure, something that has nothing to do with complexity per se. Note that, when Brooks speaks of software, he has a particular type of software in mind:

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions.
By software, Brooks specifically means algorithmic software, the type of software which is coded in every computer in existence. Just like Alan Turing before him, Brooks fails to see past the algorithmic model. He fails to realize that the unreliability of software comes from not understanding the true nature of computing. It has nothing to do with the difficulty of enumerating all the states of a program. In the remainder of this article, I will argue that all the effort in time and money being spent on making software more reliable is being targeted at the wrong complexity, that of algorithmic software. And it is a particularly insidious and intractable form of complexity, one which humanity, fortunately, does not have to live with. Switch to the right complexity and the problem will disappear.

Next: Part IV, The Silver Bullet

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga

Sunday, October 19, 2008

Why Software Is Bad, Part II

Turing's Baby

Early computer scientists of the twentieth century were all trained mathematicians. They viewed the computer primarily as a tool with which to solve mathematical problems written in an algorithmic format. Indeed, the very name computer implies the ability to perform a calculation and return a result. Soon after the introduction of electronic computers in the 1950s, scientists fell in love with the ideas of famed British computer and artificial intelligence pioneer, Alan Turing. According to Turing, to be computable, a problem has to be executable on an abstract computer called the universal Turing machine (UTM). As everyone knows, a UTM (an infinitely long tape with a movable read/write head) is the quintessential algorithmic computer, a direct descendent of Lovelace's sequential stored program. It did not take long for the Turing computability model (TCM) to become the de facto religion of the entire computer industry.

A Fly in the Ointment

The UTM is a very powerful abstraction because it is perfectly suited to the automation of all sorts of serial tasks for problem solving. Lovelace and Babbage would have been delighted, but Turing's critics could argue that the UTM, being a sequential computer, cannot be used to simulate [see comments below] real-world problems which require multiple simultaneous computations. Turing's advocates could counter that the UTM is an idealized computer and, as such, can be imagined as having infinite read/write speed. The critics could then point out that, idealized or not, an infinitely fast computer introduces all sorts of logical/temporal headaches since all computations are performed simultaneously, making it unsuitable to inherently sequential problems. As the saying goes, you cannot have your cake and eat it too. At the very least, the TCM should have been extended to include both sequential and concurrent processes. However, having an infinite number of tapes and an infinite number of heads that can move from one tape to another would destroy the purity of the UTM ideal.

The Hidden Nature of Computing

The biggest problem with the UTM is not so much that it cannot be adapted to certain real-world parallel applications but that it hides the true nature of computing. Most students of computer science will recognize that a computer program is, in reality, a behaving machine (BM). That is to say, a program is an automaton that detects changes in its environment and effects changes in it. As such, it belongs in the same class of machines as biological nervous systems and integrated circuits. A basic universal behaving machine (UBM) consists, on the one hand, of a couple of elementary behaving entities (a sensor and an effector) or actors and, on the other, of an environment (a variable).

More complex UBMs consist of arbitrarily large numbers of actors and environmental variables. This computing model, which I have dubbed the behavioral computing model (BCM), is a radical departure from the TCM. Whereas a UTM is primarily a calculation tool for solving algorithmic problems, a UBM is simply an agent that reacts to one or more environmental stimuli. As seen in the figure below, in order for a UBM to act on and react to its environment, sensors and effectors must be able to communicate with each other.
The main point of this argument is that, even though communication is an essential part of the nature of computing, this is not readily apparent from examining a UTM. Indeed, there are no signaling entities, no signals and no signal pathways on a Turing tape or in computer memory. The reason is that, unlike hardware objects which are directly observable, software entities are virtual and must be logically inferred.

Fateful Choice

Unfortunately for the world, it did not occur to early computer scientists that a program is, at its core, a tightly integrated collection of communicating entities interacting with each other and with their environment. As a result, the computer industry had no choice but to embrace a method of software construction that sees the computer simply as a tool for the execution of instruction sequences. The problem with this approach is that it forces the programmer to explicitly identify and resolve a number of critical communication-related issues that, ideally, should have been implicitly and automatically handled at the system level. The TCM is now so ingrained in the collective mind of the software engineering community that most programmers do not even recognize these issues as having anything to do with either communication or behavior. This would not be such a bad thing except that a programmer cannot possibly be relied upon to resolve all the dependencies of a complex software application during a normal development cycle. Worse, given the inherently messy nature of algorithmic software, there is no guarantee that they can be completely resolved. This is true even if one had an unlimited amount of time to work on it. The end result is that software applications become less predictable and less stable as their complexity increases.

Part III: Why the Experts Are Wrong (continued)

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Non-Algorithmic

Monday, October 13, 2008

Why Software Is Bad, Part I

Part I, II, III, IV


I first wrote Why Software Is Bad and What We Can Do to Fix It back in June 2002. It was my first web article on what I think is wrong with computer programming. The ideas that had led to it had been brewing in my head since 1980. I think that now is a good time to revisit it with a critical eye looking through the lens of parallel programming. The next few posts will consist of chosen extracts from the original piece. I may add comments here and there as I go along.

The 'No Silver Bullet' Syndrome

Not long ago, in an otherwise superb article [pdf] on the software reliability crisis published by MIT Technology Review, the author blamed the problem on everything from bad planning and business decisions to bad programmers. The proposed solution: bring in the lawyers. Not once did the article mention that the computer industry's fundamental approach to software construction might be flawed. The reason for this omission has to do in part with a highly influential paper that was published in 1987 by a now famous computer scientist named Frederick P. Brooks. In the paper, titled "No Silver Bullet--Essence and Accidents of Software Engineering", Dr. Brooks writes:

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any--no inventions that will do for software productivity, reliability, and simplicity what electronics, transistors, and large-scale integration did for computer hardware.
No other paper in the annals of software engineering has had a more detrimental effect on humanity's efforts to find a solution to the software reliability crisis. Almost single-handedly, it succeeded in convincing the entire software development community that there is no hope in trying to find a solution. It is a rather unfortunate chapter in the history of programming. Untold billions of dollars and even human lives have been and will be wasted as a result.

When Brooks wrote his famous paper, he apparently did not realize that his arguments applied only to algorithmic complexity. Most people in the software engineering community wrongly assume that algorithmic software is the only possible type of software. Non-algorithmic or synchronous reactive software is similar to the signal-based model used in electronic circuits. It is, by its very nature, extremely stable and much easier to manage. This is evident in the amazing reliability of integrated circuits.
Calling in the lawyers and hiring more software experts schooled in an ancient paradigm will not solve the problem. It will only be costlier and, in the end, deadlier. The reason is threefold. First, the complexity and ubiquity of software continue to grow unabated. Second, the threat of lawsuits means that the cost of software development will skyrocket (lawyers, experts and trained engineers do not work for beans). Third, the incremental stopgap measures offered by the experts are not designed to get to the heart of the problem. They are designed to provide short-term relief at the expense of keeping the experts employed. In the meantime, the crisis continues.

Ancient Paradigm

Why ancient paradigm? Because the root cause of the crisis is as old as Lady Ada Lovelace who invented the sequential stored program (or table of instructions) for Charles Babbage's analytical engine around 1842. Built out of gears and rotating shafts, the analytical engine was the first true general-purpose numerical computer, the ancestor of the modern electronic computer. But the idea of using a step-by-step procedure in a machine is at least as old as Jacquard's punched cards, which were used to control the first automated loom in 1801. The Persian mathematician Muhammad ibn Mūsā al-Khwārizmī is credited for having invented the algorithm in 825 AD, as a problem solving method. The word algorithm derives from 'al-Khwārizmī.'

Part II: Why the Experts Are Wrong

Thursday, October 9, 2008

COSA: A New Kind of Programming, Part VI

Part I, II, III, IV, V, VI


In Part V, I came out against the use of messaging for common communication between components and showed my preference for the flexibility of reactive data sensing and data connectors. In this post, I describe a new approach to task prioritization based on a simple extension to the COSA behavior control mechanism.

Thread-Based Prioritization

One of the few advantages of using multithreading is that it makes it possible to prioritize threads. Even though this capability will not matter much with the advent of processors sporting hundreds or thousands of cores, the fact remains that, whenever processing power is at a premium, it makes sense to allocate more of the processor’s cycles to critical functions that must execute in real time. It goes without saying that thread-based priority scheduling wreaks havoc with deterministic timing, which is one of the reasons that the COSA software model does not use threads.

Message-Based Prioritization

Since COSA is not multithreaded, my initial approach was to use a message-based scheme for task prioritization. Lately, however, I have been having serious doubts about the suitability of messaging for inter-component communication. The approach that I originally had in mind would use a prioritized message queue within a client-server context. High priority messages would simply go to the head of the queue. It then occurred to me that a queued client-server approach makes no sense in an inherently parallel environment. Indeed, why have a serial server processing queued requests one at a time when using multiple parallel server clones could scale in performance as the number of processor cores is increased? Not a good idea.

Behavior-Based Prioritization

I have already explained how basic behavior control is done in COSA. While the Stop command can be used at times to save cycles, this is not its main intended function. Its main function is to prevent conflicts among cooperating components. Besides, an improperly timed Stop command will probably result in failure because intermediate results are discarded. I figure that the best way is to introduce a new control mechanism that can temporarily pause a component. I call it the Pause Effector. Its purpose is to give the system the ability to alleviate the processor’s load so that the more time-critical tasks can be processed in real time. The caveat is that any signal received by the component during its comatose state will be ignored.
The way it works is as follows. When a signal arrives at the ‘Pause’ terminal, the component goes into a coma and all cell activities in the component are suspended. Internally, the system sets a ‘stop’ flag in all the cells that causes the processor to stop processing them. All cells that were about to be processed in the next cycle are placed in a temporary hold buffer. On reception of a ‘Continue’ signal, the system clears the ‘stop’ flag in all the cells and the cells that were in the temporary hold buffer are transferred into the processor’s input buffer for processing. A signal is emitted to indicate that the component is once again active.

Load Manager

At this point, I believe that the Pause effector should not be directly accessible by user applications. Every COSA operating system will have a Load Manager whose job it is to manage processor load according to every component’s load requirement. My current thinking is that only the Load Manager should control the Pause Effector but this may change if a good enough reason is brought to my attention. Again, I will say that I don’t think that task prioritization will be needed in the future with the advent of processors with hundreds and even thousands of cores.

In a future article, I will introduce the COSA Connection Manager, a special COSA system component that makes it possible for a component to modify its programming on the fly. This is essential for certain adaptive applications like neural networks and other machine learning programs.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Monday, October 6, 2008

COSA: A New Kind of Programming, Part V

Part I, II, III, IV, V, VI


In Part IV, I described the overall control architecture used in the COSA programming model and what happens to a component when it receives a ‘Start’ or ‘Stop’ signal. In this installment, I explain why simple reactive data connectors are better than message connectors.

The Problem with Messages

Whenever two or more components cooperate on performing a common task, they must not only be attentive to changes in their own immediate environment, but also, to what the other components are doing. When I first devised the COSA model, I had assumed that sending messages was the best way for components to communicate with one another. I have since changed my mind.

The problem with sending messages is that the message sender has no way of knowing when the message should be sent and, as a result, the receiver has no choice but to query the sender for info. This complicates things because it requires the use of a two-way communication mechanism involving two message connectors or a two-line multi-connector. Alternatively, the sender is forced to send messages all the time whether or not the receiver needs it. I now think that messaging should be used strictly in conjunction with a messenger component dedicated to that purpose. I am also slowly coming to the conclusion that messaging should be used only in a distributed environment between components that reside on separate machines. This means that the COSA pages, in particular, the operating system and software composition pages, are in dire need of a revision. I will expand on this in a future post.

Reactive Data Sensing

The communication method that I prefer is reactive data sensing in which components simply watch one another’s actions within a shared data environment. This way, sharers can immediately sense any change in relevant data and react to it if desired. Since it is up to the receiving components to decide whether or not a change is relevant to them, it makes future extensions easier to implement. Sure, you can have restrictions such as which components have permission to effect changes but the component that owns the data should not impose any restriction on when or whether those changes are detected and used by others. Reactive data sensing is a simple matter of creating sensors (comparison operators) and associating them to relevant effectors. Effector-sensor association is done automatically in COSA.
In the figure above, the dotted line means that the + effector is associated with the != sensor. The way it works is that the != comparison operation is performed automatically every time the effector executes its operation. If the assigned change occurs, the sensor emits a signal.

Control Timing

Reactive sensing makes sense within the context of behavior control. The primary reasons for stopping or deactivating a component are that a) its services are either not needed at certain times (deactivation saves cycles) or b) they would conflict with the work of another component (this prevents errors). The main job of a COSA Supervisor is to precisely time the activation and deactivation of various components under its charge so as to ensure smooth cooperation while eliminating conflicts and improving system performance.

Data Connectors

A data connector is just mechanism for sharing data. The data must reside in only one component, the owner or server. The latter has read/write permission on its own data. A client component that is attached to a data owner via a data connector has only read permission by default. It is up to the component designer to specify whether client components can have write permission as well. The figure below illustrates the use of both data and signal connectors. To the right is the color convention that I currently use for the connectors. Note that control connectors are signal connectors.
The data assigned to a data connector can be a single variable, a list of variables, a C++ like data structure or an array. In a design environment, double-clicking on a data connector will open up a box showing the type of data that is assigned to it and the effector and sensors assigned to each data item, if any.

In Part VI, I will introduce a new COSA high-level behavior control mechanism with two complementary commands, ‘Pause’ and ‘Continue’.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Saturday, October 4, 2008

COSA: A New Kind of Programming, Part IV

Part I, II, III, IV, V, VI

In Part III, I introduced the concept of the control effector, a high-level behavior control mechanism that makes it possible to control a COSA component as if it were a low-level effector. In this post, I describe the overall control architecture used in the COSA programming model and I explain what happens internally to a component under control.

Control Architecture

None of this is chiseled in stone but the way I envisioned it, every high-level component should contain a single master supervisor in charge of one or more slave components. The only components that do not contain a supervisor are low-level components, i.e., components that are composed of cells. It goes without saying that a supervisor is a low-level component. (See Two Types of Components). In the figure below, a component is shown with its supervisor and five slave components. Note that only the control connections are shown for clarity. Normally, the slave components will make data connections with one another and some of the connectors may be visible outside the component. Every one of the slave components may have internal supervisors of their own, if necessary.

Only a supervisor can access the control connector (small red circle) of a slave. The latter cannot access the control connector of its supervisor or that of another slave. The control connector of a supervisor component can only be accessed by an external supervisor component. When a supervisor receives a start or stop signal, it may pass it to the slaves concurrently or in a given sequential order dictated by the design. In an actual development environment, the order in which the components are activated can be shown in slow motion by visually highlighting them in some fashion.

Control Effector

A control effector is a special COSA cell that is used to activate and/or deactivate a low-level component. In a previous illustration (reproduced below) I showed a control effector connected to its 3-input multi-connector. That is the default configuration. This is not a requirement, however. It is up to the designer to decide what to do with the start and stop signals. For example, the component may need to initialize or reset certain variables after starting and before stopping. Or it may do nothing other than outputting a ‘Done’ signal when it receives a ‘Stop’ signal (by routing the ‘Stop’ signal to the ‘Done’ terminal). It also has the option of stopping itself for whatever reason by sending a 'Stop' signal to its control effector.

Stopping a component means to deactivate it so that it can no longer process signals. Deactivating a component is a simple matter of clearing a special activation flag in every cell of the component. This causes the processor to ignore them. A component is fully deactivated only if its control effector receives a 'Stop' signal.


Activating or starting a component is a little more complicated than just resetting the activation flags. Recall that, unlike conventional programming models, COSA is a change-based or reactive model that uses reactive sensing. That is to say, in a COSA program, a comparison operation (i.e., sensor) is not explicitly executed by the program but is invoked automatically whenever there is a change in a data variable that may potentially affect the comparison (See Effector-Sensor Associations). Being a change detector, a sensor must compare the current state of its assigned variable with its previous state. It fires only when there is a correct change. For example, a non-zero sensor fires only if its variable changes from zero to non-zero. The problem is that, when a component is activated, its sensors have no idea what the previous states were. The solution is to set all sensors to their default states upon activation and invoke them immediately afterwards. This way, when a component is activated, all of its sensors perform their assigned tests or comparisons on their assigned data and fire if necessary. A component is fully activated when its control effector receives a 'Start' signal.

In Part V, I will describe reactive data connectors and explain why they are preferable to active message passing.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Thursday, October 2, 2008

COSA: A New Kind of Programming, Part III

Part I, II, III, IV, V, VI


In Part II, I showed that behavior control in the COSA programming model is based on a simple master/slave mechanism that uses complementary start/stop control commands. In this post, I reveal how the same method can be used to control high-level components as well.

Component As Effector

Controlling a component simply means that the component can be seen, for all intents and purposes, as a low-level COSA effector. Every component will have a special control/effector cell connected to a multi-connector with three connections: two input connections for the start and stop signals and one output connection for the ‘done’ signal that is emitted when the component is finished with its task. The control effector can be used both internally and externally.

Connectors are great because they enforce plug-compatibility and make drag-and-drop software composition possible, but they don’t do much for program comprehension. The reason is that part of a connector’s purpose is to hide information so as not to overwhelm the user with too much complexity. My philosophy is that information should be delivered on an as-needed basis only. That being said, it would be nice if we could summons a special view of a group of a component in order to see exactly how they interact together.

An Example

Let’s say we have a water level component that is used to control a pump that maintains water in a tank at a certain level. Suppose we don’t want the pump to be turned on too often for maintenance or costs reasons. To do that, we can use a generic real-time timer cell to wait, say, 30 minutes between activations. We could incorporate the timer cell directly into the water level component but the latter would no longer be a generic component. A better alternative is to create a separate supervisor component that uses the timer cell to control the activation of the water level component. The figure below shows the two components, as they would normally appear.
In a complex program, the supervisor component would normally be extended to control an indefinite number of slave components. Note that, while the figure shows the components involved, it does not tell us how the water level component is controlled. For that, we need a control view. As seen below, the control view is a simple diagram that depicts the manner in which the water level component is controlled by the supervisor. It displays only the control connections; all other connections, if any, are omitted.
Essentially, the water level component emits a signal when it's done. This signal is used to start the delay timer. At the end of the delay, the timer emits a Done signal, which is used to start the water level component and the cycle begins anew.

In Part IV, I will describe the overall control architecture of a COSA component and explain what happens internally to a component when it receives a start or a stop signal.

Related articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Saturday, September 27, 2008

COSA: A New Kind of Programming, Part II

Part I, II, III, IV, V, VI


In Part I of this multi-part article, I wrote that the master/slave approach to elementary behavior control in the COSA programming model should be extended to high-level software construction. My thesis is that this control mechanism is so simple and intuitive that it will revolutionize computer programming by moving it out of the exclusive realm of computer nerds into that of the average computer user. Since COSA is inherently parallel, this will, in effect, solve the parallel programming problem. Below, I go over the elementary principles of control used in COSA.

Effector Control

Most of us are familiar with the controls that come with many of our electrical and electronic appliances. Some may be a little bit fancier than others but almost all come equipped with a minimum set of controls: the on/off (start/stop) buttons. To reuse the metaphor of the previous post, we are the masters and the appliances are the slaves. It turns out that motor command neurons in the brain’s basal ganglia use a similar method to control their target muscles: excitatory (start) and inhibitory (stop) signals. The way it works is that the neuron begins firing as soon as it receives an excitatory signal and stops firing when it receives an inhibitory signal. This is what gave me the original idea for COSA effectors. The addition effector shown below will repeat its operation over and over until it receives a stop command.

A single motor command neuron may receive excitatory and inhibitory signals from hundreds or even thousands of afferent synaptic connections. It goes without saying that the basal ganglia are using some kind of error detection mechanism in order to keep all those incoming control signals from conflicting with one another. COSA effectors, too, use a special mechanism that automatically detects command conflicts. It is applied during application development for debugging purposes and it is based on what I call the principle of motor coordination:
No action can be started if it has already started, or stopped if it is already stopped.

In sum, low-level behavior control in COSA is a very simple thing that even children can grasp. In Part III, I will explain how to control the behavior of high-level COSA components by applying the same principles used with elementary objects.

Related articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Thursday, September 25, 2008

COSA: A New Kind of Programming, Part I

Part I, II, III, IV, V, VI


A few exciting ideas related to the on-going evolution of the COSA programming model have been percolating in my mind for quite some time. I wrote a little about them in my recent article, Parallel Computing: Command Hierarchy. These ideas form the basis of a radically new way of looking at software construction that is so intuitive, it promises (or threatens, as the case may be) to reclassify computer programming as a mostly geek-only occupation into something that the average computer user can partake in and enjoy. This multi-part article is an account of the reasoning that led to my current thinking.

Something Is Missing

There is no doubt that the graphical approach to parallel programming can do wonders to productivity and program comprehension. It is true that the use of plug-compatible components in COSA will facilitate drag-and-drop software composition but the simple and intuitive feel that one gets from connecting a sensor to an effector is absent in the realm of high-level components.

Even though looking at a bunch of interconnected COSA components may give one a sense of the various functional parts of a program, the manner in which the parts cooperate to accomplish a task is not obvious. Something is missing.
Masters and Slaves

I realized that the only way to solve the problem is to return to COSA’s roots. The COSA philosophy is that a program is a behaving machine that senses and effects changes in its environment. Of course, there is more to the design of a behaving machine than realizing that programs behave. We, as designers, want the program to behave in a certain way, that is to say, we want control. At the lowest level, we can control the behavior of a program by connecting specific sensors to specific effectors. The applicable metaphor, in this example, is that the sensor is the master or controller and the effector is the slave, i.e., the object that is under control. I rather like the master/slave symbolism because it perfectly illustrates the principle of complementarity. Those of you who have followed my work over the years know that I have always maintained that complementarity is the most important of all the COSA principles because it is the basic underlying principle of every complex organism.

In part II, I will describe how behavior control in COSA works at the elementary level.

Related articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Friday, September 19, 2008

Parallel Computing: Both CPU and GPU Are Doomed

Tim Sweeny

A few weeks after I wrote Heralding The Impending Death of the CPU, Tim Sweeney, the renowned founder of Epics Games and pioneering game engine designer, predicts the impending fall of the GPU. In an interview titled Twilight of the GPU: an epic interview with Tim Sweeney, published by Ars Technica last Saturday, the same day hurricane Ike ripped Texas a new one, Sweeny does the same to the future of graphics processors. Here is something that Sweeny said that caught my attention:
In the next console generation you could have consoles consist of a single non-commodity chip. It could be a general processor, whether it evolved from a past CPU architecture or GPU architecture, and it could potentially run everything—the graphics, the AI, sound, and all these systems in an entirely homogeneous manner. That's a very interesting prospect, because it could dramatically simplify the toolset and the processes for creating software.
This is exactly what I have been saying for a long time. Homogeneity and universality are the names of the new game. I may not agree with Sweeny on what development tools we will use in the future (he seems to be married to the old C, C++ linguistic approach), but he is absolutely correct about the future of parallel processors.


This brings me to thinking about Nvidia. Unlike Intel and AMD, Nvidia’s financial future is not tied to the CPU. The CPU will soon join the vacuum tube and the buggy whip in the heap of obsolete technologies. The future of parallel computing is in vector processing and, as we all know, Nvidia’s GPUs are vector processors. Sure, GPUs are not universal parallel processors because they use an SIMD (single instruction, multiple data) configuration. However, this is a problem that Nvidia will eventually correct by switching over to a pure MIMD (multiple instruction, multiple data) vector architecture. In my opinion, Nvidia is ideally positioned to dominate the processor industry in the decades to come. That is, assuming its leadership is shrewd enough to see and heed the writings on the wall.

PS. As an aside, a little over a month ago, Tim wrote a couple of comments to my article Larrabee: Intel's Hideous Heterogeneous Beast.

Related Articles:

Parallel Computing: The Fourth Crisis
Radical Future of Computing, Part II
Heralding the Impending Death of the CPU
Transforming the TILE64 into a Kick-Ass Parallel Machine
How to Solve the Parallel Programming Crisis

Wednesday, September 10, 2008

Parallel Computing: Command Hierarchy


I had originally decided not to publish this article because it reveals a powerful aspect of the COSA software model that I could conceivably use to my advantage should I eventually form a company around COSA, but who cares? I have already given away tons of my ideas; so one more will not matter much. Long ago, I realized that the best ideas are the ones that can be explained via metaphors. The reason is that a metaphor uses a familiar and well-understood concept to elucidate one that is not so familiar but is structurally similar. Below, I describe how the metaphor of a command hierarchy can serve as a powerful organizational principle for composing parallel applications.

Command Hierarchy

In the real world, parallel entities normally go about their business without the need for a hierarchical command structure. They just interact via sensors and effectors according to their set behaviors. However, any time there is a need to perform a complex task or to reach a long-term goal that requires the cooperation of many, it pays to divide the subtasks among various individuals under the direction of a supervisor. If the task is very complex, the supervisors may in turn be placed under the supervision of an even higher authority, and so on. The use of a command hierarchy is a very powerful way to efficiently organize any complex, goal-oriented system consisting of multiple actors or individuals. It is for this reason that it is used by all large organizations such as corporations, churches, armies, etc. Since a COSA application is a collection of behaving entities, it makes sense to use the same method to organize and control their behavior.

The COSA Supervisor

Some of my readers may be familiar with object-oriented programming languages like C++, Smalltalk or Java. It is important not to confuse a class hierarchy in OOP with a command hierarchy. Unlike high-level C++ objects that have their own methods, a high-level command component in COSA does nothing other than supervise the manner in which its subordinate components perform their actions. This is not unlike the way a conductor conducts a musical orchestra. Remember that a COSA component can be either active or inactive. When activated, a component’s sensors immediately detect input changes, if any. If necessary, the component accomplishes its task and, when the task is done, it sends an output signal before going back to sleep. The job of a COSA supervisor component is to determine the right time to activate (or deactivate) various components under its control in order to accomplish a task. This does not mean that the tasks are necessarily consecutive, however. Any number of components may be running concurrently.

COSA QuickSort Revisited

One of the nice things about using a supervisor is that there is no longer a need for low-level components to send messages to each other. They just communicate with the supervisor at the start and end of their tasks. These low-level components will use what I call a data sensor, a simple connector that allows one component to see or sense data changes in another. If time permits, I will redesign the COSA QuickSort component to include the use of a COSA supervisor and a command hierarchy.

See Also:
COSA: A New Kind of Programming

Friday, September 5, 2008

The Radical Future of Computing, Part II

Part I, II


This post is a continuation of my response to reader Marc’s interesting comments at the end of my recent article, Heralding the Impending Death of the CPU.

The Market Wants Speed and Low Energy Consumption

The microprocessor market is also highly fragmented between cheap low-end processor makers like Microchip and Atmel, and desktop makers. The desktop players have their own mindset that has made them successful in the past. The obviously-easily parallelizable tasks (sound, graphics...) are so common that custom parallel processors were designed for them. You might be able to get Microchip to squeeze in 20 16f84 microcontrollers on one piece of silicon and could easily use a bunch of cheap PICs to emulate a bunch of 20 vector processors with current technology at a chip cost of maybe $100. But then, the optimum bus design would vary on the application.

What application would be most compelling to investors? I don't know... But I think an FPGA or multi-PIC proof of concept would help your idea become implemented at low cost, and a "suggestion software on how to parallelize applications" for sequentially-thinking programmers, combined with a parallel processor emulator for conventional chip architectures would help programmers see parallel programmingas an approachable solution instead of a venture capitalist buzzword.

Well, I am not so sure that this would attract the people with the money. I sense that, when it comes to processors, people are more impressed with proven performance than anything else. And, nowadays, people also want low energy usage to go with the speed. Sure, it would be cool if I could demonstrate a powerful parallel programming tool, but it would be an expensive thing to develop and it would not prove the superiority of the target processor. What I would like to deliver, as an introduction, is a low wattage, general-purpose, single-core processor that is several times more powerful (measured in MIPS) than say, an Intel or AMD processor with four or more cores. I think I can do it using vector processing. This, too, is not something that can be built cheaply, in my estimation. It must be designed from scratch.

SIMD Vector Processor: Who Ordered That?

At this point in the game, there should be no doubt in anyone’s mind that vector processing is the way to go. As GPUs have already amply demonstrated, vector processing delivers both high performance and fine-grain deterministic parallelism. Nothing else can come close. That multicore vendors would want to use anything other than a vector core is an indication of the general malaise and wrongheadedness that have gripped the computer industry. As everyone knows, multithreading and vector processing are incompatible approaches to parallelism. For some unfathomable reason that will keep future psycho-historians busy, the computer intelligentsia cannot see past multithreading as a solution to general purpose parallel computing. That's too bad because, unless they change their perspective, they will fall by the wayside.

When I found out that Intel was slapping x86 cores laced together with SIMD vector units in their upcoming Larrabee GPU, I could not help cringing. What a waste of good silicon! The truth is that the only reason that current vector processors (GPUs) are not suitable for general-purpose parallel applications is that they use an SIMD (single instruction, multiple data) configuration. This is absurd to the extreme, in my opinion. Why SIMD? Who ordered that? Is it not obvious that what is needed is an MIMD (multiple instruction, multiple data) vector core? And it is not just because fine-grain MIMD would be ideal for general-purpose parallel applications, it would do wonders for graphics processing as well. Why? Because (correct me if I’m wrong) it happens that many times during processing, a bunch of SIMD vector units will sit idle because the program calls for only a few units (one instruction at a time) to be used on a single batch of data. The result is that the processor is underutilized. Wouldn't it be orders of magnitude better if other batches of data could be processed simultaneously using different instructions? Of course it would, if only because the parallel performance of a processor is directly dependent on the number of instructions that it can execute at any given time.

MIMD Vector Processing Is the Way to Go

Most of my readers know that I absolutely abhor the multithreading approach to parallelism. I feel the same way about CPUs. A day will come soon when the CPU will be seen as the abomination that it always was (see Heralding the Impending Death of the CPU for more on this topic). However, SIMD vector processors are not the way to go either even if they have shown much higher performance than CPUs in limited domains. It is not just that they lack universality (an unforgivable sin, in my view) but the underutilization problem that is the bane of the SIMD model will only get worse when future vector processors are delivered with thousands or even millions of parallel vector units. The solution, obviously, is to design and build pure MIMD vector processors. As I explained in a previous article on Tilera’s TILE64, the best way to design an MIMD vector processor is to ensure that the proportion of vector units for every instruction reflects the overall usage statistics for that instruction. This would guarantee that a greater percentage of the units are used most of the time, which would, in turn, result in much lower power consumption and greater utilization of the die’s real estate for a given performance level. Of course, a pure MIMD vector core is useless unless you also have the correct parallel software model to use it with, which is what COSA is all about.

Have you looked at any Opencores designs?
No, I haven’t. The open source issue is very interesting but it opens a whole can of worms that I'd better leave for a future article.

Wednesday, September 3, 2008

The Radical Future of Computing, Part I

Part I, II


A reader named Marc left an interesting comment at the end of my previous post, Heralding the Impending Death of the CPU. I hope Marc will forgive me for using his words as a vehicle on which to piggyback my message.

Linguistic Origin of Programming

I think that algorithmic programming is popular because it is similar to the way many of us write in western natural language; people plan whether a thought should be after or before a previous one in academic essays, which is inherently sequential in nature.
I agree. I believe that the modern computer evolved from the sequential/algorithmic needs of mathematicians like Charles Babbage and Ada Lovelace. As you know, linguistic or textual symbols are perfect for expressing mathematical algorithms. I have often wondered what kind of computers we would have if clockmakers or locomotive engineers had had a more direct influence on the development of early computer technology. Those folks are more accustomed to having multiple interacting mechanisms performing their functions concurrently.

Note also that the typewriter predated modern computers and served as a model for the input device (keyboard) of the first mass market computers and has profoundly affected the way we perceive them. Although the mouse was a major influence in changing human-computer interactions, the event-driven approach to programming that it engendered somehow failed to convince computer scientists that every action in programming should be a reaction to some other action (event-driven), down to the instruction level. Hopefully, the new multi-touch screen technologies will drastically change our idea of what a computer is or should be.

Petri Nets, Conditions, Metaphors and Simplicity

Native parallel programming requires that the programmer (or implementer if you'd rather call it that) decides what are the conditions that have to be met for each cell to trigger and what are the outputs that are produced based on those conditions so it requires skills that are part-user, part coder. Petri Nets are a great graphical symbolism for this. It actually requires that people focus on the problem instead of style.
I agree. Nobody should have to worry about syntax or have to learn the meaning of a token (in someone else’s native language and alphabet) in order to program a computer. Only a few graphical symbols (sensors, effectors, pathways, data and encapsulations) should be allowed. Labeling should be done in the designer’s preferred language. I believe that the main reason that graphical programming languages have not taken off is that their designers not only don’t seem to appreciate the importance of encapsulation (information hiding), but they have a tendency to multiply symbols beyond necessity. I am a fanatic when it comes to simplicity.

One of my goals is to turn computer programming into something that the lay public will find approachable and enjoyable. In this regard, I think that even Petri Nets, in spite of their simplicity compared to other programming models, are still too complicated and too abstract, making them unpalatable to the masses or the casual developer. I rather like PNs and I am sorry that the concept never really became mainstream. However, I have a bone to pick with the notion of conditions (places?). Don’t take me wrong; I don’t disagree that there is a need for conditions. I just don’t think the token concept is intuitive or concrete enough to appeal to the layperson. In my opinion everything should be driven by events (changes or transitions). What Petri calls a transition is what I call a sensor. A condition, to me, is just a past or current event and, as such, it should be used in conjunction with sensors (logic sensors, sequence detectors, etc.). This makes it easy to extend the idea of conditions to include that of temporal expectations, a must for reliability in COSA.

That being said, the ideal programming metaphors, in my opinion, are those taken from the behavioral sciences such as psychology and neuroscience: stimulus/response, sensor/effector, sequence memory, action/reaction, environment (variables), etc… The reason is that a computer program is really a behaving machine that acts and reacts to changes in its environment. A layperson would have little trouble understanding these metaphors. Words like ‘transitions’, ‘tokens’ and ‘places’ don’t ring familiar bells. Let me add that, even though I applaud the clean graphical look of PNs, my main criticism is that they are not deterministic. In my view, this is an unpardonable sin. (I confess that I need to take another close look at PNs because it seems that they have evolved much over the years).

New Leadership to Replace the Old

To me, starting with a software specification before implementing a solution seems obvious, but my son has mainly sold freelance projects to business types based on his suggested user interface first; when he tried to tell his potential customers what data sources he used and how he got to his results, the customers' eyes would often glaze over...
Yes. People love to see pretty pictures, which is understandable because business types tend to see technology from a user’s perspective. They want to see the big picture and they don’t care about what makes it work. You make an interesting point because I have pretty much given up on selling my ideas directly to techies. I am slowly coming to the conclusion that the next computer revolution will have to emerge out of some government-funded initiative or some industry-wide consortium under the leadership of an independent, strategy-minded think tank. The reason is that the industry is in a deep malaise caused by the aging baby boomers who drove computer innovation in the last half of the 20th century, but lately have run out of ideas simply because they are old and set in their ways. I don't want to generalize too much but I think this is a major part of the problem. Their training has taught them to have a certain perspective on computing that is obviously not what is needed to solve the parallel programming and software reliability crises. Otherwise, they would have been solved decades ago. In fact, it is their perspective on computing that got the industry and the world into this mess in the first place.

As a case in point, consider this recent article at HPCwire by Michael Wolfe. It pretty much sums up what the pundits are thinking. Michael believes that “the ONLY reason to consider parallelism is for better performance.” I don’t know how old Michael is but it is obvious to me that his thinking is old and in serious need of an update. The problem is that the older computer nerds are still in charge at the various labs/universities around the world and they hold the purse strings that fund research and development. These folks have titles like CTO or Project Director or Chief Science Officer. That does not portend well for the future of computing.

As I wrote somewhere recently, the computer industry is in dire need of a seismic paradigm shift and there is only one way to do it. The old computer nerds must be forced into retirement and new leadership must be brought in. The new mandate should be to reevaluate the computing paradigms and models of the last century and assess their continued adequacy to the pressing problems that the industry is currently facing, such as the parallel programming and software reliability crises. If they are found to be inadequate (no doubt about it from my perspective), then they should be replaced. These kinds of strategic decisions are not going to be made by the old techies but by the business leaders, both in the private sector and within the government. Sometimes, it pays not to be too married to the technology because you can’t see the forest for the trees.

Software Should Be More Like Hardware and Vice Versa

There is plenty of parallel processing already going around in graphics processors, Field-programmable Gate Arrays and other Programmable Logic chips. It's just that people with software-experience who are used to a certain type of tool are afraid to make the effort to acquire what they see as hardware-type electrical engineer thought-habits; I know my programmer son would have an issue. The US has developed a dichotomy between electrical engineers and computer scientists.
Which is rather unfortunate, in my opinion. In principle, there should be no functional distinction between hardware and software, other than that software is flexible. I foresee a time when the distinction will be gone completely. The processor core as we know it will no longer exists. Instead, every operator will be a tiny, super-fast, parallel processor that can randomly access its data directly at any time without memory bus contention problems. We will have a kind of soft, super-parallel hardware that can instantly morph into any type of parallel computing program.

Programming for the Masses

Also "Talking heads" have a vested interest in promoting certain products that are only incremental improvements over the existing tools, because otherwise they would need to educate the clients about the details of the new paradigm, which would require extended marketing campaigns which would only pay back over the long term.
Yeah, legacy can be a big problem but it doesn’t have to be. I wrote about this before but you bring out the important issue of client education, which is a major part of the paradigm shift that I am promoting. I think the time has come to move application design and development from the realm of computer geeks into that of the end user. The solution to the parallel programming problem gives us an unprecedented opportunity to transform computer programming from a tedious craft that only nerds can enjoy into something that almost anybody can play with, even children. Now that multi-touch screens are beginning to enter the consumer market, I envision people using trial-and-error methods together with their hands and fingers (and possibly spoken commands) to quickly manipulate parallel 3-D objects on the screen in order to create powerful and novel applications. I see this as the future of programming, kind of like putting Lego blocks together. In this regard, I don’t think we will need to reeducate traditional programmers to accept and use the new paradigm. They will have to get with the new paradigm or risk becoming obsolete.

PS. I’ll respond to the rest of your message in Part II.

Thursday, August 28, 2008

Heralding the Impending Death of the CPU

Ancient Model vs. Modern Necessities

The modern CPU may seem like a product of the space age but its roots are rather ancient. British mathematician Charles Babbage first conceived of the principles behind the CPU more than 150 years ago and he was building on ideas that were older still, such as Jacquard’s punch card-driven loom and the algorithm, a mathematical concept that was invented a thousand years earlier by Persian mathematician, al-Khwārizmī. Like most mathematicians of his day, Babbage longed for a reliable machine that could automate the tedious task of calculating algorithms or tables of instructions. Parallel computing was the furthest thing from his mind. Yet amazingly, a century and a half later, the computer industry is still clinging to Babbage’s ancient computing model in the age of multicore computers.

The Good Way vs. the Bad Way

There are two main competing approaches to parallelism. The thread-based approach, the one chosen by the computer industry for general purpose computing, calls for having multiple instruction sequences processed side by side. The other approach, vector-based parallelism, calls for having multiple parallel lists of instructions, with the lists processed one after the other.
In the figure above, the small circles represent instructions or operations to be executed by the processor. The down arrows show the direction of execution. In thread-based parallelism, each thread can potentially be processed by a separate sequential core (CPU). In vector-based parallelism, the operations are fed into the processor as a collection of elements to be processed in parallel. For that, one would need a pure MIMD (multiple instructions, multiple data) vector processor in which every instruction is an independent vector that can be processed in parallel with the others. Both approaches will increase performance but, as I have explained elsewhere, the vector-based approach is better because it is deterministic and fine-grained; and it reflects the way parallel objects normally behave in nature.

Our brains are temporal signal-processing machines. The ability to sense the temporal relationships between events is essential to our understanding of the world around us. We can make sense of the vector-based approach because we can easily determine which processes are concurrent and which are sequential, and we can make a clear distinction between predecessors and successors. This sort of predictability is the sine qua non of learning and understanding. This is part of the basis of the COSA computing model. Multithreaded applications, by contrast, are hard to understand and maintain precisely because they are temporally non-deterministic. The idea that we can solve the parallel programming crisis by holding on to a flawed and inadequate programming model is preposterous, to say the least.

The World Does Not Need Another CPU

I was happy to read the recent news (see Techlogg’s analysis) that Nvidia has denied rumors that it was planning on developing its own x86 compatible CPU in order to compete against Intel and AMD. Good for them. The last thing the world needs is another x86 CPU, or any CPU for that matter. Nvidia should stick to vector processors because that is where the future of computing is. However, it will have to do something in order to counteract the fierce competition from AMD’s ATI graphics products and Intel’s upcoming Larrabee. The most sensible thing for Nvidia to do, in my opinion, is to transform its base GPU from an SIMD (single-instruction, multiple data) vector core into a pure MIMD vector core. This would make it an ideal processor core for Tilera’s TILE64™ as I suggested in my recent article on Tilera. Come to think of it, maybe Nvidia should just acquire Tilera. That would be a superb marriage of complementary technologies, in my opinion.


The CPU is on its deathbed. The doctors are trying their best to keep it alive but they can only postpone the inevitable for a little while longer. Soon it will die from old age but I, for one, will not be shedding any tears. Good riddance! It is not really hard to figure out what will replace it. It is not rocket science. It will take courage and fortitude more than brains. Who will be the standard bearer for the coming computer revolution? Which organization? Which company? Which country will see the writings on the wall and dominate computing for decades to come? Will it be Intel, AMD, Nvidia, or Tilera? Will it be the US, India, China, Germany, France, Spain, Sweden, Singapore, Japan or Taiwan? Who knows? Only time will tell but I sense that it won’t be long now.
Next: The Radical Future of Computing, Part I

Related Articles:

Parallel Computing: Both CPU and GPU Are Doomed
How to Solve the Parallel Programming Crisis
Transforming the TILE64 into a Kick-Ass Parallel Machine
Parallel Computing: Why the Future Is Non-Algorithmic
Why Parallel Programming Is So Hard
Parallel Computing: Why Legacy Is Not such a Big Problem
Parallel Computing, Math and the Curse of the Algorithm