Saturday, October 25, 2008

Why Software Is Bad, Part IV

Part I, II, III, IV

The Silver Bullet

The Billion Dollar Question

The billion (trillion?) dollar question is: What is it about the brain and integrated circuits that makes them so much more reliable in spite of their essential complexity? But even more important, can we emulate it in our software? If the answer is yes, then we have found the silver bullet.

Why Software Is Bad

Algorithmic software is unreliable because of the following reasons:

  • Brittleness

    An algorithm is not unlike a chain. Break a link and the entire chain is broken. As a result, algorithmic programs tend to suffer from catastrophic failures even in situations where the actual defect is minor and globally insignificant.

  • Temporal Inconsistency

    With algorithmic software it is virtually impossible to guarantee the timing of various processes because the execution times of subroutines vary unpredictably. They vary mainly because of a construct called ‘conditional branching’, a necessary decision mechanism used in instruction sequences. But that is not all. While a subroutine is being executed, the calling program goes into a coma. The use of threads and message passing between threads does somewhat alleviate the problem but the multithreading solution is way too coarse and unwieldy to make a difference in highly complex applications. And besides, a thread is just another algorithm. The inherent temporal uncertainty (from the point of view of the programmer) of algorithmic systems leads to program decisions happening at the wrong time, under the wrong conditions.

  • Unresolved Dependencies

    The biggest contributing factor to unreliability in software has to do with unresolved dependencies. In an algorithmic system, the enforcement of relationships among data items (part of what Brooks defines as the essence of software) is solely the responsibility of the programmer. That is to say, every time a property is changed by a statement or a subroutine, it is up to the programmer to remember to update every other part of the program that is potentially affected by the change. The problem is that relationships can be so numerous and complex that programmers often fail to resolve them all.
Why Hardware is Good

Brains and integrated circuits are, by contrast, parallel signal-based systems. Their reliability is due primarily to three reasons:
  • Strict Enforcement of Signal Timing through Synchronization

    Neurons fire at the right time, under the right temporal conditions. Timing is consistent because of the brain's synchronous architecture (**). A similar argument can be made with regard to integrated circuits.

  • Distributed Concurrent Architecture

    Since every element runs independently and synchronously, the localized malfunctions of a few (or even many) elements will not cause the catastrophic failure of the entire system.

  • Automatic Resolution of Event Dependencies

    A signal-based synchronous system makes it possible to automatically resolve event dependencies. That is to say, every change in a system's variable is immediately and automatically communicated to every object that depends on it.

Programs as Communication Systems

Although we are not accustomed to think of it as such, a computer program is, in reality, a communication system. During execution, every statement or instruction in an algorithmic procedure essentially sends a signal to the next statement, saying: 'I'm done, now it's your turn.' A statement should be seen as an elementary object having a single input and a single output. It waits for an input signal, does something, and then sends an output signal to the next object. Multiple objects are linked together to form a one-dimensional (single path) sequential chain. The problem is that, in an algorithm, communication is limited to only two objects at a time, a sender and a receiver. Consequently, even though there may be forks (conditional branches) along the way, a signal may only take one path at a time.

My thesis is that this mechanism is too restrictive and leads to unreliable software. Why? Because there are occasions when a particular event or action must be communicated to several objects simultaneously. This is known as an event dependency. Algorithmic development environments make it hard to attach orthogonal signaling branches to a sequential thread and therein lies the problem. The burden is on the programmer to remember to add code to handle delayed reaction cases: something that occurred previously in the procedure needs to be addressed at the earliest opportunity by another part of the program. Every so often we either forget to add the necessary code (usually, a call to a subroutine) or we fail to spot the dependency.

Event Dependencies and the Blind Code Problem

The state of a system at any given time is defined by the collection of properties (variables) that comprise the system's data, including the data contained in input/output registers. The relationships or dependencies between properties determine the system's behavior. A dependency simply means that a change in one property (also known as an event) must be followed by a change in one or more related properties. In order to ensure flawless and consistent behavior, it is imperative that all dependencies are resolved during development and are processed in a timely manner during execution. It takes intimate knowledge of an algorithmic program to identify and remember all the dependencies. Due to the large turnover in the software industry, programmers often inherit strange legacy code which aggravates the problem. Still, even good familiarity is not a guarantee that all dependencies will be spotted and correctly handled. Oftentimes, a program is so big and complex that its original authors completely lose sight of old dependencies. Blind code leads to wrong assumptions which often result in unexpected and catastrophic failures. The problem is so pervasive and so hard to fix that most managers in charge of maintaining complex mission-critical software systems will try to find alternative ways around a bug that do not involve modifying the existing code.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga

Tuesday, October 21, 2008

Why Software Is Bad, Part III

Part I, II, III, IV

Why The Experts Are Wrong (continued)

Turing's Monster

It is tempting to speculate that, had it not been for our early infatuation with the sanctity of the TCM, we might not be in the sorry mess that we are in today. Software engineers have had to deal with defective software from the very beginning. Computer time was expensive and, as was the practice in the early days, a programmer had to reserve access to a computer days and sometimes weeks in advance. So programmers found themselves spending countless hours meticulously scrutinizing program listings in search of bugs. By the mid 1970s, as software systems grew in complexity and applicability, people in the business began to talk of a reliability crisis. Innovations such as high-level languages, structured and/or object-oriented programming did little to solve the reliability problem. Turing's baby had quickly grown into a monster.

Vested Interest

Software reliability experts (such as the folks at Cigital) have a vested interest in seeing that the crisis lasts as long as possible. It is their raison d'être. Computer scientists and software engineers love Dr. Brooks' ideas because an insoluble software crisis affords them with a well-paying job and a lifetime career as reliability engineers. Not that these folks do not bring worthwhile advances to the table. They do. But looking for a breakthrough solution that will produce Brooks' order-of-magnitude improvement in reliability and productivity is not on their agenda. They adamantly deny that such a breakthrough is even possible. Brooks' paper is their new testament and 'no silver bullet' their mantra. Worst of all, most of them are sincere in their convictions.This attitude (pathological denial) has the unfortunate effect of prolonging the crisis. Most of the burden of ensuring the reliability of software is now resting squarely on the programmer's shoulders. An entire reliability industry has sprouted with countless experts and tool vendors touting various labor-intensive engineering recipes, theories and practices. But more than thirty years after people began to refer to the problem as a crisis, it is worse than ever. As the Technology Review article points out, the cost has been staggering.

There Is a Silver Bullet After All

Reliability is best understood in terms of complexity vs. defects. A program consisting of one thousand lines of code is generally more complex and less reliable than a one with a hundred lines of code. Due to its sheer astronomical complexity, the human brain is the most reliable behaving system in the world. Its reliability is many orders of magnitude greater than that of any complex program in existence (see devil's advocate). Any software application with the complexity of the brain would be so riddled with bugs as to be unusable. Conversely, given their low relative complexity, any software application with the reliability of the brain would almost never fail. Imagine how complex it is to be able to recognize someone's face under all sorts of lighting conditions, velocities and orientations. Just driving a car around town (taxi drivers do it all day long, everyday) without getting lost or into an accident is incredibly more complex than anything any software program in existence can accomplish. Sure brains make mistakes, but the things that they do are so complex, especially the myriads of little things that we are oblivious to, that the mistakes pale in comparison to the successes. And when they do make mistakes, it is usually due to physical reasons (e.g., sickness, intoxication, injuries, genetic defects, etc...) or to external circumstances beyond their control (e.g., they did not know). Mistakes are rarely the result of defects in the brain's existing software.The brain is proof that the reliability of a behaving system (which is what a computer program is) does not have to be inversely proportional to its complexity, as is the case with current software systems. In fact, the more complex the brain gets (as it learns), the more reliable it becomes. But the brain is not the only proof that we have of the existence of a silver bullet. We all know of the amazing reliability of integrated circuits. No one can seriously deny that a modern CPU is a very complex device, what with some of the high-end chips from Intel, AMD and others sporting hundreds of millions of transistors. Yet, in all the years that I have owned and used computers, only once did a CPU fail on me and it was because its cooling fan stopped working. This seems to be the norm with integrated circuits in general: when they fail, it is almost always due to a physical fault and almost never to a defect in the logic. Moore's law does not seem to have had a deleterious effect on hardware reliability since, to my knowledge, the reliability of CPUs and other large scale integrated circuits did not degrade over the years as they increased in speed and complexity.

Deconstructing Brooks' Complexity Arguments

Frederick Brooks' arguments fall apart in one important area. Although Brooks' conclusion is correct as far as the unreliability of complex algorithmic software is concerned, it is correct for the wrong reason. I argue that software programs are unreliable not because they are complex (Brooks' conclusion), but because they are algorithmic in nature. In his paper, Brooks defines two types of complexity, essential and accidental. He writes:

The complexity of software is an essential property, not an accidental one.
According to Brooks, one can control the accidental complexity of software engineering (with the help of compilers, syntax and buffer overflow checkers, data typing, etc...), but one can do nothing about its essential complexity. Brooks then explains why he thinks this essential complexity leads to unreliability:

From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.
This immediately begs several questions: Why must the essential complexity of software automatically lead to unreliability? Why is this not also true of the essential complexity of other types of behaving systems? In other words, is the complexity of a brain or an integrated circuit any less essential than that of a software program? Brooks is mum on these questions even though he acknowledges in the same paper that the reliability and productivity problem has already been solved in hardware through large-scale integration.

More importantly, notice the specific claim that Brooks is making. He asserts that the unreliability of a program comes from the difficulty of enumerating and/or understanding all the possible states of the program. This is an often repeated claim in the software engineering community but it is fallacious nonetheless. It overlooks the fact that it is equally difficult to enumerate all the possible states of a complex hardware system. This is especially true if one considers that most such systems consist of many integrated circuits that interact with one another in very complex ways. Yet, in spite of this difficulty, hardware systems are orders of magnitude more robust than software systems (see the COSA Reliability Principle for more on this subject).

Brooks backs up his assertion with neither logic nor evidence. But even more disturbing, nobody in the ensuing years has bothered to challenge the validity of the claim. Rather, Brooks has been elevated to the status of a demigod in the software engineering community and his ideas on the causes of software unreliability are now bandied about as infallible dogma.

Targeting the Wrong Complexity

Obviously, whether essential or accidental, complexity is not, in and of itself, conducive to unreliability. There is something inherent in the nature of our software that makes it prone to failure, something that has nothing to do with complexity per se. Note that, when Brooks speaks of software, he has a particular type of software in mind:

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions.
By software, Brooks specifically means algorithmic software, the type of software which is coded in every computer in existence. Just like Alan Turing before him, Brooks fails to see past the algorithmic model. He fails to realize that the unreliability of software comes from not understanding the true nature of computing. It has nothing to do with the difficulty of enumerating all the states of a program. In the remainder of this article, I will argue that all the effort in time and money being spent on making software more reliable is being targeted at the wrong complexity, that of algorithmic software. And it is a particularly insidious and intractable form of complexity, one which humanity, fortunately, does not have to live with. Switch to the right complexity and the problem will disappear.

Next: Part IV, The Silver Bullet

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga

Sunday, October 19, 2008

Why Software Is Bad, Part II

Turing's Baby

Early computer scientists of the twentieth century were all trained mathematicians. They viewed the computer primarily as a tool with which to solve mathematical problems written in an algorithmic format. Indeed, the very name computer implies the ability to perform a calculation and return a result. Soon after the introduction of electronic computers in the 1950s, scientists fell in love with the ideas of famed British computer and artificial intelligence pioneer, Alan Turing. According to Turing, to be computable, a problem has to be executable on an abstract computer called the universal Turing machine (UTM). As everyone knows, a UTM (an infinitely long tape with a movable read/write head) is the quintessential algorithmic computer, a direct descendent of Lovelace's sequential stored program. It did not take long for the Turing computability model (TCM) to become the de facto religion of the entire computer industry.

A Fly in the Ointment

The UTM is a very powerful abstraction because it is perfectly suited to the automation of all sorts of serial tasks for problem solving. Lovelace and Babbage would have been delighted, but Turing's critics could argue that the UTM, being a sequential computer, cannot be used to simulate [see comments below] real-world problems which require multiple simultaneous computations. Turing's advocates could counter that the UTM is an idealized computer and, as such, can be imagined as having infinite read/write speed. The critics could then point out that, idealized or not, an infinitely fast computer introduces all sorts of logical/temporal headaches since all computations are performed simultaneously, making it unsuitable to inherently sequential problems. As the saying goes, you cannot have your cake and eat it too. At the very least, the TCM should have been extended to include both sequential and concurrent processes. However, having an infinite number of tapes and an infinite number of heads that can move from one tape to another would destroy the purity of the UTM ideal.

The Hidden Nature of Computing

The biggest problem with the UTM is not so much that it cannot be adapted to certain real-world parallel applications but that it hides the true nature of computing. Most students of computer science will recognize that a computer program is, in reality, a behaving machine (BM). That is to say, a program is an automaton that detects changes in its environment and effects changes in it. As such, it belongs in the same class of machines as biological nervous systems and integrated circuits. A basic universal behaving machine (UBM) consists, on the one hand, of a couple of elementary behaving entities (a sensor and an effector) or actors and, on the other, of an environment (a variable).

More complex UBMs consist of arbitrarily large numbers of actors and environmental variables. This computing model, which I have dubbed the behavioral computing model (BCM), is a radical departure from the TCM. Whereas a UTM is primarily a calculation tool for solving algorithmic problems, a UBM is simply an agent that reacts to one or more environmental stimuli. As seen in the figure below, in order for a UBM to act on and react to its environment, sensors and effectors must be able to communicate with each other.
The main point of this argument is that, even though communication is an essential part of the nature of computing, this is not readily apparent from examining a UTM. Indeed, there are no signaling entities, no signals and no signal pathways on a Turing tape or in computer memory. The reason is that, unlike hardware objects which are directly observable, software entities are virtual and must be logically inferred.

Fateful Choice

Unfortunately for the world, it did not occur to early computer scientists that a program is, at its core, a tightly integrated collection of communicating entities interacting with each other and with their environment. As a result, the computer industry had no choice but to embrace a method of software construction that sees the computer simply as a tool for the execution of instruction sequences. The problem with this approach is that it forces the programmer to explicitly identify and resolve a number of critical communication-related issues that, ideally, should have been implicitly and automatically handled at the system level. The TCM is now so ingrained in the collective mind of the software engineering community that most programmers do not even recognize these issues as having anything to do with either communication or behavior. This would not be such a bad thing except that a programmer cannot possibly be relied upon to resolve all the dependencies of a complex software application during a normal development cycle. Worse, given the inherently messy nature of algorithmic software, there is no guarantee that they can be completely resolved. This is true even if one had an unlimited amount of time to work on it. The end result is that software applications become less predictable and less stable as their complexity increases.

Part III: Why the Experts Are Wrong (continued)

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Non-Algorithmic

Monday, October 13, 2008

Why Software Is Bad, Part I

Part I, II, III, IV


I first wrote Why Software Is Bad and What We Can Do to Fix It back in June 2002. It was my first web article on what I think is wrong with computer programming. The ideas that had led to it had been brewing in my head since 1980. I think that now is a good time to revisit it with a critical eye looking through the lens of parallel programming. The next few posts will consist of chosen extracts from the original piece. I may add comments here and there as I go along.

The 'No Silver Bullet' Syndrome

Not long ago, in an otherwise superb article [pdf] on the software reliability crisis published by MIT Technology Review, the author blamed the problem on everything from bad planning and business decisions to bad programmers. The proposed solution: bring in the lawyers. Not once did the article mention that the computer industry's fundamental approach to software construction might be flawed. The reason for this omission has to do in part with a highly influential paper that was published in 1987 by a now famous computer scientist named Frederick P. Brooks. In the paper, titled "No Silver Bullet--Essence and Accidents of Software Engineering", Dr. Brooks writes:

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any--no inventions that will do for software productivity, reliability, and simplicity what electronics, transistors, and large-scale integration did for computer hardware.
No other paper in the annals of software engineering has had a more detrimental effect on humanity's efforts to find a solution to the software reliability crisis. Almost single-handedly, it succeeded in convincing the entire software development community that there is no hope in trying to find a solution. It is a rather unfortunate chapter in the history of programming. Untold billions of dollars and even human lives have been and will be wasted as a result.

When Brooks wrote his famous paper, he apparently did not realize that his arguments applied only to algorithmic complexity. Most people in the software engineering community wrongly assume that algorithmic software is the only possible type of software. Non-algorithmic or synchronous reactive software is similar to the signal-based model used in electronic circuits. It is, by its very nature, extremely stable and much easier to manage. This is evident in the amazing reliability of integrated circuits.
Calling in the lawyers and hiring more software experts schooled in an ancient paradigm will not solve the problem. It will only be costlier and, in the end, deadlier. The reason is threefold. First, the complexity and ubiquity of software continue to grow unabated. Second, the threat of lawsuits means that the cost of software development will skyrocket (lawyers, experts and trained engineers do not work for beans). Third, the incremental stopgap measures offered by the experts are not designed to get to the heart of the problem. They are designed to provide short-term relief at the expense of keeping the experts employed. In the meantime, the crisis continues.

Ancient Paradigm

Why ancient paradigm? Because the root cause of the crisis is as old as Lady Ada Lovelace who invented the sequential stored program (or table of instructions) for Charles Babbage's analytical engine around 1842. Built out of gears and rotating shafts, the analytical engine was the first true general-purpose numerical computer, the ancestor of the modern electronic computer. But the idea of using a step-by-step procedure in a machine is at least as old as Jacquard's punched cards, which were used to control the first automated loom in 1801. The Persian mathematician Muhammad ibn Mūsā al-Khwārizmī is credited for having invented the algorithm in 825 AD, as a problem solving method. The word algorithm derives from 'al-Khwārizmī.'

Part II: Why the Experts Are Wrong

Thursday, October 9, 2008

COSA: A New Kind of Programming, Part VI

Part I, II, III, IV, V, VI


In Part V, I came out against the use of messaging for common communication between components and showed my preference for the flexibility of reactive data sensing and data connectors. In this post, I describe a new approach to task prioritization based on a simple extension to the COSA behavior control mechanism.

Thread-Based Prioritization

One of the few advantages of using multithreading is that it makes it possible to prioritize threads. Even though this capability will not matter much with the advent of processors sporting hundreds or thousands of cores, the fact remains that, whenever processing power is at a premium, it makes sense to allocate more of the processor’s cycles to critical functions that must execute in real time. It goes without saying that thread-based priority scheduling wreaks havoc with deterministic timing, which is one of the reasons that the COSA software model does not use threads.

Message-Based Prioritization

Since COSA is not multithreaded, my initial approach was to use a message-based scheme for task prioritization. Lately, however, I have been having serious doubts about the suitability of messaging for inter-component communication. The approach that I originally had in mind would use a prioritized message queue within a client-server context. High priority messages would simply go to the head of the queue. It then occurred to me that a queued client-server approach makes no sense in an inherently parallel environment. Indeed, why have a serial server processing queued requests one at a time when using multiple parallel server clones could scale in performance as the number of processor cores is increased? Not a good idea.

Behavior-Based Prioritization

I have already explained how basic behavior control is done in COSA. While the Stop command can be used at times to save cycles, this is not its main intended function. Its main function is to prevent conflicts among cooperating components. Besides, an improperly timed Stop command will probably result in failure because intermediate results are discarded. I figure that the best way is to introduce a new control mechanism that can temporarily pause a component. I call it the Pause Effector. Its purpose is to give the system the ability to alleviate the processor’s load so that the more time-critical tasks can be processed in real time. The caveat is that any signal received by the component during its comatose state will be ignored.
The way it works is as follows. When a signal arrives at the ‘Pause’ terminal, the component goes into a coma and all cell activities in the component are suspended. Internally, the system sets a ‘stop’ flag in all the cells that causes the processor to stop processing them. All cells that were about to be processed in the next cycle are placed in a temporary hold buffer. On reception of a ‘Continue’ signal, the system clears the ‘stop’ flag in all the cells and the cells that were in the temporary hold buffer are transferred into the processor’s input buffer for processing. A signal is emitted to indicate that the component is once again active.

Load Manager

At this point, I believe that the Pause effector should not be directly accessible by user applications. Every COSA operating system will have a Load Manager whose job it is to manage processor load according to every component’s load requirement. My current thinking is that only the Load Manager should control the Pause Effector but this may change if a good enough reason is brought to my attention. Again, I will say that I don’t think that task prioritization will be needed in the future with the advent of processors with hundreds and even thousands of cores.

In a future article, I will introduce the COSA Connection Manager, a special COSA system component that makes it possible for a component to modify its programming on the fly. This is essential for certain adaptive applications like neural networks and other machine learning programs.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Monday, October 6, 2008

COSA: A New Kind of Programming, Part V

Part I, II, III, IV, V, VI


In Part IV, I described the overall control architecture used in the COSA programming model and what happens to a component when it receives a ‘Start’ or ‘Stop’ signal. In this installment, I explain why simple reactive data connectors are better than message connectors.

The Problem with Messages

Whenever two or more components cooperate on performing a common task, they must not only be attentive to changes in their own immediate environment, but also, to what the other components are doing. When I first devised the COSA model, I had assumed that sending messages was the best way for components to communicate with one another. I have since changed my mind.

The problem with sending messages is that the message sender has no way of knowing when the message should be sent and, as a result, the receiver has no choice but to query the sender for info. This complicates things because it requires the use of a two-way communication mechanism involving two message connectors or a two-line multi-connector. Alternatively, the sender is forced to send messages all the time whether or not the receiver needs it. I now think that messaging should be used strictly in conjunction with a messenger component dedicated to that purpose. I am also slowly coming to the conclusion that messaging should be used only in a distributed environment between components that reside on separate machines. This means that the COSA pages, in particular, the operating system and software composition pages, are in dire need of a revision. I will expand on this in a future post.

Reactive Data Sensing

The communication method that I prefer is reactive data sensing in which components simply watch one another’s actions within a shared data environment. This way, sharers can immediately sense any change in relevant data and react to it if desired. Since it is up to the receiving components to decide whether or not a change is relevant to them, it makes future extensions easier to implement. Sure, you can have restrictions such as which components have permission to effect changes but the component that owns the data should not impose any restriction on when or whether those changes are detected and used by others. Reactive data sensing is a simple matter of creating sensors (comparison operators) and associating them to relevant effectors. Effector-sensor association is done automatically in COSA.
In the figure above, the dotted line means that the + effector is associated with the != sensor. The way it works is that the != comparison operation is performed automatically every time the effector executes its operation. If the assigned change occurs, the sensor emits a signal.

Control Timing

Reactive sensing makes sense within the context of behavior control. The primary reasons for stopping or deactivating a component are that a) its services are either not needed at certain times (deactivation saves cycles) or b) they would conflict with the work of another component (this prevents errors). The main job of a COSA Supervisor is to precisely time the activation and deactivation of various components under its charge so as to ensure smooth cooperation while eliminating conflicts and improving system performance.

Data Connectors

A data connector is just mechanism for sharing data. The data must reside in only one component, the owner or server. The latter has read/write permission on its own data. A client component that is attached to a data owner via a data connector has only read permission by default. It is up to the component designer to specify whether client components can have write permission as well. The figure below illustrates the use of both data and signal connectors. To the right is the color convention that I currently use for the connectors. Note that control connectors are signal connectors.
The data assigned to a data connector can be a single variable, a list of variables, a C++ like data structure or an array. In a design environment, double-clicking on a data connector will open up a box showing the type of data that is assigned to it and the effector and sensors assigned to each data item, if any.

In Part VI, I will introduce a new COSA high-level behavior control mechanism with two complementary commands, ‘Pause’ and ‘Continue’.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Saturday, October 4, 2008

COSA: A New Kind of Programming, Part IV

Part I, II, III, IV, V, VI

In Part III, I introduced the concept of the control effector, a high-level behavior control mechanism that makes it possible to control a COSA component as if it were a low-level effector. In this post, I describe the overall control architecture used in the COSA programming model and I explain what happens internally to a component under control.

Control Architecture

None of this is chiseled in stone but the way I envisioned it, every high-level component should contain a single master supervisor in charge of one or more slave components. The only components that do not contain a supervisor are low-level components, i.e., components that are composed of cells. It goes without saying that a supervisor is a low-level component. (See Two Types of Components). In the figure below, a component is shown with its supervisor and five slave components. Note that only the control connections are shown for clarity. Normally, the slave components will make data connections with one another and some of the connectors may be visible outside the component. Every one of the slave components may have internal supervisors of their own, if necessary.

Only a supervisor can access the control connector (small red circle) of a slave. The latter cannot access the control connector of its supervisor or that of another slave. The control connector of a supervisor component can only be accessed by an external supervisor component. When a supervisor receives a start or stop signal, it may pass it to the slaves concurrently or in a given sequential order dictated by the design. In an actual development environment, the order in which the components are activated can be shown in slow motion by visually highlighting them in some fashion.

Control Effector

A control effector is a special COSA cell that is used to activate and/or deactivate a low-level component. In a previous illustration (reproduced below) I showed a control effector connected to its 3-input multi-connector. That is the default configuration. This is not a requirement, however. It is up to the designer to decide what to do with the start and stop signals. For example, the component may need to initialize or reset certain variables after starting and before stopping. Or it may do nothing other than outputting a ‘Done’ signal when it receives a ‘Stop’ signal (by routing the ‘Stop’ signal to the ‘Done’ terminal). It also has the option of stopping itself for whatever reason by sending a 'Stop' signal to its control effector.

Stopping a component means to deactivate it so that it can no longer process signals. Deactivating a component is a simple matter of clearing a special activation flag in every cell of the component. This causes the processor to ignore them. A component is fully deactivated only if its control effector receives a 'Stop' signal.


Activating or starting a component is a little more complicated than just resetting the activation flags. Recall that, unlike conventional programming models, COSA is a change-based or reactive model that uses reactive sensing. That is to say, in a COSA program, a comparison operation (i.e., sensor) is not explicitly executed by the program but is invoked automatically whenever there is a change in a data variable that may potentially affect the comparison (See Effector-Sensor Associations). Being a change detector, a sensor must compare the current state of its assigned variable with its previous state. It fires only when there is a correct change. For example, a non-zero sensor fires only if its variable changes from zero to non-zero. The problem is that, when a component is activated, its sensors have no idea what the previous states were. The solution is to set all sensors to their default states upon activation and invoke them immediately afterwards. This way, when a component is activated, all of its sensors perform their assigned tests or comparisons on their assigned data and fire if necessary. A component is fully activated when its control effector receives a 'Start' signal.

In Part V, I will describe reactive data connectors and explain why they are preferable to active message passing.

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional

Thursday, October 2, 2008

COSA: A New Kind of Programming, Part III

Part I, II, III, IV, V, VI


In Part II, I showed that behavior control in the COSA programming model is based on a simple master/slave mechanism that uses complementary start/stop control commands. In this post, I reveal how the same method can be used to control high-level components as well.

Component As Effector

Controlling a component simply means that the component can be seen, for all intents and purposes, as a low-level COSA effector. Every component will have a special control/effector cell connected to a multi-connector with three connections: two input connections for the start and stop signals and one output connection for the ‘done’ signal that is emitted when the component is finished with its task. The control effector can be used both internally and externally.

Connectors are great because they enforce plug-compatibility and make drag-and-drop software composition possible, but they don’t do much for program comprehension. The reason is that part of a connector’s purpose is to hide information so as not to overwhelm the user with too much complexity. My philosophy is that information should be delivered on an as-needed basis only. That being said, it would be nice if we could summons a special view of a group of a component in order to see exactly how they interact together.

An Example

Let’s say we have a water level component that is used to control a pump that maintains water in a tank at a certain level. Suppose we don’t want the pump to be turned on too often for maintenance or costs reasons. To do that, we can use a generic real-time timer cell to wait, say, 30 minutes between activations. We could incorporate the timer cell directly into the water level component but the latter would no longer be a generic component. A better alternative is to create a separate supervisor component that uses the timer cell to control the activation of the water level component. The figure below shows the two components, as they would normally appear.
In a complex program, the supervisor component would normally be extended to control an indefinite number of slave components. Note that, while the figure shows the components involved, it does not tell us how the water level component is controlled. For that, we need a control view. As seen below, the control view is a simple diagram that depicts the manner in which the water level component is controlled by the supervisor. It displays only the control connections; all other connections, if any, are omitted.
Essentially, the water level component emits a signal when it's done. This signal is used to start the delay timer. At the end of the delay, the timer emits a Done signal, which is used to start the water level component and the cycle begins anew.

In Part IV, I will describe the overall control architecture of a COSA component and explain what happens internally to a component when it receives a start or a stop signal.

Related articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: Why the Future Is Compositional