The Silver Bullet
The Billion Dollar Question
The billion (trillion?) dollar question is: What is it about the brain and integrated circuits that makes them so much more reliable in spite of their essential complexity? But even more important, can we emulate it in our software? If the answer is yes, then we have found the silver bullet.
Why Software Is Bad
Algorithmic software is unreliable because of the following reasons:
- Brittleness
An algorithm is not unlike a chain. Break a link and the entire chain is broken. As a result, algorithmic programs tend to suffer from catastrophic failures even in situations where the actual defect is minor and globally insignificant. - Temporal Inconsistency
With algorithmic software it is virtually impossible to guarantee the timing of various processes because the execution times of subroutines vary unpredictably. They vary mainly because of a construct called ‘conditional branching’, a necessary decision mechanism used in instruction sequences. But that is not all. While a subroutine is being executed, the calling program goes into a coma. The use of threads and message passing between threads does somewhat alleviate the problem but the multithreading solution is way too coarse and unwieldy to make a difference in highly complex applications. And besides, a thread is just another algorithm. The inherent temporal uncertainty (from the point of view of the programmer) of algorithmic systems leads to program decisions happening at the wrong time, under the wrong conditions. - Unresolved Dependencies
The biggest contributing factor to unreliability in software has to do with unresolved dependencies. In an algorithmic system, the enforcement of relationships among data items (part of what Brooks defines as the essence of software) is solely the responsibility of the programmer. That is to say, every time a property is changed by a statement or a subroutine, it is up to the programmer to remember to update every other part of the program that is potentially affected by the change. The problem is that relationships can be so numerous and complex that programmers often fail to resolve them all.
Brains and integrated circuits are, by contrast, parallel signal-based systems. Their reliability is due primarily to three reasons:
- Strict Enforcement of Signal Timing through Synchronization
Neurons fire at the right time, under the right temporal conditions. Timing is consistent because of the brain's synchronous architecture (**). A similar argument can be made with regard to integrated circuits. - Distributed Concurrent Architecture
Since every element runs independently and synchronously, the localized malfunctions of a few (or even many) elements will not cause the catastrophic failure of the entire system. - Automatic Resolution of Event Dependencies
A signal-based synchronous system makes it possible to automatically resolve event dependencies. That is to say, every change in a system's variable is immediately and automatically communicated to every object that depends on it.
Programs as Communication Systems
Although we are not accustomed to think of it as such, a computer program is, in reality, a communication system. During execution, every statement or instruction in an algorithmic procedure essentially sends a signal to the next statement, saying: 'I'm done, now it's your turn.' A statement should be seen as an elementary object having a single input and a single output. It waits for an input signal, does something, and then sends an output signal to the next object. Multiple objects are linked together to form a one-dimensional (single path) sequential chain. The problem is that, in an algorithm, communication is limited to only two objects at a time, a sender and a receiver. Consequently, even though there may be forks (conditional branches) along the way, a signal may only take one path at a time.
My thesis is that this mechanism is too restrictive and leads to unreliable software. Why? Because there are occasions when a particular event or action must be communicated to several objects simultaneously. This is known as an event dependency. Algorithmic development environments make it hard to attach orthogonal signaling branches to a sequential thread and therein lies the problem. The burden is on the programmer to remember to add code to handle delayed reaction cases: something that occurred previously in the procedure needs to be addressed at the earliest opportunity by another part of the program. Every so often we either forget to add the necessary code (usually, a call to a subroutine) or we fail to spot the dependency.
Event Dependencies and the Blind Code Problem
The state of a system at any given time is defined by the collection of properties (variables) that comprise the system's data, including the data contained in input/output registers. The relationships or dependencies between properties determine the system's behavior. A dependency simply means that a change in one property (also known as an event) must be followed by a change in one or more related properties. In order to ensure flawless and consistent behavior, it is imperative that all dependencies are resolved during development and are processed in a timely manner during execution. It takes intimate knowledge of an algorithmic program to identify and remember all the dependencies. Due to the large turnover in the software industry, programmers often inherit strange legacy code which aggravates the problem. Still, even good familiarity is not a guarantee that all dependencies will be spotted and correctly handled. Oftentimes, a program is so big and complex that its original authors completely lose sight of old dependencies. Blind code leads to wrong assumptions which often result in unexpected and catastrophic failures. The problem is so pervasive and so hard to fix that most managers in charge of maintaining complex mission-critical software systems will try to find alternative ways around a bug that do not involve modifying the existing code.
Related Articles:
How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga
More complex UBMs consist of arbitrarily large numbers of actors and environmental variables. This computing model, which I have dubbed the behavioral computing model (BCM), is a radical departure from the TCM. Whereas a UTM is primarily a calculation tool for solving algorithmic problems, a UBM is simply an agent that reacts to one or more environmental stimuli. As seen in the figure below, in order for a UBM to act on and react to its environment, sensors and effectors must be able to communicate with each other.
The main point of this argument is that, even though communication is an essential part of the nature of computing, this is not readily apparent from examining a UTM. Indeed, there are no signaling entities, no signals and no signal pathways on a Turing tape or in computer memory. The reason is that, unlike hardware objects which are directly observable, software entities are virtual and must be logically inferred.
The way it works is as follows. When a signal arrives at the ‘Pause’ terminal, the component goes into a coma and all cell activities in the component are suspended. Internally, the system sets a ‘stop’ flag in all the cells that causes the processor to stop processing them. All cells that were about to be processed in the next cycle are placed in a temporary hold buffer. On reception of a ‘Continue’ signal, the system clears the ‘stop’ flag in all the cells and the cells that were in the temporary hold buffer are transferred into the processor’s input buffer for processing. A signal is emitted to indicate that the component is once again active.
In the figure above, the dotted line means that the + effector is associated with the != sensor. The way it works is that the != comparison operation is performed automatically every time the effector executes its operation. If the assigned change occurs, the sensor emits a signal.
The data assigned to a data connector can be a single variable, a list of variables, a C++ like data structure or an array. In a design environment, double-clicking on a data connector will open up a box showing the type of data that is assigned to it and the effector and sensors assigned to each data item, if any.
Only a supervisor can access the control connector (small red circle) of a slave. The latter cannot access the control connector of its supervisor or that of another slave. The control connector of a supervisor component can only be accessed by an external supervisor component. When a supervisor receives a start or stop signal, it may pass it to the slaves concurrently or in a given sequential order dictated by the design. In an actual development environment, the order in which the components are activated can be shown in slow motion by visually highlighting them in some fashion.
Deactivation
Connectors are great because they enforce plug-compatibility and make drag-and-drop software composition possible, but they don’t do much for program comprehension. The reason is that part of a connector’s purpose is to hide information so as not to overwhelm the user with too much complexity. My philosophy is that information should be delivered on an as-needed basis only. That being said, it would be nice if we could summons a special view of a group of a component in order to see exactly how they interact together.
In a complex program, the supervisor component would normally be extended to control an indefinite number of slave components. Note that, while the figure shows the components involved, it does not tell us how the water level component is controlled. For that, we need a control view. As seen below, the control view is a simple diagram that depicts the manner in which the water level component is controlled by the supervisor. It displays only the control connections; all other connections, if any, are omitted.
Essentially, the water level component emits a signal when it's done. This signal is used to start the delay timer. At the end of the delay, the timer emits a Done signal, which is used to start the water level component and the cycle begins anew.