[I think the message in this article on software reliability is so important to society that I will repost it about once a month until it sinks into the collective consciousness of the computer industry.]
Software unreliability is a monumental problem. Toyota's brake pedal troubles are just the tip of the iceberg. Yet, the solution is so simple that I am almost tempted to conclude that computer scientists are incompetent. As I showed in my previous post, the usual 'no silver bullet' excuse (Brooks's excuse) for unreliable code is bogus. Contrary to Fred Brooks's claim in his famous No Silver Bullet paper, it is not necessary to enumerate every state of a program to determine its correctness. What matters is the set of conditions or temporal expectations that dictate the program's behavior. Timing is fundamental to the solution. Below, I expand on my thesis by arguing that the computer can in fact automatically discover everything that may go wrong in a complex program even if the programmer overlooks them. Please read Unreliable Software, Part I-III before continuing.
Expectations and Abnormalities
Jeff Voas, a software reliability expert and a co-founder of Cigital, once said, "it's the things that you never thought of that get you every time." Voas is not in any hurry to see a solution to the unreliability problem because he would be out of a job if that happened. Still, I agree with him that it is observably true that the human mind cannot think of everything that can go wrong with a complex software system but (and this is my claim) the computer is not so limited. It is because the computer has a certain advantage over the human brain: it can do a complete exhaustive search of what I call the expectation space of a computer program. The latter has to do with all the possible decision pathways that might occur within a program as a result of expected events.
A billion mathematicians jumping up and down and foaming at the mouth notwithstanding, software is really all about stimuli and responses, or actions and reactions. That function calculation stuff is just icing on the cake. Consider that every decision (reaction) made by a program in response to a sensed event (a stimulus) implicitly expects a pattern of sequential and/or simultaneous events to have preceded the decision. This expected temporal signature is there even if the programmer is not aware of it. During the testing phase, it is easy for a diagnostic subprogram to determine the patterns that drive decisions within the application under test. It suffices to exercise the application multiple times to determine its full expectation pattern. Once this is known, it is even more trivial for the subprogram to automatically generate abnormality sensors that activate in the event that the expectations are not met. In other words, the system can be made to think of everything even if the programmer is not thorough. Abnormality sensors can be automatically connected to an error or alarm component or to a component constructed for that purpose. The system should then be tested under simulated conditions that force the activation of every abnormality sensor in order to determine its robustness under abnormal conditions.
Learn to Relax and Love the Complexity
The above will guarantee that a program is 100% reliable within its scope. The only prerequisite to having a diagnostic subprogram like the one I described is that the software model employed must be synchronous and reactive. This insures rock-solid deterministic program behavior and timely reactions to changes, which are the main strengths of the COSA software model. The consequences of this are enormous for the safety-critical software industry. It means that software developers no longer need to worry about bugs in their programs as a result of complexity. This way, adding new functionality to a system makes it even more robust and reliable. Why? Because new functionality cannot break the system's existing expectations without triggering an alarm. They must conform to the functionality that is already in place. Expectations are like constraints and the more complex a program is, the more constraints it has. We can make our programs as complex as necessary without incurring a reliability penalty. So there is no longer any reason to not have a completely automated mass transportation or air traffic control system.
This is the part where I step on my soapbox and start yelling. This blog is read everyday by academics from various institutions around the world and from research labs in the computer industry. I know, I have the stats. If you are a computer scientist and you fail to act on this information, then you are a gutless coward and an asshole, pardon my French. Society should and probably will hold you personally responsible for the over 40,000 preventable traffic fatalities on U.S. roads alone. You have no excuse, goddammit.
Why the FAA's Next Generation Air Traffic Control System Will Fail
Computer Scientists Created the Parallel Programming Crisis
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Half a Century of Crappy Computing
Why Software Is Bad and What We can Do to Fix It
The COSA Operating System