Tuesday, October 21, 2008

Why Software Is Bad, Part III

Part I, II, III, IV

Why The Experts Are Wrong (continued)

Turing's Monster

It is tempting to speculate that, had it not been for our early infatuation with the sanctity of the TCM, we might not be in the sorry mess that we are in today. Software engineers have had to deal with defective software from the very beginning. Computer time was expensive and, as was the practice in the early days, a programmer had to reserve access to a computer days and sometimes weeks in advance. So programmers found themselves spending countless hours meticulously scrutinizing program listings in search of bugs. By the mid 1970s, as software systems grew in complexity and applicability, people in the business began to talk of a reliability crisis. Innovations such as high-level languages, structured and/or object-oriented programming did little to solve the reliability problem. Turing's baby had quickly grown into a monster.

Vested Interest

Software reliability experts (such as the folks at Cigital) have a vested interest in seeing that the crisis lasts as long as possible. It is their raison d'ĂȘtre. Computer scientists and software engineers love Dr. Brooks' ideas because an insoluble software crisis affords them with a well-paying job and a lifetime career as reliability engineers. Not that these folks do not bring worthwhile advances to the table. They do. But looking for a breakthrough solution that will produce Brooks' order-of-magnitude improvement in reliability and productivity is not on their agenda. They adamantly deny that such a breakthrough is even possible. Brooks' paper is their new testament and 'no silver bullet' their mantra. Worst of all, most of them are sincere in their convictions.This attitude (pathological denial) has the unfortunate effect of prolonging the crisis. Most of the burden of ensuring the reliability of software is now resting squarely on the programmer's shoulders. An entire reliability industry has sprouted with countless experts and tool vendors touting various labor-intensive engineering recipes, theories and practices. But more than thirty years after people began to refer to the problem as a crisis, it is worse than ever. As the Technology Review article points out, the cost has been staggering.

There Is a Silver Bullet After All

Reliability is best understood in terms of complexity vs. defects. A program consisting of one thousand lines of code is generally more complex and less reliable than a one with a hundred lines of code. Due to its sheer astronomical complexity, the human brain is the most reliable behaving system in the world. Its reliability is many orders of magnitude greater than that of any complex program in existence (see devil's advocate). Any software application with the complexity of the brain would be so riddled with bugs as to be unusable. Conversely, given their low relative complexity, any software application with the reliability of the brain would almost never fail. Imagine how complex it is to be able to recognize someone's face under all sorts of lighting conditions, velocities and orientations. Just driving a car around town (taxi drivers do it all day long, everyday) without getting lost or into an accident is incredibly more complex than anything any software program in existence can accomplish. Sure brains make mistakes, but the things that they do are so complex, especially the myriads of little things that we are oblivious to, that the mistakes pale in comparison to the successes. And when they do make mistakes, it is usually due to physical reasons (e.g., sickness, intoxication, injuries, genetic defects, etc...) or to external circumstances beyond their control (e.g., they did not know). Mistakes are rarely the result of defects in the brain's existing software.The brain is proof that the reliability of a behaving system (which is what a computer program is) does not have to be inversely proportional to its complexity, as is the case with current software systems. In fact, the more complex the brain gets (as it learns), the more reliable it becomes. But the brain is not the only proof that we have of the existence of a silver bullet. We all know of the amazing reliability of integrated circuits. No one can seriously deny that a modern CPU is a very complex device, what with some of the high-end chips from Intel, AMD and others sporting hundreds of millions of transistors. Yet, in all the years that I have owned and used computers, only once did a CPU fail on me and it was because its cooling fan stopped working. This seems to be the norm with integrated circuits in general: when they fail, it is almost always due to a physical fault and almost never to a defect in the logic. Moore's law does not seem to have had a deleterious effect on hardware reliability since, to my knowledge, the reliability of CPUs and other large scale integrated circuits did not degrade over the years as they increased in speed and complexity.

Deconstructing Brooks' Complexity Arguments

Frederick Brooks' arguments fall apart in one important area. Although Brooks' conclusion is correct as far as the unreliability of complex algorithmic software is concerned, it is correct for the wrong reason. I argue that software programs are unreliable not because they are complex (Brooks' conclusion), but because they are algorithmic in nature. In his paper, Brooks defines two types of complexity, essential and accidental. He writes:


The complexity of software is an essential property, not an accidental one.
According to Brooks, one can control the accidental complexity of software engineering (with the help of compilers, syntax and buffer overflow checkers, data typing, etc...), but one can do nothing about its essential complexity. Brooks then explains why he thinks this essential complexity leads to unreliability:


From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.
This immediately begs several questions: Why must the essential complexity of software automatically lead to unreliability? Why is this not also true of the essential complexity of other types of behaving systems? In other words, is the complexity of a brain or an integrated circuit any less essential than that of a software program? Brooks is mum on these questions even though he acknowledges in the same paper that the reliability and productivity problem has already been solved in hardware through large-scale integration.

More importantly, notice the specific claim that Brooks is making. He asserts that the unreliability of a program comes from the difficulty of enumerating and/or understanding all the possible states of the program. This is an often repeated claim in the software engineering community but it is fallacious nonetheless. It overlooks the fact that it is equally difficult to enumerate all the possible states of a complex hardware system. This is especially true if one considers that most such systems consist of many integrated circuits that interact with one another in very complex ways. Yet, in spite of this difficulty, hardware systems are orders of magnitude more robust than software systems (see the COSA Reliability Principle for more on this subject).

Brooks backs up his assertion with neither logic nor evidence. But even more disturbing, nobody in the ensuing years has bothered to challenge the validity of the claim. Rather, Brooks has been elevated to the status of a demigod in the software engineering community and his ideas on the causes of software unreliability are now bandied about as infallible dogma.

Targeting the Wrong Complexity

Obviously, whether essential or accidental, complexity is not, in and of itself, conducive to unreliability. There is something inherent in the nature of our software that makes it prone to failure, something that has nothing to do with complexity per se. Note that, when Brooks speaks of software, he has a particular type of software in mind:


The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions.
By software, Brooks specifically means algorithmic software, the type of software which is coded in every computer in existence. Just like Alan Turing before him, Brooks fails to see past the algorithmic model. He fails to realize that the unreliability of software comes from not understanding the true nature of computing. It has nothing to do with the difficulty of enumerating all the states of a program. In the remainder of this article, I will argue that all the effort in time and money being spent on making software more reliable is being targeted at the wrong complexity, that of algorithmic software. And it is a particularly insidious and intractable form of complexity, one which humanity, fortunately, does not have to live with. Switch to the right complexity and the problem will disappear.

Next: Part IV, The Silver Bullet

Related Articles:

How to Solve the Parallel Programming Crisis
Parallel Computing: The End of the Turing Madness
Parallel Computing: Why the Future Is Synchronous
Parallel Computing: Why the Future Is Reactive
Why Parallel Programming Is So Hard
Parallel Programming, Math and the Curse of the Algorithm
The COSA Saga

3 comments:

Louis Savain said...

Over the years, I have received a lot of criticism over the above but guess what? After reading it over, I stand by everything I wrote.

Matthew B. Richards said...

I think the best point you made in this article was the Brooks contradicts himself. Can't have it both ways, but people always tend to focus on what they want to believe, rather than what is real.

VicDiesel said...

I've been reading your posts, and while they are intriguing, I think you're doing a much better job of pointing out the failings in the existing approach to complexity (in particular parallelism) than of outlining the solution. But I'll keep reading, and will follow some of your pointers to COSA.

Just a thought for now: our brains maybe better than computers in any number of ways, but we don't use them to compute the same sort of things. Our brains do complicated pattern recognition, our computers solve coupled partial differential equations, for instance to design a supersonic plane.

It is not obvious to me at all that the computations of mathematical physics (which is what the heap big parallel computers mostly do) can be done with brain-like architecture.