Sunday, September 30, 2007

Korea Advanced Institute Of Science And Technology

All right. You guys at KAIST (Daejeon, Seoul) have been hitting the Silver Bullet and Project COSA pages almost daily for the last few weeks. KAIST has a solid reputation for top-notch research and education in Asia and the rest of the world. What are you working on that involves Project COSA? Is this a class assignment or what? Talk to me.

Saturday, September 29, 2007

Functional Programmers Encourage Crappy Parallel Computing

All Functional Programming Articles

The Computer Is a Behaving Machine, Not a Calculator

The idea of a computer as a calculating machine has a long history. Ever since the Persian mathematician Muhammad ibn Mūsā al-Khwārizmī invented the algorithm (the word algorithm derives from 'al-Khwārizmī) in 825 AD as a problem solving method, people have dreamt of creating machines that could perform long and tedious calculation sequences automatically. Charles Babbage was the first to design and build (partially) such a machine. Babbage’s friend and mathematician, Lady Ada Lovelace, became the world’s first programmer for having written the first computer program (table of instructions) for Babbage’s analytical engine. I have nothing against mathematicians wanting to build function calculators. But I do have a problem with mathematicians wanting to force the world to adopt their antiquated concept of what a general purpose computer should be, especially in the 21st century. The computer is not a calculator, for crying out loud. It is a behaving machine. Calculating is just one of the many types of behaviors that computers can perform.

A Computing Model for the 21st Century

The idea of a computer as a machine for calculating sequences of operations (algorithms or threads) is fundamentally wrong. It is the primary reason that the computer industry is in the mess that it is: software is buggy, expensive and hard to develop. The problem with software will disappear only when the computer industry shakes its addiction to the algorithmic software model and wakes up to the fact that a computer is not a calculator. A computer program should be seen as a collection of elementary parallel, synchronous, reactive, behaving entities that use signals to communicate with each other. The algorithmic model has served us well in the last century but now that the industry is transitioning from sequential computing to massive parallelism, it is time to switch to a new model, one that is worthy of the 21st century.

Thread Monkeys

If you think that functional languages like Erlang or Haskell should be used for concurrent programming, you are a thread monkey. You are a hindrance to progress in computer technology. Why? Because you are encouraging multicore CPU manufacturers like Intel, AMD, IBM, Sun Microsystems, Freescale Semiconductor, Tilera and others, to continue to make multicore CPUs that support coarse-grain, thread-based parallelism at a time when they should be trying to build fine-grain, auto-scalable and self-balancing multicore CPUs. You people are promoting what I have been calling, the age of crappy concurrency.

Inadequacy of Functional Programming

The market wants super fast multicore CPUs and operating systems that support fine-grain parallel computing. It wants systems that are bug-free and easy to program. Do functional languages provide what the market wants? Not even close, in my opinion. Sure, they are a better choice for thread-based parallelism than imperative languages but, as I explained in a previous article, they lack something that is essential to reliable software and that is the ability to automatically find and resolve data dependencies, a must for reliability. This feature can only be implemented in a purely reactive, deterministic and synchronous system. And please, don’t give me the crap about Erlang being great for writing reliable concurrent programs. The whole idea behind concurrency in Erlang, as stated by Joe Armstrong himself, is to provide a mechanism for fault tolerance. Fault tolerance assumes unreliability. It does not prevent it. FP is inadequate for safety-critical applications for this reason.

In conclusion, my advice to FP fanatics is to promote functional languages for what they do best, solving math functions. Do not advertise FP as the solution to the parallel programming problem. It is not. Crappy parallelism, yes. But true fine-grain parallelism, I think not.

Thursday, September 27, 2007

How to Make Computer Geeks Obsolete: The Future of Software Design

Charles Simonyi

I just finished reading a very interesting article over at MIT Technology Review about former Microsoft programming guru and billionaire, Charles Simonyi. Essentially, Simonyi, much like everyone else in the computer business with a head on their shoulders, realized that there is something fundamentally wrong with the way we construct software. So, while working at Microsoft, he came up with a new approach called intentional programming to attack the problem. Seeing that his bosses at Microsoft were not entirely impressed, Simonyi quit his position and founded his own company, Intentional Software Corporation, to develop and market the idea. It’s been a while, though. I am not entirely sure what’s holding things up at Intentional but methinks they may have run into a brick wall and, knowing what I know about Simonyi’s style, he is probably doing some deconstruction and reconstruction.

Sorry, Charlie, Geeks Love the Dark Ages

There is a lot of secrecy surrounding the project but, in my opinion, Simonyi and the folks at Intentional will have to come around to the conclusion that the solution will involve the use of graphical tools. At this week’s Emerging Technology Conference at MIT, Simonyi tried to convince programmers to leave the Dark Ages (LOL), as he put it. His idea is to bring the business people (i.e., the domain experts) into software development. I applaud Simonyi’s courage but my question to him is this, if your goal is to turn domain experts into developers, why give a talk at a techie conference? The last thing a computer geek wants to hear is that he or she may no longer be needed. In fact, based on my own personal experience, the geeks will fight Simonyi every step of the way on this issue. Ironically enough, geeks are the new luddites of the automation age. Unfortunately for the geeks but fortunately for Simonyi, he is not exactly looking for venture capital. With about a billion dollars in his piggy bank, a mega-yacht in the bay and Martha Stewart at his side, the man can pretty much do as he pleases.

The Future of Software Development

In my opinion, Simonyi does not go far enough. In his picture of the future of software development, he sees the domain expert continuing to work side by side with the programmer. In my picture, by contrast, I see only the domain expert gesturing in front of one of Jeff Han’s multi-touch screens and speaking into a microphone. The programmer is nowhere to be seen. How can this be? Well, the whole idea of automation is to make previous expertise obsolete so as to save time and money, right? Programmers will have joined blacksmiths and keypunch operators as the newest victims of the automation age. Sorry. I am just telling it like I see it. But don't feel bad if you're a programmer because, eventually, with the advent of true AI, even the domain expert will disappear from the picture.

Intentional Design vs. Intentional Programming

The way I see it, future software development will be strictly about design and composition. Forget programming. I see a software application as a collection of concurrent, elementary behaving entities organized into plug-compatible modules that communicate via message connectors. Modules are like pieces in a giant picture puzzle. The main difference is that modules are intelligent: they know how to connect to one another. For example, let’s say you are standing in front of your beautiful new multi-touch screen and you are composing a new business application. Suppose you get to a point where you have some floating point data that you want the program to display as a bar graph. You simply say “give me bar graph display module” into the microphone. Problem is, there are all sorts of bar graph display modules available and the computer displays them all on the right side of the screen. No worry. You simply grab all of them with your right hand and throw them into your app space like confetti driven by the wind. And, lo and behold, the one that is compatible with your data magically and automatically connects itself to your app and voila! You smile and say “clean up!” and all the incompatible modules disappear, as if by magic. You suddenly remember Tom Cruise’s character, John Anderton, in the movie Minority Report and you can barely keep from laughing. Creating software is so much fun! This tiny glimpse of the future of software development is brought to you by Project COSA.

In conclusion, my advice to Charles Simonyi is to start thinking in terms of reactive, plug-compatible parallel objects and to get somebody like Jeff Han on board. Also, stop trying to convince the geeks.

Wednesday, September 26, 2007

COSA and Macromedia Director MX 2004™

Religious Fervor

My motivation in getting the COSA software model adopted by the computer industry is due mainly to my interest in artificial intelligence. We are going to need fast, reliable, easy to program and powerful parallel computers for the coming AI revolution. The current crop of multicore CPUs leaves a lot to be desired, in my opinion. We need auto-scalable, self-balancing CPUs that support fine-grain (no more coarse-grain, thread-based CPUs, please) parallelism using an MIMD execution model. I had always thought that my arguments in favor of adopting a non-algorithmic software model (and a new CPU architecture to support the new model) would be enough to galvanize interest in the developer community. I was wrong. Sure, a handful of people write to tell me that they agree with me but the groundswell of enthusiasm that I had hoped for did not materialize. It turns out that I had badly underestimated the religious fervor of the attachment that computer geeks feel vis-à-vis the current paradigm, especially vis-à-vis their favorite programming languages.

I Don't Like Computer Geeks (LOL)

This blog and Project COSA do generate a lot of interest from around the world but as soon as the loud atheist majority within the computer geek community finds out about my religious beliefs (in this regard, my position is simple: if my religious beliefs bother you, then don't read my blog, goddamnit; it's not meant for you) and about my views on the rampant crackpottery that I see in the physics community (“chicken feather voodoo physics” is one my favorite putdowns, LOL), they use it to attack me personally and brand me as some sort of religious nut or a crank. Not that I care, mind you (the assholes do not put food on my table, thank God), but it only serves to slow the progress of Project COSA. Many of my readers write to advise me that I should write a prototype COSA Editor but, frankly, I am too busy with my AI project to spend much time writing code in order to convince a bunch of computer geeks of the soundness of the COSA model. First, I don’t think that will do it and second, I have a low opinion of computer geeks in general. I want clear thinking software and hardware professionals on my side, not a bunch of know-it-all grownup nerds who get all excited about Star-Trek physics crap like multiple universes, brain uploads and time travel through wormholes.

Macromedia Director

With this in mind, I am investigating whether or not Adobe’s Macromedia Director is a good tool with which to quickly develop a COSA Development Studio (CDS). My idea is to use Director to build the user interface and app generator and use C++ or C# to create a small and fast COSA kernal/virtual machine to run the apps. Again, I can’t spend much time on this. I am hoping that I'll be able to start the project and then release it so that others can continue to work on it. I’m going to play with Director for a few days. I’ll let you know what I think. Later.

Monday, September 24, 2007

Why Functional Languages Should Not Be Used for Safety-Critical Applications

All Functional Programming Articles

I Actually Like Functional Languages But With a Caveat

This may come as a surprise to my enemies in the functional programming community but I happen to like functional languages. Their power of expression cannot be denied. I especially like Mathematica, if for nothing else than its beauty alone. My bone of contention with FP proponents is not that FP is not useful, but that it should not be used for complex, real-time, safety-critical and automation-type systems. I believe that FP should be used mainly for solving mathematical functions for which it is ideally suited. I have used words like ‘crap’ and ‘abomination’ to characterize FP but that’s mostly for effect. It bothers me that FP proponents are championing functional languages (especially Erlang) as the solution for the parallel programming problem. It is not. Erlang is encouraging coarse-grain, inherently non-deterministic, thread-based parallelism at a time when the computer industry should be shooting for fine-grain deterministic parallelism, an essential requirement for safe and reliable software systems. I also want to draw attention to the fact that FP is not a software model. A functional language is mainly a specialized problem-solving tool that sits on top of an existing model which happens to be the algorithmic software model. As such, it inherits the principal drawback of algorithms, unreliability.

FP and Safety-Critical Applications

There is a real danger in thinking that FP even comes close to being a silver bullet for the software reliability problem. If it was, there would be no reason to advertise Erlang as a language for the construction of fault-tolerant concurrent software systems. Fault tolerance subsumes the likely presence of hidden faults, does it not? Note that I am not saying that there is anything wrong with fault tolerance, as long as we are talking about hardware faults. My problem is with software faults. They bother me. They do because I, unlike Joe Armstrong, the main inventor of Erlang, don’t think they are inevitable. The pundits may insist over and over that unreliability is an inherent characteristic of complex software systems but, from my standpoint, they are referring only to algorithmic software. There is no doubt in my mind that, should the computer industry decide to abandon its long sordid love affair with the algorithmic model and settles down with a nice, non-algorithmic, synchronous reactive model instead, the reliability and productivity crisis would simply vanish.

Finding and Resolving Data Dependencies

The irony of it all is that the very thing that FP gurus are railing against (the use of variables) turns out to be its Achilles’ heel. I understand their rationale. The use of variables introduces unwanted side effects that invariably lead to software failure. However, as I argued in my previous article, getting rid of variables is not the answer. Besides, FP does not really get rid of variables, it replaces them with functions and these keep their changing values on the stack. It just so happens that traditional variables are part of the solution to a vexing problem in software systems. I am talking about the discovery and resolution of data dependencies. Traditionally, this is done by the programmer but, and this is my main point, it does not have to be. It makes no difference whether or not one is using declarative or imperative languages, it is easy to miss dependencies in a complex software system. This is especially true if the programmer is in charge of maintaining a legacy system that he or she is not familiar with. Any addition or modification can potentially introduce an unwanted side effect.

Using Variables to Automatically Find and Resolve Data Dependencies

A data dependency exists whenever one part of a program depends on data changes occurring in another part. Keeping dependent objects up-to-date with the latest changes must be done in a timely fashion. For example, the movements of the mouse or the clicking of a button must be communicated immediately to every target program or module that needs them. What is nice about this is that we can run a simulation program that generates mouse clicks and movements without having to modify the target programs. This is an example of dependencies being resolved automatically in a reactive manner. Let me take it a bit further. Any time you write code to perform a comparison operation on a data item, a new data dependency is born. If you add new code that modifies the data but you forget to invoke the comparison operation in a timely manner, you have a situation where a part of the program is unaware of the change and continues to operate under a false assumption. Failure is bound to ensue. Like I said, this is a major problem when maintaining complex legacy systems. There is only one way to solve the problem and that is to adopt a non-algorithmic, reactive, synchronous software model and allow the use of variables. The idea is to associate every effector (operator) that changes a variable with every sensor (comparator) that may be affected by the change. This can be done automatically in a reactive system, eliminating the problem altogether. You can read the fine details elsewhere.

The FP Data Dependency Problem

The problem is that FP forbids the use of variables in functions and FP systems are not reactive. So the question is, how can FP automatically resolve data dependencies? Answer, it cannot. This is the reason, in my opinion, that FP is not suited for safety and mission-critical applications like avionics, air traffic control, power plant control, medical and financial systems, etc… In other words, use it at your own risk.

Sunday, September 23, 2007

Functional Programming Is Worse Than Crap: State Changes Are Essential to Reliability

Geekism Is a Menace to Society

All Functional Programming Articles

(Skip this rant if you’re only interested in my criticism of FP)

I had assumed that functional programming was a mere harmless sideshow conducted by a bunch of math nerds pretty much for their own benefit and entertainment. Now that I had a little bit more time to think about it and after discovering (from doing a few searches on Google) that FP fanatics have been frantically promoting FP for use in safety-critical applications, I must say that I am becoming rather alarmed. Those geeks have found themselves a new religion to latch onto and have taken to proselytizing with a passion akin to that of Catholic missionaries in the days after Columbus stepped foot in the new world. They are convinced that they have found the holy grail of computing and will not consider any argument to the contrary. Any criticism is seen as a threat to the religion. In a sense, this is understandable since the religion has become their bread and butter.

In my opinion, computing has become too much a vital part of our daily lives to be entrusted entirely to a bunch of nerds whose main goal in life is to convince others of the superiority of their gray matter. Geekism is a threat to national security and society at large because it aims to create an elite class of individuals who consider themselves above public scrutiny and even above scrutiny from the government. In fact, they look down on the general public, not unlike the high priests of ancient religions. There is a need for checks and balances.

The Jihad Against States and State Changes

One of the stated goals of FP is that side effects between different parts of a program should be eliminated because they are inherently harmful. Never mind, for now, that we are already in crackpot territory (see below) but the FP fanatic’s prescription for avoiding side effects is a little strange, to say the least. They want to eliminate variables (changeable states) from programming altogether. And how do they propose to do this amazing feat, pray tell, given that computing is all about sensing and effecting changes? Well, never underestimate a geek’s capacity for self-deception. Their bright idea is that, whenever a variable must be changed, a new variable should be initialized and used in the place of the old one. Huh? Now, hold on a second. Run that by me one more time, por favor. How does creating a new variable to replace the old one not constitute a state change, pray tell? Aren’t these newly created immutable variables used as arguments to other functions so as to generate a new effect? When a function uses a new set of arguments that is not equal in value to the previous set, does that not constitute a change of state? And how is that any different from changing the value of the previous set?

Crackpot Territory

What most FP theorists fail to explain is that, in FP, the function itself is the variable. The variable value of functions are kept on the stack and are used as arguments for other functions. One function affects another. Insisting that there are no variables and thus no side effects in FP is wishful thinking at best and crackpottery at worst. The side effects are obvious to anybody who is willing to look. Certainly, you may claim that there is no side effects within a function but so what? It remains that one part of a program will affect another and that is unavoidable. Dependencies are a fact of life in software. The point I am driving at is that FP harbors a fundamental flaw in the way it deals with dependencies. Let me explain.

It often happens that a programmer is given the job of adding new functionality to a legacy program. It may happen that the programmer creates a new function to operate on a list but forgets to update an existing function with the new list? Worse, he or she may not even be aware of the existence of the other function or the need to update it. This is what I call a hidden side effect or blind code. That is to say, code that is unaware of changes that are relevant to itself and the proper functioning of the program. Publish and subscribe is not the answer. First, the programmer has to remember to subscribe and second, if the programmer is not familiar with the code, he or she may have no idea that a new subscription may be needed. What is needed is a software system that automatically discovers and resolves all data dependencies in a program. And, unfortunately for the FP lobby, this little bit of magic can only be performed with the use of shared mutable variables.

Why I Came to Love State Changes and Their Side Effects

Changes and effects are unavoidable characteristics of software. This is a given. The way I see it, the problem with traditional imperative languages is not that they allow side effects from state changes but that they are neither reactive nor deterministic. There are good side effects and bad side effects. A bad side effect is one that is either not propagated at the right time or not propagated at all, thus leading to false assumptions. A good side effect is one that is propagated immediately to every part of the program that is dependent on or affected by the change. A good side effect is one that cannot be ignored by the software designer because the system will not allow it. If the effect is unwanted, this is immediately obvious and the designer must take steps to correct the problem. The automatic discovery and resolution of state (data) dependencies is a must for reliable software. It can only be done in a purely reactive and deterministic system that uses named variables. For more on this issue, see how COSA handles blind code elimination.

Temporal determinism is essential to software reliability. It means that the execution order (concurrent or sequential) of reactions to changes is guaranteed to remain the same throughout the lifetime of the system. Deterministic control over the precise timing of changes is the only way to prevent deadlocks and unwanted side effects in a parallel system that allows the use of shared variables. There is more to it than that, though. Since the execution order of changes is expected to be stable, any violation should be seen as a potential defect. Timing watchdogs can be inserted in various places within a deterministic program in order to sound an alarm in the event of a violation. Deterministic timing can be used for security purposes but it primary use is for insuring reliability. It forces the designer to investigate any change to the timing of previously created software modules. Furthermore, a timing change may introduce a motor conflict, i.e., a situation where a data object is modified by multiple operators simultaneously. Motor conflicts are temporal in nature and are detectable only in deterministic system.

Why Functional Programming Is an Abomination

FP fanatics love to boast that FP is perfectly suited for parallel computers because, unlike threads in conventional programs, pure functions are free of side effects. By assigning functions to threads and using message passing between threads, one can create a thread-safe concurrent system. This is all fine and dandy but what the FP crowd conveniently forget to mention is that there are major drawbacks to using FP in a parallel environment.

  1. FP forces the use of a particular type of parallelism, thread-based parallelism. In other words, if your computer uses fine-grain parallelism (this is the future of computing, you can bet on it), FP will not support it because functions are inherently algorithmic. So if you want to implement a fast, fine-grain, parallel QuickSort routine, you might as well forget using FP. The widespread adoption of FP will only result in encouraging what I call the age of crappy concurrency.
  2. FP programs are notoriously wasteful of CPU cycles because they do not allow the use of shared variables or data. A collection of data can only be modified by one thread at a time. The entire collection must be copied onto a message queue and sent to another function/thread. FP proponents are afraid of side effects because they don't know how to handle them safely.
  3. The structure of FP is such that an automatic mechanism for the discovery and resolution of data dependencies cannot be implemented. The reason is that such a mechanism is only possible in a system that uses variables. This means that additions to legacy FP programs can introduce hidden dependency problems that are potentially catastrophic. FP is not suited for safety-critical applications for this reason.
  4. FP is non-deterministic. This is a serious flaw, in my opinion, because the precise execution order of changes, a must for reliable software, is not guaranteed. This is especially true in a thread-based parallel environment.

Invasion of the Mind Snatchers

(Another rant)

People ask me why I use pejorative qualifiers like “crap” and “geeks” in my arguments. The answer is simple. I use them for psychological effect. Computer geeks are notoriously political animals who cannot fathom that others may look down on their supposed intellectual superiority. Nothing infuriates them more than a total lack of respect. It’s like throwing holy water on vampire. But unlike a vampire, they want the minds and admiration of the populace, not their blood. Realize that I have nothing to hide. I am engaged in a battle to get the computer industry to adopt what I believe to be the correct approach to computing. It’s an uphill battle because I am asking for nothing less than the reinvention of the computer, both from a hardware and a software point of view. My main opposition comes from computer geeks. Why do they oppose me? Because their livelihoods are at stake, that’s why. If my approach is adopted, they become irrelevant. And the more famous a computer geek is, the more he feels threatened by any paradigm that may supplant the one he or she is championing. They’ll defend their turf with everything they got. I, on the other hand, am the fearless and relentless barbarian at the gate, the infidel climbing the makeshift ladder, threatening to overthrow the ramparts. My advice to the geeks is, give up now and join me before it's too late. It’s only a matter of time before the walls come crumbling down. I am in top shape and I've just begun to fight. ahahaha...

Next: Why Functional Languages Should Not Be Used for Safety-Critical Applications

Thursday, September 20, 2007

Why I Think Functional Programming Languages Like Erlang and Haskell are Crap

All Multicore and Parallel Programming Articles
All Erlang Related Articles

Computing Is About Behaving, Not Calculating

The main problem with functional programming is that it was created by mathematicians for mathematicians. In the minds of mathematicians, computation is synonymous with function evaluation. They see the computer strictly as a glorified calculator, a machine for the evaluation of math problems. My position, by contrast, is that computing is not about function evaluation at all but about behavior. In other words, a computer is a behaving machine. Let me explain the difference.

Hidden Variables and Side Effects

Functional programming proponents love to talk about how their style of programming avoids states and state changes. What they fail to tell you is that functions are high-level constructs and that the compiler creates, initializes and manipulates hidden states internally on your behalf. For example, here’s a C function that calculates the sum of integers from 1 to 10:

sum = 0;
for (i=1; i<=10; ++i)
sum += i;
And here is how it’s done in Haskell:

Now, the Haskell fanatics will tell you that there was no variable updates in the sum function. Don’t you believe it! There is no way the compiler can add a list of integers without updating at least one variable. The language simply hides it. You may say, so what? As long as the programmer does not have to deal with variables, it prevents side effects, right? Wrong. The worst side effects in programming have to do with timing and unless you can precisely control how and when those variables are changed, you cannot control timing.
sum [1..10]

Temporal Side Effects

Let’s say you encapsulate a simple sum function in an Erlang process. You can then invoke the process by sending it a message. Subsequently, the process will send another message with the result back to the original sender. Never mind, for now, that Erlang stupidly requires the entire list of integers to be copied onto a message queue and that the receiving process creates a new variable, which is copied onto another queue to be returned to the sender (what an unnecessary waste of CPU cycles!). The point I am getting at is that the time between sending and receiving in an Erlang (or any functional language) program is non-deterministic!

On the Crucial Importance of Temporal Determinacy

This, ladies and gentlemen, is the biggest problem with functional languages: they are temporally non-deterministic. Why is this so important? Well, let’s take security as a case in point. It is entirely possible that some malware surreptitiously replaces your little Erlang process with a spying module. The sending process or client has no way of knowing this. The best and most effective way to detect malware is to use a temporally deterministic system. Such a system is necessarily based on fine-grain concurrency and has a stable temporal signature. What this means is that a number of timing watchdogs can be inserted at various places in the system to detect changes in its temporal signature. Any code modification (such as the insertion of a spying module) would change the signature and trigger an alarm (see Impregnable Security in Concurrent Reactive Systems). Granted, not every part of a software system can be deterministic but every reaction or even chain reaction is.

Security is one thing. Overall reliability and maintainability is another. It is true that a lot of math problems are not concerned with the precise order of most operations but a huge number of real world applications (e.g., simulations, automation, neural networks, etc…) certainly are and the ability to automatically detect changes in event timing at the elementary operation level is crucial to maintainability.

Faking Purity and the Mark of the Beast

The idea that one can implement real world software systems using a purely functional style of programming is fantasy at best, and an outright lie at worst. Let us take the example of a robot that uses two I/O registers to control the movement of a specific actuator. Let’s say that one register is used to start the motion of the actuator and the other to stop it. You can repeat until you turn blue that those registers are not variables but they are. You can scream “MONADS” at the top of your lungs until you drop. Makes no difference, they are still variables. Jumping up and down and foaming at the mouth is not going to help either. :-D

Faking purity and adherence to a mathematical ideal while using impure loopholes to mask the inadequacy of a paradigm is the ultimate exercise in self-deception. Those two control registers are variables that may be used by various behavior subsystems or modules within the robot’s brain. Worst, they may be accessible by any number of cells within a given module. There is only one way to properly manipulate them without introducing all sorts of signal racing and conflict problems and that is to use a temporally deterministic system. Forget threads and all the evil :-) thread monkeys (at Intel, AMD, Sun, IBM, Tilera, Freescale, etc…) who promote them for their crappy multithreaded CPUs. Threads are the absolute work of the devil. Why? Because they are non-deterministic, that’s why. There is a better way to build concurrent deterministic systems without threads. We will need non-threaded, fine-grain, auto-scalable, MIMD multicore CPUs. Unfortunately for those of us who refuse to accept the mark of the Beast (the Algorithm), unless you have the mark on your forehead, you can neither buy nor sell in this business. But that will change soon enough. Slowly but surely, the number of rebels is growing and, sooner or later, the time for the uprising will come. Ahahaha…

Fine-Grain Parallelism, Deterministic Timing and Behavior

I wrote above that computing is about behavior. What I mean is that every elementary operation in a software system (additions, subtractions, divisions, multiplications, comparisons, etc…) should be seen as a basic behaving entity that acts on or reacts to an environmental variable. A behaving entity can be either a sensor (comparator) or an effector (operator). A computer should be viewed as a universal behaving machine (UBM), one which consists of a number of concurrent behaving and communicating entities. The only way to build a truly deterministic system is by adopting a convention whereby every behaving entity executes itself in one system cycle immediately upon receiving a signal. This is called synchronous reactive computing (not the same thing as synchronous messaging). A UBM is a radically different type of computer with a new kind of CPU and programming model. The goal of Project COSA is to pave the way for the development of the first UBM and the first true non-algorithmic, concurrent, synchronous, reactive OS. Until that happens, I will continue to bash existing systems and paradigms and gain new converts, one at a time. It’s not easy but, at least, it’s fun. :-D

See Also:
How to Solve the Parallel Programming Crisis

Next: Why Functional Programming Is Worse Than Crap: State Changes Are Essential to Reliability

Tuesday, September 18, 2007

Don’t Like Deadlocks, Data Races and Traffic Accidents? Kill the Threads

All Multicore and Parallel Programming Related Articles

Bad Software Model

If we want to get rid of all the nasty problems that plague concurrent software, then we must get rid of the threads. Multithreading is the second worst thing to have happened to computing. The worst is the algorithm proper, the mother of all threads and of everything that is bad with software. We have become addicted to a hopelessly flawed paradigm. Our multithreaded software systems are full of bugs (hidden or otherwise) and they are hard to program. But the problem is far more serious than this. The algorithmic model is the reason that we can’t prevent over 43,000 people from dying in traffic accidents every year. And that's just the US statistics. There is a better way to do things. Let me explain.

Non-Deterministic Timing

If there is anything more unpredictable and prone to errors than a complex algorithmic program, it is a bunch of algorithms running concurrently, communicating with each other and accessing the same data in memory. That is what threads are, concurrent algorithms. Algorithms are inherently non-deterministic. By this, I mean that the precise timing of the actions (operations) in a complex algorithm cannot be predicted. Why? Because of an inherently non-deterministic construct used in algorithms for decision making (I am talking about the if-then construct). As a result, it is hard to determine when a call to a subroutine, for example, will return. Threads are far worse because context switching and varying CPU load introduce even more uncertainties into the mix.

The Thread Monkeys

So why do the multicore CPU manufacturers want to force thread-based parallelism down our throats even though multithreaded software is bug-prone and hard to develop? The answer is three-fold. First, they don’t know how to make a fine-grain multicore CPU for a MIMD parallel environment; second, they don’t know how to build a non-algorithmic OS and the required development tools, and last but not least, they make a lot of money supporting legacy software and tools. In sum, all they really know is threads. It is up to us, the consumers, to demand better products for our money.

The Solution

The only way to build rock-solid, complex software systems is to use deterministic processes. This means we must change over to a non-algorithmic, synchronous, reactive software model. You may ask, how can one build a parallel system without threads? Consider that programmers of video games, cellular automata, neural networks and simulation-type applications have been doing it for years. We now have cellular automata with thousands and even millions of cells running in parallel. In a fine-grain parallel computer, the cells are the elementary operators (add, subtract, compare, etc…). The only problem is that such a system would be too slow because current CPUs are not designed and optimized for true parallel software. This is the reason that we must design a new kind of multicore CPU, one that is self-balancing and auto-scalable.

How to Prevent Traffic Accidents

The unreliability of the algorithmic software model puts an upper limit on the complexity of our software systems. The real cost of unreliability is much higher than we might suppose. As I wrote elsewhere on this blog, we could conceivably be riding in self-driving vehicles right now (yes, it can be done without AI), but concern over safety and reliability will not allow it. In addition, the cost of developing safety-critical software rises exponentially with the level of complexity. What will it take to get the Intels, AMDs, IBMs, Googles and Microsofts of the world to do the right thing and switch to the right software model? I really don’t know. All I can do is write articles on my blog and hope that somebody important will take notice and realize the error of our ways. It’s a big task but it must be done. The right model will solve the BIG problem of parallel computing and unreliable software. The world cannot wait any longer. Thousands of people are dying needlessly.

Sunday, September 16, 2007

Killing the Beast

All Multicore Related Articles

The Big Daddy of All Thread Monkeys: Intel

One would think that, with all the money Intel has been spending in researching new programming tools to tackle the BIG problem of parallel programming, it would occur to at least one clever PhD at Intel’s multicore research labs that, maybe, just maybe, threads are not the answer to parallel computing. I mean, how long have those big-brained guys and gals been playing with threads anyway? Decades? Well, let us not hold our breath. Multithreading is to Intel what blood is to a vampire. It can't live without it. Intel’s customers have invested billions of dollars over the years writing trillions of lines of code to run on Intel’s multithreading processors. Intel cannot possibly abandon its customer base and introduce a new CPU that is not compatible with all that legacy code. At least, this is Intel’s thinking: whatever you do, don't touch those threads. This reality has colored Intel's perception to the point where it has boxed itself into a corner and there is no way out. Like it or lump it, Intel is the big daddy of all thread monkeys. Well, that is too bad, in my opinion, because, unless Intel finds an effective cure for its vampire lust, nothing will remain of its entire castle in the air, not a thread (pun intended). The computer revolution waits for nobody, not even Goliaths like Intel.

We Must Kill the Beast

There is only one way to solve the BIG problem, and that is to kill the Beast. By this I mean that we must eliminate threads from programming altogether. But it goes deeper than that. Threads are direct descendents of the algorithmic software model, an ancient paradigm that saw its first use in computing when Lady Ada wrote a table of instructions for Babbage’s analytical engine. There is no need to deny it, threads are algorithms. What is wrong with the algorithm, you ask? Well, in my opinion, the algorithm IS the BIG problem. It has been the BIG problem from the start, long before parallel computers and multicore CPUs became all the rage. I will even claim that, with the exception of the Von Neumann bottleneck, the algorithmic software model is the cause of every ill that ails the computer industry, from unreliability to low productivity. Yes, it is an old and nasty Beast but, unfortunately, one with a huge army of dedicated followers who will fight teeth and nails to save it from extinction. It makes no difference, though. Sooner or later, we must kill the Beast.

The Anti-Beast

Killing the Beast is one thing. Finding a successor to the Beast is another. The Anti-Beast must be the opposite of the Beast. Here is a partial list of the characteristics of both for comparison:

Additionally, the Anti-Beast must support fine-grain parallelism and asynchronous messaging. For more information on the Anti-Beast, see Project COSA.

PS. Sorry, Intel. If you can't kill the Beast, someone else will. For your sake, let us hope it's not AMD. :-) Easy now, that was just a joke. AMD is a big-time thread monkey, too. It's like they say, monkey see, monkey do.

Saturday, September 15, 2007

The Age of Crappy Concurrency: Erlang, Tilera, Intel, AMD, IBM, Freescale, etc…

I’ll get right to the point. If your multicore CPU or concurrent programming language or operating system does not support fine-grain, instruction-level parallelism in a MIMD (multiple instruction, multiple data) environment, it is crap. Ok, I know that you are offended and you don’t think that your company’s CPU, language, or OS is crap but it is. And I mean Crap with a capital C. The computer market is looking for fine-grain parallel systems that are fast, secure, easy to program, auto-scalable and bug free. The crap that I see out there does not even come close to delivering what the market wants. Let me twist the knife in the wound a little deeper because I don’t think you people (I’m tempted to say, you idiots :-), but I’ll let it slide) are getting the message. Here is a list of highly desirable things that the market wants right now, things that your crappy products are not supporting but should be.

  • Fast, fine-grain, instruction-level parallelism using MIMD. What is the point of parallelism otherwise? And please, don't tell your customers that it can only be done using SIMD. As I wrote elsewhere on this blog, using SIMD to develop software is like pulling teeth with a crowbar. You don't know how to do it with MIMD because you are a bunch of thread monkeys, that's all.
  • Easy software composition. This means a graphical interface for non-algorithmic programming, among other things. It also means plug-compatible components. Just drag’m and drop’m. No more (text-based) computer languages, por favor! Computer languages are a primitive, awkward, unnecessarily complex and messy legacy from the 20th century. They are a stupid way to program computers. Only computer geeks love computer languages. The market is not about what geeks like. It's about profit, reliability, security and productivity. To give you an idea of what I'm talking about, take a look at this parallel QuickSort using COSA cells. All labels can be created by the designer/developer using a natural language of his/her choosing.
  • Deterministic timing of events at the elementary operation level. Deterministic temporal order is a must for reliability. This is possible only by adopting a convention whereby all instructions (elementary operations) are parallel reactive processes and have equal durations based on a virtual system-wide clock. That is to say, they all execute in exactly one virtual cycle. This is called ‘synchronous reactive computing’ (not to be confused with synchronous messaging).
  • Implicit parallelism at the design level, and by this, I don’t mean compilers that extract parallelism from crappy sequential programs. I mean the programming environment should use objects that are inherently parallel. Otherwise, you're just fooling yourself that you're doing parallel programming. Implicit parallelism is the natural way to program and that's the way it should have been from the start, even with single core CPUs.
  • Explicit sequential order. Either you’re doing parallel programming or you’re not. If you are, then sequences should not be implicit but explicitly specified by the developer. Otherwise, things become hard to organize and understand.
  • Automatic resolution of data dependencies. This eliminates blind code and otherwise hidden side effects of code modification. It is an essential ingredient for reliability. It can only be done using a reactive synchronous software model.
  • Fast asynchronous message passing using shared memory structures. Copying an entire message to a queue a la Erlang is a joke.
  • Automatic transparent scalability. The developer should never have to think about cores. Programs should take advantage of all the cores automatically.
  • Impregnable security. This is possible only in a system that enforces deterministic timing.
I could go on but that’s enough for now. What is my point? My point is that you people in the computer industry who insist on putting out one crappy product after another, have no excuse. And please, don’t say anything about customers wanting to preserve their legacy tools and software. It’s bullshit. If you gave your customers the things that I list above, they would switch to it as fast as they can. Why? Because it’s not crap, that’s why. It would save them a shitload of money and eliminate the headaches. It would allow them to develop systems of arbitrary complexity without having to worry about unreliability and insecurity. We would finally be able to implement truly automated transportation systems (no, you don’t need AI for this) and eliminate the need for human drivers on the road, thereby saving 40,000 lives a year in the US alone! And don’t tell me that it cannot be done because I know otherwise.

The public wants cars, trains and buses that drive themselves and planes that fly themselves. They want automated air traffic control that never fails. Your crappy products are killing people out there, damnit! by default and otherwise. You have no excuses! If you don’t get off your sorry asses :-) right now and investigate the soundness of the COSA software model, you should all be prosecuted for gross negligence. I mean it.

PS. There is one cool thing that happened in the industry recently and that's Jeff Han's multi-touch screen technology. You people should take a good look at Jeff''s stuff and study that user interface closely because fast, touch 'n drag composition is part of the future of parallel programming.

See Also:

Erlang Is Not the Solution
Nightmare on Core Street

Parallel Programming, Math and the Curse of the Algorithm
Half a Century of Crappy Computing
Parallel Computers and the Algorithm: Square Peg vs. Round Hole

Thursday, September 13, 2007

Alienating Computer Geeks

The Risk of Alienation

Read this article only if you’re a computer geek but do ignore the rest of my blog because it is not meant for you. And don’t even bother responding because I will trash your comments. Someone wrote to me today to advise me to stop promoting my blog at various programming news sites (e.g., DZone). His advice is that, unless I provide code for people (meaning computer geeks) to play with, I risk alienating the same people who might be interested in my work. Oh well. What if I told you that alienating computer geeks is exactly what I want to do? Would that piss you off? Good. Let me make it clear that I have very little respect for computer geeks. I have worked with geeks for many years and most of them (although not all) are full of themselves. And here is why.

Computer Geeks Are Stupid

Computer programming has been a profession for at least half a century. During all this time, not once did it occur to all those legions of computer geeks out there that the way they build and program computers might be fundamentally wrong. And these are people who consider themselves among the smartest people on earth. What is wrong with this picture? I remember realizing that something was wrong with computing as soon I picked up a book to learn 6502 assembly language. That was way back in 1980 and I had just bought myself a $400 single board Rockwell AIM 65 computer with 4k of memory (I still have it). The basic idea of building and programming computers has not changed much since. Heck, it has not changed in 150 years! Since the days of Babbage and Ada Lovelace for crying out loud! What is my point? My point is that it’s crap. That’s my point.

So then let me give it to you straight. You computer geeks out there don’t know shit as you ought to know. You are the last people on earth that I want on my side. You are stupid, in my opinion. If I alienate you, so be it. No skin off my back. You don’t put food on my table. I want smart people on my side, real thinkers. Not a bunch of nerds who get their jollies from writing a chicken shit sort routine in C++, Erlang, Haskell or what have you. It’s all crap, I tell you.

Making Geeks Obsolete

So there you go. I just did not want the word to spread that I am trying to persuade a bunch of computer nerds on DZone (or wherever) to be on my side. You are not smart enough (the few smart ones among you know who you are). The computer world is about to change drastically, not because of you, but in spite of you. You are the cause of the software crisis, not the solution. In fact, the computer world is going to change to the point where you will become obsolete and forced to hit the unemployment line, fondling a code listing of Python, Ruby, C++ or Erlang in your shirt pockets, longing for the days when people actually paid you for writing those crappy codes. So yes, that’s one of the goals of Project COSA, to make you obsolete.

Ok. I feel much better now. I think I'll have me a glass of wine. Thank you. :-D

Impregnable Security in Concurrent Reactive Systems

The Security Problem

Government agencies, public and private corporations, and organizations of all sorts have legitimate reasons to protect their private information from prying malevolent eyes. This is a serious problem. A week does not go by without some report in the media about a new virus, worm, ID theft, break-in or other nefarious act perpetrated against computer users. In a previous article on the importance of timing in programming, I wrote that the temporal signature of a concurrent reactive system is guaranteed to remain fixed during the lifetime of the system. The reason is that the timing or temporal order of reactions in such a system is deterministic. This charateristic of concurrent reactive system can be used to great advantage in the battle against unreliability and malware.

Temporal Signature

How can reactive concurrent systems be used to secure a system against viruses, key loggers, spyware, intruders, worms and other malicious programs? Well, deterministic timing means that a program or operating system based on the COSA software model has a certain fixed temporal signature. A number of timing watchdogs can be inserted in a COSA system that will trigger an alarm as soon as the slightest change in the system’s temporal signature is detected. There is no way that someone can surreptitiously insert a spying program into a COSA system without changing its temporal signature and triggering an alarm. This is one more advantage of the COSA model, rock solid security. I just added it to the list of claims I am making for COSA. Check out the COSA Pitch for more.

Tuesday, September 11, 2007

Why Timing Is the Most Important Thing in Computer Programming

An Analogy

Architects, carpenters, mechanical and civil engineers expect things to have specific sizes. If, during construction, the sizes of parts are found to be different than specified or do not match properly with other parts, a search will be conducted to find the cause of the discrepancy and correct it. In this article, I will argue that size (distance) is to architecture what timing is to computing. In other words, deterministic timing is essential to software reliability.

Deterministic Timing in Reactive Concurrent Systems

In a Von Neumann computer, it is unrealistic to expect every operation of a program to occur at a specific relative time based on a real time clock. The reason is that operations must wait their turn to be processed by the CPU. The problem is twofold. First, the CPU load varies from moment to moment and second, algorithmic software is such that it is impossible to predict the duration of every subroutine in an average program. However, it is possible to simulate a parallel, signal-based reactive system based on a virtual system clock. In such a system, every operation is required to be purely reactive, that is to say, it must execute within one system cycle immediately upon receiving its signal. These two requirements (every elementary operation is reactive and is processed in one cycle) are sufficient to enforce deterministic timing in a program, based on the virtual system clock. Deterministic timing means that reaction times are predictable. It does not mean that all the events (such as the movements of a mouse) that trigger the reactions are predictable. However, one event may trigger one or more chains of reactions and these, too, are deterministic, relative to the first event.

Timing Watchdogs

One nice thing about concurrent reactive systems is that interval detectors can be used to automatically find invariant intervals between any number of signals within a program. We can place timing watchdogs at various places in the program (this, too, can be done automatically) so that any discrepancy between an expected interval and the actual measured value will trigger an alarm. The temporal signature of a reactive system remains fixed for the life of the system and this makes for rock-solid reliability. So there are only two ways a timing watchdog can trigger an alarm; either the code was modified or there was a local physical system failure.

Automatic Discovery and Resolution of Data and Event Dependencies

Another nice aspect of concurrent reactive systems is that they are based on change. A change to a program variable is immediately communicated to every part of the program that may be affected by the change. The development environment can automatically link every entity or operator that changes a variable to every sensor that detects the change. This essentially eliminates blind code.

Side Effects in Complex Systems

We all know how hard it is to maintain complex legacy systems. A minor modification often triggers unforeseen side effects that may lay dormant for a long time after release. The right combination of events may cause a system failure that can be directly linked to the modification. For this reason, most system managers will look for alternative ways around a problem before committing to modifying the code. The side effects problem not only places an upper limit to the complexity of software systems, but the cost of continued development and maintenance soon becomes prohibitive. This is a problem that will never go away as long as we continue to use algorithmic systems. Luckily, the problem becomes nonexistent in the temporally deterministic reactive system that I described above. This is because blind code elimination and the use of timing watchdogs make it impossible to introduce undetected side effects. Indeed, timing is so deterministic and precise in a purely reactive system that the smallest modification is bound to violate a temporal expectation and trigger an alarm. It is up to the designer to either accept the new temporal signature, change it or revert back to the old code. As a result, we can build our software as complex as possible without having to worry about hidden bugs. In fact, and this is rather counterintuitive, more complex software will mean more timing constraints and thus more correct and robust systems, i.e., systems that work according to specs.

The Entire Computer Industry Is Wrong

All right. I know this sounds arrogant but it is true. We have been building and programming computers the wrong way from the beginning (since the days of Charles Babbage and Lady Ada Lovelace). To solve the unreliability and low productivity problems that have been plaguing the industry, we must change to a new way of doing things. We cannot continue with the same old antiquated stuff. It is no good. We must abandon the algorithmic software model and adopt a concurrent reactive model. And what better time is there to change than now, seeing that the industry is just beginning to transition from sequential computing to massive parallelism? Reactive concurrent systems are right at home in a parallel, multicore universe. We must change now or continue to suffer the consequences of increasingly unreliable and hard to develop software. This is what Project COSA is about.

Yukihiro "Matz" Matsumoto Takes Offence

Yukihiro "Matz" Matsumoto, the Japanese inventor of the English-based computer language, Ruby, is not happy that I criticize Erlang for being an English-based language. Why am I not surprised? Maybe I should write an article on the seven deadly sins of Ruby, just for fun?

The COSA Pitch, Pure and Simple

All COSA Related Articles

Recently, this blog and the COSA pages have received increased attention from various Silicon Valley venture capital firms and many technological and research organizations around the world. My position is simple. It is a two-way street. If you are interested, I am interested. The following is a list of what I am claiming for COSA. Note that Project COSA’s goal is to change not only the way we build software but also the way we build computers. COSA promises the following:

The power of COSA comes mainly from recognizing the crucial importance of deterministic event timing to software reliability. It can be summed up in one sentence: Interval is to computer programming what distance is to architecture.

Possible COSA Business Models

Monday, September 10, 2007

AMD Can Kick Intel’s Ass But Only By Forging a New Market

All Multicore Related Articles

More Than Just a Slap in the Face

On the same day that cash poor AMD released its long awaited Quad-Core “Barcelona” Opteron™ CPU, its much bigger rival, Intel Corporation, decided to rain on its parade by announcing close to ten billion dollars in revenue for the third quarter, better than they expected. Intel’s gross margins was more than 52 percent. This is more than just a slap in AMD’s face and everyone knows it. Intel is flexing its muscles and signaling that it is about to inflict major pain on the competition, you can bet on it.

Deep Trouble

AMD is in deep trouble. It just barely survived a vicious price war with Intel; its chief sales and marketing officer recently left and took a similar position at another chip company, Freescale Semiconductor; and it doesn’t look like Barcelona is going to offer much of an advantage over Intel’s own quad-core offerings. It would not surprise me if Intel suddenly announced deep price cuts on all their server chips. That’s a blow that could conceivably send a wounded and bleeding AMD crawling into chapter eleven. That would be a disaster for everybody.

Desperately Seeking Relief

It is obvious that AMD is waging a valiant but ultimately losing battle against a determined and fierce Goliath. How can AMD extricate itself from this predicament? Well, there is a large sector of the computer market that is living in constant fear. These are people who run extremely high-risk, mission-critical businesses and they depend on fast, reliable computers to keep them in business. A minor glitch or malfunction in their systems can mean millions and even tens of millions of dollars in lost business. At times, even human lives are at stake. I am talking about the power generation, transportation, aerospace, defense, financial and health industries. Failure is simply not an option for these people. They are desperately seeking relief from the high cost of system development and the nagging feeling that something may go terribly wrong at any time. But relief is hard to come by.

AMD’s Ace in the Hole

Obviously the current computing paradigm is not adequate for the job. The relief can only come from a paradigm shift, one which will eventually supersede the old one and leave Intel and other overconfident and late adopters holding empty bags. Luckily for AMD, there is still time to jump on this opportunity and claim the new frontier for itself. It is time to stop playing the me-too fiddle and become a real leader and pioneer worthy of the 21st century. It is time to leave the old stuff behind and forge ahead. The new market wants super fast, auto-scalable, rock-solid parallel software applications that are easy to develop and the specially designed, fine-grain, multicore CPUs that will make it all possible. AMD needs an ace in the hole that will give it the confidence and the know-how it needs to kick ass and take names in this high-stakes, cut-throat business. That’s what Project COSA is about. Kicking ass and taking names.

See Also:
The Age of Crappy Concurrency: Erlang, Tilera, Intel, AMD, IBM, Freescale, etc…
Half a Century of Crappy Computing
Parallel Programming, Math and the Curse of the Algorithm
Parallel Computers and the Algorithm: Square Peg vs. Round Hole

Is Erlang a Parallel Language? Simple Answer: No

All Multicore Related Articles
All Erlang Related Articles

In a recent article, Fady Younan wrote “We have to realize that the world is going towards parallel processing even we don't accept Erlang as a language I think it will take place in the next decade”. I am not entirely sure what Fady is saying here. Is he saying that Erlang will be accepted as the de facto parallel language in the next decade? If so, I beg to differ. A true parallel language must handle fine-grain parallel processing. Erlang is anything but fine grain. A true parallel language will have no trouble initializing a collection of variables simultaneously. Can Erlang be used to implement a parallel QuickSort module on a multi-core system that is faster than the single-core sequential implementation? I doubt it.

The point that I am driving at is that, if the computing world is going toward massive parallelism (Tilera and others are already talking about the coming of 1000-core CPUs), in order to take full advantage of all those cores, we need a parallel programming model that scales up linearly, up to the elementary operation level. Unless there are as many processes/threads running as the number of available processors (cores), Erlang’s lightweight concurrent processes are not fine enough to take advantage of a huge number of processors. As soon as the number of processes falls below the number of available cores, performance suffers. The best way to take full advantage of massive parallelism is to increase the number of processes by shrinking their size down to elementary operations such as, addition, subtraction, etc… At that point, the model is fully parallel (parallelism is implicit) and sequential order is explicit and signal-based. Indeed, why limit asynchronous signaling to lightweight processes? Why should elementary operations be excluded? I know, some may think that asynchronous signaling at the operation level would be too slow. Sure, this is true, but only because current CPUs are designed and optimized for the algorithm. There is a way to design a multicore MIMD CPU to handle fine-grain parallelism at the operation level that does not incur any performance penalty. (see How to Solve the Parallel Programming Crisis)

One of my bones of contention with concurrent languages like Erlang is that they don’t encourage multicore designers (AMD, Intel, Tilera, etc…) to create architectures for fine grain parallelism in an MIMD environment. In fact, they discourage it. That’s too bad, in my opinion, because the market wants super fast, auto-scalable, fine-grain parallel computers and applications. And what the market wants, the market will get. You can bet on it.

See Also:

Erlang Is Not the Solution
How to Solve the Parallel Programming Crisis
Parallel Programming, Math and the Curse of the Algorithm

The Age of Crappy Concurrency: Erlang, Tilera, Intel, AMD, IBM, Freescale, etc…
Half a Century of Crappy Computing
Parallel Computers and the Algorithm: Square Peg vs. Round Hole

Sunday, September 9, 2007

Parallel Computers and the Algorithm: Square Peg vs. Round Hole

All Multicore Related Articles

Ada Did It

Lately, a lot has been said about how hard it is to program parallel computers. It never occurs to the pundits that we might be trying to force a square peg into a round hole. The algorithmic software model saw its first use, about a century and a half ago, when Lady Ada Lovelace wrote the first algorithm (or table of instructions) for Babbage’s analytical engine. This programming model has been with us ever since and has served us well. It is a natural fit for our sequential computers. However, now that the computer industry has decided to transition from sequential computing to massive parallelism, should we expect the algorithmic model to be the correct one for parallel computing? I think not. And yet, this is exactly what Intel, AMD, Tilera and other multicore CPU manufacturers are forcing down our throats.

What the Market Wants

What the market wants is not hard to figure out. The market wants super fast multicore applications that are painlessly auto-scalable, are easy to develop, do not fail and use fine-grain parallelism. The market does not want parallel computers that cannot parallelize a QuickSort algorithm because they are using a coarse-grain, thread-based programming model. In reality, the market does not want to think about how to deal with cores at all. It just wants ease of programming, linear transparent scalability and speed, just like in the old days of Moore’s law. The market believes in a simple equation: more cores = faster computers. No code redesign after adding cores, no jumping through loops, no futzing with a bunch of hard to program threads. Simplicity and speed. That’s all. Unfortunately, Intel, AMD, Microsoft, Tilera, Sun Microsystems and the others are not delivering the goods. Why? Because they don’t know how. All they know is square pegs and round holes.

Stealing the Pot of Gold

Having said that, let me ask a pertinent question. Even if they did know how, would they deliver the goods? I have the funny suspicion that they wouldn’t. The reason is that they have a lot (trillions?) invested in the old paradigm. They want to milk that puppy for everything it’s got. They are not going to switch to an incompatible model that may destroy their old cash cow. Besides, a lot of well-paid programmers and engineers would become obsolete since their expertise with the old stuff would no longer fetch as much in the new marketplace. The fact is that there is a better way to build and program computers. It is called COSA. It is an implicitly parallel, explicitly sequential model that would solve what ails the computer industry, if it would only open its eyes. But then again, maybe some of them do have their eyes open. They just don’t like the consequences. That’s too bad, in my opinion. Judging from the number of visits this blog and the COSA pages have been getting recently from Silicon Valley and certain well known venture firms, an unlikely startup may sneak behind them and steal the pot of gold. Oh sweet irony!

Saturday, September 8, 2007

How to Design a Self-Balancing Multicore Processor, Part III

Part I,  IIIII


In Part I and II of this three-part article, I argued that a processor should be seen as a device for simulating or processing vast numbers of concurrent, elementary processes or cells. A cell is a basic entity that performs an elementary operation. Unlike other concurrent languages, the programming model used for this processor design assumes that parallelism is implicit but sequential order must be explicitly specified. Simulating a parallel collection of cells requires the use of two processing buffers (A and B). Essentially, while the cells in buffer A are being processed, buffer B is being filled with the cells that will be processed during the next consecutive cycle. This is done in order to avoid signal racing conditions that would otherwise occur. As soon as all the cells in buffer A are processed, the two buffers are swapped (A becomes B and B becomes A) and the cycle begins anew.

Note: Both cells and data are shown sharing the same memory space. This is for convenience only. In a real processor, they would be separate and would use separate address and data busses.

The Task Manager

As already mentioned, both buffers should reside on the processor and buffer management can be handled by relatively simple on-chip circuitry. On a multicore processor, however, the load must be distributed among the cores for fine-grain parallel processing. Note that this is all MIMD processing. Using SIMD for development is like pulling teeth. Besides, SIMD is worthless when it comes to enforcing the deterministic timing of events. There are probably several ways to design a fast mechanism to manage load balancing among multiple cores. The one that I envision calls for every core to have its own pair of processing buffers, A and B. However, the job of populating the B buffers should be that of a special on-chip controller that I call the Task Manager (TM).

Essentially, the job of filling the buffers with instructions is that of the task manager. Whenever the TM encounters new cells to be processed, it should distribute them as equally as possible amongst the B buffers. The TM must have intimate knowledge of the status (the number of items) of the A and B buffers of every core. It should also know the execution duration (in terms of clock cycles) of every cell according to its type. This way, the TM can intelligently assign cells to each core and equalize their individual loads as much as possible. The TM has only one more job to do: as soon as all the A buffers are empty, it must signal the cores to swap the buffers and repeat the cycle. One nice thing about this approach is that the TM works in parallel with the cores and does not slow their performance. Another advantage has to do with power consumption. Since the TM has perfect knowledge of processorload at all times, it can automatically turn off a percentage of the cores, depending on the load, in order to save energy.

The Road Ahead

As far as I know, no other multicore architecture provides for fine-grain, self-balancing parallelism using an MIMD execution model. There is no doubt in my mind that it is the correct approach to designing the multicore architectures of the future. There are additional advantages that are inherent in the software model, such as fault tolerance, deterministic timing, the automatic discovery and enforcement of data dependencies. The result is rock-solid software reliability and high performance.

Thursday, September 6, 2007

How to Design a Self-Balancing Multicore Processor, Part II

Part I,  IIIII

Implicit Parallelism, Explicit Sequential Order

In Part I, I wrote that a processor should be seen as a necessary evil. It is needed only because cells (elementary objects or instructions) in computer memory cannot process themselves. Thus a processor should be seen as a cell processor. From a programming point of view, the cells are the elementary operations that comprise a program residing in memory. In a truly parallel programming environment, parallelism is implicit and sequential order is explicit. In other words, unless specified otherwise, all cells are assumed to run concurrently with other cells. Every cell must explicitly specify its successor or successors, if any. A parallel program thus consists of multiple linked lists of cells (there are ways to use simple lists for optimization purposes but that’s another story). Immediately after executing its operation, a cell must send a signal to its successor or successors, if any, alerting them that it is their turn to execute. In a reactive system like COSA, execution and signaling count as one system cycle, meaning that they happen simultaneously.

Two Instruction Buffers

In order to prevent the troublesome signal racing conditions that would otherwise occur, the cell processor must use two instruction or cell buffers, A and B. The way it works is rather straightforward. As the cells in buffer A are being processed, buffer B is populated with successor cells that will be processed during the next cycle. As soon as all the cells in buffer A are processed, the two buffers are swapped and the cycle begins anew. Simple. I use this exact method in my own neural network experiments. Of course, this method can be modified and optimized for performance in a processor. Instead of just two buffers one can have three, four or more buffers and an instruction prefetch mechanism can be used to fill the buffers ahead of time. This is important because load balancing must happen before the processor is ready to process a buffer.

The beauty of the two-buffer scheme is that the parallelism is fine grained and scalable and it can be performed entirely by the processor. It can be used in both single core and multicore architectures. It should be done using an MIMD configuration, if possible, an order to take full advantage of vectorization. Highly parallel programs will automatically scale to take advantage of all available cores. In fact, neither the program nor the programmer need be aware of the number of cores. The cores are presented with a list of cells to process at every system cycle and they divide the cell load among themselves. Program modules can send messages to each other via shared memory structures. Both buffers should reside on the chip for fast processing. The size of the buffers should be big enough to accommodate the most demanding situation so as to avoid using main memory. In a single-core processor, managing the buffers should be straightforward. In a multicore processor, however, we need a special controller, a task manager. In Part III, I will describe the task manager’s job, which is mostly load balancing.

See Also: