Why is parallel programming so hard? Why are Intel, AMD, Tilera, Ambric, etc… having such a hard time getting programmers to take full advantage of their new fancy zillion-core processors? Answer: the Beast.
He is an ancient Beast who first came to life thousands of years ago as an egg in the land of Persia, the land of frankincense and woven rugs, at the crossroads between the Occident and the kingdoms of silk. There he laid dormant for centuries, waiting for the right time when its shell would be cracked open and he would grow to establish himself a kingdom, a kingdom of priests and devoted followers. Eventually he was brought to a new land, the land of the setting sun, where he stayed for many long years, for the time of his kingdom had not yet arrived. And then one day, a princess by the name of Ada cracked the egg open and out came the Beast. And all who saw him fell in love with him, for he had been given great powers of persuasion. And a new name was given unto him, Algorithm, for his steps were ordered in darkness. And he went out conquering and to conquer, and the whole world worshipped the Beast. And he caused as many as did not have his name or the number of his name on their foreheads to be put to death.
All right, enough with the metaphors! You get the message. The reason that parallel programming is hard is that the algorithmic software model is the natural enemy of parallelism. There is another model, one that is non-algorithmic and inherently parallel. We must make a choice. We cannot retain the algorithmic model while pretending that we are engaged in parallel programming. Light weight threads or processes a la Erlang are not the answer. A thread/process is just a little algorithmic computer. And a bunch of small processes running concurrently and using messages to communicate with each other is no different than a bunch of little sequential computers communicating over a network. Big deal! Sure, you get the benefit of fault tolerance (only down to the thread level) but you lose the advantage of temporal determinism and program comprehension. In a properly designed parallel programming environment, parallelism should be implicit and sequential order explicit, not the other way around. And you lose the performance advantage of fine-grain parallelism within the processor-intensive stand-alone application.
Computing is behaving and behaving depends on communication between behaving entities. The question we should ask is, if the goal is fault-tolerance, why limit the entities to light weight processes? Why not go all the way down to the elementary operations? Why should the operations be forbidden to use asynchronous signaling? What is good for the goose is good for the gander. Yes, that is the main difference between the algorithmic and non-algorithmic software model. In the latter, synchronous messaging is forbidden altogether, not just partially, as it is now done in Erlang and other concurrent languages. The advantage of super fine-grain parallelism is that it makes it possible to do something that is impossible with light weight processes. We can impose deterministic timing. That is to say, we can dictate that all atomic or elementary processes have equal durations, one system cycle. This, alone, is a good enough reason to abandon the algorithmic model because it does wonders for reliability. See the COSA model for more on deterministic timing.
There is a way to build and program computers to support scalable, fine-grain parallelism without the use of threads. It is not rocket science. One excuse I have heard for not using the non-algorithmic model is that current processors do not support it. Darn it, that is the lamest excuse in the world. Since when did CPU designers tell the software designers what to do? We, the software experts, are the real bosses of computing. We come up with the right software model and the hardware designers must follow suit. It is up to us, not them.
Ok, so what do we do with all the trillions of lines of legacy code that has already been written? Simple, in my opinion. They can continue to run on the legacy computers. In the meantime, we must forge full-speed ahead with the new paradigm. We must build a new generation of non-algorithmic parallel computers and create the awesome new software that will run on them. We must abandon that light-weight concurrency nonsense and embrace full, fine-grain, scalable parallelism. Eventually, the legacy stuff will join the slide rule and other ancient technologies to its new home, the museum. In the meantime, we will be busy enjoying the golden age of the second computer revolution, the real one, the one that will make the first one pale in comparison.
In conclusion, I strongly advise AMD, Intel, Tilera, Ambric, and the other multicore CPU makers to hold on to your horses. Don’t throw your zillion-core processors at us and expect us to figure out how to program the monsters. We will throw them back at you in a jiffy. You know why? Because they are crap, that is why. Even Tilera’s pretty little baby is crap (sorry, Professor). So you just wait. Otherwise, be prepared to lose a ship-load of that VC and shareholders money, ahahaha... because it won't be long before your current multicore technology becomes obsolete, you can bet on it. Something better is bound to come out. The software community is just now beginning to wake up to the non-algorithmic paradigm. Soon, we will have the correct model in place, one which is scalable and which will take full advantage of the available cores. It is only a matter of time. The market wants it and what the market wants, the market will get. Only then should you jump into designing the new CPUs that will support the model that we hand to you. Not before. We, the software designers, are the lords of the software realm, not you. Sorry to break the news to you in this manner.