Thursday, August 28, 2008

Heralding the Impending Death of the CPU

Ancient Model vs. Modern Necessities

The modern CPU may seem like a product of the space age but its roots are rather ancient. British mathematician Charles Babbage first conceived of the principles behind the CPU more than 150 years ago and he was building on ideas that were older still, such as Jacquard’s punch card-driven loom and the algorithm, a mathematical concept that was invented a thousand years earlier by Persian mathematician, al-Khwārizmī. Like most mathematicians of his day, Babbage longed for a reliable machine that could automate the tedious task of calculating algorithms or tables of instructions. Parallel computing was the furthest thing from his mind. Yet amazingly, a century and a half later, the computer industry is still clinging to Babbage’s ancient computing model in the age of multicore computers.

The Good Way vs. the Bad Way

There are two main competing approaches to parallelism. The thread-based approach, the one chosen by the computer industry for general purpose computing, calls for having multiple instruction sequences processed side by side. The other approach, vector-based parallelism, calls for having multiple parallel lists of instructions, with the lists processed one after the other.
In the figure above, the small circles represent instructions or operations to be executed by the processor. The down arrows show the direction of execution. In thread-based parallelism, each thread can potentially be processed by a separate sequential core (CPU). In vector-based parallelism, the operations are fed into the processor as a collection of elements to be processed in parallel. For that, one would need a pure MIMD (multiple instructions, multiple data) vector processor in which every instruction is an independent vector that can be processed in parallel with the others. Both approaches will increase performance but, as I have explained elsewhere, the vector-based approach is better because it is deterministic and fine-grained; and it reflects the way parallel objects normally behave in nature.

Our brains are temporal signal-processing machines. The ability to sense the temporal relationships between events is essential to our understanding of the world around us. We can make sense of the vector-based approach because we can easily determine which processes are concurrent and which are sequential, and we can make a clear distinction between predecessors and successors. This sort of predictability is the sine qua non of learning and understanding. This is part of the basis of the COSA computing model. Multithreaded applications, by contrast, are hard to understand and maintain precisely because they are temporally non-deterministic. The idea that we can solve the parallel programming crisis by holding on to a flawed and inadequate programming model is preposterous, to say the least.

The World Does Not Need Another CPU

I was happy to read the recent news (see Techlogg’s analysis) that Nvidia has denied rumors that it was planning on developing its own x86 compatible CPU in order to compete against Intel and AMD. Good for them. The last thing the world needs is another x86 CPU, or any CPU for that matter. Nvidia should stick to vector processors because that is where the future of computing is. However, it will have to do something in order to counteract the fierce competition from AMD’s ATI graphics products and Intel’s upcoming Larrabee. The most sensible thing for Nvidia to do, in my opinion, is to transform its base GPU from an SIMD (single-instruction, multiple data) vector core into a pure MIMD vector core. This would make it an ideal processor core for Tilera’s TILE64™ as I suggested in my recent article on Tilera. Come to think of it, maybe Nvidia should just acquire Tilera. That would be a superb marriage of complementary technologies, in my opinion.

Conclusion

The CPU is on its deathbed. The doctors are trying their best to keep it alive but they can only postpone the inevitable for a little while longer. Soon it will die from old age but I, for one, will not be shedding any tears. Good riddance! It is not really hard to figure out what will replace it. It is not rocket science. It will take courage and fortitude more than brains. Who will be the standard bearer for the coming computer revolution? Which organization? Which company? Which country will see the writings on the wall and dominate computing for decades to come? Will it be Intel, AMD, Nvidia, or Tilera? Will it be the US, India, China, Germany, France, Spain, Sweden, Singapore, Japan or Taiwan? Who knows? Only time will tell but I sense that it won’t be long now.
Next: The Radical Future of Computing, Part I

Related Articles:

Parallel Computing: Both CPU and GPU Are Doomed
How to Solve the Parallel Programming Crisis
Transforming the TILE64 into a Kick-Ass Parallel Machine
Parallel Computing: Why the Future Is Non-Algorithmic
Why Parallel Programming Is So Hard
Parallel Computing: Why Legacy Is Not such a Big Problem
Parallel Computing, Math and the Curse of the Algorithm

4 comments:

Marc said...

I work in a manufacturing facility so I immediately thought of ladder logic of course. I am not a programmer.

I think that algorithmic programming is popular because it is similar to the way many of us write in western natural language; people plan whether a thought should be after or before a previous one in academic essays, which is inherently sequential in nature. Essay to pseudo-code to code seems natural to us but greatly increases the likelihood of error as the project gets bigger as you point out... On the other hand the analogy between written text and code makes a program easily copyrightable (the US is all about lawyers after all, other countries maybe less so)

Native parallel programming requires that the programmer (or implementer if you'd rather call it that) decides what are the conditions that have to be met for each cell to trigger and what are the outputs that are produced based on those conditions so it requires skills that are part-user, part coder. Petri Nets are a great graphical symbolism for this. It actually requires that people focus on the problem instead of style. To me, starting with a software specification before implementing a solution seems obvious, but my son has mainly sold freelance projects to business types based on his suggested user interface first; when he tried to tell his potential customers what data sources he used and how he got to his results, the customers' eyes would often glaze over...

There is plenty of parallel processing already going around in graphics processors, Field-programmable Gate Arrays and other Programmable Logic chips. It's just that people with software-experience who are used to a certain type of tool are afraid to make the effort to acquire what they see as hardware-type electrical engineer thought-habits; I know my programmer son would have an issue. The US has developed a dichotomy between electrical engineers and computer scientists.

Also "Talking heads" have a vested interest in promoting certain products that are only incremental improvements over the existing tools, because otherwise they would need to educate the clients about the details of the new paradigm, which would require extended marketing campaigns which would only pay back over the long term.

The microprocessor market is also highly fragmented between cheap low end processor makers like Microchip and Atmel, and desktop makers. The desktop players have their own mindset that has made them successful in the past. The obviously-easily parallelizable tasks (sound, graphics...) are so common that custom parallel processors were designed for them. You might be able to get Microchip to squeeze in 20 16f84 microcontrollers on one piece of silicon and could easily use a bunch of cheap PICs to emulate a bunch of 20 vector processors with current technology at a chip cost of maybe $100. But then, the optimum bus design would vary on the application. Have you looked at any Opencores designs?

What application would be most most compelling to investors? I don't know... But I think an FPGA or multi-PIC proof of concept would help your idea become implemented at low cost, and a "suggestion software on how to parallelize applications" for sequentially-thinking programmers, combined with a parallel processor emulator for conventional chip architectures would help programmers see parallel programming as an approachable solution instead of a venture capitalist buzzword.

Louis Savain said...

Marc,

Thanks for your constructive comments. You make several interesting points that I would like to respond to in my upcoming article.

Matthew said...

Here is an attempt to use petri nets in a graphical programming language:

http://perchta.fit.vutbr.cz:8000/projekty/12


I do not think it has much in common with COSA, but it might be of interest.

Louis Savain said...

Matthew,

Thanks for that link. Very interesting. I will have more to say about PNs in the near future.