Thursday, March 6, 2008

Nightmare on Core Street, Part III

SIMD

Part I, II, III, IV, V

Recap

In Part II of this five-part article I went over the pros and cons of MIMD multicore CPU architectures that are designed to run coarse-grain, multithreaded applications. Current MIMD multicore architectures are an evolutionary step from single core architectures in that they make it easy for existing threaded applications to make the transition to multicore processing without much modification. The bad thing is that multithreaded applications are unreliable and too hard to program and maintain. In addition, coarse-grain parallelism is not well suited to many important types of computations such as graphics and scientific/engineering simulations. Here I describe the advantages and disadvantages of SIMD (single instruction, multiple data) parallelism, also known as data level or vector parallelism.

SIMD

Most multicore processors can be configured to run in SIMD mode. In this mode, all the cores are forced to execute the same instruction on multiple data simultaneously. SIMD is normally used in high performance computers running scientific applications and simulations. This is great when there is a need to perform a given operation on a large data set and in situations when programs have low data dependencies, i.e., when the outcome of an operation rarely affect the execution of a succeeding operation.

Many graphics processors use SIMD because graphics processing is data intensive. If you have a computer with decent graphics capabilities, chances are that it has a special co-processor that uses SIMD to handle the graphics display. Companies like NVIDIA and ATI (now part of AMD) make and sell SIMD graphics processors. In the last few years, many people in the business have come to realize that these dedicated graphics processors can do more than just handle graphics. They can be equally well suited to non-graphical scientific and/or simulation applications that can benefit from a similar data-flow approach to parallel processing.

The Good

One of the advantages of SIMD processors is that, unlike general-purpose MIMD multicore processors, they handle fine-grain parallel processing, which can result in very high performance under certain conditions. Another advantage is that SIMD processing is temporally deterministic, that is to say, operations are guaranteed to always execute in the same temporal order. Temporal order determinism is icing on the parallel cake, so to speak. It is a very desirable property to have in a computer because it is one of the essential ingredients of stable and reliable software.

The Bad

The bad thing about SIMD is that it is lousy in situations that call for a mixture of operations to be performed in parallel. Under these conditions, performance degrades significantly. Applications that have high data dependencies will also perform poorly. I am talking about situations where a computation is performed based on the outcome of a previous computation. An SIMD processor will choke if you have too many of these. Unfortunately, many applications are like that.

Hybrid Processors

The latest trend in multicore CPU design is to mix MIMD and SIMD processing cores on the same die. AMD has been working hard on its Fusion processor, which they plan to release in 2009. Not to be outdone, Intel is quietly working on its own GPU/CPU multicore offering. Indeed, Intel started the trend or mixing graphics and general purpose cores with its failed MMX Pentium processor a while back. Sony, Toshiba and IBM already have a multicore processor that mixes SIMD and MIMD processing cores on one chip. It is called the Cell processor and it is the processor being shipped with Sony’s PlayStation 3 video game console.

The idea behind these so-called heterogeneous processors is that their designers believe that SIMD and MIMD complement each other’s capabilities, which is true. In addition, having both types of cores on the same chip increases performance because communication between cores is faster since it does not have to use a slow external bus. The problem with hybrid processors, however, is that programming them is extremely painful. In the past, I have compared it to pulling teeth with a crowbar. This is something that the industry is acutely aware of and hundreds of millions of dollars are currently being spent on finding a solution that will alleviate the pain.

Fundamental Flaw

In my opinion, all of the current approaches to multicore parallel processing will fail in the end and they will fail miserably. They will fail because they are fundamentally flawed. And they are flawed because, 150 years after Babbage designed the first general-purpose computer, neither academia nor the computer industry has come to understand the true purpose of a CPU. In Part IV of this series, I will explain why I think the computer industry is making a colossal error that will come back to haunt them. Stay tuned.

No comments: