Monday, March 3, 2008

Nightmare on Core Street, Part II


Part I, II, III, IV, V


In Part I of this five-part article, I wrote that the computer industry is in a sort of panic, their planned transition from sequential computing to parallel processing having hit a brick wall. The existing multicore architectures are not only hard to program, most legacy applications cannot take advantage of the parallelism. Some experts, especially MIT Professor Anant Agarwal, CTO and co-founder of Tilera Corporation, are adamant that the industry needs to come up with a new software model because current thread-based operating systems that use cache coherency snooping will not scale up to the many-core processors of the future (source: EETimes). I agree with Professor Agarwal. In this installment, I will describe the difference between fine and coarse grain parallelism, the origin of threads and the pros and cons of thread-based MIMD multicore processors. MIMD simply means that the parallel cores can execute different instructions on multiple data simultaneously.

Fine Grain Vs. Coarse Grain

General-purpose multicore processors (the kind built in the newest laptops, desktops and servers) use an MIMD architecture. These processors implement coarse-grain parallelism. That is to say, applications are divided into multiple concurrent modules or threads of various sizes. Each core can be assigned one or more threads so as to share the load; this results in faster processing. By contrast, in fine-grain parallelism, applications are broken down to their smallest constituents, i.e., the individual instructions. Ideally, these instructions can be assigned to separate cores for parallel execution. Fine grain parallelism is much more desirable than coarse-grain parallelism because it makes it possible to parallelize well-known functions like those used in array sorting or tree searching.


Threads are a legacy of the early years of electronics digital computing. They stem from an idea called multitasking that was originally invented to eliminate a problem with sequential batch processing. In the old days, multiple programs (jobs) were stacked in a batch and a job controller was used to feed them into the computer to be executed one after the other. This was time consuming and very expensive. Multitasking made it possible for multiple jobs to run simultaneously on the same sequential processor. The rationale is that the processor is so fast that it can switch execution from one concurrent task to another many times a second. Each task would have its own memory space and would behave as if it had the processor entirely to itself. It did not take long for someone to figure out that a single application could be divided into multiple concurrent internal mini-tasks running in the same memory space. These are called threads.

The Good

Even though multitasking and multithreading were never intended to be a parallel programming model, this is nevertheless the model that most major multicore CPU vendors have embraced. Early on, everybody understood that threads and/or tasks could be divided among multiple parallel cores running concurrently and that it would result in faster processing. In addition, threads provided a direct evolutionary path from single-core to multicore computing without upsetting the cart too much, so to speak. Many existing applications that already used threads extensively could make the transition with little effort and programmers could continue to use the same old compilers and languages. Even non-threaded applications could take advantage of the new CPUs if several of them can be kept running concurrently on the same multicore computer.

The Bad

The need for programming continuity and for compatibility with the existing code base is the reason that companies like AMD and Intel are trying their best to encourage programmers to use as many threads as possible in their code. There are many problems with threads, however, too many to discuss in a blog article. So I’ll just mention a few here. The biggest problem is programming difficulty. Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write. Threads are inherently non-deterministic and, as a result, they tend to be unreliable and hard to understand, debug and maintain. In addition, the coarse-grain parallelism used in threaded programs is not well suited to data-parallel applications such as graphics processing, simulations and a whole slew of scientific computations. These types of programs run much faster in a fine-grain parallel environment. Another problem is that a huge number of legacy applications were not written with threads in mind and cannot take advantage of the processing power of multicore CPUs. Converting them into threaded applications is a monumental undertaking that will prove to be prohibitively costly and time consuming. Rewriting them from scratch will be equally costly.

Obviously, thread-based MIMD parallelism is not the answer the industry is looking for, wishful thinking notwithstanding. In Part III, I will examine the pros and cons of SIMD and heterogeneous multicore processors.

See also:

Why Parallel Programming Is So Hard

No comments: