Monday, August 4, 2008

Larrabee: Intel’s Hideous Heterogeneous Beast

The Battle of the Cores

According to the latest indications from Intel, its new processor, code named Larrabee, will feature between 8 and 48 x86 cores and will be slated for both general purpose and graphics processing (source: Computer World). Compare this to Nvidia’s Tesla 10P with its 240 cores and AMD’s soon to be released Firestream 9250, which will sport a whopping 500 cores. So obviously, when it comes to number crunching, Intel’s offering does not even come close. Worse, it won’t be released until 2009 at the earliest. Intel counters that Larrabee will be compatible with existing programming languages such as C and C++ and thus take advantage of the wide familiarity that programmers already have with these languages. Well, wup dee do! That'll teach them.

Update (8/5): Please read the comments below. A reader wrote to point out that AMD and Nvidia misadvertise the number of cores built into their graphics processors.

Hideous to the Core

Larry Seiler, chief architect in Intel's visual computing group, claims that Larrabee “will combine the full programmability of the CPU with the kinds of parallelism and other special capabilities of graphics processors” (source: SlashGear). This can only mean one thing. A programmer will have the option of using a percentage of the cores for general purpose, coarse-grained, MIMD multithreaded parallelism while dedicating the rest for fine-grained, SIMD vector processing for graphics purposes. How the partitioning will be realized or whether it will be fixed or programmable is anyone’s guess. It remains that Larrabee is hideous to the core, if you pardon the pun. Heterogeneous processors will wreak havoc on productivity due to the increased programming difficulty of having to deal with two incompatible modes of execution. In addition, effective load balancing across all the cores becomes a nightmare to manage, if it is at all possible.

Intel’s Folly and Industry Lemmings

The following is a quote from an EETimes article:



Intel says it has a number of internal teams, projects and software-related efforts underway to speed the transition, but the tera-scale research program has been the single largest investment in Intels technology research and has partnered with more than 400 universities, DARPA and companies such as Microsoft and HP to move the industry in this direction.
This is too horrible to even contemplate. In my considered opinion, Intel is single-handedly driving the computer industry over a cliff and the rest of the industry is cheerfully following along like a bunch of lemmings. Oh, the humanity!

Intel Needs to Go to Rehab

Intel is high on its own dope. Over the past several months, from the time I posted my Nightmare on Core Street series of articles, Intel Research has visited my blog hundreds of time. My message is simple: the industry must get rid of its addiction to multithreading and algorithmic computing and adopt a universal, non-algorithmic parallel programming model, one which can handle general purpose computing and graphics processing with equal ease, or anything else you can throw at it. The heterogeneous or hybrid approach to parallelism is absurd to the extreme. It hurts me just to think about it. Justin Rattner needs to go to computer science rehab and take all his buddies with him, in my opinion. Which reminds me of Amy Winehouse's Rehab song:



I’m sorry but the only thing that is keeping me from tearing my hair out is a little bit of humor. Besides, I like Amy Winehouse, drugs, alcohol and all. :-D

Related articles:

How to Solve the Parallel Programming Crisis
Heralding the Impending Death of the CPU
Parallel Computing: Both CPU and GPU Are Doomed

Update (8/5):

CNET has a nice article on Larrabee that explains certain details about the chip's vector and scalar units. The author is not very impressed (he compares it to a science project that got out of hand) but check it out anyway.

6 comments:

Louis Savain said...

I received several comments from a few irate or should I say, deluded, readers. Here's my new policy for commenting. No more anonymous comments. If I can face the music, so can everybody else. Identify yourself in your comment, please. Otherwise, I will not post it.

Tim Sweeney said...

Note that the quoted core counts for AMD and NVIDIA are misleading.

A GPU vendor quoting a "240 cores" is actually referring to a 15-core chip, with each core supporting 16-wide vectors (15*16=240). This would be roughly comparable to a 15-core Larrabee chip.

Also keep in mind, a game engine need not use an architecture such as this heterongeneously. A cleaner implementation approach would be to compile and run 100% of the codebase on the GPU, treating the CPU solely as an I/O controller. Then, the programming model is homogeneous, cache-coherent, and straightforward.

Given that GPUs in the 2009 timeframe will have multiple TFLOPs of computing power, versus under 100 GFLOPS for the CPU, there's little to lose by underutilizing the CPU.

If Larrabee-like functionality eventually migrates onto the main CPU, then you're back to being purely homogeneous, with no computing power wasted.

Tim Sweeney
Epic Games

Louis Savain said...

Tim,

Thanks for that comment and for correcting my incorrect assumption. If that's the case, I apologise to Intel for falsely misrepresenting their processor.

That being said, my criticism of multicore heterogeneous processors remains the same. And my criticism is not directed solely at Intel. I am equally critical of AMD and Nvidia since they essentially use the same approach. A game engine is a domain specific tool that may be fine for game development but the world of applications is much wider than video gaming.

Mixing programming model on the same chip is a match made in hell, in my opinion, because it does nothing to improve productivity or move parallel programming into the mainstream. I am adamant that what is needed is a single universal model that does everything with equal ease. The pundits say it cannot be done. I disagree because I know otherwise. Intel is not helping. As the leader of processor manufacturing industry, they have a duty to do the right thing.

Tim Sweeney said...

Louis,

I agree that a homogeneous architecture is not just ideal, but a prerequisite to most developers adopting large-scale parallel programming.

In consumer software, games are likely the only applications whose developers are hardcore enough to even contemplate a heterogeneous model. And even then, the programming model is sufficiently tricky that the non-homogeneous components will be underutilized.

The big lesson we can learn from GPUs is that a powerful, wide vector engine can boost the performance of many parallel applications dramatically. This adds a whole new dimension to the performance equation: it's now a function of Cores * Clock Rate * Vector Width.

For the past decade, this point has been obscured by the underperformance of SIMD vector extensions like SSE and Altivec. But, in those cases, the basic idea was sound, but the resulting vector model wasn't a win because it was far too narrow and lacked the essential scatter/gather vector memory addressing instructions.

All of this shows there's a compelling case for Intel and AMD to put Larrabee-like vector units future mainstream CPUs, gaining 16x more performance on data-parallel code very economically.

Louis Savain said...

Tim,

I think we agree on principles. The problem with the current parallel computing models (fine-grain GPU and coarse-grain CPU) is that they are limited to certain types of processing and are not compatible with each other. What is needed is a new universal model that exhibits the advantages of both models while eliminating their disavantages. This has been my crusade for many years.

It pains me to see new processors being promoted by the likes of Intel, Nvidia and AMD that essentially reinforce an approach to parallel computing that is going to hurt the industry in the long run. There is a right way to do things and Intel should take the lead in this effort. If its current crop of lead engineers and thinkers cannot or refuse to make the transition to the correct model (there is only one correct model, by the way), then Intel should replace them with better thinkers. Otherwise, someone else will move to center stage and show everybody how to do it. Being an industry giant does not mean invincibility. This road is fraught with potholes and goblins, so to speak.

There are many countries and organizations in the world that would not hesitate to cease a clear opportunity to dominate the comnputer industry in this century. Very big money is in the balance and the industry is at a crossroad. The parallel programming crisis is quickly getting to the point where something radical will have to rise to the surface in order to solve the problem. The market wants nothing less than a solution and it wants it yesterday. Intel should not be so complacent and arrogant as to think that the heads of its engineering research teams are the brightest in the world or that its vision of the future of computing is perfect. There is nothing visionary or revolutionary about Larrabee, regardless of the fine engineering that went into it.

Vincent SONG said...

Larrabee seems to me to be the best solution ever (compare to the existing processors) it's a massive manycore symetric processor and I'am waiting eagerly for it (strictly speaking).

The fact that you can "[...]combine the full programmability of the CPU with the kinds of parallelism and other special capabilities of graphics processors" is optionnal (I hope). Larrabee is not a Heterogeneous processor anymore if you choose to use it entirely for a general purpose.

Tim Sweeney had already say that massive manycore could be use to make next generation 3D software engine (http://arstechnica.com/gaming/news/2008/09/gpu-sweeney-interview.ars/4)