Part I, II, III, IV
Please read the following articles before continuing:
How to Solve the Parallel Programming Crisis
Tilera’s TILE64: The Good, the Bad and the Possible, Part I, II, III
Tilera Corporation’s TILE64™ processor technology can serve as a springboard for a new type of multicore processor that can jettison Tilera to the forefront of the computer universe. Should it decide to act on this advice, the entire computer industry, the world over, would come to worship at its feet. It must act quickly, however, because many others have shown a marked interest in Project COSA and the parallel programming articles on this blog. In this multi-part article, I will describe several modifications that Tilera can make to the TILE64 that will turn it into a kick-ass multicore processor second to none.
Radical Core Transformation
Assuming that Tilera wants to be the industry leader and not just another me-too multicore vendor, the first thing it must do is to change the design of its processor core from a sequential or scalar core into a pure vector core. And by pure vector core, I don’t mean a GPU. I mean a pure MIMD vector processor (see section on vector processing below) in which every single instruction is a vector that can perform its operation in parallel! That’s right. You are not going to find this kind of processor core lying around anywhere. ARM doesn’t have one and neither do MIPS, Sun Microsystems, IBM, Nvidia, Intel or anybody else.
This is a major change and it is highly doubtful that Tilera could just modify the base MIPS core that it is currently using. The best thing to do is to redesign the core from scratch. Although this transformation is not absolutely necessary in order to support the COSA programming model, the performance increase would be so tremendous that it would be foolish not to do it. In my estimation, even a single-core, pure MIMD vector processor would see, on average, at least an order of magnitude increase in performance over a conventional scalar or sequential processor. Even a superscalar architecture would look like a snail in comparison. Imagine that! a one-core, general purpose, fine-grain, deterministic, parallel processor! This is the kind of benefits that can be obtained by adopting the COSA model. My thesis is that this is the way CPUs should have been designed from the beginning. (In a future article, I will examine the pros and cons of having multiple cores with a few vector units vs. having a single core with a huge number of vector units.)
So, if Tilera were to transform every core of its 64-core processor into a pure vector core, Intel’s much-ballyhooed 48-core Larrabee (with its ancient x86 technology and its 16 SIMD vector units per core) would look like a miserable slug in comparison. LOL.
MIMD Vector Processing
Why a pure MIMD vector processor and how is it even possible? The answer is that, in the COSA software model, instructions are not presented to the processor in a sequence but as an array of independent elements to be executed in parallel, if possible. This means that, ideally, a COSA core should be designed as an MIMD (multiple instructions, multiple data) vector processor as opposed to an SISD (single instruction, single data) scalar processor. This way, every instruction is an elementary parallel vector with its own dedicated registers. Normally a vector processor operates in an SIMD (single instruction, multiple data) mode. A graphics processor (GPU) is an excellent example of an SIMD vector processor. The problem with the GPU, however, is that its performance takes a major hit when it is presented with multiple parallel instructions because it is forced to process them sequentially, which defeats the purpose of parallelism. Not so with an MIMD vector processor. Pure parallel bliss is what it is.
In part II of this article, I will talk about vector optimization.