In Part I and II of this three-part article, I argued that a
Note: Both cells and data are shown sharing the same memory space. This is for convenience only. In a real processor, they would be separate and would use separate address and data busses.
The Task Manager
As already mentioned, both buffers should reside on the processor and buffer management can be handled by relatively simple on-chip circuitry. On a multicore processor, however, the load must be distributed among the cores for fine-grain parallel processing. Note that this is all MIMD processing. Using SIMD for development is like pulling teeth. Besides, SIMD is worthless when it comes to enforcing the deterministic timing of events. There are probably several ways to design a fast mechanism to manage load balancing among multiple cores. The one that I envision calls for every core to have its own pair of processing buffers, A and B. However, the job of populating the B buffers should be that of a special on-chip controller that I call the Task Manager (TM).
Essentially, the job of filling the buffers with instructions is that of the task manager. Whenever the TM encounters new cells to be processed, it should distribute them as equally as possible amongst the B buffers. The TM must have intimate knowledge of the status (the number of items) of the A and B buffers of every core. It should also know the execution duration (in terms of clock cycles) of every cell according to its type. This way, the TM can intelligently assign cells to each core and equalize their individual loads as much as possible. The TM has only one more job to do: as soon as all the A buffers are empty, it must signal the cores to swap the buffers and repeat the cycle. One nice thing about this approach is that the TM works in parallel with the cores and does not slow their performance. Another advantage has to do with power consumption. Since the TM has perfect knowledge of processorload at all times, it can automatically turn off a percentage of the cores, depending on the load, in order to save energy.
The Road Ahead
As far as I know, no other multicore architecture provides for fine-grain, self-balancing parallelism using an MIMD execution model. There is no doubt in my mind that it is the correct approach to designing the multicore architectures of the future. There are additional advantages that are inherent in the software model, such as fault tolerance, deterministic timing, the automatic discovery and enforcement of data dependencies. The result is rock-solid software reliability and high performance.