Sunday, September 16, 2007

Killing the Beast

All Multicore Related Articles

The Big Daddy of All Thread Monkeys: Intel

One would think that, with all the money Intel has been spending in researching new programming tools to tackle the BIG problem of parallel programming, it would occur to at least one clever PhD at Intel’s multicore research labs that, maybe, just maybe, threads are not the answer to parallel computing. I mean, how long have those big-brained guys and gals been playing with threads anyway? Decades? Well, let us not hold our breath. Multithreading is to Intel what blood is to a vampire. It can't live without it. Intel’s customers have invested billions of dollars over the years writing trillions of lines of code to run on Intel’s multithreading processors. Intel cannot possibly abandon its customer base and introduce a new CPU that is not compatible with all that legacy code. At least, this is Intel’s thinking: whatever you do, don't touch those threads. This reality has colored Intel's perception to the point where it has boxed itself into a corner and there is no way out. Like it or lump it, Intel is the big daddy of all thread monkeys. Well, that is too bad, in my opinion, because, unless Intel finds an effective cure for its vampire lust, nothing will remain of its entire castle in the air, not a thread (pun intended). The computer revolution waits for nobody, not even Goliaths like Intel.

We Must Kill the Beast

There is only one way to solve the BIG problem, and that is to kill the Beast. By this I mean that we must eliminate threads from programming altogether. But it goes deeper than that. Threads are direct descendents of the algorithmic software model, an ancient paradigm that saw its first use in computing when Lady Ada wrote a table of instructions for Babbage’s analytical engine. There is no need to deny it, threads are algorithms. What is wrong with the algorithm, you ask? Well, in my opinion, the algorithm IS the BIG problem. It has been the BIG problem from the start, long before parallel computers and multicore CPUs became all the rage. I will even claim that, with the exception of the Von Neumann bottleneck, the algorithmic software model is the cause of every ill that ails the computer industry, from unreliability to low productivity. Yes, it is an old and nasty Beast but, unfortunately, one with a huge army of dedicated followers who will fight teeth and nails to save it from extinction. It makes no difference, though. Sooner or later, we must kill the Beast.

The Anti-Beast

Killing the Beast is one thing. Finding a successor to the Beast is another. The Anti-Beast must be the opposite of the Beast. Here is a partial list of the characteristics of both for comparison:

Additionally, the Anti-Beast must support fine-grain parallelism and asynchronous messaging. For more information on the Anti-Beast, see Project COSA.

PS. Sorry, Intel. If you can't kill the Beast, someone else will. For your sake, let us hope it's not AMD. :-) Easy now, that was just a joke. AMD is a big-time thread monkey, too. It's like they say, monkey see, monkey do.

8 comments:

Keymone said...

can you explain a bit what is your "implicit paralellism" means? every action / reaction / signal / message can be interpreted as part of algorithm. in fact your most parallel program should have an entry point. and exit point. and everything between entry point and exit point is an algorithm. am i missed something?

Louis Savain said...

can you explain a bit what is your "implicit paralellism" means?

Thanks for your interest in this blog, Keymone.

Implicit parallelism means that that no special notation, instruction, keyword or directive is needed to specify that an object or module is concurrent. All objects are inherently concurrent. By contrast, a sequence of two actions must be explicitly specified by the programmer.

An algorithmic procedure is defined as a single path or thread of actions or operations. There is only one predecessor and one successor. In a non-algorithmic system, however, every action or operation may spawn multiple new parallel paths. Additionally, in a synchronous reactive system, the speeds of the signals along their paths are equal. This makes for deterministic timing, a must for reliability. Deterministc timing is a crucial aspect of the COSA software model.

A parallel module may have any number of entry and exit points. An entry point is an incoming signal path and an exit point is an outgoing signal path.

Keymone said...

but isn't special notations are just the abstraction of the low-level machine code? it is about language not about programming paradigm. when i type spawn in Erlang i am sequentially starting a new process that will sequentially produce some results and sequentially pass them as messages to another sequentially spawned processes and blah-blah-blah. but why do you think it is a wrong way?

Louis Savain said...

keymone wrote, but isn't special notations are just the abstraction of the low-level machine code? it is about language not about programming paradigm. when i type spawn in Erlang i am sequentially starting a new process that will sequentially produce some results and sequentially pass them as messages to another sequentially spawned processes and blah-blah-blah. but why do you think it is a wrong way?

The problem is that the sequencing of instructions (operations) within an Erlang process is implicit. They are executed in the order that they are listed and the timing or speed of execution is non-deterministic. Also, there is no fine-grain parallelism. In a COSA program, by contrast, every instruction (operator) can send a signal to an indefinite number of other operations and these are processed in parallel.

For example, an operator may send a signal to 10 other operators that initialize 10 variables. In Erlang, the variables are initialized one after the other. In COSA, they are initialized concurrently which is more efficient in a multicore environment. The difference has to do mainly with one using fine-grain parallelism and the other using coarse-grain. However, the most important difference, in my opinion (from the point of view of reliability and security), is that Erlang is non-deterministic whereas COSA is 100% deterministic.

Keymone said...

In Erlang, the variables are initialized one after the other

AFAIK Erlang does not have any variables at all(i'm not counting ETS as variable) instead the process is a variable but i agree with you that Erlang is sequential in it's nature and there is no workaround to make it spawn 10 processes at once or send 10 signals at once. But you would need 10 cores for such operation and i don't think that number of cores is such a flexible parameter in hardware environment.. There is definitely a need in a true parallel modelling environment but it should get use of all possible architectures.

Louis Savain said...

keymone wrote: AFAIK Erlang does not have any variables at all

Well, strictly speaking, they do since the variables have to be initialized.

i agree with you that Erlang is sequential in it's nature and there is no workaround to make it spawn 10 processes at once or send 10 signals at once. But you would need 10 cores for such operation and i don't think that number of cores is such a flexible parameter in hardware environment.

Even if you had only 2 cores, it should make a difference in performance because each core would execute 5 operations. The nice thing about a COSA multicore CPU is that scaling is automatic and fine-grained. Whether you move from 1 core to 2 or 10 cores, you immediately notice a speedup.

dvyukov said...

I have one more "objection to adopting a non-algorithmic, signal-based, synchronous software model" (http://www.rebelscience.org/Cosas/objections.htm).

It's not exactly objection. Personally I think that "signal-based, synchronous software model" is good in general. It's more the reason why COSA software will be *not* fail-free.

Important reason why hardware is so reliable - it's working in relatively deterministic environment. In software we have to cope with things such: low memory conditions, tcp-connection is broken, network packed is dropped, 100'000 simultaneous connections to server, user have 5-years old browser, user uses Latvian keyboard layout, user uses two monitors, user uses resolution lower than 1024x768, uses uses firewall or proxy, database is not fully support sql standard, file is toooo large and so on and on.

You can say that failing in such conditions (which program was not intended for) is not exactly software bug. And I agree. This can be called "environmental" bug. But this is exactly king of things that *is* called bugs by users and program managers. And this is the source of significant part of software crashes and hangups.

To draw an analogy with hardware - what if you just take out memory plane from working computer, or what if you try to put Pentium Core 2 Quad processor into Socket373, or what if you just disconnect some interconnections on motherboard...
Hardware will not give you user-friendly message box, it will just hung or crash silently.

Software *have* to work in significantly more nondeterministic environment. And this is the given. This is what you call "physical malfunction". And software have to cope with this every day.

You just have to foresee work on computer with 2 monitors. No matter what software model you use. And no software model will help you in this.

So with good software model you can eliminate about 50% of bugs. About 50% of bugs which is relatively easy to find with good testing or unit tests or code reviews, and which is relatively easy to correct. But remainder of bugs - most hard to find and correct - will remain in the system anyway.

Any thoughts?

Dmitriy V'jukov

Louis Savain said...

Dmitriy V'jukov wrote: Any thoughts?

Dmitriy, this is an excellent criticism. I am confident that the COSA model can solve the problems you raise. I am preparing an article to address your concerns.