Wednesday, April 29, 2015

No, a Deep Learning Machine Did Not Solve the Cocktail Party Problem

Irresponsible Hype from MIT Technology Review

MIT Technology Review is running a story claiming that a group of machine learning researchers used a convolutional deep learning neural network to solve the cocktail party problem. Don't you believe it. The network that was used has to be pre-trained separately on individual vocals and musical instruments in order to separate out the vocals from the background music. In other words, it can only separate voice from music.

The human brain needs no such training. We can instantly latch on to any voice or sound, even one that we had never heard before, while ignoring all others. We have no trouble focusing on a strange voice speaking a foreign language in a room full of talking people, with or without music playing. This is what the true cocktail party problem is about. A deep learning network cannot pay attention to an arbitrary voice while ignoring the others. To do this, it would have to be pre-trained on all the voices individually.

Note: I posted a protest comment at the end of the article but MIT Tech Review editors chose to censor it. I guess it is easier to attract visitors with a lie than the truth.

It Is Not about Speech

Contrary to rumors, the cocktail party problem has nothing specifically to do with speech or sounds. To focus on individual sounds, the brain uses the same mechanism that it normally uses to pay attention to anything, be it a bird, the letters and words on the computer screen or grandma's voice. The attention mechanism of the brain is universal and is an inherent part of the architecture of memory and how objects are represented in it. Unlike deep learning neural networks, it does not have to be trained separately for every sound or object. The ability of the cortex to instantly model a novel visual or auditory object is a major part of the brain's attention mechanism.

It is clear that the auditory cortex can quickly model a new sound on the fly and tune its attention mechanism to it. No deep learning network can do that. And knowing what I know about how the brain's attention mechanism works, I can confidently say that no deep learning network can ever do that.

See Also:

Did OSU Researchers Solve the Cocktail Party Problem?
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is a Hindrance to Progress Toward True AI

2 comments:

Unknown said...

100% correct!

One thing the brain is really good at is filling in the pattern, the blind eye spot. Its right in front of us and yet we seem to miss the point, pun intended.

Ao said...

Have you ever read about John W. Keely's life and work?
""I assume that sound, like odor, is a real substance of unknown and wonderful tenuity, emanating from a body where it has been induced by percussion, and throwing out absolute corpuscles of matter - interatomic particles - with a velocity of 1120 feet per second, in vacuo 20,000. The substance which is thus disseminated is a part and parcel of the mass agitated, and if kept under this agitation continuously would, in the course of a certain cycle of time, become thoroughly absorbed by the atmosphere; or, more truly, would pass through the atmosphere to an elevated point of tenuity corresponding to the condition of subdivision that govern its liberation from its parent body. The sounds from vibratory forks, set so as to produce etheric chords, while disseminating their compound tones permeate most thoroughly all substances that come under the range of their atomic bombardment. The clapping of a bell in vacuo liberates these atoms with the same velocity and volume as one in the open air; and were the agitation of the bell kept up continuously for a few millions of centuries, it would thoroughly return to its primitive element. If the chamber were hermetically sealed, and strong enough, the vacuous volume surrounding the bell would be brought to a pressure of many thousands of pounds to the square inch, by the tenuous substance evolved. In my estimation, sound truly defined is the disturbance of atomic equilibrium, rupturing actual atomic corpuscles; and the substance thus liberated must certainly be a certain order of etheric flow. Under these conditions is it unreasonable to suppose that, if this flow were kept up, and the body thus robbed of its element, it would in time disappear entirely? All bodies are formed primitively from this high tenuous ether, animal, vegetal and mineral, and they only return to their high gaseous condition when brought under a state of differential equilibrium."