Sunday, February 9, 2014

Why Deep Learning Will Go the Way of Symbolic AI

Abstract

Deep learning is a machine learning and pattern representation and recognition technique based on multi-layered, statistical neural networks. Deep learning is all the rage lately. Big corporations like Google, Facebook and others are spending billions to set up labs and acquire experts and companies with experience in the technology. In this article, I argue that the current approach to deep learning will not lead to human-like intelligence because this is not the way the brain does it.
Related:
Why Convolutional Neural Networks Miss the Mark
The Billion Dollar AI Castle in the Air

Hierarchical Representation

There is no question that the brain classifies knowledge using a hierarchical architecture. The representation of objects in memory is compositional. That is to say, higher level representations are built on top of lower level ones. For examples, low level visual representations might consist of edges and lines. These can be combined to form higher level objects such as a nose or an eye. So the one thing deep learning neural networks have going for them is that they use multiple layers to form a hierarchical structure of representations.

Weighted Connections

A deep learning network consists of multiple layers of neurons. Each layer is a restricted Boltzmann machine or RBM.
Restricted Boltzmann Machine
The visible units of an RBM receive data from input sensors and the hidden units are the outputs of the machine. In a deep learning network, the hidden units are used as the visible units for the RBM residing immediately above it in the hierarchy. Each neuron (or hidden unit) in an RBM has a number of inputs represented by connections. Each connection is weighted, that is, it has a strength that is tuned by a learning algorithm during training on a set on examples. Loosely speaking, a connection strength represents the belief (or degree of certainty) that a particular input activation contributes to the activation of a hidden unit. A hidden unit is activated by approximating a nonlinear function of its inputs.

Biologically Implausible

There are a number of problems with deep learning networks that make them unsuitable to the goal of emulating the brain. I list them below.
  1. A deep learning network encodes knowledge by adjusting the strengths of the connections between visible and hidden units. There is no evidence that the brain uses variable synaptic strengths to encode degrees of certainty during sensory learning.
  2. Every visible unit is connected to every hidden unit in an RBM. There is no evidence that sensors make connections with every downstream neuron in the brain's cortex. In fact, as the brain learns, the number of connections (synapses) between sensors and the cortex is drastically reduced. The same is true for intracortical connections.
  3. Deep learning networks must be fine-tuned using supervised learning or backpropagation. There is no evidence that sensory learning in the brain is supervised.
  4. Deep learning networks are ill-suited for invariant pattern recognition, something that the brain does with ease.
  5. Deep learning networks use highly complex learning algorithms based on complex mathematical functions that require fast processors. There is no evidence that cortical neurons solve complex functions.
  6. Deep learning networks use static examples whereas the brain is bombarded with a constantly changing stream of sensory signals. Timing is essential to learning in the brain.
Winner Takes All

Current approaches to deep learning assume that the brain learns visual representations by computing input statistics. As a result, one would expect a gradation in the way patterns are recognized, especially in ambiguous images. However, psychological experiments with optical illusions suggest otherwise.
When looking at the picture above, two things can happen. Either you see a cow or you don't. There is no in-between. Some people never see the cow. Furthermore, if you do see the cow, the recognition seems to happen instantly.

It seems much more likely that the cortex uses a winner-takes-all strategy whereby all possible patterns and sequences are learned regardless of probability. The only criterion is that they must occur often enough to be considered above mere random noise. During recognition, the patterns and sequences compete for activation and the ones with the highest number of hits are the winners. This kind of pattern learning is simple (no fancy math is needed), fast and requires no supervision.

See Secrets of the Holy Grail, Part II for more on this alternative approach to pattern learning.

Conclusion

In view of the above, I conclude that, in spite of its initial success, deep learning is just a red herring on the road to true AI. It is not true that the brain maintains internal probabilistic models of the world. After all is said and done, deep learning will be just a footnote in the annals of AI history. The same can be said about the Bayesian brain hypothesis, by the way.

See Also

The Myth of the Bayesian Brain
Why Convolutional Neural Networks Miss the Mark

16 comments:

Jaroslav KuboŇ° said...

Hi Louis,

Nice article as usually. Recently I've found this article http://www.sciencedaily.com/releases/2010/08/100812151632.htm referring to https://wiki.brown.edu/confluence/download/attachments/8257/Hausser2010.pdf?version=1&modificationDate=1285622676000

Are you aware of it? It seems to me that it is pretty strong experiment showing that the widespread model of neuron (weighted inputs and threshold function - like http://www.codeproject.com/KB/recipes/NeuralNetwork_1/NN1.png) really sucks.

Louis Savain said...

Jaroslav,

Thank you for the interesting comment. No, I was not aware of those experiments. That's rather exciting because my research in sequence learning and detection suggested that such neurons ought to exist.

You are right about neuron models that use weighted inputs. This stuff is a legacy of ANN models that go back to the perceptron. This is so far out of the ballpark, it's not even wrong. This is not the way the brain works, of course. But a lot of big money is being wasted on them as I write.

Armando said...


"Current approaches to deep learning assume that the brain learns visual representations by computing input statistics."

This is not correct. See the generative models: they generate input from a abstract representation of images.

Armando said...

See the tutorials of G. Hinton on Google tech talk

Louis Savain said...

Armando,

Thanks for the input. I will take your advice and try to improve my understanding of deep learning. I am no expert, for sure, but, based on what I now understand, I am not optimistic about the future of the model.

Pierre Sermanet said...

Armando, you may be right in that deep learning will not be the ultimate answer, but you're confused on a number of points:
1- the goal of deep learning is not to mimic the brain
2- one does not have to copy nature to solve AI, take the typical example of man being able to fly without flapping-wing planes.
3- layers don't have to be fully connected, in fact many deep learning models have sparse connection tables.
4- deep learning can be used unsupervised too, there's lots of literature about this.
5- I'm not sure where does "deep learning is ill suited for invariant pattern recognition", that's exactly what deep learning does, invariant pattern recognition and it does well on a number of tasks.
6- the learning algorithm used by the most successful approaches on speech or object recognition is simply gradient descent and rectified linear non-linearity, these are not complex mathematical functions at all. and you're comparing the brain with our current hardware, the brain is orders of magnitude more powerful than current computers, it's amazing that we can achieve so much with so little models. to give you an example, the biggest models trained currently have ~150 millions "neurons", the brain has ~100 billion.
6- deep learning is not restricted to static examples at all, it's only a matter of how you feed data to deep learning models.
7- the models used in Yann Lecun's group are not statistical, no, deep learning does not assume statistical internal representations.

Michael said...

"deep learning" may or may not be suitable for a particular task, and it may even be "wrong", but I'm not sure I understand your logic that it is wrong because it doesn't work the same way the brain does. Neuromorphic architectures also have promise of course, but using the human brain as the measuring stick or as a template seems rather limiting, and not necessary how one wants to execute things on computer. We already know what human brains do and are capable of (even if we don't understand them completely) - so copying them doesn't seem like as interesting an exercise as creating something equally or more powerful that works completely differently. As for "going the way of symbolic AI", I think we will see a resurgence there as well - some of the AI programs of the 60's-80's were incredibly successful given the limited computing power and lack of data, and now that both limitations are gone, symbolic AI (which has the advantage that it is more transparent as not as black-boxy as stochastic methods popular in the BigData space) will also benefit.

Louis Savain said...

Pierre,

Thank you for the comment. You wrote:

1- the goal of deep learning is not to mimic the brain

IMO, it should be. There is a reason that the brain is so good at what it does.

2- one does not have to copy nature to solve AI, take the typical example of man being able to fly without flapping-wing planes.

I have seen this argument many times and I disagree with it. The Wright brothers did study the gliding flight of birds to arrive at their initial design. Besides, birds and airplanes use the same aerodynamic principles whether we are talking about wings or propellers.

3- layers don't have to be fully connected, in fact many deep learning models have sparse connection tables.

OK. I stand corrected. However, there seems to be a difference between a sparse architectures and sparse recognition. IMO, the brain recognizes objects using the latter. IOW, only a subset of the data is needed for recognition. This does not mean that the brain uses a sparse architecture.

4- deep learning can be used unsupervised too, there's lots of literature about this.

The information that I have on unsupervised deep learning networks is that they are not very good. Google's cat recognizing network is a case in point.

5- I'm not sure where does "deep learning is ill suited for invariant pattern recognition", that's exactly what deep learning does, invariant pattern recognition and it does well on a number of tasks.

Are you saying that deep learning networks are invariant to the types of transformations the brain can handle with ease? I doubt it.

6- the learning algorithm used by the most successful approaches on speech or object recognition is simply gradient descent and rectified linear non-linearity, these are not complex mathematical functions at all. and you're comparing the brain with our current hardware, the brain is orders of magnitude more powerful than current computers, it's amazing that we can achieve so much with so little models. to give you an example, the biggest models trained currently have ~150 millions "neurons", the brain has ~100 billion.
I think the brain's advantage (billions of neurons) is overrated. This is not where its power come from, IMO. As I said, the brain can do what I call 'sparse recognition' using a small subset of the available data. This implies that good recognition can occur using only a small subset of the available neurons. This is where a winner-takes-all model will shine, IMO.

6- deep learning is not restricted to static examples at all, it's only a matter of how you feed data to deep learning models.

True but what is missing is the timing. Precise timing is essential to certain transformations.

7- the models used in Yann Lecun's group are not statistical, no, deep learning does not assume statistical internal representations.

This is news to me. My understanding is that the Boltzmann machine is a statistical learning machine. Are you saying that Yann Lecun uses a winner-takes-all approach? Statistical or not, there has to be a way to handle the uncertainty that is inherent in the sensory space.

Louis Savain said...

Michael,

I see your point and I don't deny that deep learning architectures will be very useful in niche applications. However, based on my research, I am convinced that the brain has the ideal architecture for general intelligence. Everything else is doomed to pale in comparison.

Michael said...

I am convinced that the brain has the ideal architecture for general intelligence

Given the device that you used to come to this conclusion, maybe you/we are a bit biased :)
I'm not sure what "general intelligence" is - usually if you use the word "general", you have a set of instances to base it on - whereas, to date, we only have ourselves as the both judge/agent and target of intelligence.

For a controlling agent that needs to act autonomously in the "real world" the brain is certainly well adapted -- but it doesn't follow that that's the only thing we want to build (e.g., cyborgs) - intelligence that is completely virtual does not need to "learn" the same way we do, but can be "bootstrapped" between generations with new "innate" ability, split/merge itself with other individuals, run billions of experimental copies of itself, etc. - so we shouldn't expect a "brain like" thing to evolve here. (Though restricting something to be brain-like is interesting for other reasons.)

"The brain" of course is a "system" of components, whereas "deep-learning" is a one method of achieving the goals of a particular kind of component - so it's a little like apples and orchards anyway.

I do agree that numbers of neurons alone does not intelligence make.

Louis Savain said...

Michael:

"The brain" of course is a "system" of components, whereas "deep-learning" is a one method of achieving the goals of a particular kind of component - so it's a little like apples and orchards anyway.

I can see the logic whereby a deep learning network is a single component within a much larger intelligent architecture. A deep learning network should be seen only as a static pattern learner. The problem is that its proponents want to turn it into something much bigger than it should be, such as an invariant pattern recognizer. In so doing they turn it into a fuzzy recognizer, one that is incapable of noticing fine nuances.

In order to have a true invariant recognizer (i.e., one that can, say, recognize someone's hand regardless of its orientation, position and shape), a pattern recognizer must be combined with a sequence learner/recognizer. The latter must, likewise, use a hierarchical architecture whereby a branch in the hierarchy represents a single invariant object.

A good perceptual learner must be able to focus on a single object in an image full of different objects. This cannot be done properly without a hierarchical sequence recognizer. In speech recognition, such a system should be able to solve the so-called cocktail party problem. I don't see that happening at all with current approaches to deep learning.

lucky?strike said...

Louis,

w.r.t. your post, I think you are highly misinformed with the general landscape of deep learning, in fact most of your points are outright wrong. I think if you really want to post something so assertive about deep learning, you need to research your facts more.

Is this a post just about RBMs or deep learning in general? There's many other models under the umbrella of deep learning that are not RBMs

1. A deep learning network encodes knowledge by adjusting the strengths of the connections between visible and hidden units. There is no evidence that the brain uses variable synaptic strengths to encode degrees of certainty during sensory learning.

The weights learnt in each neuronal connection of a NN model are not individually correlated with degrees of certainty either. The behavior of groups of neurons in the brain is "similar" at a very high level to neurons in an NN. The main differences are the lack of phase information and the difference in spiking functions (among a ton of other differences).

2. As Pierre said, sparse connections are nothing new, do your research.

3. Deep learning is not emulating the brain.
Your reply to Pierre:

I have seen this argument many times and I disagree with it. The Wright brothers did study the gliding flight of birds to arrive at their initial design. Besides, birds and airplanes use the same aerodynamic principles whether we are talking about wings or propellers.


This is like the story of making a monkey and an elephant climb a tree and measuring their IQ. We often borrow ideas from neuroscience findings, but creating a "brain emulator" would be such a fail because of fundamental hardware differences between the brain and modern computing chips.

This point is also your whim, I guess everyone can have an opinion.

4. Deep learning networks are ill-suited for invariant pattern recognition, something that the brain does with ease.
As Pierre said, this is what they are well-suited for. I think you need to do your research here again. Just disagreeing with Pierre (without any reasoning whatsoever) doesn't necessarily make you right.

5. Deep learning networks use highly complex learning algorithms based on complex mathematical functions that require fast processors. There is no evidence that cortical neurons solve complex functions.
I'm not sure if you know what you are talking about anymore. Neuronal spiking functions are often fairly complex and diverse. The activation functions in practically implemented DL/NN models are actually super simple. Thresholding, Sigmoid, Tanh, these are not complex compared to neuronal spiking characteristics.

6. Deep learning networks use static examples whereas the brain is bombarded with a constantly changing stream of sensory signals. Timing is essential to learning in the brain.
Are Wired articles your source of "deep learning" concepts, or have you actually looked into literature? There's tons of research going (and already done) into giving streaming signals as input.

Winner Takes All
Surprise surprise, this has and is being used for a bit now, in fact most of the benchmarks on very challenging datasets in the visual, audio and NLP domains have a Winner-takes-all layer. You can figure out what it's actually called if you read the papers.

In your reply to Pierre, you talk about "Sparse recognition", apart from "sparse connections".
Again, you need to go back to google scholar and give that search a shot. There's been tons of key learning algorithms that are being researched that involve sparsity constraints.

Your point about unsupervised learning not working, that's moot. Just because Google did not get their cat detector to work doesn't mean anything. There's been tons of work in learning unsupervised embeddings that has been seminal to the field of NLP, and also in the audio and time-series domain.

Louis Savain said...

lucky?strike

Thanks for your comment. Before I reply to some of your points, let me say that I realize that some of you are professionals in the field of deep learning and you make a living from your expertise. It is only natural that you should defend it against any criticism. Take note that I am not arguing that deep learning is worthless. All I am saying is that it will not lead to human-like perception.

You wrote:

Is this a post just about RBMs or deep learning in general? There's many other models under the umbrella of deep learning that are not RBMs

Fine. However, RBM or not, I have not seen any effective deep learning approach to pattern classification that does not use statistical representations. I reiterate my assertion that the brain does not use statistical modelling.

The weights learnt in each neuronal connection of a NN model are not individually correlated with degrees of certainty either.

You're kidding me? What are they for then?

2. As Pierre said, sparse connections are nothing new, do your research.

I don't think I ever said that sparse distributed representations are new. I have studied Jeff Hawkins's hierarchical temporal memory and he makes a big deal about it. However, most deep learning networks don't seem to use it. Besides, I don't believe the brain uses sparse distributed memory. Sparse recognition is a different matter.

This is like the story of making a monkey and an elephant climb a tree and measuring their IQ. We often borrow ideas from neuroscience findings, but creating a "brain emulator" would be such a fail because of fundamental hardware differences between the brain and modern computing chips.

Sorry. This makes no sense, in my opinion. Hardware has nothing to do with it. We can emulate any process in software, including the brain. Some, such as Henry Markram and others, are trying to do just that.

4. Deep learning networks are ill-suited for invariant pattern recognition, something that the brain does with ease.
As Pierre said, this is what they are well-suited for. I think you need to do your research here again. Just disagreeing with Pierre (without any reasoning whatsoever) doesn't necessarily make you right.


IMO, the invariance claimed for deep learning networks is shallow and it comes at the expense of accuracy. True invariance requires both a pattern learner and a sequence learner that stitches multiple patterns representations together.

I'm not sure if you know what you are talking about anymore. Neuronal spiking functions are often fairly complex and diverse...

If one does not understand what is going on, one can complicate the hell out of things. I use spiking neurons in my research and I can assure you that they are simple. Once you realize that intelligence is mostly about timing and temporal correlations, you also realize that there are only two types of correlations: signals are either concurrent or sequential.

Are Wired articles your source of "deep learning" concepts, or have you actually looked into literature? There's tons of research going (and already done) into giving streaming signals as input.

So? The fact remains that sensory streaming is not a fundamental part of deep learning research.

Winner Takes All
Surprise surprise, this has and is being used for a bit now...


My argument is that a winner-takes-all approach should be the only approach worth considering. This is the way it works in the brain and there is a very good reason for it: It does not calculate probabilities and it is very thorough.

Your point about unsupervised learning not working, that's moot. Just because Google did not get their cat detector to work doesn't mean anything.

Sure it does. Unsupervised deep learners are not that good and that is a fact.

Louis Savain said...

I have a question for those who insist that unsupervised deep learning networks can do true invariant recognition.

If I hold my hand in front of my face and rotate it, move it side to side, up and down, make a fist, a peace sign, etc., at no point during the transformations will there be any doubt in my mind that I am looking at a hand. My question is this,

Can an unsupervised deep learning network recognize a hand under similar transformations?

lucky?strike said...

Hey,

Maybe my post did come off as a little aggressive, sorry about that.

I know short-comings of deep learning, and I'm in no way defending it. There's tons of work needed in reinforcement learning and I understand that these approaches still cant learn unseen topologies. I'm just saying that your perception of what deep learning is, is very narrow and you definitely need to read more about it before you make a decision, and the short-comings that you mention in your blogpost are actually not.

The weights learnt in each neuronal connection of a NN model are not individually correlated with degrees of certainty either.

You're kidding me? What are they for then?

First off, the weights are in an energy space, they are not probabilities unless you use something like Softmax at every layer.

IMO, the invariance claimed for deep learning networks is shallow and it comes at the expense of accuracy. True invariance requires both a pattern learner and a sequence learner that stitches multiple patterns representations together.
Wrong. Convolutional neural networks are exactly that, they stitch together multiple pattern activations in a hierarchical model. The invariance in convnets is actually better for generalization, it doesn't come at the expense of accuracy.

So? The fact remains that sensory streaming is not a fundamental part of deep learning research.
that you've read. Most of the new buzz created is around imagenet, which is still images. But there's lots of research on video sequences, and audio, which has been going on in parallel. It is just not as popular yet.

My argument is that a winner-takes-all approach should be the only approach worth considering. This is the way it works in the brain and there is a very good reason for it: It does not calculate probabilities and it is very thorough.
There's actually a paper which tries to formalize this with a bit of mathematical theory. It basically states that depending on the problem, the ideal solution is actually between "average weighting" and "winner-takes-all".

Sure it does. Unsupervised deep learners are not that good and that is a fact.
Okay.

Louis Savain said...

lucky?strike,

I was under the impression that convolutional neural networks (CNN) were only invariant to translations because the architecture is hardwired to pool neighboring units. This only applies to visual systems apparently.

I can even imagine how CNNs can be extended to have rotational or depth invariance but I don't see how they can be invariant to the hand transformations I mentioned in my previous comment.

That being said, the brain can also learn auditory invariance. For example, a musical tune played at different scales can be recognized as the same tune. I doubt that the brain uses hardwired neurons to do invariance. It is all learned on the fly. Animals whose optic nerves were redirected to their auditory cortex in the embryonic stage, were able to learn to see and navigate fairly normally.

First off, the weights are in an energy space, they are not probabilities unless you use something like Softmax at every layer.

Ok, fine. However, I disagree that the brain uses varying synaptic weights to encode knowledge. In my opinion, synapses are either connected or they are not.

PS. My own experiments in using non-weighted, winner-takes-all, machine learning for speech recognition have taught me a few things. Full invariance can be learned (not hardwired) by using two hierarchies, one for patterns and one for sequences of patterns. There is a simple way to do pattern learning that automatically takes care of the connections. I hope to publish my results in the not too distant future.