Friday, September 22, 2017

Fast Unsupervised Pattern Learning Using Spike Timing

Abstract

In my previous article on the problem with backpropagation, I made the case for using timing as the critic for unsupervised learning. In this article, I define what a sensory spike is, I explain the difference between pattern learning in the brain and neural networks and I reveal a simple and superfast 10-step method for learning concurrent patterns. Please note that this is all part of an ongoing project. I will have a demo program ready at some point in the future. Still, I will give out enough information in these articles that someone with adequate programming skills can use to implement their own unsupervised spiking neural network.



Sensors and Spikes

A sensor is an elementary mechanism that emits a discrete signal (a spike or pulse) when it detects a phenomenon, i.e., a change or transition in the environment. A spike is a discrete temporal marker that alerts an intelligent system that something just happened. The precise timing of spikes is extremely important because the brain cannot learn without it. There are two types of spikes, one for the onset of stimuli and the other for the offset. This calls for two types of sensors, positive and negative. A positive sensor detects the onset of a phenomenon while a negative sensor detects the offset.
For example, a positive audio sensor might detect when the amplitude of a sound rises above a certain level. And a complementary negative sensor would detect when the amplitude falls below that level. The diagram above depicts an amplitude waveform plotted over time. The horizontal line represents an amplitude level. The red circle A represents the firing of a positive sensor and B that of a negative sensor. In this example, sensor A fires twice as we follow the amplitude from left to right. To properly sense a variable phenomenon such as the amplitude of an audio frequency, the system must have many sensors to handle many amplitude levels. A complex intelligent system such as the human brain has millions of elementary sensors that respond to different amplitude levels and different types of phenomena. Sensors send their signals directly to pattern memory where they are grouped into concurrent patterns. Every sensor can make multiple connections with neurons in pattern memory.

Pattern Learning: Brain Versus Neural Networks

To a spiking neural net, such as the brain's sensory cortex, a pattern is a set of spikes that often arrive concurrently. To a deep neural net, a pattern is a set of data values. Unlike neural networks, the brain's pattern memory does not learn to detect very complex patterns, such as a face, a car, an animal or a tree. Strangely enough, in the brain, the detection of complex objects is not the job of pattern memory but of sequence memory. Pattern memory only learns to detect small elementary patterns (e.g., lines, dots and edges) which are the building blocks of all objects. The brain's sequence memory combines or pools many small pattern signals together in order to instantly detect complex objects, even objects that it has never encountered before.

Note: I will explain the architecture and working of sequence memory in an upcoming article.

Pattern Memory

Knowledge in the brain is organized hierarchically like a tree. In my view (which is, unfortunately, not shared by Jeff Hawkins' team at Numenta), an unsupervised perceptual learning system must have two memory hierarchies, one for pattern detection and the other for sequence detection. As seen in the diagram below, the pattern hierarchy consists of multiple levels arranged like a binary tree. I predict, based on my research, that the brain's pattern hierarchy resides in the thalamus (there is no other place for it to be) and that it has 10 levels. This means that pattern complexity in the brain ranges from a minimum of 2 inputs at the bottom level to a maximum of 1024 inputs at the top level. I have my reasons for this but they are beyond the scope of this article.


Sensors are connected to the bottom level (level 1) of the hierarchy. A pattern neuron (small red filled circles) can have only two inputs. But like a sensor, it can send output signals to an indefinite number of target neurons. Connections are made only between adjacent layers in the hierarchy. This is known as a binary tree arrangement. Every pattern neuron in the hierarchy also makes reciprocal connections to a sequence neuron (not shown) at the bottom level of sequence memory (more on this later). The hierarchical structure of pattern memory makes it possible to learn as many different pattern combinations as possible while using as few connections as possible.

Fast Unsupervised Pattern Learning

To repeat, the goal of pattern learning is to discover non-random elementary patterns in the sensory stream. Pattern learning is fully unsupervised in the brain, as it should be. That is to say, it is a bottom-up process dictated solely by the environment and the signals emitted by the sensors. Every learning system is based on trial and error, and as such, must have a critic to correct it in case of error. In the brain, the critic is in the precise temporal correlations between the sensory spikes. The actual pattern learning process is rather simple. It is based on the observation that non-random patterns occur frequently. It works as follows:
  • Start with a fixed number of unconnected pattern neurons at every level of the hierarchy.
  • Make random connections between the sensors and the neurons at the bottom level.
  • If the input connections of a neuron fire concurrently 10 times in a row, the neuron is promoted and the connections become permanent.
  • If a connection fails the test even once, it is immediately disconnected. Failed inputs are quickly resurrected and retried randomly.
As soon as a neuron gets promoted, it can make connections with the sequence hierarchy (not shown) and with the level immediately above its own, if any. The same concurrency test is applied at every level but perfect pattern detection is a must during learning. Excellent results can be obtained even if some inputs are never connected. Pattern learning is fast, efficient and can be scaled to suit different applications. Just use as many or as few sensors and neurons as is necessary for a given task. Connections are sparse, which means that bandwidth requirements are low.

Given that sensory signals are not always reliable and that only perfect pattern detections are used during learning, the process slows down as one goes up the hierarchy. This limits the number of levels in the hierarchy and the upper complexity of learned patterns. This is why the number of levels in the pattern hierarchy is only 10. In a computer application, we can use fewer levels and get good overall results. The goal is to create enough elementary pattern detectors to enable object detection in the sequence hierarchy. Note that the system does not assume that the world is probabilistic. No probabilistic computations are required. The system assumes that the world is deterministic and perfect. Errors or missing information are attributed to accidents and the system will try to correct them if possible.

But why require 10 consecutive firings in a row? Why not 2, 5 or 20? Keep in mind that this is a search for concurrent patterns that occur often enough to be considered above mere random noise. The choice of 10 is a compromise. Using less than 10 would run the risk of learning useless noise while having more than 10 would result in a slow learning process.

Pattern Pruning

The pattern hierarchy must be pruned periodically in order to remove redundancies. A redundancy is the result of a closed loop in the hierarchy.


Looking at the diagram above, we see a closed loop formed by sensor D and the pattern neurons A, B and C. This is forbidden because signals emitted by sensor D arrive at B via two pathways, D-A-B and D-C-B. One or the other must be eliminated. It does not matter which. Note that eliminating a pathway is not enough to prevent the closed loop from forming again. In the diagram above, either pattern neuron A or C should be barred permanently. That is to say, an offending pattern neuron should not be destroyed but simply prevented from forming output connections. This prevents the learning process from repeating the same mistake. In the brain, pattern pruning is done during REM sleep because it would interfere with sensory perception during waking hours. In a computer program, it can be done instantly even during learning.

Pattern Detection

Intuitively, one would expect a pattern neuron to recognize a pattern if all of its input signals arrive concurrently. But, strangely enough, this is not the way it works in the brain. The reason is that patterns are rarely perfect due to occlusions, noise pollution and other accidents. Uncertainty is a major problem that has dogged mainstream AI for decades. The customary solution in mainstream AI is to perform probabilistic computations on sensory inputs. However, this is out of the question as far as the brain is concerned because its neurons are too slow. The brain uses a completely different and rather clever solution and so should we.

Pattern recognition is a cooperative process between pattern memory and sequence memory. During detection, all sensory signals travel rapidly up the pattern hierarchy and continue all the way up to the top sequence detectors of sequence memory where actual recognition decisions are made. If enough signals reach a top sequence detector in the sequence hierarchy, they trigger a recognition event. The sequence detector immediately fires a recognition signal that travels all the way back down to the source pattern neurons which, in turn, trigger their own recognition events. Thus a pattern neuron recognizes its pattern, not when its input signals arrive, but upon receiving a feedback signal from sequence memory. This way, a pattern neuron can recognize a sensory pattern even if the pattern is imperfect.

Coming Soon

In the next article in this series, I will explain how to do unsupervised learning in sequence memory. This is where the really fun stuff happens. Hang in there.

See Also:

Unsupervised Machine Learning: What Will Replace Backpropagation?
AI Pioneer Now Says We Need to Start Over. Some of Us Have Been Saying This for Years
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is A Hindrance to Progress Toward True AI
The World Is its Own Model or Why Hubert Dreyfus Is Still Right About AI

11 comments:

Rick Deckard said...

Have you looked into cybernetics? I think you inadvertently preach some of it here

Cybernetics, too, is a “theory of machines”, but it treats, not things but ways of behaving. It does not ask what is this thing but “what does it do?” Thus it is very interested in such a statement as “this variable is undergoing a simple harmonic oscillation”, and is much less concerned with whether the variable is the position of a point on a wheel, or a potential in an electric circuit.

Louis Savain said...

Hi Rick,

Good observation. A good learning mechanism should not care about the various origins of the sensory signals. It makes no difference if the signals come from audio, tactile or visual sensors. It only cares about their temporal relationships. This is what makes it a universal learning system. By contrast, supervised deep learning is not universal because it needs labels to identify the patterns.

Alexander Buianov said...

you wrote...

Start with a fixed number of unconnected pattern neurons at every level of the hierarchy.
Make random connections between the sensors and the neurons at the bottom level.
If the input connections of a neuron fire concurrently 10 times in a row, the neuron is promoted and the connections become permanent.
If a connection fails the test even once, it is immediately disconnected. Failed inputs are quickly resurrected and retried randomly.

...Pattern learning is fast, efficient and can be scaled to suit different applications. Just use as many or as few sensors and neurons as is necessary for a given task.... Connections are sparse, which means that bandwidth requirements are low....



Oh no Louis!
That's a blow for me. I didn't expect you to step into the same shit as Howkins did with his Numenta. Fuck random connections! There should be no! That is not fast at all. You wrote yourself that "The Brain Assumes a Perfect World". Why the fuck you need random connections if you have perfect world and therefore perfect information coming in.
Maybe I don't know the other ways to do it(or maybe I do :)), but there should be ones. Using random is a signal that you doing something wrong.
10 repeat's ? Are you mad? Look's fine for the moron AI.

Fuck sparce distributed patterns too! I don't know how to make it right, but don't YOU try to repeat numenta's way to make "fast! efficient! scaled!" and "sparce!!!" patterns. They tried to succeed this for 15 years and finally... they succed with making a long list of "what is wrong with those peaple, who don't see Numenta's brilliant progress on making cool sparce patterns and anomaly detection".

Alexander Buianov said...

Sorry for being rude.

Louis Savain said...

Alexander,

That's a blow for me. I didn't expect you to step into the same shit as Howkins did with his Numenta. Fuck random connections! There should be no! That is not fast at all.

1. I had no idea that Numenta was doing the same thing since they don't even use a pattern hierarchy. They have a single hierarchy for everything AFAIK.

2. Random connections are very fast. I know because I use them in my experiments. Learning can be even faster by doing it the way the brain does it. Start with a huge number of connections to all neurons. As soon as a neuron acquires two valid inputs, the others are immediately disconnected. This can be extremely fast but it requires a lot of computing power.

You wrote yourself that "The Brain Assumes a Perfect World". Why the fuck you need random connections if you have perfect world and therefore perfect information coming in.

Since there is no way of knowing which connections will form patterns together, random connections are a must. This is the whole idea behind trial and error.

Fuck sparce distributed patterns too!

Of course, the connection are sparse after learning. This is what is observed in the brain. 'Sparse' simply means that everything is not connected to everything else.

PS. Maybe I don't understand your arguments. If so, I'm sorry.

Louis Savain said...

Sorry for being rude.

No problem. I only get offended when I sense bad intentions.

owen said...

I do not think you can do pattern pruning in real time. Some patterns arise over a period of time. I am not sure how a computer would determine the time to wait but being fast can also is a disadvantage for computers.

Da Fo said...

Louis,

A few questions about what you wrote.

You stated, "every pattern neuron in the hierarchy also makes reciprocal connections to a sequence neuron at the bottom level of sequence memory"

and then later on you say:

"As soon as a neuron gets promoted, it can make connections with the sequence hierarchy "

Just curious, these statements seem to be contradictory. Can you clarify when this is allowed. Did you mean to say that every 'promoted' pattern neuron makes a connection to a sequence neuron at the bottom level of the sequence memory?

Also, you state, "If a connection fails the test even once, it is immediately disconnected. Failed inputs are quickly resurrected and retried randomly"

Can you clarify what 'test' is being done? So if a pattern neuron has two inputs connected and a signal comes across one input, but there is no 'concurrent' signal from the other input(is this the test?), then BOTH are disconnected.

Louis Savain said...

Hi Owen,

Sorry, I don't think I understand your objection to real time pattern pruning.

Louis Savain said...

Hi Da Fo,

You write:

Just curious, these statements seem to be contradictory. Can you clarify when this is allowed. Did you mean to say that every 'promoted' pattern neuron makes a connection to a sequence neuron at the bottom level of the sequence memory?

Well, in the brain, the feedback pathways from sequence memory to pattern neurons are wired at birth even though the connections to the sequence nodes are not yet made. In a computer program, both feedforward and feedback connections should be made only after a pattern neuron is promoted in order to save resources.

Also, you state, "If a connection fails the test even once, it is immediately disconnected. Failed inputs are quickly resurrected and retried randomly"

Can you clarify what 'test' is being done? So if a pattern neuron has two inputs connected and a signal comes across one input, but there is no 'concurrent' signal from the other input(is this the test?), then BOTH are disconnected.


Good point. I copied and pasted that part of the article from a document that I wrote a long time ago.

The test is a concurrency test.

I used to disconnect both input connections to a pattern neuron as soon as the test failed. Then I quickly realized that it was not necessary. Here's what I do now in my own experimental program.

1. Connect every sensor to a pattern neuron at the bottom level. This is the first and permanent connection to the neuron.

2. Make random connections from the sensors to the pattern neurons at the bottom level. These are trial connections. Note: Make sure that no sensor makes more than one connection to the same pattern neuron.

3. Perform the concurrency test only when the first connection fires. Do it 10 times. If the test fails just once, disconnect the trial connection only and reconnect it elsewhere. If the test is passed, the neuron is promoted.

In my own experiment, I have a function that wakes up every once in a while and makes random connections if needed. Also, if you have a fast computer, you can test more than 2 connections per pattern neuron at once. This makes for extremely fast learning. Just make sure you disconnect all other connections (if any) as soon as the neuron is promoted.

I hope this helps.

owen said...

Its not an objection. Just saying that if you prune in real time you might lose some patterns that take longer to form.