Wednesday, September 27, 2017

I Need Time to Think

I have spent many years researching the brain and intelligence. I use unconventional methods that most would consider crazy but I have made tremendous progress. I have arrived at a sharply different understanding than the mainstream. The problem is that I don't really know what I should do with it. A week ago, I decided that the time had come for me to publish at least part of what I have discovered. Perception is the meat and potatoes of intelligence. Once you figure it out, the rest just falls into place.

Then I take a look at the miserable conditions of the world and the precarious state of international relations and I get cold feet. Humanity has a death wish. There is a terrible feeling that takes over me and paralyzes me. It's all deja vu but I can't shake it. I need a couple of days to think things through. Hang in there.

Monday, September 25, 2017

Fast Unsupervised Sequence Learning Using Spike Timing (1)

Novel Memory Architecture

Previously in this series on unsupervised learning, I explained how to implement a fast unsupervised pattern learning network based on spike timing. In this article, I introduce a novel architecture for sequence memory. Its purpose is to emulate the brain's flexible perceptual abilities.

Note: I originally intended this to be a single blog post but the subject is too vast for one post to do it justice. Expect one or more more installments after this one.

The Magic of Sequence Memory

Sequence memory is the seat of knowledge and cognition. This is where most of the magic of perception happens. It is the part of the brain that gives us a common sense, cause-effect understanding of the world in all of its 3-dimensional grandeur. Equally impressive is its ability to make highly accurate guesses when presented with incomplete or noisy sensory information. This ability is the reason that we have no difficulty recognizing highly stylized art or seeing faces and other objects in the clouds. Take a look at the image below. Those of us who are familiar with farm animals will instantly recognize the face of a cow even if we have never seen the picture before. Don't worry. Some of us never see the cow.

Font designers rely on the brain's ability to almost instantly classify objects according to their similarity to other known objects. Without it, we would have a hard time recognizing words written in unfamiliar fonts. It can also be used to play tricks on the brain. Cognitive scientist Douglas Hofstadter and others have written about this. Consider the ambigram below. We can read the bottom word as either 'WAVe' or 'particle'. How is that possible?
This magical flexibility is the gift of sequence memory. The brain can quickly recognize sequences at various levels of abstraction based on very little or even faulty information. My point here is that, unless we can design and build our neural networks to exhibit the same capabilities as the human brain, we will have failed. I am proposing a novel architecture for sequence memory that, I believe, will solve these problems and open up the field of AGI to a glorious future.

Note: Sequence memory is also the source of all voluntary motor signals and is essential to motor learning. I will cover this topic in a future article.

Math Is Not the Answer

At this point, some of you may be wondering why I use no math in my articles on AI. The reason is that the brain does not use it. Why? Only because its neurons are too slow and there is no time for lengthy calculations. Not that I have anything against math, mind you, but if you hear anyone claiming that AGI cannot be achieved without doing some fancy math (which is just about everybody in mainstream AGI research), you can rest assured that he or she hasn't a clue as to what intelligence is really about.

The Brain Assumes a Perfect World

One of the most specious yet ubiquitous myths in mainstream AI research is the notion that the world is uncertain and that, therefore, our intelligent machines should use probabilistic methods to make sense of it. It is a powerful myth that has severely retarded progress in AI research. I am not the first to argue this point. "People are not probability thinkers but cause-effect thinkers." These words were spoken by none other than famed computer scientist Dr. Judea Pearl during a 2012 Cambridge University Press interview. Pearl, an early champion of the Bayesian approach to AI, apparently had a complete change of heart. Unfortunately, the AI community is completely oblivious to any truth that contradicts their paradigm.

As I have said elsewhere, we can forget about computing probabilities because the brain's neurons are not fast enough. There is very little time for computation in the brain. The surprising truth is that the brain is rather lazy and does not compute anything while it is perceiving the world. It assumes that the world is perfectly deterministic and that it performs its own computations. The laws of classical physics and geometry are precise, universal and permanent. Any uncertainty comes from the limitations of our sensors. The brain learns how the world behaves and expects that this behavior is perfect and will not deviate. The perceptual process is comparable to that of a coin sorting machine whereby the machine assumes that the various sizes of the coins automatically determine which slots they belong to.

We cannot hope to solve the AGI problem unless we emulate the brain. But how can the brain capture the perfection that is in the world if it cannot rely on its sensors? It turns out that sensory signals are not always imperfect. Every once in a while, even if for a brief interval, they are indeed perfect. The brain is ready to capture this perfection in both pattern and sequence memories. None of the magic of perception I spoke of earlier would be possible without this capability.

Sequence Memory

Sequence memory is organized hierarchically and receives input signals from pattern memory. These signals arrive at the bottom level of the hierarchy and a few percolate upward to the top level. The number of levels depends on design requirements. I happen to know that the brain's cortical hierarchy has 20 levels. This is much more than is necessary for most purposes in my opinion. It is a sign that we can think at very high levels of abstraction. I estimate that most of our intelligent machines, at least in the beginning, will require less than half that number. In a future article on motor learning and behavior, I will explain how the bottom level of the sequence hierarchy serves as the source of all motor signals.
In the diagram above, we see three sequence detectors A, B and C (red filled circles) on two levels. Sequences A and C on level 1 receive inputs directly from 7 patterns neurons (blue filled circles). Unfinished sequence B on level 2 has only two inputs arriving from sequences A and C. The red lines represent connections to the output nodes (see below) which are the only pathways up the hierarchy.

The sequence is the building block of sequence memory. It is a 7-node segment within a longer series that I call the vine. The 7th node is the output node of the sequence. Every node in a sequence receives signals from either a pattern neuron or another sequence. Vines and sequences receive signals in a specific order separated by an interval. The interesting thing about a sequence is that it does not have a fixed duration. That is to say, the interval between nodes can vary. This is extremely important because, without it, we would not be able to make sense of the 3D geometry of the world or to understand events when their rates of occurrence change over time.
In the early days of my quest to understand the brain and intelligence, I used to think that the sequence hierarchy was just a way to organize various combinations of long sequences. I had assumed that a sequence at the top of the hierarchy was just a container for other shorter sequences at the lower levels. I cannot go, at this time, into how I eventually changed my mind but I was completely wrong. It turns out that the brain builds all of its sequences/vines at the bottom level of the sequence hierarchy. The upper levels are used primarily for finding temporal correlations between sequences and for building special structures called branches which are used for the invariant detection of complex objects in the world.

Coming Soon

In my next article in this series, I will explain how to use spike timing to do fast unsupervised sequence learning. I will explain how sequence detection occurs with relatively few sensory signals. I will also introduce a model for the brain's cortical column based on this architecture. Stay tuned.

Note (11/16/2017): I will eventually publish the next article in this series. Now is not the time.

See Also:

Fast Unsupervised Pattern Learning Using Spike Timing
Unsupervised Machine Learning: What Will Replace Backpropagation?
AI Pioneer Now Says We Need to Start Over. Some of Us Have Been Saying This for Years
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is A Hindrance to Progress Toward True AI
The World Is its Own Model or Why Hubert Dreyfus Is Still Right About AI

Friday, September 22, 2017

Fast Unsupervised Pattern Learning Using Spike Timing


In my previous article on the problem with backpropagation, I made the case for using timing as the critic for unsupervised learning. In this article, I define what a sensory spike is, I explain the difference between pattern learning in the brain and neural networks and I reveal a simple and superfast 10-step method for learning concurrent patterns. Please note that this is all part of an ongoing project. I will have a demo program ready at some point in the future. Still, I will give out enough information in these articles that someone with adequate programming skills can use to implement their own unsupervised spiking neural network.

Sensors and Spikes

A sensor is an elementary mechanism that emits a discrete signal (a spike or pulse) when it detects a phenomenon, i.e., a change or transition in the environment. A spike is a discrete temporal marker that alerts an intelligent system that something just happened. The precise timing of spikes is extremely important because the brain cannot learn without it. There are two types of spikes, one for the onset of stimuli and the other for the offset. This calls for two types of sensors, positive and negative. A positive sensor detects the onset of a phenomenon while a negative sensor detects the offset.
For example, a positive audio sensor might detect when the amplitude of a sound rises above a certain level. And a complementary negative sensor would detect when the amplitude falls below that level. The diagram above depicts an amplitude waveform plotted over time. The horizontal line represents an amplitude level. The red circle A represents the firing of a positive sensor and B that of a negative sensor. In this example, sensor A fires twice as we follow the amplitude from left to right. To properly sense a variable phenomenon such as the amplitude of an audio frequency, the system must have many sensors to handle many amplitude levels. A complex intelligent system such as the human brain has millions of elementary sensors that respond to different amplitude levels and different types of phenomena. Sensors send their signals directly to pattern memory where they are grouped into concurrent patterns. Every sensor can make multiple connections with neurons in pattern memory.

Pattern Learning: Brain Versus Neural Networks

To a spiking neural net, such as the brain's sensory cortex, a pattern is a set of spikes that often arrive concurrently. To a deep neural net, a pattern is a set of data values. Unlike neural networks, the brain's pattern memory does not learn to detect very complex patterns, such as a face, a car, an animal or a tree. Strangely enough, in the brain, the detection of complex objects is not the job of pattern memory but of sequence memory. Pattern memory only learns to detect small elementary patterns (e.g., lines, dots and edges) which are the building blocks of all objects. The brain's sequence memory combines or pools many small pattern signals together in order to instantly detect complex objects, even objects that it has never encountered before.

Note: I will explain the architecture and working of sequence memory in an upcoming article.

Pattern Memory

Knowledge in the brain is organized hierarchically like a tree. In my view (which is, unfortunately, not shared by Jeff Hawkins' team at Numenta), an unsupervised perceptual learning system must have two memory hierarchies, one for pattern detection and the other for sequence detection. As seen in the diagram below, the pattern hierarchy consists of multiple levels arranged like a binary tree. I predict, based on my research, that the brain's pattern hierarchy resides in the thalamus (there is no other place for it to be) and that it has 10 levels. This means that pattern complexity in the brain ranges from a minimum of 2 inputs at the bottom level to a maximum of 1024 inputs at the top level. I have my reasons for this but they are beyond the scope of this article.

Sensors are connected to the bottom level (level 1) of the hierarchy. A pattern neuron (small red filled circles) can have only two inputs. But like a sensor, it can send output signals to an indefinite number of target neurons. Connections are made only between adjacent layers in the hierarchy. This is known as a binary tree arrangement. Every pattern neuron in the hierarchy also makes reciprocal connections to a sequence neuron (not shown) at the bottom level of sequence memory (more on this later). The hierarchical structure of pattern memory makes it possible to learn as many different pattern combinations as possible while using as few connections as possible.

Fast Unsupervised Pattern Learning

To repeat, the goal of pattern learning is to discover non-random elementary patterns in the sensory stream. Pattern learning is fully unsupervised in the brain, as it should be. That is to say, it is a bottom-up process dictated solely by the environment and the signals emitted by the sensors. Every learning system is based on trial and error, and as such, must have a critic to correct it in case of error. In the brain, the critic is in the precise temporal correlations between the sensory spikes. The actual pattern learning process is rather simple. It is based on the observation that non-random patterns occur frequently. It works as follows:
  • Start with a fixed number of unconnected pattern neurons at every level of the hierarchy.
  • Make random connections between the sensors and the neurons at the bottom level.
  • If the input connections of a neuron fire concurrently 10 times in a row, the neuron is promoted and the connections become permanent.
  • If a connection fails the test even once, it is immediately disconnected. Failed inputs are quickly resurrected and retried randomly.
As soon as a neuron gets promoted, it can make connections with the sequence hierarchy (not shown) and with the level immediately above its own, if any. The same concurrency test is applied at every level but perfect pattern detection is a must during learning. Excellent results can be obtained even if some inputs are never connected. Pattern learning is fast, efficient and can be scaled to suit different applications. Just use as many or as few sensors and neurons as is necessary for a given task. Connections are sparse, which means that bandwidth requirements are low.

Given that sensory signals are not always reliable and that only perfect pattern detections are used during learning, the process slows down as one goes up the hierarchy. This limits the number of levels in the hierarchy and the upper complexity of learned patterns. This is why the number of levels in the pattern hierarchy is only 10. In a computer application, we can use fewer levels and get good overall results. The goal is to create enough elementary pattern detectors to enable object detection in the sequence hierarchy. Note that the system does not assume that the world is probabilistic. No probabilistic computations are required. The system assumes that the world is deterministic and perfect. Errors or missing information are attributed to accidents and the system will try to correct them if possible.

But why require 10 consecutive firings in a row? Why not 2, 5 or 20? Keep in mind that this is a search for concurrent patterns that occur often enough to be considered above mere random noise. The choice of 10 is a compromise. Using less than 10 would run the risk of learning useless noise while having more than 10 would result in a slow learning process.

Pattern Pruning

The pattern hierarchy must be pruned periodically in order to remove redundancies. A redundancy is the result of a closed loop in the hierarchy.

Looking at the diagram above, we see a closed loop formed by sensor D and the pattern neurons A, B and C. This is forbidden because signals emitted by sensor D arrive at B via two pathways, D-A-B and D-C-B. One or the other must be eliminated. It does not matter which. Note that eliminating a pathway is not enough to prevent the closed loop from forming again. In the diagram above, either pattern neuron A or C should be barred permanently. That is to say, an offending pattern neuron should not be destroyed but simply prevented from forming output connections. This prevents the learning process from repeating the same mistake. In the brain, pattern pruning is done during REM sleep because it would interfere with sensory perception during waking hours. In a computer program, it can be done instantly even during learning.

Pattern Detection

Intuitively, one would expect a pattern neuron to recognize a pattern if all of its input signals arrive concurrently. But, strangely enough, this is not the way it works in the brain. The reason is that patterns are rarely perfect due to occlusions, noise pollution and other accidents. Uncertainty is a major problem that has dogged mainstream AI for decades. The customary solution in mainstream AI is to perform probabilistic computations on sensory inputs. However, this is out of the question as far as the brain is concerned because its neurons are too slow. The brain uses a completely different and rather clever solution and so should we.

Pattern recognition is a cooperative process between pattern memory and sequence memory. During detection, all sensory signals travel rapidly up the pattern hierarchy and continue all the way up to the top sequence detectors of sequence memory where actual recognition decisions are made. If enough signals reach a top sequence detector in the sequence hierarchy, they trigger a recognition event. The sequence detector immediately fires a recognition signal that travels all the way back down to the source pattern neurons which, in turn, trigger their own recognition events. Thus a pattern neuron recognizes its pattern, not when its input signals arrive, but upon receiving a feedback signal from sequence memory. This way, a pattern neuron can recognize a sensory pattern even if the pattern is imperfect.

Coming Soon

In the next article in this series, I will explain how to do unsupervised learning in sequence memory. This is where the really fun stuff happens. Hang in there.

See Also:

Unsupervised Machine Learning: What Will Replace Backpropagation?
AI Pioneer Now Says We Need to Start Over. Some of Us Have Been Saying This for Years
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is A Hindrance to Progress Toward True AI
The World Is its Own Model or Why Hubert Dreyfus Is Still Right About AI

Wednesday, September 20, 2017

Unsupervised Machine Learning: What Will Replace Backpropagation?

The Great Awakening?

At long last, the AI research community is showing signs of waking up from its decades-old, self-induced stupor. Deep learning pioneer Geoffrey Hinton has finally acknowledged something that many of us with an interest in the field have known for years: AI cannot move forward unless we discard backpropagation and start over. What took him so long? Certainly, the deep learning community can continue its own merry way but there is no question that AI research must retrace its steps back to the beginning and choose a new path. In this article, I argue that the future of machine learning will be based on the precise timing of discrete sensory signals, aka spikes. Welcome to the new age of unsupervised spiking neural networks.

The Problem With Backpropagation

The problem with backpropagation, the learning mechanism used in deep neural nets, is that it is supervised. That is to say, the system must be told when it makes an error. Supervised neural nets do not learn to classify patterns on their own. A human or some other entity does the classification for them. The system only creates algorithmic links between given patterns and given classes or categories. This type of learning (if we can call it that) is a big problem because we must manually attach a label (class) to every single pattern the system must classify and every label can have hundreds if not thousands of possible patterns.

Of course, anybody with a lick of sense knows that this is not how the brain learns. We do not need labels to learn to recognize anything. Backpropagation would require a little homunculus inside the brain that tells it when it activates a wrong output. This is absurd, of course. Reinforcement (pain and pleasure) signals cannot be used as labels since they cannot possibly teach the brain about the myriad intricacies of the world. The deep learning community has no idea how the brain does it. Strangely enough, some of their most famous experts (e.g., Demis Hassabis) still believe that the brain uses backpropagation.

The World Is Its Own Model

Loud denials notwithstanding, supervised deep learning is just the latest incarnation of symbolic AI, aka GOFAI. It is a continuation of the persistent but deeply flawed idea that an intelligent system must somehow model the world by creating internal representations of things in the world. As the late philosopher Hubert Dreyfus was fond of saying, the world is its own model. Unlike a neural net which cannot detect a pattern unless it has been trained to recognize it (it already has a representation of it in memory), the adult human brain can instantly see and understand an object it has never seen before. How is that possible?

This is where we must grok the difference between a pattern recognizer and a pattern sensor. The brain does not learn to recognize complex patterns; it learns how to sense complex patterns in the world directly. To repeat, it can do so instantly even if it has never encountered them before. Unless a sensed pattern is sufficiently rehearsed, the brain will not remember it. And if it does remember it, the memory is fuzzy and inaccurate, something that is well-known to criminal lawyers: eyewitness accounts are notoriously unreliable. But how does the brain do it? One thing is certain: we will not solve the perceptual learning problem unless we get rid of our representationalist baggage. Only then will the scales fall off our eyes so that we may see the brain for what it really is: a sensory organ connected to a motor organ and controlled by a motivation organ.

The Critic Is In the Data

How does the brain learn to see the world? Every learning system is based on trial and error. The trial part consist of making guesses and the error part is a mechanism that tells the system whether or not the guesses are correct. The error mechanism is what is known as a critic. Both supervised and unsupervised systems must have a critic. Since the critic cannot come from inside an unsupervised system (short of conjuring a homunculus), it can only come from the data itself. But where in the data? And what kind of data are we talking about? To answer these questions, we must rely on neurobiology.

How to Make Sense of the World: Timing

One of the amazing things about the cortex is that it does not process data in the programming sense. It does not receive numerical values from its sensors. The cortex only receives discrete signals or spikes. A spike is a discrete temporal marker that indicates that a change/event just occurred. It is not a binary value. It is a precisely timed signal. There is a difference. The brain must somehow find order in the spikes. Here is the clincher. The only order that can be found in multiple sensory streams of discrete signals is temporal order. And there can only be two kinds of temporal order: the signals can be either concurrent or sequential.

This here is the key to unsupervised learning. In order to make sense of the world, the brain must have the ability to time its sensory inputs. In this light, the brain should be seen as a vast timing mechanism. It uses timing for everything, from perceptual learning to motor behavior and motivation.

Coming Soon

In my next article, I will explain how sensors generate spikes and how the brain uses timing as the critic for fast and effective unsupervised learning. I will also explain how it creates a fixed set of small elementary concurrent pattern detectors/sensors as the building blocks of all perception. It uses the same elementary pattern sensors to sense everything. It also uses cortical feedback to handle uncertainty in the sensory data. Hang in there.

See Also:

Fast Unsupervised Pattern Learning Using Spike Timing
AI Pioneer Now Says We Need to Start Over. Some of Us Have Been Saying This for Years
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is A Hindrance to Progress Toward True AI
The World Is its Own Model or Why Hubert Dreyfus Is Still Right About AI

Saturday, September 16, 2017

AI Pioneer Now Says We Need to Start Over. Some of Us Have Been Saying This for Years

This Bothers Me

This is just a short post to point out how progress in science and technology can be held back by those who set themselves as the leaders. Artificial Intelligence pioneer Geoffrey Hinton now says that we should discard backpropagation, the deep learning technique used in deep neural nets, and start over. This bothers me because I and many others have been saying this for years. Some of us, including Jeff Hawkins, have known that this was not the way to go since the 1990s. Here is an article I wrote about this very topic back in 2015: Why Deep Learning Is a Hindrance to Progress Toward True AI.

Demis Hassabis, the Champion of Backpropagation

What is amazing about this is that Geoffrey Hinton is a famous Google employee (engineering fellow) and AI expert. He is now directly contradicting Demis Hassabis, another famous Google employee and co-founder of DeepMind, an AI company that has been acquired by Google. Hassabis and his team at DeepMind recently published a peer-reviewed paper in which they suggested that backpropagation is used by the brain and that their research may uncover biologically plausible models of backprop. I wrote an article about this recently: Why Google's DeepMind Is Clueless About How Best to Achieve AGI.

I find the whole thing rather annoying because these are people who are paid millions of dollars to know better. Oh, well.

See Also:

Unsupervised Machine Learning: What Will Replace BackPropagation?
In Spite of the Successes, Mainstream AI is Still Stuck in a Rut
Why Deep Learning Is A Hindrance to Progress Toward True Intelligence
Mark Zuckerberg Understands the Problem with DeepMind's Brand of AI
The World Is its Own Model or Why Hubert Dreyfus Is Still Right About AI