Thursday, May 26, 2011

Jeff Hawkins, Atheism, Christianity and the Brain

Abstract

Several people have written to me recently to point out that my theory of intelligence is very similar to that of Jeff Hawkins and the folks at Numenta, a company Hawkins co-founded in 2005 with Dileep George (1) and Donna Dubinsky. The amazing thing is that Hawkins and I have arrived at a similar understanding of the brain via very dissimilar routes. Hawkins draws his inspiration from his knowledge of neuroscience and I get mine mostly from my interpretation of ancient Biblical metaphorical texts. I just finished reading a portion of Numenta's HTM Cortical Learning Algorithms (pdf). I think I now understand enough about the theory underlying HTM to form an educated opinion. Let me come right out and say that I think that Numenta's overall philosophy with regard to the function and operation of the cortex is basically sound but their current design is flawed. Furthermore, there is no way I can cooperate with Hawkins, Numenta or anybody associated with their HTM technology. I explain below.

Temporality, Hierarchy and Prediction

In 2005, Hawkins published his book, On Intelligence, in which he revealed his theory of intelligence. Based on the parts of the book that I've read, I think that Hawkins makes a convincing case for his ideas. He argues that the ability to anticipate the future and to learn sequences of patterns is the basis of intelligence. He further argues that cortical learning is universal in the sense that the neocortex uses the same learning and prediction algorithms to process visual, tactile or auditory information. What is important, he maintains, is the temporal relationships that can be inferred from parallel streams of discrete sensory data. Those relationships can be learned and stored in a hierarchical memory structure. I think others have made similar arguments before but Hawkins is the first to pin it down in an easy to read book. I essentially agree with Hawkins' position on intelligence. As some of my readers already know, I have been saying pretty much the same thing for many years.

Complexity

After reading Dileep George's interesting PhD thesis, How the Brain Might Work: A Hierarchical and Temporal Model for Learning and Recognition (pdf), I can't help but conclude that academics, in general, love to complicate things just for the sake of complexity. It might also be a way to impress one's peers. I mean, unless one can show one's mathematical prowess, regardless of its relevance to one's thesis, one can forget about becoming a doctor in philosophy. Dr. George apparently believes that mathematics, in the form of Bayesian belief propagation equations, is essential to perceptual learning and recognition. I think he is mistaken. I believe that a fully functional artificial brain can be implemented with nothing fancier than basic arithmetic operations. What Dr. George calls belief propagation can be done simply by having every node at the input level in a hierarchy trigger a recognition signal whenever its input signals add up to a majority. This works for both sequential and concurrent pattern recognition. Repeat the same method at every level in the hierarchy until an entire branch is activated, indicating that a certain object has been recognized. By the way, this is what Zechariah was alluding to in his little occult book when he wrote:
[...] Behold, I will bring forth my servant the branch.
[...] And I will remove the iniquity of that land in one day.
The books of Zechariah and Revelation use terms like filthy garments or iniquity to refer to noisy and incomplete data. The ability to work with corrupted (filthy) data is what pattern recognition experts call pattern completion. It is a natural consequence of an intelligent system's predictive capability. It is a very powerful and effective mechanism, yet extremely simple. I know this because I use it in Animal's tree of knowledge. It is powerful because it allows an intelligent system to recognize sensory patterns even in situations where only partial or noisy information is available. Again, it can all be done with simple arithmetic operations. No need for any fancy math. In my opinion, mathematics is very much overrated. I have found that it is not needed in almost all cases.

[Why do I bring up my work on Biblical symbolism in this critique? Because, as seen in the last section below, it is a crucial part of the point that I want to make.]

Non-Universality

In spite of Hawkins' claim that Numenta's HTM technology is universal in that it can handle any kind of sensory learning, Numenta's current focus is restricted to visual learning and recognition. So it is no surprise that their literature frequently mentions spatial patterns as seen below:
An HTM region learns about its world by finding patterns and then sequences of patterns in sensory data. The region does not “know” what its inputs represent; it works in a purely statistical realm. It looks for combinations of input bits that occur together often, which we call spatial patterns. It then looks for how these spatial patterns appear in sequence over time, which we call temporal patterns or sequences.
I think that using the term "spatial patterns" to refer to concurrent inputs is distractive and misleading because, from the point of view of the learning algorithm, there is no such thing as a spatial signal. It is all temporal, in the sense that the only relationships that can exist between discrete signals are simultaneity and sequentiality. The problem is that Numenta's approach forces them to determine in advance the boundaries of those spatial patterns. For example, in the case of vision, the designer is forced to select a small square area of pixels to act as inputs for the low-level concurrent pattern learner. This is a mistake, in my opinion, primarily because important information can be overlooked as a result of using arbitrarily restrictive boundaries. One wonders how Numenta would set boundaries for concurrent audio patterns.

A five-node sequence
Furthermore, I disagree with Numenta's idea that concurrent patterns (depicted as multi-input nodes in the above diagram) must be learned before sequential patterns. In my opinion, the suitability of input signals to a given concurrent pattern (a node in a particular sequence) is not determined solely by their simultaneity but also by whether or not the node belongs to the sequence. In other words, frequency is the main fitness criterion for temporal learning. I have written about this before.

Note: I have since changed my mind on this. I now believe that patterns must be learned independently of sequences. (10/23/13)

Christian AI Versus Atheist AI

Hawkins is very dismissive of those he disagrees with although I don't fault him for that. I think that, right or wrong, we should all have the courage to stand for what we believe in. Hawkins essentially dismisses the entire symbolic approach to AI taken during the latter part of the twentieth century by early AI pioneers like Herbert Simon, Marvin Minsky, John McCarthy and many others. He does not come right out and say that the symbolic AI gang is out to lunch (which they are, in my view) but it is obvious that his ideas on intelligence leave no room for all that symbolic nonsense. However, even though Hawkins preaches that atheists should not go around proclaiming their atheism for fear of antagonizing religious folks (the majority), he himself makes no bones of the fact that he is an atheist and an evolutionist. He wears the label proudly. Those of us who believe that the universe was intelligently designed and created are all a bunch of idiots in his view.

One of the reasons that I am bringing this up is that someone suggested in a recent comment that I should embrace Hawkins' technology and consider partnering with others on AI projects based on Numenta's HTM. The truth is that I have indeed thought of doing so in the past but I have since decided that Hawkins' virulent atheism turns me off. Not just because I am a Christian, but because Hawkins preaches that the best way to promote atheism is for atheists to accomplish great technological and scientific feats in the name of atheism. I, by contrast, I am of the opinion that the best way to advance Christianity is for Christians to accomplish great technological and scientific feats in the name of Christianity. And, like Hawkins, I put my money where my mouth is. I am already claiming that I obtained almost all of my understanding of the brain and intelligence from deciphering a few ancient Biblical metaphorical texts. I am even willing to go out on a limb and claim that, based on my interpretation of the ancient texts, I understand enough about the brain to write a computer program that, given adequate computing resources, will learn to behave intelligently in a manner similar to human beings.

It may indeed be possible to study human behavior and the brain's biology and use one's findings to eventually figure out how human intelligence works but, in my opinion, this approach will take a very long time. It is the approach that I originally took when I first became interested in AI. It did not get me very far because neuroscience is too chaotic. Finding relevant information by browsing the literature is like searching for the proverbial needle in the haystack. I have since found what I believe to be a much better and faster way to solve the AI problem. I realize that my approach is rather unconventional but I have researched it on and off for more than eight years and I am convinced now more than ever that I am on the right track. After all, how is it possible that my findings are in basic agreement with neuroscience and Hawkins's own ideas?

There can be no doubt that my world view and my approach to AI is anathema to atheists like Hawkins. Paul Z. Myers, an atheist biologist who seems to have a major bone to pick with Christians, once wrote an entire blog article to ridicule me. But there is a flip side to this coin. I am just as dismissive of Hawkins and his atheist colleagues in the scientific community as they are of Christians like me. Hawkins is one of those deeply religious people (we are all religious, especially if we are convinced that we are not) that I have taken to calling dirt believers. Essentially, a dirt believer is a person who is convinced that matter (dirt) sprang out of nothing all by itself and that life sprang out of dirt all by itself. I think this is an excruciatingly idiotic view (for reasons that I will not go into because they are beyond the scope of this post). Consequently, there is no way I will cooperate or partner with either Jeff Hawkins, Dileep George or any other atheist in intelligence research or pretty much anything else. The way I see it, what we have here is a battle between the atheist AI and the Christian AI. May the best religion win.

See Also:

Missing Pieces in Numenta's Memory Model
Rebel Cortex

1. Dr. Dileep George has left Numenta to form his own AI company, Vicarious Systems. Like Hawkins, Dr. George is also an atheist.

8 comments:

Luther said...

I have been reading your blog for about a year now. I have gone back and read all of your posts in the archive, and all of the articles on your website.

I must say, I am extremely impressed! I am not very old (27 is June), but I have been fascinated by AI for roughly 12 years now. My area of interest is in Chatter bots: programs that learn to speak intelligently. I did my research in the form of reading books on psychology, theories of how emotion and memory function, and in general programming, just to name a few. I looked at code for current chatter bots and dissected how they functioned and analyzed where they failed.

About 4 years ago I came up with an extremely vague idea of how to accomplish my goal. I had an idea for storing the memory of the bot, for weighting the inputs to make them more useful, and even a way to correct output for the bot on the fly. What I lacked was a suitable way to parse the saved information and make a decision of what to say now based on all this information.

Fast forward to now, and I read about your tree of knowledge. At first I thought that is was SIMILAR to my idea. As I read, I a realized that it was exactly the SAME as my idea! I was dumbfounded! It’s was like you were reading out of my private notebooks.

I say this to let you know that your last two posts have given me the idea I was missing. They showed me a way to generate the output! Previously I was treating the input and the output as two separate signals. But I see now where this is wrong. The input and output need to go in the same tree. Then the output will follow the input as sure as night follows day! And using the correction mechanism (reward and punishment, just like yours), I could then account for deviations from the input/output patterns for different topics, or “moods”. It is so very simple as to be almost blinding!

I am also Christian. I do not claim to get any of my research data from the bible, but I do believe the verse which states that the bible contains everything related to “life and Godliness”.

That said, I would love to get in contact with you and “compare notes”.

-Looking forward to the next Article
Luther M. Ramsey

Louis Savain said...

Luther,

Thanks for the comment and for your interest in my work. I must say that chatter bots never really piqued my interest because I saw them as just a continuation of the symbolic AI and Turing test mindset of the last century. I think that the symbol manipulation approach to AI is a waste of time and effort. In fact, I think it's a stupid way to approach the intelligence problem. As you know, my own approach considers only discrete sensory signals and their temporal relationships.

Having said that, I can see how text strings can be viewed as temporal data (sequences) created by various sensors. In this light, it should be possible to store the sequences in a hierarchical tree and this could conceivably lead to a powerful memory structure that can be used to generate intelligent text strings. However, I have serious doubts about the ultimate effectiveness of this approach.

The problem is that it is not grounded in the kind of sensory experience that comes only with interaction with the environment. Text strings have no meaning in and of themselves. Meaning in text or speech must come from its association with our knowledge of the world around us.

On the subject of output, you are correct that the input and the output are in the same tree. I would even say that the input is the output. This is not very obvious and it took me a long time to get it but, once you do get it, it's like being struck by a sudden bolt of enlightenment.

I am also Christian. I do not claim to get any of my research data from the bible, but I do believe the verse which states that the bible contains everything related to “life and Godliness”.

Yes. The Master does nothing that he has not revealed to his servants the prophets of old.

That said, I would love to get in contact with you and “compare notes”.

Unfortunately, I have very little spare time for personal correspondence. I get too much email as it is. Keep reading my articles and post comments and questions on the blog if you need to. I try to answer all comments as time permits. Above all, hang on to your faith. Forget about righteousness (we don't have any). Faith is what's important because, as it is written, when the Master returns, will he find faith in the world? Take care.

Luther said...

Having said that, I can see how text strings can be viewed as temporal data (sequences) created by various sensors. In this light, it should be possible to store the sequences in a hierarchical tree and this could conceivably lead to a powerful memory structure that can be used to generate intelligent text strings. However, I have serious doubts about the ultimate effectiveness of this approach.

The problem is that it is not grounded in the kind of sensory experience that comes only with interaction with the environment. Text strings have no meaning in and of themselves. Meaning in text or speech must come from its association with our knowledge of the world around us.


I see what you are saying, but my goal is not to make something capable of thought. My goal is to make something that looks like it is capable of thought.

By using an automated "part of speech" detection system, of which many exist, I can break input into parts of speech, then store them in a specialized memory with a unique identifier, the part of speech, and the word itself. Then the TOK links to the unique identifier instead of the actual work, which will greatly compress the amount of space required for the tree, and it will be able to see sentence structure, not just the words in use as part of the TOK without having to add any extra storage.

Whether is uses the structure, ore the actual words as it's output will be decided based on weights and it's short term memory (a smaller, specialized TOK) will help to weight the search for context.

You say you are not sure it can be extended to fit with chatter bots, but by your own admission, you have not looked into them much.

I could well be wrong (it has happened), but I think this will work.

As to lacking a sense of meaning, children astound me with their ability to use words in proper context without the slightest idea what the words mean. Pattern matching (which computers excel at) is all that is required for that.

-Luther

Louis Savain said...

Luther,

I see what you are saying, but my goal is not to make something capable of thought. My goal is to make something that looks like it is capable of thought.

Sorry. I misunderstood.

By using an automated "part of speech" detection system, of which many exist, I can break input into parts of speech, then store them in a specialized memory with a unique identifier, the part of speech, and the word itself. Then the TOK links to the unique identifier instead of the actual work, which will greatly compress the amount of space required for the tree, and it will be able to see sentence structure, not just the words in use as part of the TOK without having to add any extra storage.

I think the TOK's temporal learning algorithm can discover what you call "parts of speech" automatically from many samples of pre-written text. So I don't think you need to separate the low-level detection system from the TOK. What I mean is that the learning algorithm is the same for every level of the TOK hierarchy. Remember that every node in the hierarchy is a short sequence of lower-level nodes. Also, the TOK is very efficient because it reuses nodes as much as possible.

Whether is uses the structure, ore the actual words as it's output will be decided based on weights and it's short term memory (a smaller, specialized TOK) will help to weight the search for context.

I see short term memory simply as an active branch in the tree of knowledge. The branch can be low-level (syllables and words) or high level (phrases and sentences). In Animal, a node consists only of seven lower nodes. This is in keeping with the fact that STM capacity in humans is limited to seven items.

You say you are not sure it can be extended to fit with chatter bots, but by your own admission, you have not looked into them much.

I think you misunderstood. I actually think the hierarchical temporal approach is well suited to text processing and chatter bots. I just don't think a tree of knowledge based solely on text strings is sufficient for generating truly intelligent conversation.

As to lacking a sense of meaning, children astound me with their ability to use words in proper context without the slightest idea what the words mean. Pattern matching (which computers excel at) is all that is required for that.

Yes, I agree. It will show an ability to stay within context and that will be enough to appear intelligent most of the time. You may even win the Loebner Prize with it. However, I don't think it will show a true causal understanding of what is being talked about. You will need true AI for that. At any rate, it all sounds very interesting and it is a good way to show the power of hierarchical memory.

juha.ranta said...

Robert Hecht-Nielsen also has a model of cortex. In some sense, and by a hunch, it seems even more interesting than that of Hawkins. I've got his book and did some of the exercises. In addition, there's a paper using his model about the cocktail party problem, which I find a very interesting problem.

However, related to what Louis said about, I think you need understanding in addition to just words to create truly intelligent speech and interaction. Hecht-Nielsen's model can create some quite impressive "chatter bot" type of speech, but it's still a comrade you can discuss things about.

I think that, roughly, in the human brain some areas of interest are the Broca's area and the Wernicke's area. For instance, if you get your Wernicke's area injured, you may start to produce fluent speech which basically makes no sense.

Louis Savain said...

Juha,

Thanks for the comment. I am not familiar with Hecht-Nielsen's model. After a quick search on Google, I see that his confabulation theory also uses a hierarchy. It seems that the use of Bayesian propagation within a hierarchically organized memory has been a common aspect of the more powerful cortical models for some time. So I don't really see what is new with Hawkins' HTM technology other than his insistence on temporality and prediction.

One of the things about what I've read so far of confabulation theory that immediately made me cringe is Hecht-Nielsen's frequent use of the word "symbol" to describe a memory unit or node. I think this is a relic of last century's symbolic AI nonsense and it should not be used when talking about the brain. I also object to his (and Hawkins') use of the word "spatio-temporal". There is nothing spatial about discrete signals. It's all temporal.

One of the problems I have with HTMs is that it does not seem to have a mechanism for speed (rate of change) detection. At least, I haven't seen it yet. We use speed detection all the time to predict the motion of various objects. For example, we use it when driving or when crossing the street in order to predict the behavior of moving vehicles and pedestrians. We could not survive without this capability.

Another problem is that HTMs receive signals directly from sensors. I think this is a serious mistake. I am convinced that sensory signals must first go through a separation layer where signals are separated according to fixed temporal correlations over multiple fixed time scales. The tree of knowledge (hierarchical memory) should only be used for variable time scales. The ancient texts are adamant about this.

juha.ranta said...

Thanks for the comment. I am not familiar with Hecht-Nielsen's model. After a quick search on Google, I see that his confabulation theory also uses a hierarchy. It seems that the use of Bayesian propagation within a hierarchically organized memory has been a common aspect of the more powerful cortical models for some time. So I don't really see what is new with Hawkins' HTM technology other than his insistence on temporality and prediction.

Yeah, I'm a bit fed up with Hawkins taking so much attention, and I felt the same way years back. I bought his book "On Intelligence" and while I enjoyed it, much of the stuff was something I had already read somewhere else or figured out already. Further, I'm afraid he'll try to patent some general rules of the way our brain works.

Some say his new idea is that the whole cortex uses the same simple rules, but people such as Hecht-Nielsen have already been saying the same thing for a long time.

By the way, concerning the Bayesian thing..

"One of the reasons confabulation theory was probably not discovered long ago is that cogency maximization is NOT consistent with so-called 'Bayesian mathematics' (which essentially calls for choosing that conclusion which has the highest probability of being true, given the assumption that the assumed facts are true). The Bayesian mathematics juggernaut (a system of beliefs, not indisputable facts) has dominated many areas of information processing research for decades. This dominance probably strongly deterred researchers from considering other possibilities."

http://www.scholarpedia.org/article/Confabulation_theory_(computational_intelligence)

I don't understand Bayesian maths enough to comment much further on this.


One of the things about what I've read so far of confabulation theory that immediately made me cringe is Hecht-Nielsen's frequent use of the word "symbol" to describe a memory unit or node. I think this is a relic of last century's symbolic AI nonsense and it should not be used when talking about the brain. I also object to his (and Hawkins') use of the word "spatio-temporal". There is nothing spatial about discrete signals. It's all temporal.


Yeah, I've seen some documents where the memory nodes (groups of neurons) are called symbols. For instance, in some region of the brain (I'd say somewhere around the temporal lobes) may be a "symbol" where the general idea of the "apple" is stored. This symbol may have connections to symbols in other regions in the brain which are connected to the idea of "apple". For instance, the symbol for the word "apple" probably has a connection with this symbo" meaning the "apple". Other things the "apple" symbol is perhaps connected to includes the "red" symbol in the visual region, "munching" symbol in some motor area, the apple texture symbol in the touch area, etc.

So, though he uses the perhaps unfortunate term "symbol", he's not really trying to reinvent the symbolic AI.

Louis Savain said...

Yeah, I'm a bit fed up with Hawkins taking so much attention, and I felt the same way years back. I bought his book "On Intelligence" and while I enjoyed it, much of the stuff was something I had already read somewhere else or figured out already. Further, I'm afraid he'll try to patent some general rules of the way our brain works.

Yeah. It's disturbing, to say the least. If you do a Google search for numenta patents, it's kind of scary what comes up. According to faqs.org Numenta has at least 12 patents on hierarchical temporal memory. Hawkins seems to be in this field primarily for the money he hopes to make from it. He's setting himself up for years of IP litigation because the way the brain works cannot possibly be patentable (how do you patent sequences and hierarchies?). He can only patent his particular computer implementation (the source code) of the cortex. Anyone else can write their own implementation if they understand the principles involved. Hawkins sounds a little bit deluded if you ask me.

"One of the reasons confabulation theory was probably not discovered long ago is that cogency maximization is NOT consistent with so-called 'Bayesian mathematics' (which essentially calls for choosing that conclusion which has the highest probability of being true, given the assumption that the assumed facts are true). The Bayesian mathematics juggernaut (a system of beliefs, not indisputable facts) has dominated many areas of information processing research for decades. This dominance probably strongly deterred researchers from considering other possibilities."

http://www.scholarpedia.org/article/Confabulation_theory_(computational_intelligence)

I don't understand Bayesian maths enough to comment much further on this.


I am not so sure that maximization of cogency is really that much different than Bayesian belief propagation. Personally, I think too much is being made of both approaches. The problem to be solved is really not that complicated. No complex math is needed. Essentially, there is a huge number of sensory signals competing for the brain's attention. Various branches of the tree of knowledge must decide on whether or not to wake up (activate) based on the number of signals that satisfy their nodes.

The problem is that here are many objects (phenomena) in one's environment that can be recognized simultaneously. It's a problem because the brain must focus on one thing at a time, otherwise it cannot properly behave in a coherent manner. I don't think that either Numenta nor the others have come up with an effective solution yet. Sure, one can use a winner-take-all solution but there are other things to consider such as which object is more important to one's survival or which one is more interesting to the intelligent agent. How does one define 'interesting'?

So, though he uses the perhaps unfortunate term "symbol", he's not really trying to reinvent the symbolic AI.

OK. I'll forgive him for the use of the term. However, it bothers me because it smacks of political correctness on his part, as if he's trying hard not to offend the likes of Marvin Minsky and the rest of the symbolic gang.