In The Myth of the Bayesian Brain, I wrote that there is a trick to learning perfect patterns but that I cannot divulge it at this time. The trick has to do with properly factoring patterns so as to maximize the reuse of low level patterns in the composition of higher level patterns. In my opinion, the pattern hierarchy used by the brain is the ultimate classification and data compression mechanism in existence. The trick is amazingly simple and I wondered whether anybody else has thought about the problem or came up with a solution. In an unusually revealing article in today's New York Times Bits about Numenta's Grok technology, I was pleasantly surprised to read that Numenta's founder, Jeff Hawkins has thought about it:
Patterns of one or the other are reinforced over time. As new data streams in, the brain figures out if it is capturing more complexity, which requires either modifying the understanding of the original pattern or splitting it into two patterns, making for new knowledge. Sometimes, particularly if it not repeated, the data is discarded as irrelevant information. Thus, over time, sounds become words, words occupy a grammatical structure, and ideas are conveyed.Assuming that my definition of pattern (a concurrent group of related signals) is the same as Hawkins', I find the above amazing. Not only does Hawkins understand the power of pattern hierarchies (what others are calling deep learning), he also seems to grok the need for efficient composition. Grok's ability to split a pattern into two constituents is what caught my attention. If Numenta really knew how to do this automatically and efficiently, then they would have a genuine breakthrough. Maybe it is time that Hawkins considers applying Grok to other perceptual learning problems such as speech or visual recognition. I mean, if Grok's underlying technology is as great as he portrays it to be, it should be able to solve something like the cocktail party problem. That would truly be impressive.
The Bayesian Curse
Having said that, there is no doubt in my mind that whatever solution Hawkins came up with is handicapped by the Bayesian mindset that afflicts the artificial intelligentsia. It is, in all likelihood, a complicated kludge. I say this because of the way the problem is phrased in the NYT Bits article. Knowing what I know, the fact that Grok needs to split patterns into smaller patterns tells me that Hawkins is aware of the problem but his Bayesian glasses prevent him from seeing the correct solution. The latter does not involve the splitting of complex patterns into smaller patterns because it can automatically prevent the formation of patterns that are more complex than their levels within the hierarchy require. Hawkins is so close, yet so far.
The Future Is Not what It Used to Be
Brain-like artificial intelligence will arrive on the world scene much sooner than the AI community expects and it will come from a most unexpected and inconvenient source. Stay tuned.
Jeff Hawkins Develops a Brainy Big Data Company
The Second Great AI Red Herring Chase
The Myth of the Bayesian Brain
The Holy Grail of Robotics