Thursday, September 8, 2011

Rebel Speech Update, September 8, 2011

Signal Separation Blues

Well, it seems that I may have to revisit my signal separation hypothesis. I can't get it to work properly. I get the best results by completely bypassing the separation layer and feeding the sensory signals directly to the bottom level of the tree of knowledge (TOK), aka hierarchical memory. If you remember, I wrote a while back that sensory signals must first go through a fixed time scale separation layer before they are fed to the TOK where they are organized according to a variable time scale. At this point, I'm thinking that the problem may be due to some bug in my code (believe me, this shit is complicated) that is preventing the separation layer from doing what it's supposed to do.

Patterns and Sequences

For now, I am putting signal separation on the back burner and moving full speed ahead on coding the TOK. Learning sequences of patterns, while easy in principle, is a pain in the ass to program. And I am not even talking about the task of implementing a spiking neural network framework and all the classes required for a variety of neurons and synapses. And let's not forget the workhorse thread that is running the parallel simulation underneath and the management module that is required for making and severing connections without causing a fatal runtime exception. Luckily, I pretty much had most of the neural stuff done from my work on Animal. Precisely timing events in a Von Neumann serial computer is a daunting task.

Surprisingly enough, it turns out that, during learning, the system must rely on deterministic timing. In other words, a pattern is considered recognized only if all of its input signals arrived concurrently. Contrast this with the probabilistic Bayesian learning approach used in Numenta's HTM and the probalistic Hidden Markov Model used in most commercial speech recognition programs. As I have written previously, in the Rebel Cortex intelligence hypothesis, probability only comes into play during actual recognition when multiple sequences and branches of the hierarchy compete in a winner-take-all struggle to become active.

Deterministic timing does not mean, however, that two concurrent signals must always occur concurrently. It means that they must occur concurrently often enough to be captured by the learning mechanism and recognized whenever they reoccur. Only the learning mechanism needs to be deterministic, not the signals and certainly not the act of recognition.

One of the problems that I am wrestling with has to do with determining how many sequences should be created in memory. I am considering limiting the number of branches that a sensory input can have to a fixed preset value. I don't think this is the way to do it but it will have to do for the time being. I got more pressing issues to deal with.


Nothing lifts one's spirit more than obtaining good results. Although I have only worked on coding the bottom level of the TOK (it's the hardest part), I am already getting tantalizing glimpses of glorious things to come. Patterns form quickly when I repeatedly speak the numbers 1 to 10 in the microphone. This is encouraging because I am using only 4 audio amplitude levels and only the lower 24 frequencies out of a spectrum of 512 (11 Khz sampling rate). One peculiar thing that I noticed is that the learned sequences rapidly taper off. What I mean is that one of the patterns in a sequence will have say, 25 concurrent inputs, but the number quickly decreases to 3 or 4 in the other patterns. Another thing is that the learned sequences contain at most 5 or 6 patterns. I am not entirely sure but I think that the learning mechanism may be automatically limiting the pattern sequences to short phonemes. It's kind of scary when you don't fully understand what your own program is doing. I seriously need to write some code in order to graphically represent the sequences in real time. This would give me a visual understanding of what's going on.

Coming Up

The next task on my agenda is to implement the upper levels of the memory hierarchy and the branch mechanism proper. This will be the really fun part because, once that is working properly, I will know whether or not Rebel Speech is correctly recognizing my utterances. It's exciting. Stay tuned.

See Also:

Rebel Speech Recognition
Rebel Cortex
Invariant Visual Recognition, Patterns and Sequences
Rebel Cortex: Temporal Learning in the Tree of Knowledge

No comments: