31 
AN ARTIFICIAL NEURAL NETWORK FOR SPATIO- 
TEMPORAL BIPOLAR PATTERNS: APPLICATION TO 
PHONEME CLASSIFICATION 
Toshiteru Homma 
Les E. Atlas 
Robert J. Marks H 
Interactive Systems Design Laboratory 
Department of Electrical Engineering, FT-10 
University of Washington 
Seattle, Washington 98195 
ABSTRACT 
An artificial neural network is developed to recognize spatioqemporal 
bipolar patterns associatively. The function of a formal neuron is generalized by 
replacing multiplication with convolution, weights with transfer functions, and 
thresholding with nonlinear transform following adaptation. The Hebbian learn- 
ing rule and the delta learning rule are generalized accordingly, resulting in the 
learning of weights and delays. The neural network which was first developed 
for spatial patterns was thus generalized for spario-temporal patterns. It was 
tested using a set of bipolar input patterns derived from speech signals, showing 
robust classification of 30 model phonemes. 
1. INTRODUCTION 
Learning spatio-temporal (or dynamic) patterns is of prominent importance in biological 
systems and in artificial neural network systems as well. In biological systems, it relates to such 
issues as classical and operant conditioning, temporal coordination of sensorimotor systems and 
temporal reasoning. In artificial systems, it addresses such real-world tasks as robot control, 
speech recognition, dynamic image processing, moving target detection by sonars or radars, EEG 
diagnosis, and seismic signal processing. 
Most of the processing elements used in neural network models for practical applications 
have been the formal neuron 1 or its variations. These elements lack a memory flexible to tem- 
poral patterns, thus limiting most of the neural network models previously proposed to problems 
of spatial (or static) patterns. Some past solutions have been to convert the dynamic problems to 
static ones using buffer (or storage) neurons, or using a layered network with/without feedback. 
We propose in this paper to use a "dynamic formal neuron" as a processing element for 
learning dynamic patterns. The operation of the dynamic neuron is a temporal generalization of 
the formal neuron. As shown in the paper, the generalization is straightforward when the activa- 
tion part of neuron operation is expressed in the frequency domain. Many of the existing learn- 
ing rules for static patterns can be easily generalized for dynamic patterns accordingly. We show 
some examples of applying these neural networks to classifying 30 model phonemes. 
American Institute of Physics 1988 
32 
2. FORMAL NEURON AND DYNAMIC FORMAL NEURON 
The formal neuron is schematically drawn in Fig. l(a), where 
Input 
Activation 
Output 
Transmittance 
Node operator 
Neuron operation 
= [x x2    xt.] r 
Yi, i = 1,2 ..... N 
zi, i = 1,2 ..... N 
= [wi wi2''' w.] r 
where 1(') is a nonlinear memoryless transform 
(2.1) 
Note that a threshold can be implicitly included as a transmittance from a constant input. 
In its original form of formal neuron, xi  {0,1} and '1(') is a unit step function u (-). A 
variation of it is a bipolar formal neuron where xi  {-1,1} and 11(. ) is the sign function sgn('). 
When the inputs and output are converted to frequency of spikes, it may be expressed as 
xi  R and 11(') is a rectifying function r('). Other node operators such as a sigmoidal function 
may be used. 
We generalize the notion of formal neuron so that the input and output are functions of 
time. In doing so, weights are replaced with transfer functions, multiplication with convolution, 
and the node operator with a nonlinear transform following adaptation as often observed in bio- 
logical systems. 
Fig. l(b) shows a schematic diagram of a dynamic formal neuron where 
Input 
Activation 
Output 
Transfer function 
Adaptation 
Node operator 
Neuron operation 
(t) = [Xl(t ) x2(t )    xr(t)] ? 
yi(t), i = 1,2 ..... N 
zi (t), i = 1,2 ..... N 
v(t) = [Wil(t ) Wi2(t ) ''' w.(t)] r 
ai (t) 
11 where '1(') is a nonlinear memoryless transform 
z i (t) = 11 (a i (-t), q (t)T, 2(t )) 
(2.2) 
For convenience, we denote 
with b(0 is equivalent to correlating a(-0 with b(t). 
If the Fourier transforms (f ) = F {(t)}, 
ai Or ) = F {ai (t )} exist, then 
yi O c ) = ai O c) [qOc) cr 
where q (f)cr is the conjugate transpose of /(t). 
 X Wi L Yl = W ZI 
 (a) 
/(f) = F {/(t)}, Yi(f) = F{yi(t)}, 
. as correlation instead of convolution. Note that convolving a(t) 
and 
x,(I) 
X2(I) W (I) w.(t) 
.,,,,,'"''WL() 
x,lt) 
Fig. 1. Formal Neuron and Dynamic Formal Neuron. 
(2.3) 
z(i) 
33 
3. LEARNING FOR FORMAL NEURON AND DYNAMIC FORMAL NEURON 
A number of learning rules for formal neurons has been proposed in the past. In the fol- 
lowing paragraphs, we formulate a learning problem and describe two of the existing learning 
rules, namely, Hebbian learning and delta learning, as examples. 
Present to the neural network M pairs of input and desired output samples 
{k), k)}, k = 1,2 ..... M , in order. Let W () = [t ) 2 ()  .. )]r where ?) is the 
transmittance vector at the k-th step of learning. Likewise, let 
_Z () = [z 40 z 4-)  ' ' z4)], and D () = [0) (2) ... ()], 
where 
k) = W( ), z ) = 1'()), and q(.) = [q(Y0 q(Y2) ' ' ' al(Yv)] r. 
The Hebbian learning rule 2 is described as follows*: 
W (k) = W(-0 + eu() 
The delta learning (or LMS learning) rule 3, 4 is described as follows: 
_w() = _w(k-o _ a{_w(-) _ 
(3.1) 
(3.2) 
The leaming rules described in the previous section are generalized for the dynamic formal 
neuron by replacing multiplication with correlation. First, the problem is reformulated and then 
the generalized rules are described as follows. 
Present to the neural network M pairs of time-varing input and output samples 
{k)(t), )(t)}, k = 1,2 ..... M , in order. Let W()(t) = [g(t)()(t) ?)(t).   k)(t)] r 
where ,Vi()(t) is the vector whose elements w?)(t) are transfer functions connecting the input j 
to the neuron i at the k-th step of learning. The Hebbian learning rule for the dynamic neuron is 
then 
W()(t ) = W(-O(t ) + a(-t ), k)(t ), )(t ) r . (3.3) 
The delta learning rule for dynamic neuron is then 
W()(t ) = W(-O(t ) - a(-t ), {W(-0(t ), k)(t ) - )(t )}, )(t )r . (3.4) 
This generalization procedure can be applied to other learning rules in some linear discrim- 
inant systems 5 , the serf-organizing mapping system by Kohonen 6 , the perceptton 7, the back- 
propagation model 3 , etc. When a system includes a nonlinear operation, more careful analysis 
is necesssay as pointed out in the Discussion section. 
4. DELTA LEARNING, PSEUDO INVERSE AND REGULARIZATION 
This section reviews the relation of the delta learning rule to the pseudo-inverse and the 
technique known as regularization. 4, 6, 8, 9, lo 
Consider a minimization problem as described below: Find __W which minimizes 
k 
subject to y(k) = 
A solution by the delta rule is, using a gradient descent method, 
_w() = __w( - ) _ a-R () 
* This interpretation assumes a strong supervising signal at the output while learning. 
(4.1) 
(4.2) 
34 
where R (t') = II t') - ')l[  The minimum norm solution to the problem, _W*, is unique and 
can Ie expressed as 
IV* = DX t (4.3) 
where/t is the Moore-Penrose pseudo-inverse of/ , i.e., 
X t = lim(XrX + o'2I)-lX T = IinlXT(XX T + o21) -1. (4.4) 
2 
On the condition that 0 < ct <  where  is the maximum eigenvalue of XrX, ') and 
S t') am independent, and IVO,) is uncorrelated with '), 
E {_w* 3 =  {w(')3 (4.5) 
where  {x } denotes the expected value of x. One way to make use of this relation is to calcu- 
late W* for known standard data and refine it by (4.2), thereby saving time in the early stage of 
learning, 
However, this solution results in an ill-conditioned IV often in practice. When the prob- 
lem is ill-posed as such, the technique known as regularization can alleviate the ill-conditioning 
of IV. The problem is reformulated by finding IV which minimizes 
R (o) = llY '') - ')ll  + o2y'.ll ,ll  (4.6) 
subject to ') = __WX ') where IV = [2 ' "t] r  
This reformulation regularizes (4.3) to 
Iv(o) = D_Xr(X_X_ r + o2/) - (4.7) 
which is statistically equivalent to W(') when the input has an additive noise of variance o a 
uncorrelated with '). Interestingly, the leaky LMS algorithm II leads to a statistically 
equivalent solution 
Ivtk) = _ _ (4.8) 
2 
where 0 < 15 < 1 and 0 < ct <  These solutions am related as 
if o a = 1-- when W (t') is uncorrelated with ') .li 
(4.9) 
Equation (4.8) can be generalized for a network using dynamic formal neurons, resulting in 
a equation similar to (3.4). Making use of (4.9), (4.7) can be generalized for a dynamic neuron 
network as 
Iv(t ;(X) = F- {D (f )X_ (f )Cr (x (f )X (f ) cr + (x2/) 43 (4.10) 
where F - denotes the inverse Fourier transform. 
5. SYNTHESIS OF BIPOLAR PHONEME PATTERNS 
This section illustrates the scheme used to synthesize bipolar phoneme patterns and to 
form prototype and test patterns. 
The fundamental and first three formant frequencies, along with their bandwidths, of 
phoneroes provided by Klat0 2 were taken as parameters to synthesize 30 prototype phoneme pat- 
terns. The phoneroes were labeled as shown in Table 1. An array of L (=100) input neurons 
Covered the range of 100 to 4000 Hz. Each neuron had a bipolar state which was +1 only when 
one of the frequency bands in the phoneme presented to the network was within the critical band 
35 
of the neuron and -1 otherwise. The center frequencies (fc) of critical bands were obtained by 
dividing the 100 to 4000 Hz range into a log scale by L. The critical bandwidth was a constant 
100 Hz up to the center frequency fc = 500 Hz and 0.2fc Hz when f >500 Hz. 13 
The parameters shown in Table 1 were used to construct Table 1. Labels of Phoneroes 
30 prototype phoneme patterns. For O, it was constructed as a 
combination of t and 0. F, F 2 ,F s were the first, second, and 
third formants, and B, B 2, and B3. were corresponding 
bandwidths. The fundamental frequency Fo = 130 Hz with B0 = 
10 Hz was added when the phoneme was voiced. For plosives, 
there was a stop before formant traces start. The resulting bipo- 
lar paRems are shown in Fig.2. Each pattern had length of 5 
time units, composed by linearly interpolating the frequencies 
when the formant frequency was gliding. 
A sequence of phonemes converted from a continuous 
pronunciation of digits, {o, zero, one, two, three, four, five, six, 
seven, eight, nine }, was translated into a bipolar pattern, adding 
two time units of transition between two consequtive phonemes 
by interpolating the frequency and bandwidth parameters 
linearly. A flip noise was added to the test pattern and created a 
noisy test pattern. The sign at every point in the original clean 
test pattern was flipped with the probability 0.2. These test pat- 
terns are shown in Fig. 3. 
Label Phoneme 
1 [i y] 
2 [I ] 
3 [eY] 
4 
6 [a] 
7 [o] 
8 
9 [o w] 
lO [u ] 
11 [u TM] 
12 
13 [a y] 
14 [a TM] 
15 [o y] 
16 [w] 
17 [y] 
18 [r] 
19 [1] 
20 (q 
21 [v] 
22 [0] 
23 [] 
24 [s] 
25 [z] 
26 [p] 
27 [t] 
28 (d] 
29 [k] 
30 [n] 
Fig. 2. Prototype Phoneme Patterns. (Thirty phoneme patterns are shown 
in sequence with intervals of two time units.) 
6. SIMULATION OF SPATIO-TEMPORAL FILTERS FOR PHONEME CLASSIFICATION 
The network system described below was simulated and used to classify the prototype 
phoneme patterns in the test patterns shown in the previoius section. It is an example of gen- 
eralizing a scheme developed for static patterns 13 to that for dynamic patterns. Its operation is 
in two stages. The first stage operation is a spatio-temporal filter bank: 
36 
(a) 
Fig. 3. Test Patterns. (a) Clean Test Pattern. (b) Noisy Test Pattern. 
y'(t ) = __W (t ),:(t ) , and '(t ) = _r(a(-t )St(t )) . 
The second stage operation is the "winner-take-all" lateral inhibition: 
0'(t) = '(t), and o'(t+a) = r_(A_(-t),(t) - h), 
and 
(6.1) 
(6.2) 
for all f . (6.6) 
This minimizes 
R (o,f 
A_(t) = (1 + )/_.(t) -  (t-nA). (6.3) 
where h  is a constant threshold vector with elements h i = h and (') is the Kronecker delta 
function. This operation is repeated a sufficient number of times, No .13,1n The output is 
O'(t + No-a). 
Two models based on different learning rules were simulated with parameters shown 
below. 
Model 1 (Spario-temporal Matched Filter Bank) 
Let (x(t) = (t), f() =  in (3.3) where F is a unit vector with its elements e/ = 5(k-i). 
___W (t) = X (t)T. (6.4) 
4 1 
h=200, and a(t ) = -(t-n,x). 
n=O  
Model 2 (Spario-temporal Pseudo-inverse Filter) 
Let D =/ in (4.10). Using the alternative expression in (4.4), 
W(t ) = F-I{(x0 e )CTxoe ) + (2I_)-lxCT}. (6.5) 
h =0.05,o 2= 1000.0,and a(t)=(t). 
37 
Because the time and frequency were finite and discrete in simulation, the result of the 
inverse discrete Fourier transform in (6.5) may be aliased. To alleviate the aliasing, the transfer 
functions in the prototype matrix X(t) were padded with zeros, thereby doubling the lengths. 
Further zero-padding the transfer functions did not seem to change teh result significantly. 
The results are shown in Fig. 4(a)-(d). The arrows indicate the ideal response positions at 
the end of a phoneme. When the program was run with different thresholds and adaptation func- 
tion a (t), the result was not very sensitive to the threshold value, but was, nevertheless affected 
by the choice of the adaptation function. The maximum number of iterations for the lateral inhi- 
bition network to converge was observed: for the experiments shown in Fig. 4(a) - (d), the 
numbers were 44, 69, 29, and 47, respectively. Model 1 missed one phoneme and falsely 
responded once in the clean test pattern. It missed three and had one false response in the noisy 
test pattern. Model 2 correctly recognized all phonemes in the clean test pattern, and false- 
alarmed once in the noisy test pattern. 
7. DISCUSSION 
The notion of convolution or correlation used in the models presented is popular in 
engineering disciplines and has been applied extensively to designing filters, control systems, etc. 
Such operations also occur in biological systems and have been applied to modeling neural net- 
works.15,16 Thus the concept of dynamic formal neuron may be helpful for the improvement of 
artificial neural network models as well as the understanding of biological systems. A portion of 
the system described by Tank and Hopfield 17 is similar to the matched filter bank model simu- 
lated in this paper. 
The matched filter bank model (Model 1) performs well when all phonemes (as above) are 
of the same duration. Otherwise, it would perform poorly unless the lengths were forced to a 
maximum length by padding the input and transfer functions with -l's during calculation. The 
pseudo-inverse filter model, on the other hand, should not suffer from this problem. However, 
this aspect of the model (Model 2) has not yet been explicitly simulated. 
Given a spatio-temporal pattern of size L x K, i.e., L spatial elements and K temporal ele- 
ments, the number of calculations required to process the first stage of filtering by both models is 
the same as that by a static formal neuron network in which each neuron is connected to the L x 
K input elements. In both cases, L x K multiplications and additions are necessary to calculate 
one output value. In the case of bipolar patterns, the mutiplication used for calculation of activa- 
tion can be replaced by sign-bit check and addition. A future investigation is to use recursive 
filters or analog filters as transfer functions for faster and more efficient calculation. There are 
various schemes to obtain optimal recursive or analog filters. is, 19 Besides the lateral inhibition 
scheme used in the models, there are a number of alternative procedures to realize a "winner- 
take-all" network in analog or digital fashion. 15,2,21 
As pointed out in the previous section, the Fourier transform in (6.5) requires a precaution 
concerning the resulting length of transfer functions. Calculating the recursive correlation equa- 
tion (3.4) also needs such preprocessing as windowing or truncation. 22 
The generalization of static neural networks to dynamic ones along with their learning 
rules is strainghfforward as shown if the neuron operation and the learning rule are linear. Gen- 
eralizing a system whose neuron operation and/or leaming rule are nonlinear requires more care- 
ful analysis and remains for future work. The system described by Watrous and Shastri 16 is an 
example of generalizing a backpropagation model. Their result showed a good potential of the 
model and a need for more rigorous analysis of the model. Generalizing a system with recurrent 
connections is another task to be pursued. In a system with a certain analytical nonlinearity, the 
signals are expressed by Volterra functionals, for example. A practical learning system can then 
be constructed if higher kernels are neglected. For example, a cubic function can be used instead 
of a sigmoidal function. 
38 
(a) 
3 
Co) 
Fig. 4. Performance of Models. (a) Model 1 with Clean Test Pattern. (b) 
Model 2 with Clean Test Pattern. (c) Model 1 with Noisy Test Pattern. 
(d) Model 2 with Noisy Test Pattern. Arrows indicate the ideal response 
positions at the end of phoneme. 
8. CONCLUSION 
The formal neuron was generalized to the dynamic formal neuron to recognize spario- 
temporal patterns. It is shown that existing learning rules can be generalized for dynamic formal 
neurons. 
An artificial neural network using dynamic formal neurons was applied to classifying 30 
model phonemes with bipolar patterns created by using parameters of formant frequencies and 
their bandwidths. The model operates in two stages: in the first stage, it calculates the correla- 
tion between the input and prototype paRems stored in the transfer function matrix, and, in the 
second stage, a lateral inhibition network selects the output of the phoneme pattern dose to the 
input pattern. 
39 
(c) 
(d) 
Fig. 4 (continued.) 
Two models with different transfer functions were tested. Model 1 was a matched filter 
bank model and Model 2 was a pseudo-inverse filter model. A sequence of phoneme patterns 
corresponding to continuous pronunciation of digits was used as a test pattern. For the test pat- 
tern, Model 1 missed to recognize one phoneme and responded falsely once while Model 2 
correctly recognized all the 32 phonemes in the test pattern. When the flip noise which flips the 
sign of the pattern with the probability 0.2, Model 1 missed three phonemes and falsely 
responded once while Model 2 recognized all the phonemes and false-alarmed once. Both 
models detected the phonems at the correct position within the continuous stream. 
References 
W. S. McCulloch and W. Pitts, "A logical calculus of the ideas imminent in nervous 
activity," Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943. 
D. O. Hebb, The Organization of Behavior, Wiley, New York, 1949. 
40 
3. D.E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by 
error propagation," in Parallel Distributed Processing, Vol. 1, MIT, Cambridge, 1986. 
4. B. Widrow and M. E. Hoff, "Adaptive switching circuits," Institute of Radio Engineers, 
Western Electronics Show and Convention, vol. Convention Record Part 4, pp. 96-104, 
1960. 
5. R.O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Chapter 5, Wiley, 
New York, 1973. 
6. T. Kohonen, Self-organization and Associative Memory, Springer-Verlag, Berlin, 1984. 
7. F. Rosenblatt, Principles of Neurodynamics, Spartan Books, Washington, 1962. 
8. J.M. Varah, "A practical examination of some numerical methods for linear discrete ill- 
posed problems," SIAM Review, vol. 21, no. 1, pp. 100-111, 1979. 
9. C. Koch, J. Marroquin, and A. Yuille, "Analog neural networks in early vision," Proceed- 
ings of the National Academy of Sciences, USA, vol. 83, pp. 4263-4267, 1986. 
10. G.O. Stone, "An analysis of the delta rule and the learning of statistical associations," in 
Parallel Distributed Processing., Vol. 1, MIT, Cambridge, 1986. 
11. B. Widrow and S. D. Steams, Adaptive Signal Processing, Prentice-Hall, Englewood 
Cliffs, 1985. 
12. D.H. Klatt, "Software for a cascade/parallel formant synthesizer," Journal of Acoustical 
Society of America, vol. 67, no. 3, pp. 971-995, 1980. 
13. L.E. Arias, T. Homma, and R. J. Marks II, "A neural network for vowel classification," 
Proceedings International Conference on Acoustics, Speech, and Signal Processing, 1987. 
14. R.P. Lippman, "An introduction to computing with neural nets," IEEE ASSP Magazine, 
April, 1987. 
15. S. Amari and M. A. Arbib, "Competition and cooperation in neural nets," in Systems Neu- 
roscience, ed. J. Metzler, pp. 119-165, Academic Press, New York, 1977. 
16. R.L. Watrous and L. Shastri, "Learning acoustic features from speech data using connec- 
fionist networks," Proceedings of The Ninth Annual Conference of The Cognitive Science 
Society, pp. 518-530, 1987. 
17. D. Tank and J. J. Hopfield, "Concentrating information in time: analog neural networks 
with applications to speech recognition problems," Proceedings of International Confer- 
ence on Neural Netoworks, San Diego, 1987. 
18. J.R. Treichler, C. R. Johnson, Jr., and M. G. Larimore, Theory and Design of Adaptive 
Filters, Chapter 5, Wiley, New York, 1987. 
19. M Schetzen, The Volterra and Wiener Theories of Nonlinear Systems, Chapter 16, Wiley, 
New York, 1980. 
20. S. Grossberg, "Associative and competitive principles of learning," in Competition and 
Cooperation in Neural Nets, ed. M. A. Arbib, pp. 295-341, Springer-Verlag, New York, 
1982. 
21. R.J. Marks II, L. E. Atlas, J. J. Choi, S. Oh, K. F. Cheung, and D.C. Park, "A perfor- 
mance analysis of associative memories with nonlinearities in the correlation domain," 
(submitted to Applied Optics), 1987. 
22. D.E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing, pp. 
230-234, Prentice-Hall, Englewood Cliffs, 1984. 
