An Adaptive WTA using Floating Gate 
Technology 
W. Fritz Kruger, Paul Hasler, Bradley A. Minch, and Christof Koch 
California Institute of Technology 
Pasadena, CA 91125 
(SS) a95- 2S2 
stretch@klab.caltech.edu 
Abstract 
We have designed, fabricated, and tested an adaptive Winner- 
Take-All (WTA) circuit based upon the classic WTA of Lazzaro, 
et al [1]. We have added a time dimension (adaptation) to this 
circuit to make the input derivative an important factor in winner 
selection. To accomplish this, we have modified the classic WTA 
circuit by adding floating gate transistors which slowly null their 
inputs over time. We present a simplified analysis and experimen- 
tal data of this adaptive WTA fabricated in a standard CMOS 2um 
process. 
I Winner-Take-All Circuits 
In a WTA network, each cell has one input and one output. For any set of inputs, the 
outputs will all be at zero except for the one which is from the cell with the maximum 
input. One way to accomplish this is by a global nonlinear inhibition coupled with a 
self-excitation term [2]. Each cell inhibits all others while exciting itself; thus a cell 
with even a slightly greater input than the others will excite itself up to its maximal 
state and inhibit the others down to their minimal states. The WTA function is 
important for many classical neural nets that involve competitive learning, vector 
quantization and feature mapping. The classic WTA network characterized by 
Lazzaro et. al. [1] is an elegant, simple circuit that shares just one common line 
among all cells of the network to propagate the inhibition. 
Our motivation to add adaptation comes from the idea of saliency maps. Picture 
a saliency map as a large number of cells each of which encodes an analog value 
An Adaptive WTA using Floating Gate Technology 721 
Vtun01 Vtun02 
 C1 C1  
, -I- 
 Vfg 1 I 02 
Vl 
CL_-- 
Vdd 
v 
Figure 1: The circuit diagram of a two input winner-take-all circuit. 
V2 
reflecting some measure of the importance (saliency) of its input. We would like 
to pay attention to the most salient cell, so we employ a WTA function to tell us 
where to look. But if the input doesn't change, we never look away from that one 
cell. We would like to introduce some concept of fatigue and refraction to each cell 
such that after winning for some time, it tires, allowing other cells to win, and then 
it must wait some time before it can win again. We call this circuit an adaptive 
WTA. 
In this paper, we present an adaptive WTA based upon the classic WTA; Figure 1 
shows a two-input, adaptive WTA circuit. The difference between the classic and 
adaptive WTA is that M4 and M5 are pFET single transistor synapses. A single 
transistor synapse [3] is either an nFET or pFET transistor with a floating gate and 
a tunneling junction. This enhancement results in the ability of each transistor to 
adapt to its input bias current. The adaptation is a result of the electron tunneling 
and hot-electron injection modifying the charge on the floating gate; equilibrium is 
established when the tunneling current equals the injection current. The circuit is 
devised in such a way that these are negative feedback mechanisms, consequently 
the output voltage will always return to the same steady state voltage determined 
by its bias current regardless of the DC input level. Like the autozeroing amplifier 
[4], the adaptive WTA is an example of a circuit where the adaptation occurs as a 
natural part of the circuit operation. 
2 pFET hot-electron injection and electron tunneling 
Before considering the behavior of the adaptive WTA, we will review the processes of 
electron tunneling and hot-electron injection in pFETs. In subthreshold operation, 
we can describe the channel current of a pFET (Ip) for a differential change in gate 
( 
voltage, AV9, around a fixed bias current Iso, as Ip = Iso exp - u, ] where np is 
the amount by which AV 9 affects the surface potential of the pFET, and UT is kT 
q 
We will assume for this paper that all transistors are identical. 
First, we consider electron tunneling. We start with the classic model of electron 
722 W. E Kruger, P. Hasler, B. A. Minch and C. Koch 
E(Si02) Drain 
" Z ! / / 'a/ . Ec 
Depletion Region 
Source Channel 
Figure 2: pFET Hot Electron Injection. (a) Band diagram of a subthreshold pFET 
transistor for favorable conditions for hot-electron injection. (b) Measured data of pFET 
injection efficiency versus the drain to channel voltage for four source currents. Injection 
efficiency is the ratio of injection current to source current. At (I)dc equal to 8.2V, the 
injection efficiency increases a factor of e for an increase (I)dc of 250reV. 
tunneling through a silicon - SiOn. system [5]. As in the autozeroing amplifier [4], 
the tunneling current will be only a weak function for the voltage swing on the 
floating gate voltage through the region of subthreshold currents; therefore we will 
approximate the tunneling junction as a current source supplying Itu,o current to 
the floating gate. 
Second, we derive a simple model of pFET hot-electron injection. Figure 2a shows 
the band diagram of a pFET operating at bias conditions which are favorable for 
hot-electron injection. Hot-hole impact ionization creates electrons at the drain edge 
of the depletion region. These secondary electrons travel back into the channel 
region gaining energy as they go. When their energy exceeds that of the SiO2 
barrier, they can be injected through the oxide to the floating gate. The hole 
impact ionization current is proportional to the source current, and is an exponential 
function of the voltage drop from channel to drain (dc). The injection current is 
proportional to the hole impact ionization current and is an exponential function 
of the voltage drop from channel to drain. We will neglect the dependence of the 
floating-gate voltage for a given source current and dc as we did in [4]. Figure 
2b shows measured injection efficiency for several source currents, where injection 
efficiency is the ratio of the injection current to source current. The injection 
efficiency is independent of source current and is approximately linear over a 1 
- 2V swing in ac; therefore we model the injection efficiency as proportional to 
d ) within that I to 2V swing, where V/nj is a measured device parameter 
exp - ,j 
which for our process is 250mV at a bias ac - 8.2V, and Aac is the change in 
ac from the bias level. An increasing voltage input will increase the pFET surface 
potential by capacitive coupling to the floating gate. Increasing the pFET surface 
potential will increase the source current thereby decreasing a for a fixed output 
voltage and lowering the injection efficiency. 
An Adaptive WTA using Floating Gate Technology 723 
.10 ' 
o 
o 
o 
o 
o 
o 
Input CulTent Slep (% of Ia cument) 
(b) 
150 
Figure 3: Illustration of the dynamics for the winning and losing input voltages. (a) 
Measured V verses time due to an upgoing and a downgoing input current step. The 
initial input voltage change due to the input step is much smaller than the voltage change 
due to the adaptation. (b) Adaptation time of a losing input voltage for several tunneling 
voltages. The adaptation time is the time from the start of the input current step to the 
time the input voltage is within 10% of its steady state voltage. A larger tunneling current 
decreases the adaptation time by increasing the tunneling current supplied to the floating 
gate. 
3 Two input Adaptive WTA 
We will outline the general procedure to derive the general equations to describe 
the two input WTA shown in Fig. 1. We first observe that transistors Mx, 
and Ma make up a differential pair. Regardless of any adaptation, the middle V 
node and output currents are set by the input voltages (V1 and V2), which are set 
by the input currents, as in the classic WTA [1]. The dynamics for high frequency 
operation are also similar to the classic WTA circuit. Next, we can write the two 
Kirchhoff Current Law (KCL) equations at V1 and V2, which relate the change in 
V and V2 as a function of the two input currents and the floating gate voltages. 
Finally, we can write the two KCL equations at the two floating gates Vfg and 
Vfg2, which relates the changes in the floating gate voltages in terms of V and V2. 
This procedure is directly extendable to multiple inputs. A full analysis of these 
equations is very difficult and will be described in another paper. 
For this discussion, we present a simplified analysis to develop the intuition of the 
circuit operation. At sufficiently high frequencies, the tunneling and injection cur- 
rents do not adapt the floating gate voltages sufficiently fast to keep the input 
voltages at their steady state levels. At these frequencies, the adaptive WTA acts 
like the classic WTA circuit with one small difference. A change in the input volt- 
ages, V or V. is linearly related to V by the capacitive coupling (AV -- -- AV), 
where this relationship is exponential in the classic WTA. There is always some ca- 
pacitance C., even if not explicitly drawn due to the overlap capacitance from the 
floating gate to drain. This property gives the designer the added freedom to mod- 
ify the gain. We will assume the circuit operates in its intended operating regime 
where the floating gate transistors settle sufficiently fast such that their channel 
724 W. E Kruger, P. Hasler, B. A. Minch and C. Koch 
lO -1 
can,=-*   (A) 
(a) (b) 
Figure 4: Measured change in steady state input voltages as a function of bias current. 
(a) Change in the two steady state output voltages as a function of the bias current of the 
second input. The bias current of the first input was held fixed at 8.14nA. (b) Change in 
the RMS noise of the two output voltages as a function of the bias current of the second 
input. The RMS noise is much higher for the losing input than for the winning input. 
Note that where the two bias currents cross roughly corresponds to the location where the 
RMS nmse on the two input voltages is equal. 
current equals the input currents 
nA Vf gi ) dIi n dVf 9 i 
Ii = Iso exp( TT '--> d--- = Ii T d-'-- (1) 
for all inputs indexed by i, but not necessarily fast enough for the floating gates to 
settle to their final steady state levels. 
To develop some initial intuition, we shall begin by considering one half of the two 
input WTA: transistors M1, Ma and M4 of Figure 1. First, we notice that IoutX is 
equal to lb (the current through transistor M1); note that this is not true for the 
multiple input case. By equating these two currents we get an equation for V as 
V = nV1 - nVb, where we will assume that l is a fixed bias voltage. Assuming the 
input current equals the current through M4, V1 obeys the equation 
(nCx+C.)dVx CTUTdI1 (Ix exp( AV1 ) 
= + (2) 
where CT is the total capacitance connected to the floating gate. The steady state 
of (2)is 
(h) 
A=  In I--o (3) 
which is exactly the same expression for each input in a multiple input WTA. We get 
a linear differential equation by making the substitution X = exp(v.-v.) [4], and we 
get similar solutions to the behavior of the autozeroing amplifier. Figure 3a shows 
measured data for an upgoing and a downgoing current step. The input current 
change results in an initial fast change in the input voltage, and the input voltage 
then adapts to its steady state voltage which is a much greater voltage change. 
From the voltage difference between the steady states, we get that j is roughly 
500reV. 
An Adaptive WTA using Floating Gate Technology 725 
1c 
,,, I 
( 4c 
o 
5 10 15 20 25 30 35 40 45 50 
Time(s) 
(b) 
Figure 5: Experimental time traces measurements of the output current and voltage 
for small differential input current steps. (a) Time traces for small differential current 
steps around nearly identical bias currents of 8.6nA. (b) Time traces for small differential 
current steps around two different bias currents of 8.7nA and 0.88nA. In the classic WTA, 
the output currents would show no response to the input current steps. 
Returning to the two input case, we get two floating gate equations by assuming 
that the currents through M4 and M5 are equal to their respective input currents 
and writing the KCL equations at each floating gate. If V and V. do not cross 
each other in the circuit operation, then one can easily solve these KCL equations. 
Assume without loss of generality that V is the winning voltage; which implies that 
AV ---- nAVx. The initial input voltage change before the floating gate adaptation 
due to a step in the two input currents of I- - Ix + and I- - I2 + is 
AVx - nCx In , AV2  In (4) 
for C2 much less than nCx. In this case, V1 moves on the order of the floating gate 
voltage change, but V2 moves on the order of the floating gate change amplified up 
by --. The response of AV1 is governed by an identical equation to (2) of the earlier 
half-analysis, and therefore results in a small change in V. Also, any perturbation 
of V is only slightly amplified at Vx due to the feedback; therefore any noise at V 
will only be slightly amplified into V. The restoration of V2 is much quicker than 
the Vx node if C2 is much less than nC; therefore after the initial input step, one 
can safely assume that V is nearly constant. The voltage at V is amplified by -c_q_z 
at V2; therefore any noise at V is amplified at the losing voltage, but not at the 
winning voltage as the data in Fig. 4b shows. The losing dynamics are identical 
to the step response of an autozeroing amplifier [4]. Figure 3b shows the variation 
 of the adaptation time verses the percent input current change for several values of 
tunneling voltages. 
The main difficulty in exactly solving these KCL equations is the point in the 
dynamics where V1 crosses V2, since the behavior changes when the signals move 
726 W. E Kruger, P Hasler, B. A. Minch and C. Koch 
through the crossover point. If we get more than a sufficient  decrease to reach 
the starting V equilibrium, then the rest of the input change is manifested by an 
increase in V2. If the voltage V crosses the voltage Vx, then V will be set by the 
new steady state, and V1 is governed by losing dynamics until  m V2. At this 
point Vx is nearly constant and V is governed by losing dynamics. This analysis is 
directly extendible to arbitrary number of inputs. 
Figure 5 shows some characteristic traces from the two-input circuit. Recall that the 
winning node is that with the lowest voltage, which is reflected in its corresponding 
high output current. In Fig. 5a, we see that as an input step is applied, the output 
current jumps and then begins to adapt to a steady state value. When the inputs 
are nearly equal, the steady state outputs are nearly equal; but when the inputs 
are different, the steady state output is greater for the cell with the lesser input. 
In general, the input current change that is the largest after reaching the previous 
equilibrium becomes the new equilibrium. This additional decrease in Vx would 
lead to an amplified increase in the other voltage since the losing stage roughly 
looks like an autozeroing amplifier with the common node as the input terminal. 
The extent to which the inputs do not equal this largest input is manifested as a 
proportionally larger input voltage. The other voltage would return to equilibrium 
by slowly, linearly decreasing in voltage due to the tunneling current. This process 
will continue until V equals V. Note in general that the inputs with lower bias 
currents have a slight starting advantage over the inputs with higher bias currents. 
Figure 5b illustrates the advantage of the adaptive WTA over the classic WTA. In 
the classic WTA, the output voltage and current would not change throughout the 
experiment, but the adaptive WTA responds to changes in the input. The second 
input step does not evoke a response because there was not enough time to adapt 
to steady state apter the previous step; but the next step immediately causes it to 
win. Also note in both of these traces that the noise is very large in the loosing 
node and small in the winner because of the gain differences (see Figure 4b). 
References 
[1] 
J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead "Winner-Take- 
All Networks of O(N) Complexity", NIPS I Morgan Kaufmann Publishers, 
San Mateo, CA, 1989, pp 703 - 711. 
[2] 
Grossberg S. "Adaptive Pattern Classification and Universal Recoding: I. Par- 
allel Development and Coding of Neural Feature Detectors." Biological Cyber- 
netics vol. 23, 121-134, 1988. 
[5] 
P. Hasler, C. Diorio, B. A. Minch, and C. Mead, "Single Transis- 
tor Learning Synapses", NIPS 7, MIT Press, 1995, 817-824. Also at 
http://www.pcmp.caltech.edu/anaprose/paul. 
P. Hasler, B. A. Minch, C. Diorio, and C. Mead, "An autozeroing amplifier 
using pFET Hot-Electron Injection", ISCAS, Atlanta, 1996, III-325 - III-328. 
Also at http://www. pcmp.caltech.edu/anaprose/paul. 
M. Lenzlinger and E. H. Snow (1969), "Fowler-Nordheim tunneling into ther- 
mally grown Si02," J. Appl. Phys., vol. 40, pp. 278-283, 1969. 
