Neural Computation with Winner-Take-All as 
the only Nonlinear Operation 
Wolfgang Maass 
Institute for Theoretical Computer Science 
Technische Universitit Graz 
A-8010 Graz, Austria 
email: maassigi.tu-graz.ac.at 
http://www. cis.tu-graz.ac.at/igi/maass 
Abstract 
Everybody "knows" that neural networks need more than a single layer 
of nonlinear units to compute interesting functions. We show that this is 
false if one employs winner-take-all as nonlinear unit: 
Any boolean function can be computed by a single k-winner-take- 
all unit applied to weighted sums of the input variables. 
Any continuous function can be approximated arbitrarily well by 
a single soft winner-take-all unit applied to weighted sums of the 
input variables. 
Only positive weights are needed in these (linear) weighted sums. 
This may be of interest from the point of view of neurophysiology, 
since only 15% of the synapses in the cortex are inhibitory. In addi- 
tion it is widely believed that there are special microcircuits in the 
cortex that compute winner-take-all. 
Our results support the view that winner-take-all is a very useful 
basic computational unit in Neural VLSI: 
[] it is wellknown that winner-take-all of n input variables can 
be computed very efficiently with 2n transistors (and a to- 
tal wire length and area that is linear in n) in analog VLSI 
[Lazzaro et al., 1989] 
[] we show that winner-take-all is not just useful for special pur- 
pose computations, but may serve as the only nonlinear unit for 
neural circuits with universal computational power 
[] we show that any multi-layer perceptron needs quadratically in 
n many gates to compute winner-take-all for n input variables, 
hence winner-take-all provides a substantially more powerful 
computational unit than a perceptron (at about the same cost 
of implementation in analog VLSI). 
Complete proofs and further details to these results can be found in 
[Maass, 2000]. 
294 W. Maass 
1 Introduction 
Computational models that involve competitive stages have so far been neglected in com- 
putational complexity theory, although they are widely used in computational brain models, 
artificial neural networks, and analog VLSI. The circuit of [Lazzaro et al., 1989] computes 
an approximate version of winner-take-all on n inputs with just 2n transistors and wires 
of length O(n), with lateral inhibition implemented by adding currents on a single wire of 
length O(n). Numerous other efficient implementations of winner-take-all in analog VLSI 
have subsequently been produced. Among them are circuits based on silicon spiking neu- 
rons ([Meador and Hylander, 1994], [Indiveri, 1999]) and circuits that emulate attention in 
artificial sensory processing ([Horiuchi et al., 1997], [Indiveri, 1999]). Preceding analytical 
results on winner-take-all circuits can be found in [Grossberg, 1973] and [Brown, 1991]. 
We will analyze in section 4 the computational power of the most basic competitive compu- 
tational operation: winner-take-all (= 1-WTA,). In section 2 we will discuss the somewhat 
more complex operation k-winner-take-all (k-WTAn), which has also been implemented 
in analog VLSI [Urahama and Nagao, 1995]. Section 3 is devoted to soft winner-take-all, 
which has been implemented by [Indiveri, 1999] in analog VLSI via temporal coding of 
the output. 
Our results shows that winner-take-all is a surprisingly powerful computational module 
in comparison with threshold gates (= McCulloch-Pitts neurons) and sigmoidal gates. 
Our theoretical analysis also provides answers to two basic questions that have been 
raised by neurophysiologists in view of the well-known asymmetry between excitatory 
and inhibitory connections in cortical circuits: how much computational power of neural 
networks is lost if only positive weights are employed in weighted linear sums, and how 
much learning capability is lost if only the positive weights are subject to plasticity. 
2 Restructuring Neural Circuits with Digital Output 
We investigate in this section the computational power of a k-winner-take-all gate comput- 
ing the function k - WTA, : 11 ' - {0, 1} ' 
Xl x2 
bl b2 
k - WTAn 
b,  {0,1} 
with 
bi - 1 ++ xi is among the k largest of the inputs Xl,... , X n. 
[precisely: bi = I - xj > xi holds for at most k - i indices j] 
Neural Computation with Vnner-Take-All 295 
Theorem 1. Any two-layer feedforward circuit C (with m analog or binary input 
variables and one binary output variable) consisting of threshoM gates (=percep- 
trons) can be simulated by a circuit W consisting of a single k-winner-take-all gate 
k-WTAn I applied to weighted sums of the input variables with positive weights. This holds 
for all digital inputs, and for analog inputs except for some set S c_ 1t m of inputs that has 
measure O. 
In particular, any boolean function 
f ' {O, 1}m -+ {0,1} 
can be computed by a single k-winner-take-all gate applied to positive weighted sums of 
the input bits. 
Remarks 
If (7 has polynomial size and integer weights, whose size is bounded by a polyno- 
mial in m, then the number of linear gates S in W can be bounded by a polynomial 
in m, and all weights in the simulating circuit W are natural numbers whose size 
is bounded by a polynomial in m. 
The exception set of measure 0 in this result is a union of finitely many hyper- 
planes in 11 m. One can easily show that this exception set S of measure 0 in 
Theorem 1 is necessary. 
Any circuit that has the structure of W can be converted back into a 2-layer thresh- 
old circuit, with a number of gates that is quadratic in the number of weighted 
sums (=linear gates) in W. This relies on the construction in section 4. 
Proof of Theorem 1: Since the outputs of the gates on the hidden layer of C are from 
{0, 1}, we can assume without loss of generality that the weights cti,... , ct, of the out- 
put gate G of C are from {- 1, 1 } (see for example [Siu et al., 1995] for details; one first 
observes that it suffices to use integer weights for threshold gates with binary inputs, one 
can then normalize these weights to values in {- 1, 1 } by duplicating gates on the hidden 
layer of C). Thus for any circuit input z E l " we have C(z) = 1 ,  ctj Gj (_z) >_ t3, 
j=l 
where G1,... , Gn are the threshold gates on the hidden layer of C, ctl,... , ct, are from 
{-1, 1}, and 0 is the threshold of the output gate G. In order to eliminate the negative 
weights in G we replace each gate Gj for which c U = - 1 by another threshold gate j so 
that  (z) = 1 - G (z_) for all z E 1 " except on some hyperplane. 2 We set Oj := Gj 
for all j  { 1,..., n} with cj = 1. Then we have for all z  1 " , except for z from some 
exception set S consisting of up to n hyperplanes, 
EctGJ(Z) = ESJ(z) -[{j  {1,... ,n} 'c U = -1}[. 
j=l j=l 
HenceC(z) = 1   8j(z) >_ k 
j=l 
for all z  11 m - S, for some suitable/c E N. 
Let w{,... , wJ  11 be the weights and O )  11 be the threshold of gate j, j = 1,... , n. 
l of which we only use its last output bit 
2We exploit here that -, ii wizi _> 0 4g i=i (-wi)zi > -O for arbitrary wi, zi, 0 E R . 
296 W. Maass 
Z1 Zm 
G Gn 
G 
u 
Gi,... ,Gn are arbitrary threshold gates, G 
is a threshold gate with weights from {- 1,1 } 
Z1 Zm W 
'1 n+ 1 
b 
and back 
S1,... , Sn+l are linear gates (with positive 
weights only, which are sums of absolute val- 
ues of weights from the gates GI,.   , Gn) 
Thus O(z_)= I  
and 
E Iwlzi- E Iwlzi >- OJ. Hence with 
i:w >o i:w <o 
i:w <o - i:wf>o 
forj = 1,... ,n 
j=l i:w >0 
we have for every j E {1,... ,n} and every z E IR "  
Sn+l _> s   Iilz-  Ilz _> o  (_)= 1. 
i:w>O i:w<O 
This implies that the (n + 1)st output bn+ of the k-winner-take-all gate k-WTA,+i for 
Neural Computation with Winner-Take-All 297 
k := n - c + 1 applied to S1,... , $n+1 satisfies 
bn+l - 1 
I{J E {1,... ,n + 1}: $,+1 >_ $j}l > k + 1 
q= ]{j e {1,... ,n} ' Sn+l >_ S5}1 _> [c 
4= C(z) = 1. 
Note that all the coefficients in the sums S1,..., Sn+ are positive. 
3 Restructuring Neural Circuits with Analog Output 
In order to approximate arbitrary continuous functions with values in [0, 1] by circuits that 
have a similar structure as those in the preceding section, we consider here a variation of a 
winner-take-all gate that outputs analog numbers between 0 and 1, whose values depend on 
the rank of the corresponding input in the linear order of all the n input numbers. One may 
argue that such gate is no longer a "winner-take-all" gate, but in agreement with common 
terminology we refer to it as a soft winner-take-all gate. Such gate computes a function 
from 11 n into [0, 1]" 
soft winner-take-all 
/'1 /'2 
[0,1] 
whose ith output/'i  [0, 1] is roughly proportional to the rank of xi among the numbers 
Xl,  .. , Xn. More precisely: for some parameter T E N we set 
I{J e {1,... ,n}' xi > xj}l  
/'i -- , 
T 
rounded to 0 or I if this value is outside [0, 1]. Hence this gate focuses on those 
inputs xi whose rank among the n input numbers x,... ,Xn belongs to the set 
{," " 
 + 1,..., min{n, T +  }}. These ranks are linearly scaled into [0, 1]. 3 
Theorem 2. Circuits consisting of a single soft winner-take-all gate (of which we only use 
its first output r 0 applied to positive weighted sums of the input variables are universal 
approximators for arbitrary continuous functions from 11 m into [0, 1].  
3It is shown in [Maass, 2000] that actually any continuous monotone scaling into [0, 1] can be 
used instead. 
298 W. Maass 
A circuit of the type considered in Theorem 2 (with a soft winner-take-all gate applied to 
n positive weighted sums S,... , S,) has a very simple geometrical interpretation: Over 
each point z of the input "plane" 11 ' we consider the relative heights of the n hyperplanes 
H,... , H, defined by the n positive weighted sums S1,... , S,. The circuit output de- 
pends only on how many of the other hyperplanes H2,... , H, are above H at this point z. 
4 A Lower Bound Result for Winner-Take-All 
One can easily see that any k-WTA gate with n inputs can be computed by a 2-layer thresh- 
old circuit consisting of () + n threshold gates: 
? 
xi   
Xl X i ,, Xj X n 
() threshold gates 
n threshold gates 
Hence the following result provides an optimal lower bound. 
Theorem 3. Any feedforward threshold circuit (=multi-layer perceptron) that computes 
1-WTA for n inputs needs to have at least () + n gates. I 
5 Conclusions 
The lower bound result of Theorem 3 shows that the computational power of winner-take- 
all is quite large, even if compared with the arguably most powerful gate commonly studied 
in circuit complexity theory: the threshold gate (also referred to a McCulloch-Pitts neuron 
or perceptron). 
Neural Computation with Winner-Take-All 299 
It is well known ([Minsky and Papert, 1969]) that a single threshold gate is not able to 
compute certain important functions, whereas circuits of moderate (i.e., polynomial) size 
consisting of two layers of threshold gates with polynomial size integer weights have re- 
markable computational power (see [Siu et al., 1995]). We have shown in Theorem 1 that 
any such 2-layer (i.e., 1 hidden layer) circuit can be simulated by a single k-winner-take-all 
gate, applied to polynomially many weighted sums with positive integer weights of poly- 
nomial size. 
We have also analyzed the computational power of soft winner-take-all gates in the context 
of analog computation. It is shown in Theorem 2 that a single soft winner-take-all gate 
may serve as the only nonlinearity in a class of circuits that have universal computational 
power in the sense that they can approximate any continuous functions. 
Furthermore our novel universal approximators require only positive linear operations be- 
sides soft winner-take-all, thereby showing that in principle no computational power is lost 
if in a biological neural system inhibition is used exclusively for unspecific lateral inhibi- 
tion, and no adaptive flexibility is lost if synaptic plasticity (i.e., "learning") is restricted to 
excitatory synapses. 
Our somewhat surprising results regarding the computational power and universality of 
winner-take-all point to further opportunities for low-power analog VLSI chips, since 
winner-take-all can be implemented very efficiently in this technology. 
References 
[Brown, 1991] Brown, T. X. (1991). Neural NetworkDesign for Switching Network Con- 
trol.. Ph.-D.-Thesis, CALTECH. 
[Grossberg, 1973] Grossberg, S. (1973). Contour enhancement, short term memory, and 
constancies in reverberating neural networks. Studies in Applied Mathematics, vol. 52, 
217-257. 
[Horiuchi et al., 1997] Horiuchi, T. K., Morris, T. G., Koch, C., DeWeerth, S. P. (1997). 
Analog VLSI circuits for attention-based visual tracking. Advances in Neural Informa- 
tion Processing Systems, vol. 9, 706-712. 
[Indiveri, 1999] Indiveri, G. (1999). Modeling selective attention using a neuromorphic 
analog VLSI device, submitted for publication. 
[Lazzaro et al., 1989] Lazzaro, J., Ryckebusch, S., Mahowald, M. A., Mead, C. A. (1989). 
Winner-take-all networks of O (rt) complexity. Advances in Neural Information Process- 
ing Systems, vol. I, Morgan Kaufmann (San Mateo), 703-71 I. 
[Maass, 2000] Maass, W. (2000). On the computational power of winner-take-all, Neural 
Computation, in press. 
[Meador and Hylander, 1994] Meador, J. L., and Hylander, P. D. (1994). Pulse coded 
winner-take-all networks. In: Silicon Implementation of Pulse Coded Neural Networks, 
Zaghloul, M. E., Meador, J., and Newcomb, R. W., eds., Kluwer Academic Publishers 
(Boston), 79-99. 
[Minsky and Papert, 1969] Minsky, M. C., Papert, S. A. (1969). Perceptrons, MIT Press 
(Cambridge). 
[Siu et al., 1995] Siu, K.-Y., Roychowdhury, V., Kailath, T. (1995). Discrete Neural Com- 
putation: A Theoretical Foundation. Prentice Hall (Englewood Cliffs, N J, USA). 
[Urahama and Nagao, 1995] Urahama, K., and Nagao, T. (1995). k-winner-take-all circuit 
with O( N) complexity. IEEE Trans. on Neural Networks, vol.6, 776-778. 
