On 
Neural 
Networks with Minimal 
Weights 
Vasken Bohossian 
Jehoshua Bruck 
California Institute of Technology 
Mail Code 136-93 
Pasadena, CA 91125 
E-maih {vincent, bruckparadise.caltech.edu 
Abstract 
Linear threshold elements are the basic building blocks of artificial 
neural networks. A linear threshold element computes a function 
that is a sign of a weighted sum of the input variables. The weights 
are arbitrary integers; actually, they can be very big integers-- 
exponential in the number of the input variables. However, in 
practice, it is difficult to implement big weights. In the present 
literature a distinction is made between the two extreme cases: 
linear threshold functions with polynomial-size weights as opposed 
to those with exponential-size weights. The main contribution of 
this paper is to fill up the gap by further refining that separation. 
Namely, we prove that the class of linear threshold functions with 
polynomial-size weights can be divided into subclasses according 
to the degree of the polynomial. In fact, we prove a more general 
result--that there exists a minimal weight linear threshold function 
for any arbitrary number of inputs and any weight size. To prove 
those results we have developed a novel technique for constructing 
linear threshold functions with minimal weights. 
1 Introduction 
Human brains are by far superior to computers for solving hard problems like combi- 
natorial optimization and image and speech recognition, although their basic build- 
ing blocks are several orders of magnitude slower. This observation has boosted 
interest in the field of artificial neural networks [Hop field 82], [Rumelhart 82]. The 
latter are built by interconnecting multiple artificial neurons (or linear threshold 
gates), whose behavior is inspired by that of biological neurons. Artificial neural 
networks have found promising applications in pattern recognition, learning and 
On Neural Networks with Minimal Weights 247 
other data processing tasks. However most of the research has been oriented to- 
wards the practical aspect of neural networks, simulating or building networks for 
particular tasks and then comparing their performance with that of more traditional 
methods for those particular tasks. To compare neural networks to other compu- 
tational models one needs to develop the theoretical settings in which to estimate 
their capabilities and limitations. 
1.1 Linear Threshold Gate 
The present paper focuses on the study of a single linear threshold gate (artificial 
neuron) with binary inputs and output as well as integer weights (synaptic coeffi- 
cients). Such a gate is mathematically described by a linear threshold function. 
Definition I (Linear Threshold Function) 
A linear threshold function of n variables is a Boolean function f 
-1, 1} that can be written as 
f 1 
f(M) = sgn(F()) = -1 
, for F(37) >_ 0 
, where F(37) = t-37 = y wixi 
, otherwise 
i=l 
for any 37 6 {-1, 1)" and a fixed  6 Z". 
Although we could allow the weights wi to be real numbers, it is known [Muroga 71], 
[Raghavan 88] that for a, binary input neuron, one needs O(n log n) bits per weight, 
where n is the number of inputs. So in the rest of the paper, we will assume without 
loss of generality that all weights are integers. 
1.2 Motivation 
Many experimental results in the area of neural networks have indicated that the 
magnitudes of the coefficients in the linear threshold elements grow very fast with 
the size of the inputs and therefore limit the practical use of the network. One 
natural question to ask is the following. How limited is the computational power of 
the network if one limits oneself to threshold elements with only "small" growth in 
the size of the coefficients? To answer that question we have to define a measure of 
the magnitudes of the weights. Note that, given a function f, the weight vector  
is not unique (see Example I below). 
Definition 2 (Weight Space) 
Given a lineat' threshold function f we define W as the set of all weights that satisfy 
Definition 1, that is W 
Here follows a measure of the size of the weights. 
Definition 3 (Minimal Weight Size) 
We define the size of a weight vector as the sum of the absolute values of the weights. 
The minimal weight size of a linear threshold function is defined as: 
S[f] = min ( Iw, I) 
W  
i----1 
The particular vector that achieves the minimum is called a minimal weight vector. 
Naturally, S[f] is a function of n. 
248 V. BOHOSSIAN, J. BRUCK 
It has been shown [Hastad 94], [Myhill 611, [Shawe-Taylor 92], [Siu 91] that there 
exists a linear threshold function that can be implemented by a single threshold 
element with exponentially growing weights, S[f] ~ 2 ', but cannot be implemented 
by a threshold element with smaller: polynomialy growing weights, S[f]~ n a, d 
constant. In light of that result the above question was dealt with by defining a 
class within the set of linear threshold functions: the class of functions with "small" 
(i.e. polynomialy growing) weights [Siu 91]. Most of the recent research focuses on 
the power of circuits with small weights, relative to circuits with arbitrary weights 
[Goldmann 92], [Goldman 93]. Rather than dealing with circuits we are interested 
in studying a single threshold gate. The main contribution of the present paper is 
to further refine the division of small versus arbitrary weights. We separate the set 
of functions with small weights into classes indexed by d, the degree of polynomial 
growth and show that all of them are non-empty. In particular, we develop a 
technique for proving that a weight vector is minimal. We use that technique to 
construct a function of size S[f] = s for an arbitrary s. 
1.3 Approach 
The main difficulty in analyzing the size of the weights of a threshold element is due 
to the fact that a single linear threshold function can be implemented by different 
sets of weights as shown in the following exanple. 
Example 1 (A Threshold Function with Minimal Weights) 
Consider the following two sets of weights (weight vectors). 
t 1 = (1 24), FI() -' Xl +2x2+4xa 
//2 -- (2 4 8), F.2( ) -- 2Zl -[- 4x2 -]- 8z3 
They both implement the same threshold function 
f() = sgn(Fe(Z)) = sgn(2F1 (')) = sgn(F (3)) 
A closer look reveals that f() = sgn(xa), implying that none of the above weight 
vectors has minimal size. Indeed, the minimal one is a = (0 0 1) and S[f] = 1. 
It is in general difficult to determine if a given set of weights is minimal [Amaldi 93], 
[Willis 63]. Our technique consists of limiting the study to only a particular subset 
of linear threshold functions, a subset for which it is possible to prove that a given 
weight vector is minimal. That subset is loosely defined by the requirement that 
there exist input vectors for which .f() = .f(-). The existence of such a vector, 
called a root of f, puts a constraint on the weight vector used to implement f. The 
larger the set of roots - the larger the constraint on the set of weight vectors, which 
in turn helps determine the minimal one. A detailed description of the technique is 
given in Section 2. 
1.4 Organization 
Here follows a brief outline of the rest of the paper. Section 2 mathematically defines 
the setting of the problem as well as derives some basic results on the properties 
of functions that admit roots. Those results are used as building blocks for the 
proof of the main results in Section 3. It also introduces a construction method 
for functions with minimal weights. Section 3 presents the main result: for any 
weight size, s, and any number of inputs, n, there exists an n-input linear threshold 
function that requires weights of size S[f] = s. Section 4 presents some applications 
of the result of Section 3 and indicates future research directions. 
On Neural Networks with Minimal Weights 249 
2 Construction of Minimal Threshold Functions 
The present section defines the mathematical tools used to construct functions with 
minimal weights. 
2.1 Mathematical setting 
We are interested in constructing functions for which the minimal weight is easily 
determined. Finding the minimal weight involves a search, we are therefore inter- 
ested in finding functions with a constrained weight spaces. The following tools 
allows us to put constraints on W. 
Definition 4 (Root Space of a Boolean Function) 
A vector ff E {-1, 1 )'* such that f(v = f(-v is called a root of f. We define the 
root space, R, as the set of all roots of f. 
Definition 5 (Root Generator Matrix) 
For a given weight vector  E W and a root ff  R, the root generator matrix, 
G = (go), is a (n x k)-matrix, with entries in {-1,0, 1 ), whose rows ff are orthogonal 
to u7 and equal to 7 at all non-zero coordinates, namely, 
2. gij -- 0 or go = vj for all i and j. 
Example 2 (Root Generator Matrix) 
Suppose that we are given a linear threshold function specified by a weight 
vector  = (1,1,2,4,1,1,2,4). By inspection we determine one root ff = 
(1,1, 1, 1,-1,-1,-1,-1). Notice that Wl + w2 -w7 - 0 which can be written 
as tY' t = 0, where tY = (1, 1,0,0,0,0,-1,0) is a row of G. Set F= 7- 2tY. Since  
is equal to ff at all non-zero coordinates, F  {-1, 1} '. Also g.  = if. + tY' t = 0. 
We have generated a new root: F= (-1, -1, 1, 1,-1,-1, 1, -1). 
Lemma 6 (Orthogonality of G and W) 
For a given weight vector 7  W and a root ff  R 
fig r -' 6 
holds for any weight vector ff E W. 
Proof. For an arbitrary 3  W and an arbitrary row, g-, of G, let 
By definition offfi, 9   {-1,1)'* and 7   = 0. That implies f() = f(-O 
is a root of f. For any weight vector 3  W, sgn(3. ff) = sgn(-3. ff). Therefore 
3- 07- 2ffi) = 0 and finally, since if. 3 = 0 we get 3. g- = 0. [] 
Lemma 7 (Minimality) 
For a given weight vector   W and a root  E R if rank(G) = n - 1 (i.e. G 
has n - 1 independent rows) and Iwil = 1 for some i, then  is the minimal weight 
vector. 
Proof. From Lemma 6 any weight vector 3 satisfies 3G T = . rank(G) = n - 1 
implies that dim(W) = 1, i.e. all possible weight vectors are integer multiples of 
each other. Since Iwil = 1, all vectors are of the form 3 = ku7, for k _> 1. Therefore 
7 has the smallest size. [] 
We complete Example 2 with an application of Lemma 7. 
250 V. BOHOSSIAN, J. BRUCK 
Example 3 (Minimality) 
Given  = (1, 1,2,4, 1, 1,2,4) and 
/1 0 
0 1 
0 0 
G= 0 0 
1 0 
1 1 
1 1 
It is easy to verify that rank(G) 
minimal and S[f] = 16. 
7- (1, 1, 1, 1,-1,-1,-1,-1) we can construct  
0 0 -1 0 0 0 
0 0 0 -1 0 0 
1 0 0 0 -1 0 
0 1 0 0 0 -1 
0 0 0 -1 0 0 
0 0 0 0 -1 0 
1 0 0 0 0 -1 
= n- I = 7 and therefore, by Lemma 7,  is 
2.2 Construction of minimal weight vectors 
In Example 3 we saw how, given a weight vector, one can show that it is minimal. 
In this section we present an example of a linear threshold function with minimal 
weight size, with an arbitrary number of input variables. 
We would like to construct a weight vector and show that it is minimal. Let 
the number of inputs, n, be even. Let t consist of two identical blocks : 
(Wl, W2, ..., Wn/2, Wl, W2,..., Wn/2). Clearly, ff - (1, 1, ,.., 1, - 1, - 1, ..., - 1) is a root 
and G is the corresponding generator matrix. 
G ____ 
f 1 0 0 0 ... 0 0 0 -1 0 0 0 ... 0 0 
0 1 0 0 ... 0 0 0 0 -1 0 0 ... 0 0 
0 0 ! 0 ... 0 0 0 0 0 -1 0 ... 0 0 
0 0 0 0 ... 0 1 0 0 0 0 0 ... 0 -! 0 
0 0 0 0 ... 0 0 1 0 0 0 0 ... 0 0 -1 
3 The Main Result 
The following theorem states that given an integer s and a number of variables n 
there exists a function of n variables and minimal weight size s. 
Theorem 8 (Main Result) 
For any pair (s, n) that satisfies 
( 2 for n even 
1. n < s < 2,. t ' 
- - +2" ,lornodd 
2. s even 
there exists a linear threshold function of n variables, f, with minimal weight size 
s[f]= s. 
Proof. Given a pair (s, n), that satisfies the above conditions we first construct 
a weight vector u that satisfies Ein__l IWil --' $, then show that it is the minimal 
weight vector of the function f(x) = sgn(v7. . The proof is shown only for n even. 
CONSTRUCTION. 
1. Define (al,aa,...,an/2) - (1,1,..., 1). 
On Neural Networks with Minimal Weights 251 
V"n/2 . 
2. If .i----1 ai < s/2 then increase by one the smallest ai such that ai < 2 i-2 
(In the case of a tie take the wi with smallest index i). 
v 'n/2 = s/2 or (al,a2,..., 
3. Repeat the previous step until z_i=l ai aN) = 
(1, 1,2, 4, ..., 2]-9'). 
4. Set t -- (al, a2, ..., an/2, al, a2, ..., an/2). 
Because we increase the size by one unit at a time the algorithm will converge to the 
desired result for any integer s that satisfies n _ s _ 2]. We have a construction 
for any valid (s, n) pair. Let us show that  is minimal. 
MINIMALITY. Given that t -- (al,a2, ...,an/2, al, a2, ..., aa/) we find a root ff- 
(1, 1, ..., 1,-1,-1, ...,-1) and n/2 rows of the generator matrix G corresponding to 
the equations wi - wi+,. To form additional rows note that the first k ai's are 
i--1 
powers of two (where k depends on s and n). Those can be written as ai = 'y=l aj 
and generate k - 1 rows. And finally note that all other ai, i > k, are smaller than 
2 k+l. Hence, they can be written as a binary expansion ai - '=1 aiyay where 
aij E {0, 1). There are  - k such weights. G has a total of n - 1 independent rows. 
rank(G) = n - 1 and w - 1, therefore by Lemma 7, t is minimal and S[f] - s. [] 
Example 4 (A Function of 10 variables and size S[.f] = 26) 
We start with - (1,1, 1,1, 1). We iterate: (1, 1,2,1, 1), (1,1,2,2, 1), (1, 1,2,2,2), 
(1,1,2,3,2), (1,1,2,3,3), (1,1,2,4,3), (1,1,2,4,4), and finally (1,1,2,4,5). The 
construction algorithm converges to 8- (1, 1, 2,4,5). We claim that - (8, ) - 
(1, 1,2, 4, 5, 1, 1,2, 4, 5) is minimal. Indeed, ff = (1, 1,1, 1,1, -1, -1, -1, -1, -1) and 
/1 
o 
o 
o 
G= 0 
1 
1 
1 
1 
oooo- o o o 
1 0 0 0 0 -1 0 0 0 
0 1 0 0 0 0 -1 0 0 
0 0 1 0 0 0 0 -1 0 
0 0 0 1 0 0 0 0 
0 0 0 0 0 -! 0 0 0 
I 0 0 0 0 0 -1 0 0 
1 1 0 0 0 0 0 -1 0 
0 0 i 0 0 0 0 0 -1 
is a matrix of rank 9. 
Example 5 (Functions with Polynomial Size) 
This example shows an application of Theorem 8. We define "T (d) as the set of 
linear threshold functions for which sir] _ n d. The Theorem states that for any 
even n there exists a function f of n variables and minimum weight sir] = n d. The 
implication is that for all d, L (d-l) is a proper subset of -T (d) 
4 Conclusions 
We have shown that for any reasonable pair of integers (n, s), where s is even, there 
exists a linear threshold function of n variables with minimal weight size S[f] -- s. 
We have developed a novel technique for constructing linear threshold functions 
with minimal weights that is based on the existence of root vectors. An interesting 
application of our method is the computation of a lower bound on the number 
of linear threshold functions [Smith 66]. In addition, our technique can help in 
studying the trade-offs between a number of important parameters associated with 
252 V. BOHOSSIAN, J. BRUCK 
linear threshold (neural) circuits, including, the number of elements, the number of 
layers, the fan-in, fan-out and the size of the weights. 
Acknowledgements 
This work was supported in part by the NSF Young Investigator Award CCR- 
9457811, by the Sloan Research Fellowship, by a grant from the IBM Almaden 
Research Center, San Jose, California, by a grant from the AT&T Foundation and 
by the center for Neuromorphic Systems Engineering as a part of the National 
Science Foundation Engineering Research Center Program; and by the California 
Trade and Commerce Agency, Office of Strategic Technology. 
References 
[Amaldi 93] E. Amaldi and V. Kann. The complexity and approximability of finding 
maximum feasible subsystems of linear relations. Ecole Polytechnique Federale 
De Lausanne Technical Report, ORWP 93/11, August 1993. 
[Goldmann 92] M. Goldmann, J. Hastad, and A. Razborov. Majority gates vs. gen- 
eral weighted threshold gates. Computational Complexity, (2):277-300, 1992. 
[Goldman 93] M. Goldmann and M. Karpinski. Simulating threshold circuits by 
majority circuits. In Proc. 5th ACM STOC, pages pp. 551-560, 1993. 
[Hastad 94] J. Hastad. On the size of weights for threshold gates. SIAM. J. Disc. 
Math., 7:484-492, 1994. 
[Hopfield 82] J. Hop field. Neural networks and physical systems with emergent col- 
lective computational abilities. Proc. of the USA National Academy of Sciences, 
79:2554-2558, 1982. 
[Muroga 71] M. Muroga. Threshold Logic and its Applications. Wiley-Interscience, 
1971. 
[Myhill 61] J. Myhill and W. H. Kautz. On the size of weights required for linear- 
input switching functions. IRE Trans. Electronic Computers, (EC10):pp. 288- 
290, 1961. 
[Raghavan 88] P. Raghavan. Learning in threshold networks: a computational 
model and applications. Technical Report RC 13859, IBM Research, July 
1988. 
[Rumelhart 82] D. Rumelhart and J. McClelland. Parallel distributed processing: 
Explorations in the microstructure of cognition. MIT Press, 1982. 
[Shawe-Taylor 92] J. S. Shawe-Taylor, M. H. G. Anthony, and W. Kern. Classes 
of feedforward neural networks and their circuit complexity. Neural Networks, 
Vol. 5:pp. 971-977, 1992. 
[Siu 91] K. Siu and J. Bruck. On the power of threshold circuits with small weights. 
SIAM J. Disc. Math., Vol. 4(No. 3):pp. 423-435, August 1991. 
[Smith 66] D. R. Smith. Bounds on the number of threshold functions. IEEE 
Transactions on Electronic Computers, June 1966. 
[Willis 63] D. G. Willis. Minimum weights for threshold switches. In Switching 
Theory in Space Techniques. Stanford University Press, Stanford, Calif., 1963. 
