On The Circuit Complexity of Neural Networks 
V. P. Roychowdhury 
Information Systems Laboratory 
Stanford University 
Stanford, CA, 94305 
A. Orlitsky 
AT&T Bell Laboratories 
600 Mountain Avenue 
Murray IIill, N J, 07974 
K. Y. Siu 
Information Systems Laboratory 
Stallford University 
Stallford, CA, 94305 
T. Kailath 
Information Systems Laboratory 
Stauford University 
Stanford, CA, 94305 
Abstract 
We introduce a geometric approach for investigating the power of threshold 
circuits. Viewing n-variable boolean functions as vectors in ", we invoke 
tools from linear algebra and linear programming to derive new results on 
the realizability of boolean functions using threshold gates. 
Using this approach, one can obtain: (1) upper-bounds oil the number of 
spurious memories in IIopfield networks, and on the number of functions 
implementable by a depth-d threshold circuit; (2) a lower bound on the 
number of orthogonal input fuuctions required to ilnplement a threshold 
function; (3) a necessary condition for all arbitrary set of input fimctions to 
implement a threshold filnction; (4) a lower bound on the error introduced 
in approximating boolean functions using sparse polynomials; (5) a limit 
on the effectiveness of the only known lower-bound technique (based on 
computing correlations among booleall fimctions) for the depth of thresh- 
old circuits implementing boolean fimctions, and (6) a constructive proof 
that every boolean function f of n input variables is a threshold function 
of polynomially many input functions, none of which is significantly cor- 
related with f. Some of these restilts lead to genera. lizations of key restilts 
concerning threshold circuit complexity, particularly those that are based 
on tile so-called spectral or IIarmonic analysis approach. Moreover, our 
geometric approach yields simple proofs, based on elementary resnlts fi'om 
linear algebra, for many of these earlier results. 
953 
954 Roychowdhury, Orlitsky, Siu, and Kailath 
1 Introduction 
An S-input threshold gate is characterized by S real weights Wl,. ., WS. It takes S 
inputs: x,..., xs, each either +1 or -1, and outputs +1 if the linear combination 
ziS=l WiZ i is positive and -1 if the linear COlllbination is negative. Threshold gates 
were recently used to implement several functions of practical interest (including: 
Parity, Addition, Multiplication, Division, and Comparison) with fewer gates and 
reduced depth than conventional circuits using AND, OR, and NOT gates [12, 4, 11]. 
This success has led to a considerable amount of research on tile power of threshold 
circuits [1, 10, 9, ll, 3, 13]. Itowever, even simple questions remain unanswered. It 
is not known, for example, whether there is a function that call be computed by a 
depth-3 threshold circuit with polynomially many gates but cannot be computed 
by any depth-2 circuit with polynomially many threshold gates. 
Geometric approaches have proven usetiff for analyzing threshold gates. An S-input 
threshold gate corresponds to a hyperplane in 7. s. This has been used for example 
to count the number of boolean functions computable by a single threshold gate [6], 
and also to determine functions that cannot be implemented by a single threshold 
gate. IIowever, threshold circuits of depth two or more do not carry a simple geo- 
metric interpretation in s. The inputs to gates in the second levcl are themselves 
threshold functions, hence the linear COlnbination conputed at the second level is 
a non-linear function of the inputs. Lackiug a geometric view, researchers [5, 3] 
have used indirect approaches, applying harmonic-analysis techniques to analyze 
threshold gates. These techniques, apart from their complexity, restricted the input 
functions of the gates to be of very special types: input variables or parities of the 
input variables, thus not applying evcn to depth-two circuits. 
In this paper, we describe a simple geometric relation between tile output function 
of a threshold gate and its set of input filnctions. This applies to arbitrary sets of 
input functions. Using this rclation, we can prove the folloxving results: (1) upper 
bounds on (a) tile number of threshold fimctions of any set of input functions, (b) 
the number of spurious memorics in a IIopficld network, and (c) tile number of 
functions implementable by threshold circuits of depth d; (2) a lower bound on the 
number of orthogonal input functions requircd to implement a threshold function; 
(3) a quantifiable necessary condition for a set of fimctions to implement a threshold 
function; (4) a lower bound on the error ill approxinlating booleall functions using 
sparse polynomials; (5) a limit on tile effectiveness of the correlation method used 
in [7] to prove that a certain function cannot be implemented by depth two circuits 
with polynomially many gates and polynolnially bounded weights; (6) a proof that 
every function f is a threshold function of polynolnially many input functions, none 
of which is significantly correlated with f. 
Special cases of some of these results, wilere tile input functions to a threshold gate 
are restricted to the input variables, or parities of the input variables, were proven 
in [5, 3] using harnlonic-analysis tools. Our technique shows that these tools are 
not needed, providing simpler proofs for illore general results. 
Due to space limitations, we cannot present tile fifil details of our results. Instead, 
we shall introduce the basic definitions followed by a technical summary of the 
results; the emphasis will be on pointing out the lnotivation and relating our results 
On The Circuit Complexity of Neural Networks 955 
with those in the literature. The proofs and other technical details will appear in a 
complete journal paper. 
2 Definitions and Background 
An n-variable boolean function is a mapping f: {-1, 1}" -- {-1, 1}. We view f 
as a (column) vector in T 2". Each of f's 2" components is either -1 or +1 and 
represents f(x) for a distinct value assignment x of the n boolean variables. We view 
the S weights of an S-input threshold gate as a weight vector w = (w,. ., ws?' 
in T s . 
Let the functions fl,. ., fs be the inputs of a threshold gate w. The gate computes 
a function f (or f is the output of the gate) if the following vector equation holds: 
f =sgn.(kfiwi ) (1) 
i=1 
where 
+1 ifx >0, 
sgn(x) = -1 if x < 0, 
undefined if x = 0. 
Note that this definition requires that all components of y]./s=x fiwi be nonzero. It 
is convenient to write Equation (1) in a lnatrix form: 
f -- sgn(w) 
where the input matrix 
 = [f...fsl 
is a 2" by S matrix whose columns are the input functious. The function f, is a 
threshold function of f,..., fs if there exists a threshold gate (i.e., w) with inputs 
fx,..., fs that computes f. 
These definitions form the basis of our approach. Each function, being a +1 vector 
in 2", determines an ortbant in .". A function f is the output of a threshold gate 
whose input functions are fx,..., fs if and only if the linear colnbination -]/s= fiwi 
defined by the gate lies inside the orthant deterlnined by f. 
Definition 1 Tb, e correlation of two n-variable booleau functions fi and f2 is: 
C//. : 
the two functions are uncorrelated or orthogonal if C.f/ = O. 
Note that Cj,j. = 1 - 2du(f, rs)/2" where du(fx, rs) is the Ilamnfing distance 
between fx and fu; thus, the correlation can be interpreted as a lneasure of how 
'close' the two functions are. 
Fix the input functions fx,...fs to a threshold gate. The correlalion veclor of a 
function f, with the input functions is 
CD , = (yTf)/2,= (C C ... CLts) T. 
Next, we define  a.s the maximuln in lnagnitude alnong the correlation coefficients, 
i.e., 0 - 
956 Roychowdhury, Orlitsky, Siu, and Kailath 
3 Summary of Results 
The correlation between two n-variable functions is a multiple of 2-("-t), bounded 
between -1 and 1, hence can assume 2" + 1 values. The correlation vector Cj, y = 
(Cj, A,... , Cj, A) a"can therefore assume at most (2" + 1) s different values. There are 
2 2n Boolean functions of n Boolean variables, hence many share the same correlation 
vector. However, the next theorem says that a threshold function of ft,..., fs does 
not share its correlation vector with any other fimction. 
Uniqueness Theorem Lel f be a lhreshold function of f t,..., fs. Then, for all 
gf, 
%r - c; 
Corollary i There are at most (2 n q- 1) s threshold functions of any set ors input 
functions. 
The special case of the Uniqueness Theorem where the functions f,..., fs are 
the input variables had been proven in [5, 9]. The proof used harmonic-analysis 
tools such as Parseval's theoreln. It relied on the mutual orthogonality of the input 
functions (namely, C,, = 0 for all i  j). Another special case where the input 
functions are parities of the input variables was proven in [3]. The same proof 
was used; see e.g. , pages 419-422 of [9]. Our proof shows that the harmonic- 
analysis tools and assumptions are not needed thereby (1) significantly simplifying 
the proof, and (2) showing that the functions ft,..., fs need not be orthogonal: 
the Uniqueness Theorem holds for all collections of functions. The more general 
result of the Uniqueness Theorem can be applied to obtain the following two new 
counting results. 
Corollary 2 The number of stable states in a Hopfield network with n elements 
which is programmed by ihe outer product rule to store s given vectors is < 
2 s 1og(n+l). -- 
Corollary 3 Let Fn(S(n), d) be ihe number ofn-variable booleaa functions comput- 
ed by depth-d threshold circuits with fan-in bounded by S(n) (we assume S(n) > n). 
Then, for all d, n > 1, 
?.($(n), _< 
It follows easily from our geometric framework that if C'j,. = 0 then f is not a 
threshold function of fl,..., rs: every linear colnbination of f,..., fs is orthogonal 
to f, hence cannot intersect the orthant determined by f. 
Next, we consider the case where C.f - O. Define the generalized spectrum to be 
the S-dimensional vector: 
3 - (3,... ,/s) T = (yTy)-XyT f 
(the reason for the definition and the name will be clarified soon). 
On The Circuit Complexity of Neural Networks 957 
Spectral-bound Theorem 
If f is a linear threshold function of fl,..., f$, then 
s 
k 1, hnc, 
i--1 
where / = max (I/sl  1 < i < $} 
The Spectral-Bound theorem provides a way of lower bounding the number $ of 
input functions. Specifically, if3i is exponentially small (in n) for all i 6 {1,..., $}, 
then $ must be exponentially large. 
In the special case where the input functions are parities of the input variables, all 
input functions are orthogonal; hence Y'Y = 2"Is and 
/ = .T f = C.t  
Note that every parity function p is a basis function of the ttadamard transfor- 
m, hence Cfp is the spectral coefficient corresponding to p in the transform (see 
[8, 2] for more details on spectral representation of boolean functions). Therefore, 
the generalized spectrum in this case is the real spectrum of f. In that case, the 
 Therefore, the num- 
Spectral-Bound Theorem implies that S > mxllCyy, l:X<i<S}' 
ber of input functions needed is at least the reciprocal of the lnaxilnun magnitude 
among the spectral coefficients (i.e., ). This special case was proved in [31. A- 
gain, their proofs used harmonic-analysis tools and assumptions that we prove are 
unnecessary, thereby generalizing them to arbitrary input functions. Moreover, 
our geometric approach considerably silnplifies the exposition by presenting simple 
proofs based on elelnentary results from linear algebra. 
In general, we can show that if the input functions fi are orthogonal (i.e. , Cf, L = 0 
for i  j) or asymptotically orthogonal (i.e. , lira C1,y = 0) then the nulnber of 
input functions S >_ 1/0, where 0 is the largest (in magnitude) correlation of the 
output function with any of its input function. 
We can also use the generalized spectrum to derive a lower bound on the error 
incurred in approximating a boolean function, f, using a set of basis functions. 
The lower bound can then be applied to show that the Majority function cannot be 
closely approximatedby a sparse polynomial. In particular, it can be shown that if a 
polynomial of the input variables with only polynomially many (in n) lnonomials is 
used to approximate an n variable Majority function then the approxilnation error 
is f(1/(loglogn)3/). This provides a direct spectral approach for proving lower 
bounds on the approximation error. 
The method of proving lower bounds on S in terlns of the correlation coefficients 
Cf, of f with the possible input functions, can be termed the method of correla- 
tions. Hajnal et. al. [7] used a different aspect of this method x to prove a lower 
bound on the depth of a threshold circuit that computes the Inner-product-mod-2 
function. 
They did not exactly use the correlation approach introduced in titis paper, rather an 
equivalent framework. 
958 Roychowdhury, Orlitsky, Siu, and Kailath 
Our techniques can be applied to investigate tile method of correlations in more 
detail and prove some limits to its effectiveness. We can show that the number, 
$, of input functions need not be inversely proportional to the largest correlation 
coefficient O. In particular, we give two constructive procedures showing that any 
function f is a threshold function of O(n) input functions each having an exponen- 
tially small correlation with f: ICff, I _ 2-("-). 
Construction i Every boolean function f of n variables (for n even) can be 
expressed as a threshold function of 3n boolean functions: fl, f2,'", f3n such that 
(1) CfI , = O, V I _ i _ 3n- 1, and () Ctt, = 2 -("-l). 
Construction 2 Every boolean fun. ction f of n variables can be expressed as a 
threshold function of 2n boolean functions: f, f2,'", f2 such that (1) Clf , = 
O,  1  i  2n- 2, and (2) C=,_ - C= - 2 . 
The results of the above constructions are surprising. For example, ill Construction 
1, the output fimction of the threshold ga. te is uncorrelated with all but one of 
the input filnctions, and the only non-zero correlation is the smallest possible (= 
2-("-)). Note that f is not a threshold function of a set of input functions, each 
of which is orthogonal to f. 
The above results thus provide a comprehensive understanding of the so-called 
method of correlations. In particular: (1) If the input functions are mutually or- 
thogonal (or asymptotically orthogonal), then the method of correlations is effective 
even if exponential weights are allowed, i.e., if a function is exponentially small cor- 
related with every function froin a pool of possible input functions, then one would 
require exponentially Illally inputs to ilnplement the given function using a thresh- 
old gate; (2) If the input functions are not mutually orthogonal, then the method of 
correlations need not be effective, i.e. , one call construct examples, where the out- 
put function is correlated exponentially sinall with every input function, and yet it 
can be implemented as a threshold function of polynomially many input. functions. 
Furthermore, the constructive procedures can also be considered as constituting a 
preliminary answer to the following question' Given all n-variable booleall function 
f, are there efficient procedures for expressing it as threshold functions of poly- 
nolnially many (ill n) input functions? A procedure for so decolllposing a given 
function f will be referred to as a threshold-decomposition procedure; moreover, a 
decomposition procedure Call be considered as efficient if tile inpnt functions have 
simpler threshold ilnplementations than f (i.e., easier to illlplement or require less 
depth/size). Constructions 1 and 2 present two such threshold-decolnposition pro- 
cedures. At present, tile efficiency of these constructions is not clear and fm'ther 
work is necessary. We hope, however, that the general methodology introduced here 
may lead to subsequent work resulting ill more efficient threshold-decolllposition 
procedures. 
4 Concluding Remarks 
We have outlined a new geometric approach for investigatilg the properties of 
threshold circuits. In the process, we have developed a unified framework where 
many of the previous results can be derived simply as special cases, and without in- 
On The Circuit Complexity of Neural Networks 959 
troducing too many seemingly difficult concepts. Moreover, we have derived several 
new results that quantify the input/output relationships of threshold gates, derive 
lower bounds on the number of input functions required to implement a given func- 
tion using a threshold gate, and also analyze the limitations of a well-known lower 
bound technique for threshold circuit. 
Acknowledgements 
This work was supported in part by the Joint Services Program at Stanford Univer- 
sity (US Army, US Navy, US Air Force) under Contract DAAL03-88-C-0011, the 
SDIO/IST, managed by the Army Research Office under Contract DAAL03-90-G- 
0108, and the Department of the Na.vy, NASA Headquarters, Center for Aeronautics 
and Space Information Sciences under Grant NAGW-419-S6. 
References 
[9] 
[10] 
[11] 
[12] 
[13] 
[1] E. Allender. A note on the power of threshold circuits. IEEE Syrup. Found. 
Comp. Sci., 30, 1989. 
[2] Y. Bradman, A. Orlitsky, and J. IIelmessy. A Spectral Lower Bound Technique 
for the size of Decision Trees and Two level AND/OR Circuits. IEEE Trans. 
on Computers, 39, No. 2:282-287, February 1990. 
[3] J. Bruck. Harmonic Analysis of Polylomial Threshold Functions. SIAM Jour- 
nal on Discrete Mathem. atics, May 1990. 
[4] A. K. Chandra, L. Stockmeyer, and U. Vishkin. Constant depth reducibility. 
Siam J. Cornput., 13:423-439, 1984. 
[5] C. K. Chow. On The Characterization of Threshold Functions. Proc. Syrup. 
on Switching Circuit Theory and Logical Design, pages 34-38, 1961. 
[6] T. M. Cover. Geometrical a. nd Statistical Properties of Systems of Linear In- 
equalities with Applications in Pattern Recognition. IEEE Trans. on Electronic 
Computers, EC-14:326-34, 1965. 
[7] A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, and G. Turan. Threshold circuits 
of bounded depth. IEEE Syrup. Found. Comp. Sci., 28:99-110, 1987. 
[8] R. J. Lechner. IIarmonic analysis of switching functions. In A. Mukhopadhyay, 
editor, Recent Devdopment in Switching Theory. Acadelnic Press, 1971. 
P.M. Lewis and C. L. Coates. Threshold Logic. John Wiley & Sons, Inc., 1967. 
I. Parberry and G. Schnitger. Parallel Colnputation with Threshold Functions. 
Journal of Computer and System Sciences, 36(3):278-302, 1988. 
J. Reif. Ou Threshold Circuits and Polynomial Computation. In Structure in 
Complexity Theory Syrup., pages 118-123, 1987. 
K. Y. Siu and J. Bruck. On the Power of Threshold Circuits with Small 
Weights. to appear in SIAM J. Discrete Math. 
K. Y. Siu, V. P. Roychowdhury, and T. Kailath. Computing with Ahnost 
Optimal Size Threshold Circuits. sublnitted to JCSS, 1990. 
Part XIV 
Performance Comparisons 
