The Devil and the Network: 
What Sparsity Implies to Robustness and 
Memory 
Sanjay Biswas and Santosh S. Venkatesh 
Department of Electrical Engineering 
University of Pennsylvania 
Philadelphia, PA 19104 
Abstract 
Robustness is a commonly bruited property of neural networks; in particu- 
lar, a folk theorem in neural computation asserts that neural networks--in 
contexts with large interconnectivity--continue to function efficiently, al- 
beit with some degradation, in the presence of component damage or loss. 
A second folk theorem in such contexts asserts that dense interconnectiv- 
ity between neural elements is a sine qua non for the efficient usage of 
resources. These premises are formally examined in this communication 
in a setting that invokes the notion of the "devil" 1 in the network as an 
agent that produces sparsity by snipping connections. 
I ON REMOVING THE FOLK FROM THE THEOREM 
Robustness in the presence of component damage is a property that is commonly 
attributed to neural networks. The content of the following statement embodies 
this sentiment. 
Folk Theorem 1: Computation in neural networks is not substantially 
affected by damage to network components. 
While such a statement is manifestly not true in general--witness networks with 
"grandmother cells" where damage to the critical cells fatally impairs the com- 
putational ability of the network--there is anecdotal evidence in support of it in 
Well, maybe an imp. 
883 
884 Biswas and Venkatesh 
situations where the network has a more "distributed" flavour with relatively dense 
interconnectivity of elements and a distributed format for the storage of information. 
Qualitatively, the phenomenon is akin to holographic modes of storing information 
where the distributed, non-localised format of information storage carries with it a 
measure of security against component damage. 
The flip side to the robust folk theorem is the following observation, robustness 
notwithstanding: 
Folk Theorem 2: Dense interconnectivity is a sine qua non for etcient 
usage o resources; in particular, sparser structures exhibit a degradation 
in compu tattonal 'capability. 
Again, disclaimers have to be thrown in on the applicability of such a statement. 
In recurrent network architectures, however, this might seem to have some merit. 
In particular, in associative memory applications, while structural robustness might 
guarantee that the loss in memory storage capacity with increased interconnection 
sparsity may not be catastrophic, nonetheless intuitively a drop in capacity with 
increased sparsity may be expected. 
This communication represents an effort to mathematically codify these tenets. In 
the setting we examine we formally introduce sparse network interconnectivity by 
invoking the notion of a (puckish) devil in the network which severs interconnection 
links between neurons. Our results here involve some surprising consequences-- 
viewed in the light of the two folk theorems of sparse interconnectivity to robust- 
ness and to memory storage capability. Only the main results are stated here; for 
extensions and details of proofs we refer the interested reader to Venkatesh (1990) 
and Biswas and Venkatesh (1990). 
Notation We denote by lB the set (-1, 1). For every integer k we denote the set 
of integers (1, 2,..., k) by [k]. By ordered multiset we mean an ordered collection 
of elements with repetition of elements allowed, and by k-set we mean an ordered 
multiset of k elements. All logarithms in the exposition are to base e. 
2 RECURRENT NETWORKS 
2.1 INTERCONNECTION GRAPHS 
We consider a recurrent network of n formal neurons. The allowed pattern of 
neural interconnectivity is specified by the edges of a (bipartite) interconnectivity 
graph, G,, on vertices, In] x In]. In particular, the existence of an edge {i,j} in 
G is indicative that the state of neuron j is input to neuron i. 2 The network is 
characterised by an n x n matrix of weights, W = [wij], where wij denotes the 
(real) weight modulating the state of neuron j at the input of neuron i. If u  lB 
is the current state of the system, an update, ui - uti of the state of neuron i is 
2Equivalently, imagine a devil loose with a pair of scissors snipping those interconnec- 
tions for which {i, j}  (7,. For a complementary discussion of sparse interconnectivity 
see Koml6s and Paturi (1988). 
The Devil and the Network 885 
specified by the linear threshold rule 
The network dynamics describe trajectories in a state space comprised of the vertices 
of the n-cube. 3 We are interested in an associative memory application where we 
wish to store a desired set of states--the memories--as fixed points of the network, 
and with the property that errors in an input representation of a memory are 
corrected and the memory retrieved. 
2.2 DOMINATORS 
Let u G lB" be a memory and 0 <_ p < 1 a parameter. Corresponding to the memory 
u we generate a probe fi G IB n by independently specifying the components, /j, of 
the probe as follows: 
I u I with probability 1 - p 
1 = -uj with probability p. 
(1) 
We call fi a random probe with parameter p. 
Definition 2.1 We say that a memory, u, dominates over a radius pn if, with 
probability approaching one as n -- 0% the network corrects all errors in a ran- 
dom probe with parameter p in one synchronous step. We call p the (fractional) 
dominance radius. We also say that u is stable if it is a O-dominator. 
REMARKS: Note that stable memories are just fixed points of the network. Also, 
the expected number of errors in a probe is pn. 
2.3 CODES 
For given integers m >_ 1, n >_ 1, a code,/C, is a collection of ordered multisets of 
size m from ]13". We say that an m-set of memories is admissible iff it is in/Cn TM.' 
Thus, a code just specifies which m-sets are allowable as memories. Examples of 
codes include: the set of all multisets of size m from lB"; a single multiset of size 
m from lB"; all collections of m mutually orthogonal vectors in lB"; all m-sets of 
vectors in IB" in general position. 
Define two ordered multisets of memories to be equivalent if they are permutations 
of one another. We define the size of a code, /Cn TM, to be the number of distinct 
equivalence classes of m-sets of memories. We will be interested in codes of rela- 
tively large size: log ]]C,ml/n -* c as n -. c. In particular, we require at least 
an exponential number of choices of (equivalence classes of) admissible m-sets of 
memories. 
aAs usual, there are Liapunov functions for the system under suitable conditions on 
the interconnectivity graph and the corresponding weights. 
4We define admissible m-sets of memories in terms of ordered multisets rather than 
sets so as to obviate certain technical nuisances. 
886 Biswas and Venkatesh 
2.4 CAPACITY 
For each fixed n and interconnectivity graph, Gn, an algorithm, J, is a prescription 
which, given an m-set of memories, produces a corresponding set of interconnection 
weights, wij, i  In], {i,j}  G,. For m _ I let ,4(uX,... ,urn) be some attribute 
of m-sets of memories. (The following, for instance, are examples of attributes of 
admissible sets of memories: all the memories are stable in the network generated 
by J; almost all the memories dominate over a radius pn.) For given n and m, we 
choose a random m-set of memories, U 1, ..., Urn, from the uniform distribution on 
Definition 2.2 Given interconnectivity graphs Gn, codes K;n m, and algorithm X, 
a sequence, {Cn}n__x, is a capacity function for the attribute .A (or .A-capacity for 
short) if for A > 0 arbitrarily small: 
a) r {.A(ul, ..., urn)} -+ I as n whenever m 
b) r {.A(u I Urn)} 0 as n --* oo whenever m > (1 
We also say that C is a lower .A-capacity if property (a) holds, and that C is an 
upper.A-capacity if property (b) holds. 
For m > 1 let u  urn 
The outer-product algorithm specifies the interconnection weights, wij, according 
to the following rule: for i 
In general, if the interconnectivity graph, Gn, is symmetric then, under a suitable 
mode of operation, there is a Liapunov function for the network specified by the 
outer-product algorithm. Given graphs Gn, codes K;n m, and the outer-product algo- 
rithm, for fixed 0 < p < 1/2 we are interested in the attribute Z)p that each of the 
m memories dominates over a radius pn. 
3 RANDOM GRAPHS 
We investigate the effect of a random loss of neural interconnections in a recurrent 
network of n neurons by considering a random bipartite interconnectivity graph 
R6n on vertices In] x In] with 
P {{i,j}  RGr,} - p 
for all i 6 [n], j 6 [n], and with these probabilities being mutually independent. 
The interconnection probability p is called the sparstry parameter and may depend 
on n. The system described above is formally equivalent to beginning with a fully- 
interconnected network of neurons with specified interconnection weights wij, and 
then invoking a devil which randomly severs interconnection links, independently 
retaining each interconnection weight wij with probability p, and severing it (re- 
placing it with a zero weight) with probability q = 1 -p. 
The Devil and the Network 887 
Let /C denote the complete code of all choices of ordered multisets of size rn from 
IU n . 
Theorem 3.1 Let 0 _ p < 1/2 be a fixed dominance radius, and let the sparstry 
parameter p satisfy pn 2 --. c as n --. c. Then (1 - 2p)2pn/21ogpn 
capacity for random interconnectivity graphs RGn, complete codes !C, and the 
outer-product algorithm. 
REMARKS: The above result graphically validates Folk Theorem 1 on the fault- 
tolerant nature of the network; specifically, the network exhibits a graceful degra- 
dation in storage capacity as the loss in interconnections increases. Catastrophic 
failure occurs only when p is smaller than log n/n: each neuron need retain only of 
the order 0f f2(log n) links of a total of n possible links with other neurons for useful 
associative properties lo emerge. 
4 BLOCK GRAPHS 
One of the simplest (and most regular) forms of sparsity that a favourably disposed 
devil might enjoin is block sparsity where the neurons are partitioned into disjoint 
subsets of neurons with full-interconnectivity within each subset and no neural 
interconnections between subsets. The weight matrix in this case takes on a block 
diagonal form, and the interconnectivity graph is composed of a set of disjoint, 
complete bipartite sub-graphs. 
More formally, let 1 _ b _ n be a positive integer, and let {I1,... , In/o} partition 
[n] such that each subset of indices, I, k = 1, ..., n/b, has size IIl = b. 5 We call 
each I a block and b the block size. We specify the edges of the (hipattire) block 
interconnectivity graph BG, by {i, j} 6 BG, iff i and j lie in a common block. 
Theorem 4.1 Let the block size b be such that b = f2(n) as n -. c, and let 
0 _ p ( 1/2 be a fixed dominance radius. Then (1 - 2p)b/21ogbn is a Dp-capacity 
for block interconnectivity graphs BGn, complete codes Clgn m, and the outer-product 
algorithm. 
Corollary 4.2 Under the conditions of theorem d.1 the fixed point memory capacity 
is b/2 log bn. 
Corollary 4.3 For a fully-interconnected graph, complete codes Clgn m, and the 
outer-product algorithm, the fixed point memory capacity is n/4 log n. 
Corollary 4.3 is the main result shown by McEliece, Posner, Rodemich, and 
Venkatesh (1987). Theorem 4.1 extends the result and shows (formally validat- 
ing the intuition espoused in Folk Theorem 2) that increased sparsity causes a loss 
in capacity if the code is complete, i.e., all choices of memories are considered ad- 
missible. It is possible, however, to design codes to take advantage of the sparse 
interconnectivity structure, rather at odds with the Folk Theorem. 
Here, as in the rest of the paper, we ignore details with regard to integer rounding. 
888 Biswas and Venkatesh 
Without loss of generality let us assume that block I consists of the first b indices, 
[b], block I2 the next b indices, [2b]-[b], and so on, with the last block In/b consisting 
of the last b indices, [n] - [n - b]. We can then partition any vector u  IB n as 
u = . , (3) 
tln / b 
where for k = 1, ..., n/b, u is the vector of components corresponding to block I. 
. M,/b 
For M >_ 1 we form the block code I5, as follows: to each ordered multiset of 
,'' ' Mn/b 
M vectors, u I ., u M from IB", we associate a unique ordered multiset in 
by lexicographically ordering all M n/b vectors of the form 
u  
 , 0ll, et2,... , et,/b  [M]. 
un/b 
Thus, we obtain an admissible set of M n/b memories from any ordered multiset 
of M vectors in IB" by "mixing" the blocks of the vectors. We call each M-set of 
vectors, u , ..., u M 6 IB n, the generating vectors for the corresponding admissible 
K. M 
set of memories in --n  
EXAMPLE' Consider a case with n = 4, block size b = 2, and M = 2 generating 
vectors. To any 2-set of generating vectors there corresponds a unique 4(=Mn/b)-set 
in the block code as follows: 
Theorem 4.4 Let 0 _ p < 1/2 be a fixed dominance radius. Then we have the fol- 
lowing capacity estimates for block interconnectivity graphs BGn, block codes 
and the outer-product algorithm: 
a) If the block size b satisfies n log log bn/b log bn -+ 0 
Dp-capacity is 
2 log bn ] ' 
as n -. ov then the 
Define for any , 
= 
log log b n.-{-log (2 (1- 2p)--2 ) _1_  log 1o bn 
los  - (los )0o b,) 
The Devil and the Network 889 
If the block size b satisfies b/log n -. c and b log bn/log log bn = 
n -. c, lhen On(y) is a lower Dp-capacity for any choice of y < 
Ca(y) is an upper Dp-capacity for any y > 3/2. 
0(.) as 
3/2 and 
Corollary 4.5 If, for fixed t _ 1, 
theorem d.d, the 7)p-capacity is 
we have b = n/t, then, under the conditions of 
t 
Corollary 4.6 For any fixed dominance radius 0 _ p < 1/2, and for any ' < 1, a 
constant c  0 and a code of size f (2 cn-) can be found such that it is possible 
to achieve lower 1)p-capacities which are f (2 n) in recurrent neural networks with 
interconnectivity graphs of degree 0 (hi-T). 
REM.RKS: If the number of blocks is kept fixed as n grows (i.e., the block size 
grows linearly with n) then capacities polynomial in n are attained. If the num- 
ber of blocks increases with n (i.e., the block size grows sub-linearly with n) then 
super-polynomial capacities are attained. Furthermore, we have the surprising re- 
sult rather at odds with Folk Theorem 2 that very large storage capacities can 
be obtained at the expense of code size (while still retaining large code sizes) in 
increasingly sparse networks. 
Acknowledgement s 
The support of research grants from E. I. Dupont de Nemours, Inc. and the Air 
Force Office of Scientific Research (grant number AFOSR 89-0523) is gratefully 
acknowledged. 
References 
Biswas, S. and S.S. Venkatesh (1990), "Codes, sparsity, and capacity in neural 
associative memory," submitted for publication. 
KomlCs, J. and R. Paturi (1988), "Effects of connectivity in associative memory 
models," Technical Report CS88-131, University of California, San Diego, 1988. 
McEliece, R. J., E. C. Posner, E. R. Rodemich, and S. S. Venkatesh (1987), "The 
capacity of the Hopfield associative memory," IEEE Trans. Inform. Theory, vol. IT- 
33, pp. 461-482. 
Venkatesh, S.S. (1990), "Robustness in neural computation: random graphs and 
sparsity," to appear IEEE Trans. Inform. Theory. 
