History-dependent Attractor Neural 
Networks 
Isaac Meilijson Eytan Ruppin 
School of Mathematical Sciences 
Raymond and Beverly Sackler Faculty of Exact Sciences 
Tel-Aviv University, 69978 Tel-Aviv, Israel. 
Abstract 
We present a methodological framework enabling a detailed de- 
scription of the performance of Hopfield-like attractor neural net- 
works (ANN) in the first two iterations. Using the Bayesian ap- 
proach, we find that performance is improved when a history-based 
term is included in the neuron's dynamics. A further enhancement 
of the network's performance is achieved by judiciously choosing 
the censored neurons (those which become active in a given itera- 
tion) on the basis of the magnitude of their post-synaptic poten- 
tials. The contribution of biologically plausible, censored, history- 
dependent dynamics is especially marked in conditions of low firing 
activity and sparse connectivity, two important characteristics of 
the mammalian cortex. In such networks, the performance at- 
tained is higher than the performance of two 'independent' iter- 
ations, which represents an upper bound on the performance of 
history-independent networks. 
1 Introduction 
Associative Attractor Neural Network (ANN) models provide a theoretical back- 
ground for the understanding of human memory processes. Considerable effort has 
been devoted recently to narrow the gap between the original ANN Hopfield model 
(Hopfield 1982) and the realm of the structure and dynamics of the brain (e.g., 
Amit & Tsodyks 1991). In this paper, we contribute to the examination of the 
performance of ANNs under cortical-like architectures, where neurons are typically 
572 
History-dependent Attractor Neural Networks 573 
connected to only a fraction of their neighboring neurons, and have a low firing 
activity (Abeles et. al. 1990). We develop a general framework for examining var- 
ious signalling mechanisms (firing functions) and activation rules (the mechanism 
for deciding which neurons are active in some interval of time). 
The Hopfield model is based on memoryless dynamics, which identify the notion of 
'post-synaptic potential' with the input field received by a neuron from the neurons 
active in the current iteration. We follow a Bayesian approach under which the 
neuron's signalling and activation decisions are based on the current a-posteriori 
probabilities assigned to its two possible true memory states, +1. As we shall 
see, the a-posteriori belief in +1 is the sigmoidal function evaluated at a neuron's 
generalized field, a linear combination of present and past input fields. From a 
biological perspective, this history-dependent approach is strongly motivated by 
the observation that the time span of the different channel conductances in a given 
neuron is very broad (see Lytton 1991 for a review). While some channels are active 
for only microseconds, some slow-acting channels may remain open for seconds. 
Hence, a synaptic input currently impending on the neuron may influence both its 
current post-synaptic membrane potential, and its post-synaptic potential at some 
future time. 
2 The Model 
The neural network model presented is characterized as follows. There are m 'ran- 
dom memories' ', 1 _ Y _ m, and one 'true' memory m+x = . The (m + 1)N 
entries of these memories are independent and identically distributed, with equally 
likely values of +1 or -1. The initial state X has similarity P(Xi = i) = (1 + e)/2, 
P(Xi = -) = (1 - e)/2, independently of everything else. The weight of the 
synaptic connection between neurons i and j (i :fi j) is given by the simple Hebbian 
law 
Each neuron receives incoming synaptic connections from a random choice of K of 
the N neurons in the network in such a way that if a synapse exists, the synapse 
in the opposite direction exists with probability r, the refiexivity parameter. In the 
first iteration, a random sample of Lx neurons become active (i.e., 'fire'), thus on 
the average nx - LxK/N neurons update the state of each neuron. The field fi () 
of neuron i in the first iteration is 
N 
21 j=l 
(2) 
where Iij denotes the indicator function of the event 'neuron i receives a synaptic 
connection fi'om neuron j', and Ij  denotes the indicator function of the event 
'neuron j is active in the t'th iteration'. Under the Bayesian approach we adopt, 
neuron i assigns an a-priori probability hi  = P(i = +11Xi) = (1 + Xi)/2 to 
having +1 as the correct memory state and evaluates the corresponding a-posteriori 
probability Ai () - P(i = +11Xi, fi(x)), which turns out to be expressible a,s the 
574 Meilijson and Ruppin 
sigmoidal function 1/(1 + exp(-2x)) evaluated at some linear combination of Xi 
and fi(1). 
In the second iteration the belief ,ki (1) of a neuron determines the probability that 
the neuron is active. We illustrate two extreme modes for determining the active 
updating neurons, or activation: the random case where L2 active neurons are 
randomly chosen, independently of the strength of their fields, and the censored 
case, which consists of selecting the L2 neurons whose belief belongs to some set. 
The most appealing censoring rule from the biological point of view is tail-censoring, 
where the active neurons are those with the strongest beliefs. Performance, however, 
is improved under interval-censoring, where the active neurons are those with mid- 
range beliefs, and even further by combining tail and interval censoring into a hybrid 
rule. 
Let n2 -- L2K/N. The activation rule is given by a function C'[, 1] --, [0, 1]. 
Neuron j, with belief ,kj (1) in -51, becomes active with probability C(rnax(,j (1), 1 - 
,j (1))), independently of everything else. For example, the random case corresponds 
to C' -- _a and the tail-censored case corresponds to C(,k) - I or 0 depending on 
N 
whether rnax(,k, l - ,) exceeds some threshold. The output of an active neuron j 
is a signal fimction S(,Xj [1)) of its current belief. The field fi(2) of neuron i in the 
second iteration is 
N 
fi(2)_ 1 j , 
-- - Wij Iijlj(2)oC'(.'j(1)). (3) 
Neurou i now evaluates its a-posteriori belief 
,,i(2) ___ -P(i -- +11Xi, Ii(1),fi(1),fi(2)). As we shall see, /i  is, again, the 
sigmoidal function evaluated at some linear combination of the neuron's history 
Xi, Xili , fi(1) and fi . In contrast to the common history-independent Hopfield 
dynamics where the signal emitted by neuron j in the t'th iteration is a function 
of fj (t-l) only, Bayesian history-dependent dynamics involve signals and activation 
rules which depend on the neuron's generalized field, obtained by adaptively incor- 
porating fj(t-) to its previous generalized field. The final state Xi (2) of neuron i 
is taken as -1 or +1, depending on which of 1 - ,i (2) and /i  exceeds 1/2. 
For n/N, n2/N, m/N, KIN constant, and N large, we develop explicit expressions 
for the performance of the network, for any signal function (e.g., S1(,) = San(X - 
1/2) or S2(;) = 2;- 1) and activation rule. Performance is measured by the final 
overIapc"  -]iXi  (orequivalentlybythefinalsimilarity(l+e")/2). Various 
=N 
possible combinations of activation modes and signal functions described above are 
then examined under varying degrees of connectivity and neuronal activity. 
3 Single-iteration optimization: the Bayesian approach 
Consider the following well known basic fact in Bayesian Hypothesis Testing, 
Lemma 1 
Express the prior probability as 
1 
= 1)- 1 + 
(4) 
History-dependent Attractor Neural Networks 575 
and assume an observable Y which, given , is distributed according to 
1~ N(it, ") (5) 
for some constants it 6 (-cx>, cx>) and rr 2 6 (0, cx>). Then the posterior probability 
is 
1 
P( = IIY: Y) = I + e-(+(/)) ' (6) 
Applying this Lemma to Y = fi(1) with g = e and a 2 m : a, we see that 
where 7(e) 
/i(1) __ P(i = llX, fi(1)) __-- 1 
1 + e-2'('(')x'+s'(W) ' 
(7) 
=  log.l+' Hence, P( -- 1IX i fi(1)) > 1/2 if and only if fi (1) + 
1--e' ' 
cq?(e)Xi > 0. The single-iteration performance is then given by the similarity 
2 
l+e' (( ) 
"' 2 = P I() +"l*(e)x) > o1 = 
__---- Q(e, C1) 
(8) 
where (I) is the standard normal distribution function. The Hopfield dynamics, mod- 
ified by redefining W, as rn?(e) (in the Neural Network terminology) is equivalent 
(in the Bayesian jargon) to the obvious optimal policy, under which a neuron sets 
for itself the sign with posterior probability above 1/2 of being correct. 
4 Two-iterations optimization 
For mathematical convenience, we will relate signals and activation rules to nor- 
malized generalized fields rather than to beliefs. We let 
1 
h(:): X(x + e-) ' p(): c max x + -' x 
1+e_2c  (9) 
for c = e/x/-&-[. The signal function h is assumed to be odd, and the activation 
function p, even. 
In order to evaluate the belief Ai (2), we need the conditional distribution of fi () 
given Xi, ii(1) and fi(1), for i = -1 or i = +1. We adopt the working assumption 
that the pair of random variables (fi (1), fi ) has a bivariate normal distribution 
given i, Ii () and Xi, with i, Ii (1) and Xi affecting means but not variances or 
correlations. Under this working assumption, fi  is conditionally normal given 
(i, ii(1), Xi, fi(1)), with constant variance and a mean which we will identify. This 
working assumption allows us to model performance via the following well knmvn 
regression model. 
576 Meilijson and Ruppin 
Lemma 2 
If two random variables U and V with finite variances are such that E(VIU ) is a 
linear function of U and Var(VIU ) is constant, then 
Coy(U, V) 
E(VIU) = E(V) + Vat(U) (U- E(U)) (10) 
and 
Va,-(VlCr) = va,,(v)- (Coy(Or, v)) 
Va,'(v) (11) 
Letting U = fi(x) and V = fi(2), we obtain 
i(2) __ P(i - llXi,//(1), fi(1), fi(2)) = 
1 
(12) 
1 q- exp{-2  fi(1)/O 1 q- ?()X i q- r 
() -- bXiIi(1) - 
afi(x))]) 
1 q- exp{-2 [(e7() b(e*-ae) l"(1))Xi q- (: a(e*--ae) e*aefi(2)] } 
-- r' z r' ) fi(1) q_ 
which is the sigmoidal function evaluated at some generalized field. Expression (12) 
shows that the correct definition of a final state Xi(2), as the most likely value among 
+1 or -1, is 
and the performance is given by 
* -- a 
e a(e* - ae) fiO) q_ fi(2 ) 
Ol 1 T 2 T 2 
(la) 
v(xg (2) = il) - 
where the one-iteration performance function Q is defined by (8), and 
0'* -- '- 
n+rn 
(14) 
(15) 
We see that the performance is conveniently expressed as the single-iteration optimal 
performance, had this iteration involved n* rather than nl sampled neurons. This 
formula yields a numerical and analytical tool to assess the network's performance 
with different signal functions, activation rules and architectures. Due to space 
restrictions, the identification of the various parameters used in the above formulas 
is not presented. However, it can be shown that in the sparse limit arrived at 
by fixing a and a2 and letting both I(/rn and N/K go to infinity, it is always 
better to replace an iteration by two smaller ones. This suggests that Bayesian 
History-dependent Attractor Neural Networks 577 
updating dynamics should be essentially asynchronous. We also show that the 
1 ) is superior to the performance 
two-iterations performance Q e, 
Q (2Q(e, c) - 1, c2) of two independent optimal single iterations. 
5 Heuristics on activation and signalling 
t 
1 
-1 
-3 
-4 
Figure 1: A typical plot of R(x)= l(:g)/00(lg). Network parameters are N = 500, 
K = 500, nl = n2 = 50 and m = 10. 
By (14) and (15), performance is mostly determined by the magnitude of (e*-ae) 2. 
It can be shown that 
and 
(* -- a)l a : p(ig)h(gg)(l(gg)dig 
(16) 
= p(x)Oo(x)dx (17) 
where 51 and 50 are some specific linear combinations of Gaussian densities and 
their derivatives, and  = n2/K is the activity level. High performance is achieved 
by maximizing over p and possibly over h the absolute value of expression (16) 
keeping (17) fixed. In complete analogy to Hypothesis Testing in Statistics, where 
 takes the role of level of significance and (e* -ae)CIt the role of power, p(x) 
should be i or 0 (activate the neuron or don't) depending on whether the field 
value x is such that the likelihood ratio h(x)Cfl(X)/c)o(x) is above or below a given 
threshold, determined by (17). Omitting details, the ratio R(x) = C)l(X)/c)o(X) 
looks as in figure 1, and converges to -oo as x  oo. 
We see that there are three reasonable ways to make the ratio h(x)q51(x)/c)o(x) 
large: we can take a negative threshold such as tl in figure 1, activate all neurons 
with generalized field exceeding /a (tail-censoring) and signal h(x) = -Sgn(x), 
578 Meilijson and Ruppin 
or take a positive threshold such as t2 and activate all neurons with field value 
between ]1 and fi2 (interval-censoring) and signal h(x) = Sgn(x). Better still, we 
can consider the hybrid signalling-censoring rule: Activate all neurons with absolute 
field value between ]1 and fi2, or beyond fi3. The first group should signal their 
preferred sign, while those in the second group should signal the sign opposite to 
the one they so strongly believe in ! 
6 Numerical results 
Performance predicted experimental 
Random activation 0.955 0.951 
Tail censoring 0.972 0.973 
Interval/Hybrid censoring 0.975 0.972 
Hopfield - zero diagonal - 0.902,0.973 
Independent 7(e) diagonal 0.96 - 
Independent zero diagonal 0.913 - 
Table 1: Sparsely connected, low activity network: N = 1500, K = 50, r*l = r*2 -- 
20, rn = 5. 
;?':'..,1 .. . ,  .......-.....:..'. 
0.95 
0.90 
0.85 
 First Iteration 
....... Random Activation 
...... Tail-censoring 
* .... -,, Interval-censoring 
........ Hybrid censoring-signalling 
0.80 -- '  ' ' '  '  ' 
0.0 20000.0 40000.0 60000.0 80000.0 100000.0 
K 
Figure 2: Performance of a large-scale cortical-like 'columnar' ANN, at different 
values of connectivity K, for initial similarity 0.75. N -- 105 r*l -- n2 -- 200, 
, 
m - 50. The horizontal line denotes the performance of a single iteration. 
History-dependent Attractor Neural Networks 579 
Our theoretical performance predictions show good correspondence with simula- 
tion results, already at fairly small-scale networks. The superiority of history- 
dependent dynamics is apparent. Table 1 shows the performance achieved in a 
sparsely-connected network. The predicted similarity after two iterations is re- 
ported, starting from initial similarity 0.75, and compared with experimental results 
averaged over 100 trials. 
Figure 2 illustrates the theoretical two-iterations performance of large, low-activity 
'cortical-like' networks, as a function of connectivity. We see that interval-censoring 
can maintain high performance throughout the connectivity range. The perfor- 
mance of tail-censoring is very sensitive to connectivity, almost achieving the per- 
formance of interval censoring at a narrow low-connectivity range, and becoming 
optimal only at very high connectivity. The superior hybrid rule improves on the 
others only under high connectivity. As a cortical neuron should receive the con- 
comitant firing of about 200 - 300 neurons in order to be activated (Treves & Rolls 
1991), we have set n = 200. We find that the optimal connectivity per neuron, for 
biologically plausible tail-censoring activation, is of the same order of magnitude as 
actual cortical connectivity. The actual number nN/K of neurons firing in every 
iteration is about 5000, which is in close correspondence with the evidence suggest- 
ing that about 4% of the neurons in a module fire at any given moment (Abeles et. 
al. 1990). 
References 
[1] J.J. Hopfield. Neural networks and physical systems with emergent collective 
abilities. Proc. Nat. Acad. Sci. USA, 79:2554, 1982. 
[2] D. J. Amit and M. V. Tsodyks. Quantitative study of attractor neural net- 
work retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. 
Network, 2:259-273, 1991. 
[3] M. Abeles, E. Vaadia, and H. Bergman. Firing patterns of single units in the 
prefrontal cortex and neural network models. Network, 1:13-25, 1990. 
[4] W. Lytton. Simulations of cortical pyramidal neurons synchronized by inhibitory 
interneurons. J. Neurophysiol., 66(3):1059-1079, 1991. 
[5] A. Treves and E. T. Rolls. What determines the capacity of autoassociative 
memories in the brain? Network, 2:371-397, 1991. 
