Statistical Mechanics of Temporal Association 
in Neural Networks with Delayed Interactions 
Andreas V.M. Herz 
Division of Chemistry 
Caltech 139-74 
Pasadena, CA 91125 
Zhaoping Li 
School of Natural Sciences 
Institute for Advanced Study 
Princeton, NJ 08540 
J. Leo van Hemmen 
Physik-Department 
der TU Miinchen 
D-8046 Garching, FRG 
Abstract 
We study the representation of static patterns and temporal associa- 
tions in neural networks with a broad distribution of signal delays. 
For a certain class of such systems, a simple intuitive understanding 
of the spatio-temporal computation becomes possible with the help 
of a novel Lyapunov functional. It allows a quantitative study of 
the asymptotic network behavior through a statistical mechanical 
analysis. We present analytic calculations of both retrieval quality 
and storage capacity and compare them with simulation results. 
I INTRODUCTION 
Basic computational functions of associative neural structures may be analytically 
studied within the framework of attractor neural networks where static patterns are 
stored as stable fixed-points for the system's dynamics. If the interactions between 
single neurons are instantaneous and mediated by symmetric couplings, there is a 
Lyapunov function for the retrieval dynamics (Hopfield 1982). The global compu- 
tation corresponds in that case to a downhill motion in an energy landscape created 
by the stored information. Methods of equilibrium statistical mechanics may be ap- 
plied and permit a quantitative analysis of the asymptotic network behavior (Amit 
et al. 1985, 1987). The existence of a Lyapunov function is thus of great con- 
ceptual as well as technical importance. Nevertheless, one should be aware that 
environmental inputs to a neural net always provide information in both space and 
time. It is therefore desirable to extend the original Hopfield scheme and to explore 
possibilities for a joint representation of static patterns and temporal associations. 
176 
Statistical Mechanics of Temporal Association in Neural Networks 177 
Signal delays are omnipresent in the brain and play an important role in biolog- 
ical information processing. Their incorporation into theoretical models seems to 
be rather convincing, especially if one includes the distribution of the delay times 
involved. Kleinreid (1986) and Sompolinsky and Kanter (1986) proposed models 
for temporal associations, but they only used a single delay line between two neu- 
rons. Tank and Hopfield (1987) presented a feedforward architecture for sequence 
recognition based on multiple delays, but they just considered information relative 
to the very end of a given sequence. Besides these deficiences, both approaches lack 
the quality to acquire knowledge through a true learning mechanism: Synaptic ef- 
ficacies have to be calculated by hand which is certainly not satisfactory both from 
a neurobiological point of view and also for applications in artificial intelligence. 
This drawback has been overcome by a careful interpretation of the Hebb principle 
(1949) for neural networks with a broad distribution of transmission delays (Herz 
et al. 1988, 1989). After the system has been taught stationary patterns and 
temporal sequences -- by the same principle ! -- it reproduces them with high 
precission when triggered suitably. In the present contribution, we focus on a special 
class of such delay networks and introduce a Lyapunov (energy) functional for the 
deterministic retrieval dynamics (Li and Herz 1990). We thus generalize Hopfield's 
approach to the domain of temporal associations. Through an extension of the usual 
formalism of equilibrium statistical mechanics to time-dependent phenomena, we 
analyze the network performance under a stochastic (noisy) dynamics. We derive 
quantitative results on both the retrieval quality and storage capacity, and close 
with some remarks on possible generalizations of this approach. 
2 DYNAMICS OF THE NEURONS 
Throughout what follows, we describe a neural network as a collection of N two- 
state neurons with activities Si = i for a firing cell and Si = -1 for a quiescent 
one. The cells are connected by synapses with modifiable efficacies Jij(r). Here r 
denotes the delay for the information transport from j to i. We focus on a soliton- 
like propagation of neural signals, characteristic for the (axonal) transmission of 
action potentials, and consider a model where each pair of neurons is linked by 
several axons with delays 0 _< r < truax. Other architectures with only a single 
link have been considered elsewhere (Coolen and Gielen 1988; Herz et al. 1988, 
1989; Kerszberg and Zippelius 1990). External stimuli are fed into the system via 
receptors tri--4-1 with input sensitivity 7. The postsynaptic potentials are given by 
N Tm,x 
h,(t) = (1 - J,j()sj(t - +  (1) 
j=l r=0 
We concentrate on synchronous dynamics (Little 1974) with basic time step At- 
1. Consequently, signal delays take nonnegative integer values. Synaptic noise is 
described by a stochastic Glauber dynamics with noise level 13= T-  (Peretto 1984), 
1 
Prob[Si(t + 1)= 4-1] = 3{1 4- tanh[13hi(t)]}, (2) 
where Prob denotes probability. For 13cx>, we arrive at a deterministic dynamics, 
1, if hi(t) > 0 
Si(t + 1)=sgn[hi(t)] _= -1, if hi(t) < 0 (3) 
178 Herz, Li, and van Hemmen 
3 HEBBIAN LEARNING 
During a learning session the synaptic strengths may change according to the Hebb 
principle (1949). We focus on a connection with delay r between neurons i and j. 
According to Hebb, the corresponding efficacy Jij (r) will be increased if cell j takes 
part in firing cell i. In its physiological context, this rule was originaly formulated 
for excitatory synapses only, but for simplicity, we apply it to all synapses. 
Due to the delay r in (1) and the parallel dynamics (2), it takes r+ ltime steps until 
neuron j actually influences the siaie of neuron i. Jij(v) thus changes by an amount 
proportional to the product of $j(t-r) and $i(t+l). Starting with Jij(r)=O, we 
obtain after P learning sessions, labeled by It and each of duration Dr, 
p D. 
Jij(T) = E(T)N-1 Z Z 
+ - ,9 --  
(4) 
The parameters e(r), normalized by x-""* e(r) = 1 take morphological character- 
Z_r:O , 
istics of the delay lines into account; N-  is a scaling factor useful for the theoretical 
analysis. By (4), synapses act as microscopic feature detectors during the learning 
sessions and store correlations of the taught sequences in both space (i, j) and time 
(r). In general, they will be asymmetric in the sense that 
During learning, we set T = 0 and 7 = 1 to achieve a "clamped learning scenario" 
where the system evolves strictly according to the external stimuli, $i(t)= ai(t,-1). 
We study the case where all input sequences 0'i(i) are cyclic with equal periods 
D t, = D, i.e., a(t) = a(t q-D) for all It. In passing we note that one should offer 
the sequences already rmax time steps before allowing synaptic plasticity & la (4) so 
that both Si and S1 are in well defined states during the actual learning sessions. 
We define patterns 0 by 0 =_ ai(t = a) for 0 _< a < D and get 
P D--1 
 (5) 
/=1 a=0 
Our learning scheme is thus a generalization of outer-product rules to spatio- 
temporal patterns. As in the following, temporal arguments of the sequence pattern 
states  and the synaptic couplings should always be understood modulo D. 
4 LYAPUNOV FUNCTIONAL 
Using formulae (1)-(5), one may derive equations of motion for macroscopic order 
parameters (Herz et al. 1988, 1989) but this kind of analysis only applies to the case 
P << logN. However, note that from (4) and (5), we get Jii(r): Jii(D - (2 + r)). 
For all networks whose a priori weights e(r) obey e(r) = e(D - (2 + r)) we have 
thus found an "extended synaptic symmetry" (Li and Herz 1990), 
%) = h(o- (2 + 
(6) 
generalizing Hopfield's symmetry assumption Jij - Jji in a natural way to the 
temporal domain. To establish a Lyapunov functional for the noiseless retrieval 
Statistical Mechanics of Temporal Association in Neural Networks 179 
dynamics (3), we take 7=0 in (1) and define 
N D-1 
1 
H(t) --  , , Jij(r)Si(t - a)Sj(t -(a + r + 1)%D) , (7) 
i '= a =0 
where a%b = a mod b. The functional H depends on allstates between t+i-D and 
t so that solutions with constan! H, like D-periodic cycles, need not be static fixed 
points of the dynamics. By (1), (5) and (6), the difference /XH(t)-_-H(t)-H(t-1) 
is 
N 
AH(t) = -y[Si(t)-Si(t-D)]h,(t-1) 
i=1 
e(D- 1) t D-1 
2N 
/=1 a=0 i=1 
(8) 
The dynamics (3) implies that the first term is nonpositive. Since e(r) > 0, the same 
holds true for the second one. For finite N, H is bounded and AH has to vanish 
as t--*e. The system therefore settles into a state with Si(t)=Si(t-D) for all i. 
We have thus exposed two important facts: (a) the retrieval dynamics is governed 
by a Lyapunov functional, md (b) the system relaxes to a static state or a limit 
cycle with Si(t)=Si(t -D) --oscillatory solutions with the same period as that of 
the taught cycles or a period which is equal to an integer fraction of D. 
Stepping back for an overview, we notice that H is a Lyapunov functional for all 
networks which exhibit an "extended synaptic symmetry" (6) and for which the 
matrix J(D - 1) is positive semi-definite. The ttebbian synapses (4) constitute an 
important special case and will be the main subject of our further discussion. 
5 STATISTICAL MECHANICS 
We now prove that a limit cycle of the retrieval dynamics indeed resembles a stored 
sequence. ;Ve proceed in two steps. First, we demonstrate that our task concerning 
cyclic temporal associations can be mapped onto a symmetric network without 
delays. Second, we apply equilibrium statistical mechanics to study such %quivalen! 
systems" and derive analytic results for the retrieval quality and storage capacity. 
D-periodic oscillatory solutions of the retrieval dynamics can be interpreted as static 
states in a "D-plicated" system with D columns and N rows of cells with activities 
$ia. A network state will be written A = (Ao,A,... ,AD-1) with Aa = {Si; 1 _< 
i < N}. To reproduce the parallel dynamics of the original system, neurons 
with a: t%D are updated at time t. The time evolution of the new network 
therefore has a pseudo-sequential characteristic: synchronous within single columns 
and sequentially ordered with respect to these columns. Accordingly, the neural 
activities at time t are given by Sia(t) -= Si(a + nt) for a < t%D and Sia(t) 
Si(a + nt - D) for a > t%D, where nt is defined through t = nt +t%D. Due to (6), 
symmetric efficacies J= J)? may be contructed for the new system by 
Jj* = Jij((b - a - 1)%D) , (9) 
allowing a well-defined HamiltonJan, equal to that of a Hopfield net of size ND, 
N D-1 
1 
=-- . 
i,j=l a,b=O 
180 Herz, Li, and van Hemmen 
An evaluation of (10) in terms of the former state variables reveals that it is identical 
to the Lyapunov functional (7). The interpretation, however, is changed: a limit 
cycle of period D in the original network corresponds to a fized-point of the new 
system of size ND. We have thus shown that the time evolution of a delay network 
with extended symmetry can be understood in terms of a downhill motion in the 
energy landscape of its "equivalent system". 
For Hebbian couplings (5), the new efficacies Ji b take a particularly simple form if 
we define patterns {'ff; 1  i  N, 0  a  D-l) by iff = i,(a-)%D, i.e., if we create 
column-shifted copies of the prototype a . Setting ab = e((b- a - 1)%D) - a 
leads to 
P D--1 
.]b _. abN-1 E E ff  (11) 
p---1 
Storing one cycle ai(t) = 0 in the delay network thus corresponds to memorizing 
D shifted duplicates ff, 0 _( a < D, in the equivalent system, reflecting that a 
D-cycle can be retrieved in D different time-shifted versions in the original network. 
If, in the second step, we now switch to the stochastic dynamics (2), the important 
question arises whether H also determines the equilibrium distribution p of the 
system. This need not be true since the column-wise dynamics of the equivalent 
network differs from both the Little and Hopfield model. An elaborate proof (Li 
and Herz, 1990), however, shows that there is indeed an equilibrium distribution  
la Gibbs, 
p(A) = Z -1 exp[-/n(A)], (12) 
where Z = Tr^ exp[-/H(A)]. In passing we note that for D = 2 there are only 
links with zero delay. By (6) we have Jij(0) = hi(O), i.e., we are dealing with a 
symmetric Little model. We may introduce a reduced probability distribution /5 
for this special case, p(A1)=Tr^o p(AoAx), and obtain P(A1)=Z -x exp[-(A1)] 
with 
N N 
 __-- __-1  ln[2 cosh( -. JijSj)]. (13) 
i=1 j=l 
We thus have recovered both the effective Hamiltonian of the Little model as derived 
by Peretto (1984) and the duplicated-system technique of van Hemmen (1986). 
We finish our argument by turning to quantative results. We focus on the case 
where each of the P learning sessions corresponds to teaching a (different) cycle 
of D patterns a , each lasting for one time step. We work with unbiased random 
patterns where a  = 4-1 with equal probability, and study our network at a finite 
storage level a = limN-.oo(P/N) > 0. A detailed analysis of the case where the 
number of cycles remains bounded as N --. cx can be found in (Li and Herz 1990). 
As in the replica-symmetric theory of Amit et al. (1987), we assume that the network 
is in a state highly correlated with a finite number of stored cycles. The remaining, 
extensively many cycles are described as a noise term. We define "partial" overlaps 
by rna = N-  i  
ia $ia. These macroscopic order parameters measure how close 
the system is to a stored pattern u" at a specific column a. We consider retrieval 
Statistical Mechanics of Temporal Association in Neural Networks 181 
solutions, i.e., rn" 
1990) 
where 
= rat'6.,o, and arrive at the fixed-point equations (Li and Herz 
rat` = <<t`o tanh[/3{-, ravvo + Vr-z}])) , (14) 
r = q [1 - - 
and q = (( tanh2[/3{-. ravsev + x/"z}])). (15) 
Double angular brackets represent an average with respect to both the "condensed" 
cycles and the normalized Gaussian random variable z. The A() are eigenvalues 
of the matrix . Retrieval is possible when solutions with ra t' > 0 for a single cycle 
/ exist, and the storage capacity cc is reached when such solutions cease to exist. It 
should be noted that each cycle consists of D patterns so that the storage capacity 
for single patterns is c = Dc. During the recognition process, however, each of 
them will trigger the cycle it belongs to and cannot be retrieved as a static pattern. 
For systems with a "maximally uniform" distribution, ab = (D- 1)-(1- 5ab), we 
get 
D 2 3 4 5 oo 
a 0.100 0.110 0.116 0.120 0.138 
where the last result is identical to that for the corresponding Hop field model since 
the diagonal terms of  can be neglected in that case. The above findings agree 
well with estimates from a finite-size analysis (N _< 3000) of data from numerical 
simulations as shown by two examples. For D=3, we have found c=0.120+0.015, 
for D=4, c=0.125 + 0.015. Our results demonstrate that the storage capacity for 
temporal associations is comparable to that for static memories. As an example, 
take D = 2, i.e., the Little model. In the limit of large N, we see that 0.100.N 
two-cycles of the form 0  0 may be recalled as compared to 0.138.N static 
patterns (Fontanari and KSberle 1987); this leads to an 1.45-fold increase of the 
information content per synapse. 
The influence of the weight distribution on the network behavior may be demon- 
strated by some choices of e(r) for D = 4: 
r = 0 1 2 3 (c rnc 
e(r) = 1/3 1/3 1/3 0 0.116 0.96 
e(r) = 1/2 0 1/2 0 0.100 0.93 
e(r) = 0 1 0 0 0.050 0.93 
The storage capacity decreases with decreasing number of delay lines, but measured 
per synapse, it does increase. However, networks with only a few number of delays 
are less fault-tolerant as known from numerical simulations (Herz et al. 1989). For 
all studied architectures, retrieved sequences contain less than 3.5% errors. 
Our results prove that an extensive number of temporal associations can be stored 
as spatio-temporal attractors for the retrieval dynamics. They also indicate that 
dynamical systems with delayed interactions can be programmed in a very efficient 
manner to perform associative computations in the space-time domain. 
182 Herz, Li, and van Hemmen 
6 CONCLUSION 
Learning schemes can be successful only if the structure of the learning task is 
compatible with both the network architecture and the learning algorithm. In the 
present context, the task is to store simple temporal associations. It can be accom- 
plished in neural networks with a broad distribution of signal delays and Hebbian 
synapses which, during learning periods, operate as microscopic feature detectors 
for spatio-temporal correlations within the external stimuli. The retrieval dynamics 
utilizes the very same delays and synapses, and is therefore rather robust as shown 
by numerical simulations and a statistical mechanical analysis. 
Our approach may be generalized in various directions. For example, one can investi- 
gate more sophisticated learning rules or switch to continuous neurons in "iterated- 
map networks" (Marcus and Westervelt 1990). A generalization of the Lyapunov 
functional (7) covers that case as well (Herz, to be published) and allows a direct 
comparison of theoretical predictions with results from hardware implementations. 
Finally, one could try to develop a Lyapunov functional for a continuous-time dy- 
namics with delays which seems to be rather significant for applications as well as 
for the general theory of functional differential equations and dynamical systems. 
Acknowledgement s 
It is a pleasure to thank Bernhard Sulzer, John Hopfield, Reimer Kfihn and Wulf- 
ram Gerstner for many helpful discussions. AVMH acknowledges support from the 
Studienstiftung des Deutschen Volkes. ZL is partly supported by a grant from the 
Seavet Institute. 
References 
Amit D J, Gutfreund H and Sompolinsky H 1985 Phys. Rev. A 32 1007 
-- 1987 Ann. Phys. (N.Y.) 173 30 
Coolen A C C and Gielen C C A M 1988 Europhys. Lett. 7 281 
Fontanari J F and KSberle R 1987 Phys. Rev. A 36 2475 
Hebb D O 1949 The Organization of Behavior Wiley, New York 
van Hemmen J L 1986 Phys. Rev. A 34 3435 
Herz A V M, Sulzer B, Kiihn R and van Hemmen J L 1988 Europhys. Lelt. 7 663 
1989 Biol. Cybern. 60 457 
Hopfield J J 1982 Proc. Natl. Acad. Sci. USA 79 2554 
Kerszberg M and Zippelius A 1990 Phys. Scr. T33 54 
Kleinfeld D 1986 Proc. Natl. Acad. Sci. USA 83 9469 
Li Z and Herz A V M 1990 in Lecture Notes inPhysics 368 pp287 Springer,Heidelberg 
Little W A 1974 Math. Biosci. 19 101 
Marcus C M and Westervelt R M 1990 Phys. Rev. A 42 2410 
Peretto P 1984 Biol. Cybern. 50 51 
Sompolinsky H and Kanter I 1986 Phys. Rev. Lett. 57 2861 
Tank D W and Hopfield J J 1987 Proc. Natl. Acad. Sci. USA 84 1896 
