Absence of Cycles in Symmetric Neural 
Networks 
Xin Wang 
Computer Science Dept 
UCLA 
Los Angeles, CA 90024 
xwang@cs.ucla.edu 
Arun Jagota, Fernanda Botelho, Max (3arzon 
Dept of Mathematical Sciences 
University of Memphis 
Memphis, TN 38152 
jagota, botelhof, garzonm@hermes.msci.memst.edu 
Abstract 
For a given recurrent neural network, a discrete-time model may 
have asymptotic dynamics different from the one of a related 
continuous-time model. In this paper, we consider a discrete-time 
model that discretizes the continuous-time leaky integrator model 
and study its parallel and sequential dynamics for symmetric net- 
works. We provide sufficient (and necessary in many cases) con- 
ditions for the discretized model to have the same cycle-free dy- 
namics of the corresponding continuous-time model in symmetric 
networks. 
I INTRODUCTION 
For an n-neuron recurrent ntwork, a much-studied and widely-used continuous- 
time (CT) model is the leaky integrator model (Hertz, e! al., 1991; Hopfield, 1984), 
given by a system of nonlinear differential equations: 
ri dt - -xi + ai wiixi + Ii), t _ O, i = 1, ..., n, (1) 
j=l 
and a related discrete-time (DT) version is the sigmoidal model (Hopfield, 1982; 
Marcus &; Westervelt, 1989), specified by a system of nonlinear difference equations: 
xi(t + 1)=ai(Zwijxj(t)+ Ii), t - 0, 1,..., i= 1,...,n, (2) 
j----1 
where xi(t), taking values in a compact interval [a, b], represents the state of neuron 
i at time t, ri is the time constant, W = [Wij ] is the real-valued weight matrix, 
ai '  - [a, b] is the activation function which often takes a sigmoidal form and Ii 
Absence of Cycles in Symmetric Neural Networks 3 73 
is the constant external input to neuron i. When the network is symmetric (i.e., 
I/V is symmetric), the dynamics of both models have been well understood: the CT 
model (1) is always convergent, namely, every initial state will approach a fixed 
point asymptotically (Hirsch, 1989; Hertz, et al., 1991; Hopfield, 1984), and the 
DT model (2) is either convergent or approaches a periodic orbit of period 2 (i.e., 
a 2-cycle) (Goles, et al., 1985; Marcus &; Westervelt, 1989; Koiran, 1994). For 
results and analyses of fixed points and cycles in networks that are not necessarily 
symmetric, see (Brown, 1992; Bruck, 1990; Goles, 1986). 
For a given symmetric network (n, l/V, ai, Ii), the existence of possible 2-cycles in 
its discrete-time operation is sometimes trouble-some and undesirable, especially in 
associative memory and neural optimization applications where only fixed points are 
used to represent memory patterns (Hopfield, 1982) or to encode feasible solutions 
(Hertz, et al., 1991). Originally in (Hopfield, 1982) a type of sequential dynamics 
(in which only one randomly chosen neuron updates its state at any time) had 
to be employed in order to ensure the convergent dynamics of (2). A great deal 
of work on asymptotic behavior of (2) has focused on constraining the symmetric 
matrix I/V so that the model exhibits only convergent dynamics. It was shown in 
(Goles, et al., 1985) that, for ai equal to the -1/+1 signum function, if I/V is positive 
definite on the set {-1, 0, 1} n, then the model (2) is convergent only to fixed points. 
In (Marcus &; Westervelt, 1989), a similar condition on W and neuron gains was 
derived for networks with differentiable ai's (see also (Marcus, et al., 1990; Waugh 
&; Westervelt, 1993)). Nevertheless, the fact remains as that not all symmetric 
networks that are convergent in (1) show the same convergent dynamics in (2). 
Such implausibility of the DT model (2) in fully inheriting the dynamics of the CT 
model (1) leads to study of another DT model in this paper, which generalizes (2) 
with some new parameters. For symmetric networks, this model has the same types 
of parallel and sequential dynamics of (2). But, under some conditions on the new 
parameters (rather than on the weight matrix itself), this model has the same global 
convergent parallel dynamics and the same local stability around fixed points of (1). 
Moreover, with these new parameters as bifurcation parameters, the existence of 
possible 2-cycles can be understood in this model as resulting from the existence of 
possible period-doubling bifurcation when the parameters are varied. Finally, it is 
this model, rather than (2), that is used more often in practice as a discrete-time 
approximation of (1). Based on all of the above, the DT model studied here is a 
more appropriate discrete-time model of neural networks for purposes of theoretical 
investigation, numerical simulation and practical application. 
2 A DISCRETE-TIME MODEL 
The DT model that is studied in this paper is 
xi(t + 1) = (1 - ai)xi(t) + aiai( wijxj(t ) + Ii), 
j=l 
t=0,1,..., i= 1,...,n, (3) 
where ci's are newly introduced parameters, taking values in (0, 1]. This model is 
based on the Euler discretization of the CT model (1) with all ri = I 1, with xi(s) 
and (xi(s + 1)- xi(s))/ai approximating xi(t) and dxi(t)/dt in (1), respectively, 
at t: s.ci. It takes the model (2) as its special case of all ci = 1. The new 
neuron state xi(t + 1) is now a linear combination of the activation function value 
When (1) is globally convergent to fixed points, neglecting all r does not change its 
dynamics. 
3 74 X. WANG, A. JAGOTA, F. BOTELHO, M. GARZON 
ai(Y?= wlixj(t) + Ii) and the old state xi(t). Because of oq  (0, 11, the model (3) 
is well defined, in that the iterative maps resulting from the model, 
Fi(z) = (1 - ai)zi + aiai( wijz + Ii), (4) 
j=l 
preserve neuron states in the compact interval [a, b]. 
For the purposes of this paper, neuron activation functions al are sumed to satisfy 
the following constraints: 
(i) ai have continuous first-order derivatives a(U) for all U 6 R; 
(ii) al are monotone increing with a(U) > 0; 
(iii) a(U)  0  y  ; and 
(iv) a(U) take mimal values , which are usually referred to  neuron gains. 
Such functions are fairly general, including often-used [-1, 1]- and [0, 1]-sigmoids, 
such  tanh(iU), 2/tan-(iU/2), and 1/(1 + e-,). The constraints on ai's 
are sufficient for the functions defined by 
ei(xi) = ?l(y)dy. (S) 
to have the following properties that will be used subsequently in proofs of several 
propositions of this paper: 
(i) Gi(y) = a(y), and particularly G(xi(t)/ai + xi(t)) : 1 wijxi(t) + Ii; 
(ii) Gi(y)- Gi(z)  G'(y)(y- z)- 1/(2yi)(y- z)   G'(y)(y- z); 
(iii) Gi'(y)= 1/(Fl(y)); and 
(iv) Gi(yo + y) - Gi(yo) > (rain, 
-- i [ ))Yl  Yl/i, 
where Axi(t) = xi(t + 1)- xi(t). 
3 PARALLEL AND SEQUENTIAL DYNAMICS 
In parallel dynamics, also called synchronous dynamics, all neurons update their 
states in each time step. In sequential dynamics, a single neuron updates its state 
in each time step in such a way that each neuron updates its state infinitely-many 
times, over all time steps t. The most widely studied special case of sequential 
dynamics is called asynchronous dynamics (Hopfield, 1982), in which the neuron 
whose state is updated is chosen at random. This models asynchronous evolution 
of a neural network circuit composed of autonomous neurons. 
It is easy to see that the discretized DT model (3) shares the same set of fixed 
points with the CT model (1); that is, a point x* is a fixed point of (3) (i.e., 
x? = Fi(x*) with Fi given in (4)), if and only if it is a fixed point of (1) (i.e., 
+ i(Ej + = 0). 
However, as the result of discretization, fixed points may have different asymptotic 
stability (Wang & Blum, 1992) and periodic points that are not fixed points may 
occur (Blum & Wang, 1992; Marcus & Westervelt, 1989) in the DT model, especially 
when all ai = 1. Nevertheless, the discretized DT model retains the same type of the 
global parallel dynamics and sequential dynamics of (2), as stated in the following 
two propositions. These results extend the results for all cq = 1 in (Marcus & 
Westervelt, 1989) to cq  (0, 1]. 
Absence of Cycles in Symmetric Neural Networks 3 75 
Proposition I If W is symmetric, any trajectory in parallel dynamics of (3) tends 
to either a fized point or a -cycle. 
Proof. Consider the following function: 
E(t) = - _ wijxi(t)xj(t - 1) -  Ii[xi(t) + xi(t - 1)] 
i,j i 
+ (2 - ci)Gi(xi(t - 1)) +  ciGi(Axi(t - 1)/ai + xi(t - 1)). 
i 
When cq = 1, this function is the one used in (Goles, et al., 1985; Marcus & 
Westervelt, 1989). It can be shown (the details is omitted due to space limitation) 
that the one-step change of E(t), AE(t) = E(t+ 1)-E(t), is always less than or equal 
to 0 and AE(t) = 0 implies that the two-step change Av.z(t) = z(t + 1)- z(t- 1) is 
necessarily equal to zero. As E(t) is bounded from below, the network is therefore 
convergent to either a fixed point or a 2-cycle. [] 
Proposition 2 If I/V is symmetric with all wii  -(2 - i)/(ili), the DT model 
(3) has the sequential dynamics convergent to ficed points for any ci 6 (0, 1]. 
Proof. Consider the function used in (Hopfield, 1984; Marcus & Westervelt, 1989), 
1 
L(t) = -7 w,jx,(t)xj(t) - + (6) 
i,j i 
If at time t only neuron i is chosen to update its state and all the others remain 
unchanged, then L(t) is not increasing, and it is strictly decreasing when the one- 
step change in xi(t), Axi(t), is not 0. (The derivation is omitted due to space 
limitation.) Hence, any sequential trajectory tends to some fixed point. 13 
4 GLOBAL CONVERGENCE 
Call a model of a neural network cycle-free if it is globally convergent to fixed points 
only. The following proposition provides a condition that eliminates the possible 
"spurious" periodic dynamic behaviors of the discretized DT model (3). 
Proposition 3 If I/V is symmetric, a suJficient condition for (3) to be cycle-free in 
parallel dynamics is 
the matrix W + (2I - A)A-1M -1 is positive definite, (7) 
where A - diag(ci) and M - diag(tzi ) are the diagonal matrices formed by the 
parameters (i and the neuron gains tzi. 
Proof. Use the energy function L(t) used in the proof of Proposition 2. The one-step 
difference AL(t) of L(t) along any trajectory x(t) has an upper bound 
L(t) + (s) 
The condition (7) implies that the upper bound is negative and hence the parallel 
dynamics is globally convergent. [] 
In a simple case where all gains Bi - 1 (e.g., ai(z) - tanh(z)) and ci - a, this 
proposition says that the model is cycle-free if the matrix I/V-l-[(2-a)/a]I is positive 
definite. 
3 76 X. WANG, A. JAGOTA, F. BOTELHO, M. GARZON 
The sufficient condition (7) generalizes many existing conditions for the cycle-free 
dynamics in the literature. When cq - 1, it reduces to that matrix I/V -k M-1 is 
positive definite, which is the one presented in (Marcus k Westervelt, 1989) (with 
all P/- I in their model) for the DT model (2) to be cycle-free. Moreover, when 
/ -- oo the sigmoidal functions tend to the signum function. If in this case ci _> c 
for some fixed positive c, the condition (7) reduces to that the weight matrix W be 
positive definite, which is the one in (Goles, et al., 1985), except that in the latter 
case I/V need be positive definite only on the set {-1, 0, 1) n. 
When ai are sufficiently small, the matrix in (7) will be dominated by its positive 
diagonal entries and become positive definite. In fact, 
Corollary ! Let )train be the minimum eigenvalue of the symmetric matri W. If 
either (i) )train > 0 (i.e., W is positive definite itself) and i are arbitrary in (0, 1], 
or (ii) )train _ 0 and all oq 's satisfy (2 - ai)/(ilzi) > -)train, then the model (3) 
is cycle-free. 
Proof. Let I/V = pTAp be an orthogonal decomposition of l/V; that is, A is a 
diagonal matrix formed by the eigenvalues of I/V and P is some orthogonal matrix 
with its transpose pT _ p-1. The condition (7) is equivalent to that the diagonal 
matrix 
A + (2I - A)A- 1 M- 1 is positive definite. 
The later condition can be fulfilled by either condition (i) or (ii). The conclusion 
then follows from Proposition (3). [] 
This corollary implies that if the weight matrix I/V is formed according to the Hebb 
rule as constructed in (Hopfield, 1982), then the model is cycle-free. This is because 
I/V is an outer-product I/V - VV T - mI of a collection of some "memory" vectors 
V - Ivy,..., vra], and it is positive definite. 
5 LOCAL ASYMPTOTIC STABILITY 
When all ai = a, the condition (7) in Proposition 3 becomes 
2 
the matrix W + - a M-1 is positive definite. 
This is the one given in (Wang k Blum, 1992) that ensures consistency of the DT 
model (3) with the CT model (1) on local asymptotic dynamics around fixed points 
for symmetric networks. The consistency means that any fixed point has exactly 
the same asymptotic stability in both (3) and (1). If these two models are consistent 
in this regard, a fixed point is an attractor (saddle point or repellot, respectively) 
of (3) if and only if it is an attractor (saddle point or repellot) of (1). This answers 
the issue raised in (Marcus k Westervelt, 1989) on why a stable fixed point of (1) 
is also stable in (2), if a specific version of the condition (7) is met. For symmetric 
networks, the consistency condition on the local asymptotic dynamics between the 
CT and DT models turns out to be a consistency condition between them on the 
global convergent dynamics as well. It is certainly interesting to see if this type of 
relationship between the local and global consistencies can be extended to general 
(non-symmetric) networks. 
6 PERIOD-DOUBLING BIFURCATION 
In many cases, the condition (7) in Proposition 3 is also necessary for the network 
to be cycle-free. This can be addressed from a bifurcation point of view by treating 
Absence of Cycles in Symmetric Neural Networks 377 
the parameters ai as bifurcation parameters. Essentially, the condition (7) gives no 
room for existence of period-doubling bifurcation, which is the source of generating 
possible 2-cycles. 
Proposition 4 Let the activation functions ai be symmetric, i.e., ai '  -- [-a, a], 
and satisfy 
i(0) = 0, = Iti. 
Let the ecternal bias vector I = O. Then condition (7) is also a necessary condition 
for the network to be cycle-free. 
Proof. Define C = {(Oq,...,Otn) I W + (2I- A)A-1M - is positive definite). Let 
Ci denote the projection of the i th components of the n-tuples in C. Because 
 <   Ci implies cr 6 Ci, each Ci is either the entire interval (0, 1] or an open 
interval (0, Cio) for some 0 < Clo < 1. Notice that 0 is a fixed point of the network. 
The Jacobian of the iterative maps in (4) at the fixed point 0 is 
(- A) + AtW. (9) 
Notice that the condition (7) is equivalent to that the eigenvalues of (21 - A) + 
AMW are all positive, which is further equivalent to that the Jacobian (9) has all 
eigenvalues A >_ -1. 
If C = (0, 1]", the model has no cycles, according to Proposition 3, for any 
(Crl,...,a,) 6 (0,1]". However, if C/ = (O, cio) for some i and Cio < 1, some 
eigenvalue of the Jacobian (9) becomes less than -1 when ai exceeds the "thresh- 
old" Cio. During this course of changing Hi, the network undergoes generically a 
period-doubling bifurcation (Ruelle, 1989), resulting in emergence of some 2-cycles. 
Thus, in this case the condition (7) in Proposition 3 is also necessary to prevent this 
type of period-doubling bifurcation from happening around fixed points and hence 
to eliminate possibility of generating 2-cycles. D 
Examples of ai's satisfying hypotheses of Proposition 4 are tanhi :  -- I-l, l] 
with tanh,(z) = tanh(itiz). 
7 
EFFECT OF NEURON GAINS IN NEURAL 
COMPUTATIONS 
Considerable research has been conducted on using (1) in neural computations such 
as solving optimization problems approximately; see (Hertz, et al., 1991, Chapter 
4) for an overview. Often, the neuron gains Iti are also modified while the network 
is evolving. A popular algorithm of this kind uses mean field annealing (MFA) 
(Peterson g; Anderson, 1988) to solve optimization problems, in which small neuron 
gains are used initially, and increased gradually. Similar situations also happen in 
some learning algorithms. 
In practice, a discretized model such as (3) is used instead. Proposition 3 gives some 
criterion on how to choose the "discretization step-sizes" (i as functions of Iti. If 
efficiency, for example, were the paramount consideration, one might want to choose 
ai as large as possible while ensuring that the sufficient condition of Proposition 3 
is met. 
The effect of changing It on the largest sufficing a can be examined as follows. For 
simplicity, consider the case where all neuron gains Iti equal It and all ai equal to a. 
Let ci, i = 1, 2, be the respective supremums of ('s such that W+(2-()/((di)I are 
positive definite when the neuron gains It are equal to two different values d and 
3 78 X. WANG, A. JAGOTA, F. BOTELHO, M. GARZON 
d2. Then Cl and c2 satisfy (2-Cl)/(Cldl)- (2-c2)(c2d2). Letting/ = d2/dl, the 
above gives c2 = 2cl/(cl +/(2 - cl )). Clearly, c2 is proportional to the reciprocal of 
the ratio/. Thus, when/ is small, c can be taken larger than when/ is large. This 
may be used to evolve the network efficiently in the beginning and slow it down 
later, while ensuring that 2-cycles are never retrieved. 
References 
E.K. Blum k X. Wang. (1992) Stability of fixed points and periodic orbits and 
bifurcations in analog neural networks. Neural Networks, 5:577-587. 
D.P. Brown. (1992) Matrix tests for period 1 and 2 limit cycles in discrete threshold 
networks. IEEE Trans. on Systems, Man,  Cybernetics, 22:552-554. 
J. Bruck. (1990) On the convergence properties of the Hopfield model. Proc. IEEE, 
78:1579-1585. 
E. Goles, F. Fogelman-Soulie k D. Pellegrin. (1985) Decreasing energy functions as 
a tool for studying threshold networks. Discrete Applied Mathematics, 12:261-277. 
E. Goles. (1982) Fixed point behaviour of threshold functions on a finite set. SIAM 
J. of Algorithmic Discrete Methods, 3:529-531. 
E. Goles. (1986) Antisymmetrical neural networks. Discrete Applied Mathematics, 
13:97-100. 
J. Hertz, A. Krogh k R.G. Palmer. (1991) Introduction to the Theory of Neural 
Computation. Addison-Wesley. 
M.W. Hirsch. (1989) Convergent activation dynamics in continuous time networks. 
Neural Networks, 2:331-349. 
J.J. Hopfield. (1982) Neural networks and physical systems with collective compu- 
tational abilities. Proc. of the National Academy of Sciences, USA, 79:2554-2558. 
J.J. Hopfield. (1984) Neurons with graded response have collective computational 
properties like those of two-state neurons. Proc. of the National Academy of Sci- 
ences, USA, 81:3088-3092. 
P. Koiran. (1994) Dynamics of discrete time, continuous state Hopfield networks. 
Neural Computation, 6:459-468. 
C.M. Marcus k R.M. Westervelt. (1989) Dynamics of iterated-map neural net- 
works. Physical Review A, 40:501-504. 
C. Peterson k J.R. Anderson. (1988) Neural networks and NP-complete optimiza- 
tion problems; a performance study on the graph bisection problem. Complec 
Systems, 2:59-89. 
D. Ruelle. (1989) Elements of Differentiable dynamics and Bifurcation Theory. 
Academic Press, Inc. 
X. Wang k E.K. Blum. (1992) Discrete-time versus continuous-time neural net- 
works. J. of Computer and System Sciences, 49:1-17. 
F.R. Waugh k R.M. Westervelt. (1993) Analog neural networks with local com- 
petition. I. Dynamics and stability. Physical Review E, 47:4524-4536. 
PART IV 
ALGORITHMS AND ARCHITECTURES 
