Some results on convergent unlearning 
algorithm 
Serguei A. Semenov & Irina B. Shuvalova 
Institute of Physics and Technology 
Prechistenka St. 13/7 
Moscow 119034, Russia 
Abstract 
In this paper we consider probabilities of different asymptotics of 
convergent unlearning algorithm for the Hopfield-type neural net- 
work (Plakhov & Semenov, 1994) treating the case of unbiased 
random patterns. We show also that failed unlearning results in 
total memory breakdown. 
1 INTRODUCTION 
In the past years the unsupervised learning schemes arose strong interest among 
researchers but for the time being a little is known about underlying learning mech- 
anisms, as well as still less rigorous results like convergence theorems were obtained 
in this field. One of promising concepts along this line is so called "unlearning" 
for the Hopfield-type neural networks (Hopfield et al, 1983, van Hemmen & Klem- 
met, 1992, Wimbauer et al, 1994). Elaborating that elegant ideas the convergent 
unlearning algorithm has recently been proposed (Plakhov & Semenov, 1994), ex- 
ecuting without patterns presentation. It is aimed at to correct initial Hebbian 
connectivity in order to provide extensive storage of arbitrary correlated data. 
This algorithm is stated as follows. Pick up at iteration step rn, rn - 0, 1,2,... a 
random network state S (m) - (S?),..., S(N')), with the values S m) = 4-1 having 
equal probability 1/2, calculate local fields generated by S (m) 
N 
i=1 
i= 1,...,N, 
and then update the synaptic weights by 
j+) j?) . r-(-)(m) 
-- -- c, i j , 
i,j=l,...,N. (1) 
Some Results on Convergent Unlearning Algorithm 359 
Here e > 0 stands for the unlearning strength parameter. We stress that self- 
interactions, Zil, are necessarily involved in the iteration process. The initial con- 
dition for (1) is given by the Hebb matrix, j!.0) z/. 
sZ -- Jij. 
p 
(2) 
with arbitrary (4.1)-patterns {", /z = 1,... ,p. 
For e < e, the (rescaled) synaptic matrix has been proven to converge with proba- 
bility one to the projection one on the linear subspace spanned by maximal subset 
of linearly independent patterns (Plakhov &: Semenov, 1994). As the sufficient 
condition for that convergence to occur, the value of unlearning strength e should 
be less than e = XFnax where ,)tma X denotes the largest eigenvalue of the Hebb 
matrix. Very often in real-world situations there are no means to know e in ad- 
vance, and therefore it is of interest to explore asymptotic behaviour of iterated 
synaptic matrix for arbitrary values of e. As it is seen, there are only three possi- 
ble limiting behaviours of the normalized synaptic matrix (Plakhov 1995, Plakhov 
&: Semenov, 1995). The corresponding convergence theorems relate corresponding 
spectrum dynamics to limiting behaviour of normalized synaptic matrix j = Z/[[Zll 
= (5'.,,S=  Ji) /' ) which can be described in terms of X(mi" ) the smallest 
eigenvalues of 
I. if X ("?.) = 0 for every m = O, 1, 2, with multiplicity of zero eigenvalue being 
"mln   ', 
fixed, then 
(A) lirnoo J?) = s-/=Pi i 
where P marks the projection matrix on the linear subspace  C R N spanned by 
the nominated patterns set {, / = 1,... ,p, s = dime _< p; 
II. if min -- 0, m = 0, 1, 2,..., besides at some (at least one) steps multiplicity of 
zero eigenvalue increases, then 
(B) lirnoo j?)= 
where P' is the projector on some subspace/2' C/2, 
(-) 
III. if min < 0 starting from some value of m, then 
s' = dimE' < s; 
(C) lirnm j?)=-i./ (3) 
with some (not a 4-1) unity random vector { = (1,.-. ,N). 
These three cases exhaust all possible asymptotic behaviours of jjn), that is their 
total probability is unity: P.4 + PB + Pc = 1. The patterns set is supposed to be 
fixed. 
The convergence theorems say nothing about relative probabilities to have specific 
asymptotics depending on model parameters. In this paper we present some general 
results elucidating this question and verify them by numerical simulation. 
We show further that the limiting synaptic matrix for the case (C) which is the 
projector on -   cannot maintain any associative memory. Brief discussion on 
the retrieval properties of the intermediate case (B) is also given. 
360 S.A. SEMENOV, I. B. SHUVALOVA 
2 
PROBABILITIES OF POSSIBLE LIMITING 
BEHAVIOURS OF j(m) 
The unlearning procedure under consideration is stochastic in nature. Which result 
of iteration process, (A), (B) or (C), will realize depends upon the value of e, size 
and statistical properties of the patterns set {, p -- 1,...,p}, and realization of 
unlearning sequence {$(m), rn -- 0, 1, 2,...}. 
Under fixed patterns set probabilities of appearance of each limiting behaviour of 
synaptic matrix is determined by the value of unlearning strength e only. In this 
section we consider these probabilities as a function of 
Generally speaking, considered probabilities exhibit strong dependence on patterns 
set, making impossible to calculate them explicitly. It is possible however to obtain 
some general knowledge concerning that probabilities, namely: PA(e) -- 1 as e -- 
0+, and hence, PB,C(e) '- O, otherwise Pc(e) -- 1 as e -- x, and PA,B(e) --- O, 
because of PA + PB + Pc = 1. This means that the risk to have failed unlearning 
rises when e increases. Specifically, we are able to prove the following: 
Proposition. There ezist positive gl and e2 such that PA(e) = 1, 0 
and Pc(e) =1, e < e. 
Before psing to the proof we bring forward an alternative formulation of the above 
stated clsification. After multiplying both sides of (1)by S?)S ) and summing 
up over all i and j, we obtain in the matrix notation 
S(m)T j(m+I)s (m) = AS(m) J()S () (4) 
where the contraction factor Am = i -eN-1S(m)J(m)S () controls the ymp- 
totics of j(m),  it is suggested by detailed analysis (Plakhop k Semenov, 1995). 
(Here and below superscript T designates the transpose.) The hypothesis of conver- 
gence theorems can be thus restated in terms of Am instead of X() respectively: 
I. A > 0 Vm; II. Am = 0 for l steps m,...,ml; III. Am < 0 at some step 
ProoK It is obvious that Am  i - A  where A (m) marks the largest eigen- 
_  max max 
x() 
value of j(m). om (4), it follows that the sequence t.,max, m = 0, 1,2,...} is 
nonincreing, and consequently Am  i x() with 
_ -- 
A(0) = sup xTJHx = sup N-I xi 
max I1=1 Il=1 
p N N 
 sup N - E E()E 
=1 i=1 i=1 
om this, it is straightforward to see that, ife < p-, then A  0 for any m. By 
convergence theorem (Plakhop & Semenov, 1995) iteration process (1) thus leads 
to the limiting relation (A). 
Let by definition 7 = mins N-IjHS where minimum is taken over such (l)- 
vectors S for which JHS  0 (7  0, in view of positive semidefiniteness of jH), 
and put e  7 -. Let us further denote by n the iteration step such that JHS(m) 
0, m = 0, 1,..., n - 1 and jHS(")  O. Needless to say that this condition may be 
satisfied even for the initial step n = 0: JHS   O. At step n one h 
A = 1-eN-IS(nTjHs   1-e7 < O. 
Some Results on Convergent Unlearning Algorithm 361 
The latter implies loss of positive semidefiniteness of J('), what results in asymp- 
totics (C) (Plakhov, 1995, Plakhov k Semenov, 1995). By choosing e = p-1 and 
e2 -- 7 - we come to the statement of Proposition. 
Comparison of numerical estimates of considered probabilities with analytical ap- 
proximations can be done on simple patterns statistics. In what follows the patterns 
are assumed to be random and unbiased. 
The dependence P(e) has been found in computer simulation with unbiased random 
patterns. It is worth noting, by passing, that calculation Am using current simu- 
lation data supplies a good control of unlearning process owing to an alternative 
formulation of convergence theorems. In simulation we calculate PN(e) averaged 
over the sets of unbiased random patterns, as well as over the realizations of un- 
learning sequence. As N increases, with a - pin remaining fixed, the curves slope 
steeply down approaching step function P(e)= O(e--) (Plakhov & Semenov, 
1995). Without presenting of derivation or proof we will advance the reasoning 
suggestive of it. First it can be checked that Am is a selfaveraging quantity with 
mean I -eN-1TrJ(m) and variance vanishing as N goes to infinity. Initially one 
has N-ITrj H - 0, and obviously the sequence {TrJ (rn), m - 0, 1,2,...} is nonin- 
creasing. Therefore A0 = 1 -- ec, and all others A, are not less than A0. If one 
chooses e < c - , then all A, will be positive, and the case (A) will realize. On the 
other hand, when e > c -, we have A0 < 0, and the case (C) will take place. 
What is probability for asymptotics (B) to appear? We will adduce an argument 
(detailed analysis (Plakhov & Semenov, 1995) is rather cumbersome and omitted 
here) indicating that this probability is quite small. First note that given patterns 
set it is nonzero for isolated values of e only. Under the assumption that the patterns 
are random and unbiased, we have calculated probability of/-fold appearance A, = 
0 summed up over that isolated values of e. Using Gaussian approximation at 
large N, we have found that probability scales with N as N /2+2-2+m+. The 
total probability can then be obtained through summing up over integer values 
I  0 < I < s and all the iteration steps rn = 0, 1,2, .... As a result, the main 
contribution to the total probability comes from rn = 0 term which is of the order 
N-a/2. 
3 LIMITING RETRIEVAL PROPERTIES 
How does reduction of dimension of"memory space" in the case (B), s -- s' = s-l, 
affect retrieval properties of the system? They may vary considerably depending on 
I. In the most probable case l = 1 it is expected that there will be a slight decrease 
in storage capacity but the size of attraction basins will change negligibly. This is 
corroborated by calculating the stability parameter for each pattern tt 
= 
ji 
Let $(m) be the state vector with normalized projection 
P$(m)/IP$(m) ] such that 
N 
=  ~ N -1/, y 
i=1 
Then the stability parameter (5) is estimated by 
='(Po-I)=(1-P")- ( Jv 
on /2 given by V = 
 1-Pii+O(N-1/2). 
362 S.A. SEMENOV, I. B. SHUVALOVA 
Since Pii has mean a and variance vanishing as N -- cx>, we thus conclude that the 
stability parameter only slightly differs from that calculated for the projector rule 
(s = s') (Kanter & Sompolinsky, 1987). 
On the other hand, in the situation 0 < st/s << i (the possible case s  = 0 is trivial) 
the system will be capable retrieving only a few nominated patterns which ones we 
cannot specify beforehand. As mentioned above, this case realizes with very small 
but finite probability. 
The main effect of self-interactions Jii lies in substantial decrease in storage capacity 
(Kanter z Sompolinsky, 1987). This is relevant when considering the cases (A) 
and (B). In the case (C) the system possesses an interesting dynamics exhibiting 
permanent walk over the state space. There are no fixed points at all. To show this, 
we write down the fixed point condition for arbitrary state $  $i zv=i JijSj  
0, i = 1,..., N. By using the explicit expression for limiting matrix Jij (3) and 
summing up over i's, we get as a result (y.j Si i)2 < 0, what is impossible. 
If self-interactions are excluded from local fields at the stage of network dynamics, 
it is then driven by the energy function of the form H = -(2N) - '.ij JijSiSi' 
(Zero-temperature sequential dynamics either random or regular one is assumed.) 
In the rest of this section we examine dynamics of the network equiped with limit- 
ing synaptic matrix (C) (3). We will show that in this limit the system lacks any 
associative memory. There are a single global maximum of H given by Si = sgn(i) 
and exponentially many shallow minima concentrated close to the hyperplane or- 
thogonal to . Moreover it is turned out that all the metastable states are unstable 
against single spin flip only, whatever the realization of limiting vector . Therefore 
after a spin flips the system can relax into a new nearby energy minimum. Through 
a sequence of steps each consisting of a single spin flip followed by relaxation one 
can, in principle, pass from one metastable state to the other one. 
We will prove in what follows that any given metastable slate S t one can pass to 
any other one S through a sequence of steps each consisting of a single spin flip 
and subsequent relazation to a some new metaslable state. Note that this general 
statement gives no indications concerning the order of spin flips when moving along 
a particular trajectory in the state space. 
Now on we turn to the proof. Let us enumerate the spins in increasing order in 
absolute values of vector components 0 <_ 1[ _< ... _< INI. The proof is carried out 
by induction on j = 1,..., N where j is the maximal index for which SJ  Sj. 
For j = i the statement is evident. Assuming that it holds for 1,...,j - i (2 <_ 
j <_ N), let us prove it for j. One has j = max {i' S  Si }. With flipping spin 
j in the state S 1, we next allow relaxation by flipping spins 1,...,j - i only. The 
system finally reaches the state S 2 realizing conditional energy minimum under 
fixed Sj , . . . , SN. 
Show that S 2 is true energy minimum. There are two possibilities: 
(i) For some i, 1 < i _< j - 1, one has sgn (iSi ) = sgn ('S2). The fixed point 
condition for S 2 can be then written as 
[ Ts2 1_< min {[i[  1 < i < j -- 1, sgn(iS) = sgn( TS 2) }. 
/,From this, in view of increasing order of [il's, one gets immediately 
[tS2l_min{[i[' l_i_N, sgn(iS)=sgn(Ts2)}, 
what implies S 2 is true energy minimum. 
Some Results on Convergent Unlearning Algorithm 363 
(ii) sgn(iS)  sgn(rS 2) for all 1 < i < j- 1. 
If :rS2 = 0, the fixed point condition for $2 is automatically satisfied. Otherwise, 
for l_<i_<j-lonehas 
i $/ = -sgn( r $2)Iil, 
and 
j--1 N 
r$2 = _sgn(r$2)  Iil q- (6) 
i=1 i=j 
For the sake of definiteness, we set rS > 0. (The opposite case is treated analo- 
gously.) In this case rS2 > 0, since otherwise, according to (6), it should be 
j- N 
i=1 i=j 
what contradicts our setting. 
One thus obtains 
j--1 N 
i=1 i=j 
and using the fixed point condition for $ one gets 
rS _< min{[il  iS/> 0} _< min{ll ' j < i < N, iS/> 0} 
- rain { $? > 0 }. (8) 
In the latter inequality of (8) one uses that S/ < 0, 1 _< i _< j- 1 and $ = S/, j <_ 
i < N. Taking into account (7) and (8), as a result we come to the condition for 
$2 to be true energy minimum 
0 < TS,2 < min  ,s7 > o}. 
According to inductive hypothesis, since S = Si, j _< i _< N, from the state S 2 one 
can pass to $, and therefore from $' through $2 to $. This proves the statement. 
In general, metastable states may be grouped in clusters surrounded by high energy 
barriers. The meaning of proven statement resides in excluding the possibility of 
even such type a memory. Conversely, allowing a sequence of single spin flips (for 
instance, this can be done at finite temperatures) it is possible to walk through the 
whole set of metastable states. 
4 CONCLUSION 
In this paper we have begun studying on probabilities of different asymptotics of 
convergent unlearning algorithm considering the case of unbiased random patterns. 
We have shown also that failed unlearning results in total memory breakdown. 
References 
Hopfield, J.J., Feinstein, D.I. & Palmer, R.G. (1983) "Unlearning" has a stabilizing 
effect in collective memories. Nature 304:158-159. 
van Hemmen, J.L. & Klemmer, N. (1992) Unlearning and its relevance to REM 
sleep: Decorrelating correlated data. In J. G. Taylor et al (eds.), Neural Network 
Dynamics, pp. 30-43. London: Springer. 
3 64 S.A. SEMENOV, I. B. SHUVALOVA 
Wimbauer, U., Klemmer, N. &; van Hemmen, J.L. (1994) Universality of unlearning. 
Neural Networks 7:261-270. 
Plakboy, A.Yu. &; Semenov, S.A. (1994) Neural networks: iterative unlearning 
algorithm converging to the projector rule matrix. J. Phys. I France 4:253-260. 
Plakhov, A.Yu. (1995) private communication 
Plakhov, A.Yu. z Semenov, S.A. (1995) preprint IPT. 
Kanter, I. z Sompolinsky, H. (1987) Associative recall of memory without errors. 
Phys. Rev. A 35:380-392. 
