Bidirectional 
Retrieval from Associative 
Memory 
Friedrich T. Sommer and Glinther Palm 
Department of Neural Information Processing 
University of Ulm, 89069 Ulm, Germany 
{ sommer, palm} inf ormat ik. uni-ulm. de 
Abstract 
Similarity based fault tolerant retrieval in neural associative mem- 
ories (NAM) has not lead to wiedespread applications. A draw- 
back of the efficient Willshaw model for sparse patterns [Ste61, 
WBLH69], is that the high asymptotic information capacity is of 
little practical use because of high cross talk noise arising in the 
retrieval for finite sizes. Here a new bidirectional iterative retrieval 
method for the Willshaw model is presented, called crosswise bidi- 
rectional (CB) retrieval, providing enhanced performance. We dis- 
cuss its asymptotic capacity limit, analyze the first step, and com- 
pare it in experiments with the Willshaw model. Applying the very 
efficient CB memory model either in information retrieval systems 
or as a functional model for reciprocal cortico-cortical pathways 
requires more than robustness against random noise in the input: 
Our experiments show also the segmentation ability of CB-retrieval 
with addresses containing the superposition of pattens, provided 
even at high memory load. 
1 INTRODUCTION 
From a technical point of view neural associative memories (NAM) provide data 
storage and retrieval. Neural models naturally imply parallel implementation of 
storage and retrieval algorithms by the correspondence to synaptic modification 
and neural activation. With distributed coding of the data the recall in NAM 
models is fault tolerant: It is robust against noise or superposition in the addresses 
and against local damage in the synaptic weight matrix. As biological models NAM 
676 E T. Sommr and G. Palm 
have been proposed as general working schemes of networks of pyramidal cells in 
many places of the cortex. 
An important property of a NAM model is its information capacity, measuring 
how efficient the synaptic weights are used. In the early sixties Steinbuch realized 
under the name "Lernmatrix" a memory model with binary synapses which is now 
known as Willshaw model [Ste61, WBLH69]. The great variety of NAM models 
proposed since then, many triggered by Hopfield's work [Hop82], do not reach the 
high asymptotic information capacity of the Willshaw model. 
For finite network size, the Willshaw model does not optimally retrieve the 
stored information, since the inner product between matrix colurn and input 
pattern determines the activity for each output neuron independently. For au- 
toassociative pattern completion iterative retrieval can reduce cross talk noise 
[GM76, GR92, PS92, SSP96]. A simple bidirectional iteration - as in bidirectional 
associative memory (BAM) [Kos87] - can, however, not improve heteroassociative 
pattern mapping. For this task we propose CB-retrieval where each retrieval step 
forms the resulting activity pattern in an autoassociative process that uses the con- 
nectivity matrix twice before thresholding, thereby exploiting the stored information 
more efficiently. 
2 WILLSHAW MODEL AND CB EXTENSION 
Here pattern mapping tasks x v -+ yV are considered for a set of memory patterns: 
((x,y )  x  E (0,1)n,y  E (0,1)m, = 1,...,M). The number of 1-components 
in a pattern is called pattern activity. The Willshaw model works efficiently, if 
the memories are sparse, i.e., if the memory patterns have the same activities: 
v = = =  a 
--' Ei=I Xi 0, [y[ b with << n and b << m. During 
learning the set of memory patterns is transformed to the weight matrix by 
Uij = min(1, x i yj ) -- sup X i yj. 
For a given initial pattern 5 the retrieval yields the output pattern  by forming 
in each neuron the dendritic sum [U2]j = Y'i Cj2 and by calculating the activity 
value by threshold comparison 
^u = H([C'u]j - 0) Vj, (1) 
Yj 
with the global threshold value 0 and H(x) denoting the Heaviside function. 
For finite sizes and with high memory load, i.e., 0 << P1 := Prob [Uij = 1] (< 0.5), 
the Willshaw model provides no tolerance with respect to errors in the address, see 
Fig. 1 and 2. A bidirectional iteration of standard simple retrieval (1), as proposed in 
BAM models [Kos87], can therefore be ruled out for further retrieval error reduction 
[SP97]. In the energy function of the Willshaw BAM 
E(x, y) - -  Cijxiyj + O' -. xi + !3  yj 
ij i j 
we now indroduce a factor accounting for the magnitudes of dendritic potentials at 
activated neurons 
E(x,y) = -  Cijxiyj a + b +  xi + !3 y'yj. (2) 
ij i j 
Bidirectional Retrieval from Associative Memory 677 
Differentiating the energy function (2) yields the gradient descent equations 
As new terms in (3) and (4) sums over pattern components weighted with the 
quantities w and w occur. w is the overlap between the matrix columns j and 
k conditioned by the pattern x, which we call a conditioned link between y-units. 
Restriction on the conditioned link terms yields a new iterative retrieval scheme 
which we denote as crosswise bidirectional (CB) retrieval 
y(r+l)j = H( E Cij[CTy(r-1)]i-O) (5) 
x(r+l)i = H( E Cij[Cx(r-1)]j-O') (6) 
jey(r) 
For r = 0 pattern y(r.-1) has to be replaced by H([Cx(O)] - 0), for r > 2 Boolean 
ANDing with results from timestep r - i can be applied which has been shown to 
improve iterative retrieval in the Willshaw model for autoassociation [SSP96]. 
3 MODEL EVALUATION 
Two possible retrieval error types can be distinguished: a "miss" error converts a 
1-entry in yU to '0' and a "add" error does the opposite. 
35 
30 
25 
20 
35 
30 
25 
20 
10 
5 
10 20 30 40 
simple r. ad error ..... ' ' ' / 
C B-.maidsd error  
_ CB-r error--11,i ' .,..,.,,,,..,,_ 
 I ....,..,,' ......... - ............... f ....... . ...... i 
10 15 20 25 30 
Figure 1: Mean retrieval error rates for n = 2000, M - 15000, a -- b = 10 
corresponding to a memory load of P1 - 0.3. The x-axes display the address 
activity: ]bu] - 10 corresponds to a errorfree learning pattern, lower activities are 
due to miss errors, higher activities due to add errors. Left: Theory - Add errors 
for simple retrieval, eq. (7) (upper curve) and lower bound for the first step of 
CB-retrieval, eq. (9). Right: Simulations - Errors for simple and CB retrieval. 
The analysis_ of simple retrieval from the address 5yields with optimal threshold 
setting 0 = k the add error rate, i.e, the expectation of spurious ones: 
& = (m-b)Prob [r >_ ], (7) 
678 E T. Sornmer and G. Palm 
with the binomial random variable Prob[r=l] = where B(n,p)t := 
()pt(1 _ p)n-t.  denotes the add error rate and k =  the number of 
correct 1-s in the address. 
For the first step of CB-retrieval a lower bound of the add error rate a(1) can be 
derived by the analysis of CB-retrieval with fixed address x(0): 2 and the perfect 
learning pattern yt, as starting patterns in the y-layer. In this case the add error 
rate is: 
a = (m - b)Prob [rl + r2 _> }b], (8) 
where the random variables rl and r2 have the distributions: 
Prob[rx = l/b] = B(,Px)t and Probit2 = 1] = B(b,(px)2)t. Thus, 
a(1) _> (m -b)E B(,P1)sB$ [b, (Px)2, ( - s)b], 
s----0 
(9) 
where B$[n,p,t] := Y'Lt B(n,p)t is the binomial sum. 
In Fig. 1 the analytic results for the first step (7) and (9) can be compared with 
simulations (left versus right_diagram). In the experiments simple retrieval is per- 
formed with threshold 0 -- k. CB-retrieval is iterated in the y-layer (with fixed 
address ) starting with three randomly chosen 1-s from the simple retrieval result 
. The iteration is stopped, if a stable pattern at threshold 0 = b is reached. 
The memory capacity can be calculated per pattern component under the assump- 
tion that in the memory patterns each component is independent, i.e., the proba- 
bilities for a 1 are p = a/n or q - b/m respectively, and the probabilities of an add 
and a miss error are simply the renormalized rates denoted by , ] and &,/ for 
x-patterns and by "/, 5 for y-patterns. The information about the stored pattern 
contained in noisy initial or retrieved patterns is then given by the transinforma- 
tion t(p, c , fi) :- i(p) - i(p, c , ]), where i(p) is the Shannon information, and 
i(p, (, ]) the conditional information. The heteroassociative mapping is evaluated 
by the output capacity: A(,/ ) :- Mrn t(q,"/,5)/mn (in units bit/synapse). It 
depends on the initial noise since the performance drops with growing initial errors 
and assumes the maximum, if no fault tolerance is provided, that is, with noiseless 
initial patterns, see Fig. 2. Autoassociative completion of a distorted x-pattern is 
evaluated by the completion capacity: C(& t,/)') := Mn(t(p, a',/)- t(p, &',/'))/rnn. 
A BAM maps and completes at the same time and should be therefore evaluated 
by the search capacity $ := C + A. 
The asymptotic capacity of the Willshaw model is strikingly high: The completion 
capacity (for autoassociation) is C + = ln[2]/4, the mapping capacity (for heteroas- 
sociation with input noise) is A + = ln[2]/2 bit/syn [Pa191], leading to a value for 
the search capacity of (3 ln[2])/4 = 0.52 bit/syn. To estimate $ for general retrieval 
procedures one can consider a recognition process of stored patterns in the whole 
space of sparse initial patterns; an initial pattern is "recognized", if it is invari- 
ant under a bidirectional retrieval cycle. The so-called recognition capacity of this 
process is an upper bound of the completion capacity and it had been determined 
as ln[2]/2, see [PS92]. This is achieved again with parameters M,p,q providing 
A = ln[2]/2 yielding ln[2] bit/syn as upper bound of the asymptotic search capac- 
ity. In summary, we know about the asymptotic search capacity of the CB-model: 
0.52 _< S + _< 0.69 bit/syn. For experimental results, see Fig. 4. 
Bidirectional Retrieval from Associative Memory 679 
4 EXPERIMENTAL RESULTS 
The CB model has been tested in simulations and compared with the Willshaw 
model (simple retrieval) for addresses with random noise (Fig. 2) and for addresses 
composed by two learning patterns (Fig. 3). In Fig. 2 the widely enlarged range of 
high qualtity retrieval in the CB-model' is demonstrated for different system sizes. 
CB-r. -...,'" 
5 10 15 20 25 30 
6 
5 
4 
3 
2 
1 
0 
50 
45 
40 
35 
30 
25 
20 
15 
10 
5 
0 
. simpler. - .... 
',, CB-r. -- 
5 10 15 20 25 30 
simple r. - .... ""- .... 
 CB-r. -- , 
I I I I I 
5 10 15 20 25 30 
1 
output miss errors 
5 
15 20 25 30 
output add errors 
3 
2 
1 
101214161820 
5 10 15 20 25 30 
nsinformation in output pattern (bit) 
, , , , , 100 , , , , 
"' 60 
I I I I I 
5 10 15 20 25 30 
Fig. 2: Retrieval from addresses with random 
noise. The x-axis labeling is as in Fig. 1. Small 
system with n = 100, M = 35 (left), system size 
as in Fig. 1, two trials (right). Output activities 
adjusted near lYl =  by threshold setting. 
2 
101214161820 
101214161820 
1 
101214161820 
40 ', 
0 I !  ! I I I I 
101214161820 101214161820 
Fig. 3: Retrieval from addresses 
composed by two learning pat- 
terns. Parameters as in right col- 
umn of Fig. 2, explanation of left 
and right column, see text. 
In Fig. 3 the address contains one learning pattern and 1-components of a second 
learning pattern successively added with increasing abscissae. On the right end 
of each diagram both patterns are completely superimposed. Diagrams in the left 
column show errors and transinformation, if retrieval results are compared with 
the learning pattern which is for I'1  20 dominantly addressed. Simple retrieval 
errors behave similiar as for random noise in the address (Fig. 2) while the error 
level of CB-retrieval raises faster if more than 7 adds from the second pattern are 
present. Diagrams in the right column show the same quantities, if the retrieval 
result is compared with the closest of the two learning patterns. It can be observed 
i) that a learning pattern is retrieved even if the address is a complete superposi- 
tion and ii) if the second pattern is almost complete in the address the retrieved 
pattern corresponds in some cases to the second pattern. However, in all cases CB- 
retrieval yields one of the learning pattern pairs and it could be used to generate 
a good address for further retrieval of the other by deletion of the corresponding 
1-components in the original address. 
680 F. T. $ommer and G. Palm 
0.48 
0.46 
0.44 
0.42 
0.4 
0.38 
output c. - .... 
search c.  
8 10 12 14 16 18 
Fig. 4: Output and search capacity of CB retrieval in 
bit/syn with x-axis labeling as in Fig. 2 for n = m = 2000, 
a = b = 10 2/= 20000. The difference between both curves 
is the contribution due to x-pattern completion, the com- 
pletion capacity C'. It is zero for Ix(0)l = 10, if the initial 
pattern is errorfree. 
The search capacity of the CB model in Fig. 4 is close to the theoretical expectations 
from Sect. 3, increasing with input noise due to the address completion. 
5 SPARSE CODING 
To apply the proposed NAM model, for instance, in information retrieval, a coding 
of the data to be accessed into sparse binary patterns is required. A useful extraction 
of sparse features should take account of statistical data properties and the way the 
user is acting on them. There is evidence from cognitive psychology that such a 
coding is typically quite easy to find. The feature encoding, where a person is 
extracting feature sets to characterize complex situations by a few present features, 
is one of the three basic classes of cognitive processes defined by Sternberg [Ste77]. 
Similarities in the data are represented by feature patterns having a large number 
of present features in common, that is a high overlap: o(x, x ) :- Yi xi xi' For text 
retrieval word fragments used in existing indexing techniques can be directly taken 
as sparse binary features [Geb87]. For image processing sparse coding strategies 
[Zet90], and neural models for sparse feature extraction by anti-Hebbian learning 
[F5190] have been proposed. Sparse patterns extracted from different data channels 
in heterogeneous data can simply be concatenated and processed simultaneously in 
NAM. If parts of the original data should be held in a conventional memory, also 
these addresses have to be represented by distributed and sparse patterns in order 
to exploit the high performance of the proposed NAM. 
6 CONCLUSION 
A new bidirectional retrieval method (CB-retrieval) has been presented for the Will- 
shaw neural associative memory model. Our analysis of the first CB-retrieval step 
indicates a high potential for error reduction and increased input fault tolerance. 
The asymptotic capacity for bidirectional retrieval in the binary Willshaw matrix 
has been determined between 0.52 and 0.69 bit/syn. In experiments CB-retrieval 
showed significantly increased input fault tolerance with respect to the standard 
model leading to a practical information capacity in the order of the theoretical 
expectations (0.5 bit/syn). Also the segmentation ability of CB-retrieval with am- 
biguous addresses has been shown. Even at high memory load such input pat- 
terns can be decomposed and corresponding memory entries returned individually. 
The model improvement does not require sophisticated individual threshold setting 
[GW95], strategies proposed for BAM like more complex learning procedures, or 
"dummy augmentation" in the pattern coding [WCM90, LCL95]. 
The demonstrated performance of the CB-model encourages applications as mas- 
sively parallel search strategies in Information Retrieval. The sparse coding re- 
quirement has been briefly discussed regarding technical strategies and psycholog- 
ical plausibility. Biologically plausible variants of CB-retrieval contribute to more 
Bidirectional Retrieval from Associative Memory 681 
refined cell assembly theories, see [SWP98]. 
Acknowledgement: One of the authors (F.T.S.) was supported by grant SO352/3-1 
of the Deutsche Forschungsgemeinschaft. 
References 
Ira90] 
[Geb87] 
[GM76] 
[GR92] 
[GW95] 
[Hop82] 
[Kos87] 
[LCL95] 
[Pal91] 
[PS921 
[SP971 
[SSP961 
[Ste61] 
[Ste77] 
[SWP981 
[WBLH691 
[WCM901 
[Zet90] 
P. F51diak. Forming sparse representations by local anti-hebbian learning. 
Biol. Cybern., 64:165-170, 1990. 
F. Gebhardt. Text signatures by superimposed coding of letter triplets and 
quadruplets. Information Systems, 12(2):151-156, 1987. 
A,R. Gardner-Medwin. The recall of events through the learning of associ- 
ations between their parts. Proceedings of the Royal Society of London B, 
194:375-402, 1976. 
W.G. Gibson and J. Robinson. Statistical analysis of the dynamics of a sparse 
associative memory. Neural Networks, 5:645-662, 1992. 
B. Graham and D. Willshaw. Improving recall from an associative memory. 
Biological Cybernetics, 72:337-346, 1995. 
J.J. Hopfield. Neural networks and physical systems with emergent collective 
computational abilities. Proceedings of the National Academy of Sciences, 
USA, 79, 1982. 
B. Kosko. Adaptive bidirectional associative memories. Applied Optics, 
26(23):4947-4971, 1987. 
C.-S. Leung, L.-W. Chan, and E. Lai. Stability, capacity and statistical dy- 
namics of second-order bidirectional associative memory. IEEE Trans. Syst, 
Man Cybern., 25(10):1414-1424, 1995. 
G. Palm. Memory Capacities of Local Rules for Synaptic Modification. Con- 
cepts in Neuroscience, 2:97-128, 1991. 
G. Palm and F. T. Sommer. Information capacity in recurrent McCulloch-Pitts 
networks with sparsely coded memory states. Network, 3:1-10, 1992. 
F. T. Sommer and G. Palm. Improved bidirectional retrieval of sparse patterns 
stored by Hebbian learning. Submitted to Neural Networks, 1997. 
F. Schwenker, F. T. Sommer, and G. Palm. Iterative retrieval of sparsely coded 
associative memory patterns. Neural Networks, 9(3):445 - 455, 1996. 
K. Steinbuch. Die Lernmatrix. Kybernetik, 1:36-45, 1961. 
R. J. Sternberg. Intelligence, information processing and anaIogicaI reasoning. 
Hillsdale, N J, 1977. 
F. T. Sommer, T. Wennekers, and G. Palm. Bidirectional completion of Cell 
Assemblies in the cortex. In Computational Neuroscience: Trends in Research. 
Plenum Press, 1998. 
D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins. Nonholographic 
associative memory. Nature, 222:960-962, 1969. 
Y. F. Wang, J. B. Cruz, and J. H. Mulligan. Two coding stragegies for bidirec- 
tional associative memory. IEEE Trans. Neural Networks, 1(1):81-92, 1990. 
C. Zetsche. Sparse coding: the link between low level vision and associative 
memory. In R. Eckmiller, G. Hartmann, and G. Hauske, editors, Parallel 
Processing in Neural Systems and Computers. Elsevier Science Publishers B. 
V. (North Holland), 1990. 
