Dynamic Cell Structures 
Jiirg Bruske and Gerald Sommer 
Department of Cognitive Systems 
Christian Albrechts University at Kiel 
24105 Kiel - Germany 
Abstract 
Dynamic Cell Structures (DCS) represent a family of artificial neural 
architectures suited both for unsupervised and supervised learning. 
They belong to the recently [Martinetz94] introduced class of Topology 
Representing Networks (TRN) which build perfectly topology pre- 
serving feature maps. DCS employ a modified K0honen learning rule 
in conjunction with competitive Hebbian learning. The Kohonen type 
learning rule serves to adjust the synaptic weight vectors while Hebbian 
learning establishes a dynamic lateral connection structure between 
the units reflecting the topology of the feature manifold. In case of super- 
vised learning, i.e. function approximation, each neural unit implements 
a Radial Basis Function, and an additional layer of linear output units 
adjusts according to a delta-rule. DCS is the first RBF-based approxima- 
tion scheme attempting to concurrently learn and utilize a perfectly to- 
pology preserving map for improved performance. 
Simulations on a selection of CMU-Benchmarks indicate that the DCS 
idea applied to the Growing Cell Structure algorithm [Fritzke93] leads 
to an efficient and elegant algorithm that can beat conventional models 
on similar tasks. 
1 Introduction 
The quest for smallest topology preserving maps motivated the introduction of growing 
feature maps like Fritzke's Growing Cell Structures (GCS). In GCS, see [Fritzke93] for de- 
tails, one starts with a k-dimensional simplex of N = k+l neural units and (k + 1)  k?2 
lateral connections (edges). Growing of the network is performed such that after insertion 
498 Jtrg Bruske, Gerald Sommer 
of a new unit the network consists solely of k dimensional simplices again. Thus, like Ko- 
honen's SOM, GCS can only learn a perfectly topology preserving feature map I if k 
meets the actual dimension of the feature manifold. Assuming that the lateral connections 
do reflect the actual topology the connections serve to define a neighborhood for a Kohonen 
like adaptation of the synaptic vectors w i and guide the insertion of new units. Insertion 
happens incrementally and does not necessitate a retraining of the network. The principle 
is to insert new neurons in such a way that the expected value of a certain local error mea- 
sure, which Fritzke calls the resource, becomes equal for all neurons. For instance, the 
number of times a neuron wins the competition, the sum of distances to stimuli for which 
the neuron wins or the sum of errors in the neuron's output can all serve as a resource and 
dramatically change the behavior of GCS. Using different error measures and guiding in- 
sertion by the lateral connections contributes much to the success of GCS. 
The principle of DCS is to avoid any restriction of the topology of the network (lateral con- 
nection scheme between the neural units) but to concurrently learn and utilize a perfectly 
topology preserving ..map. This.is achieved by adapting the lateral connection structure ac- 
cording to a competitive Hebban learning rule: 
Cu(t + 1) 
rnax {Yi . Yj, Cij(t) } : yi . Yj> yk . yt V ( l < k, l < N) 
0 : Cij(t ) <0 "" (1) 
: otherwise, "" 
OC/j (t) 
where c, 0 < c < 1 is a forgetting constant, 0, 0 < 0 < 1 serves as a threshold for deleting 
lateral connections, and Yi = R (11v - will) is the activation of the i-th unit with w i as the 
centre of its receptive field on pre'entati6n of stimulus v. R(.) can be any positive continu- 
ously monotonically decreasing function. For batch learning with a training set T of fixed 
size IT1, o = Irlf is a good choice. 
Since the isomorphic representation of the topology of the feature manifold M in the lateral 
connection structure is central to performance, in many situations a DCS algorithm may be 
the right choice. These situations are characterized by missing a priori knowledge of the 
topology of the feature manifold M or a topology of M which cannot be readily mapped to 
the existing models. Of course, if such a priori knowledge is available then models like 
GCS or Kohonen's SOM allowing to incorporate such knowledge have an advantage, es- 
pecially if training data are sparse. 
Note that DCS algorithms can also aid in cluster analysis: In a perfectly topology preserv- 
ing map clusters which are bounded by regions of P(v) = 0 can be identified simply by a 
connected component analysis. However, without prior knowledge about the feature man- 
ifold M it is in principal impossible to check for perfect topology preservation of $. Noise 
in the input data may render perfect topology learning even more difficult. So what can per- 
fect topology learning be used for? The answer is simply that for every set S of reference 
vectors perfect topology learning yields maximum topology preservation 3 with respect to 
this set. And connected components with respect to the lateral connection structure C may 
well serve as an initialization for postprocessing by hierarchical cluster algorithms. 
1. We use the term "perfectly topology preserving feature map" in accordance with its rigorous 
definition in [Martinetz93]. 
2. In his very recent and recommendable article [Martinetz94] the term Topology Representing 
Network (TRN) is coined for any network employing competitive Hebbian learning for topolo- 
gy learning. 
3. if topology preservation is measured by the topographic function as defined in [Viiimann94]. 
Dynamic Cell Structures 499 
The first neural algorithm attempting to learn perfectly topology preserving feature maps 
is the Neural Gas algorithm ofT. Martinetz [Martinetz92]. However, unlike DCS the Neu- 
ral Gas does not further exploit this information: In every step the Neural Gas computes 
the k nearest neighbors to a given stimulus and, in the supervised case, employs all of them 
for function approximation. DCS avoids this computational burden by utilizing the lateral 
connection structure (topology) learned so far, and it restricts interpolation between acti- 
vated units to the submanifold of the current stimulus. 
Applying the principle of DCS to Fritzke's GCS yields our DCS-GCS algorithm. This al- 
gorithm sticks very closely to the basic structure of its ancestor GCS except the predefined 
k-dimensional simplex connection structure being replaced by perfect topology learning. 
Besides the conceptual advantage of perfect topology learning, DCS-GCS does decrease 
overhead (Fritzke has to handle quite sophisticated data structures in order to maintain the 
k-dimensional simplex structure after insertion/deletion of units) and can be readily imple- 
mented on any serial computer. 
2 Unsupervised DCS-GCS 
The unsupervised DCS-GCS algorithm starts with initializing the network (graph) to two 
neural units (vertices) n I and n 2. Their weight vectors w l, w 1 (centres of receptive fields) 
are set to points v l, v 2  M which are drawn from M according to P(v). They are connected 
by a lateral connection of weight C = C = 1 Note that lateral connections in DCS 
 2 21 .  
are always bidirectional and have symmetric weights. 
Now the algorithm enters its outer loop which is repeated until some stopping criterium is 
fulfilled. This stopping criterium could for instance be a test whether the quantization error 
has already dropped below a predefined accuracy. 
The inner loop is repeated for ) times. In off-line learning ) can be set to the number ex- 
amples in the training set T. In this case, the inner loop just represents an epoch of training. 
Within the inner loop, the algorithm first draws an input stimulus v  M from M according 
to P(v) and then proceeds to calculate the two neural units which weight vectors are first 
and second closest to v. 
In the next step, the lateral connections between the neural units are modified according to 
eq. (1), the competitive Hebbian learning rule. As has already been mentioned, in off-line 
learning it is a good idea to set o = . 
Now the weight vectors w i of the best matching unit and its neighbors are adjusted in a 
Kohonen like fashion: 
AWbmu "- B (12 -- Wbmu) and Awj - Nh (12 - wj) , (2) 
where the neighborhood Nh (j) of a unit j is defined by 
Sh(j) - {il(CjiO,l <i<S)} . 
The inner loop ends with updating the resource value of the best matching unit. The re- 
source of a neuron is a local error measure attached to each neural unit. As has been pointed 
out, one can choose alternative update functions corresponding to different error measures. 
For our experiments (section 2.1 and section 3.1) we used the accumulated squared distance 
to the stimulus, i.e. A'gbm u "- IIv- Wbmull ' 
The outer loop now proceeds by adding a new neural unit r to the network. This unit is lo- 
cated in-between the unit I with largest resource value and its neighbor n with second larg- 
est resource value: 4 
500 JOrg Bruske, Gerald Sommer 
The exact location of its centre of receptive field w r is calculated according to the ratio of 
the resource values x l, x n , and the resource values of units n and I are redistributed among 
r, n and h 
1 1 
 '- 'lJn/('lJ n q- 'll) , AX l '- , ( 1 -- t) ,1 l and AX n -' ,t n (3) 
w r '- w I q- (w n - Wl) , X r '- A n q- AX t , X t = X l - AX t and '1 n '- X n -- A n . (4) 
This gives an estimate of the resource values if the new unit had been in the network right 
from the start. Finally the lateral connections are changed, 
Crl '- Clr '- 1, Crn '- Crn --- 1 and Cnl "' Cln -' 0, (5) 
connecting unit r to unit l and disconnecting n and 1. 
This heuristic guided by the lateral connection structure and the resource values promises 
insertion of new units at good initial positions. It is responsible for the better performance 
of DCS-GCS and GCS compared to algorithms which do not exploit the neighborhood re- 
lation between existing units. 
The outer loop closes by decrementing the resource values of all units, 
x i (t + 1) = x: (t) , 1 < i < N, where 0 <  < 1 is a constant. This last step just avoids 
overflow of the resource variables. For off-hne learning, 13 = 0 is the natural choice. 
2.1 Unsupervised DCS simulation results 
Let us first turn to our simulation on artificial data. The training set T contains 2000 exam- 
ples randomly drawn from a feature manifold M consisting of three squares, two of them 
connected by a line. The development of our unsupervised DCS-GCS network is depicted 
in Figure 1, with the initial situation of only two units shown in the upper left. Examples 
are represented by small dots, the centres of receptive fields by small circles and the lateral 
connections by lines connecting the circles. From left to right the network is examined after 
0, 9 and 31 epochs of training (i.e. after insertion of 2, 11 and 33 neural units). 
After 31 epochs the network has built a perfectly topology preserving map of M, the lateral 
connection structure nicely reflecting the shape of M: Where M is 2-dimensional the lateral 
connection structure is 2-dimensional, and it is 1-dimensional where M is 1-dimensional. 
Note, that a connected component analysis could recognize that the upper right square is 
separated from the rest of M. The accumulated squared distance to stimuli served as the re- 
source. 
The quantization error Eq = 1  ii v-  dropped from 100% (3 units) to 3% (33 
units). n v e r 
The second simulation deals with the two-spirals benchmark. Data were obtained by run- 
ning the program "two-spirals" (provided by CMU) with parameters 5 (density) and 6.5 
(spiral radius) resulting in a training set T of 962 examples. The data represent two distinct 
spirals in the x-y-plane. Unsupervised DCS-GCS at work is shown in Figure 2, after inser- 
tion of 80, 154 and, finally, 196 units. With 196 units a perfectly topology preserving map 
of M has emerged, and the two spirals are clearly separated. Note that the algorithm has 
learned the separation in a totally unsupervised manner, i.e. not using the labels of the data 
4. Fritzke inserts new units at a slightly different location, using not the neighbor with second 
largest resource but the most distant neighbor. 
Dynamic Cell Structures 501 
Figure 1' Unsupervised DCS-GCS on artificial data 
points (which are provided by CMU for supervised learning). Again, the accumulated 
squared distance to stimuli served as the resource. 
Figure 2: Unsupervised learning of two spirals 
3 Supervised DCS-GCS 
In supervised DCS-GCS examples consist not only of an input vector v but also include an 
additional teaching output vector u. 
The supervised algorithm actually does work very similar to its unsupervised version ex- 
cept 
 when a neural unit n i is inserted an output vector o i will be attached to it with 
Oi'-U. 
 the output y of the network is calculated as a weighted sum of the best matching unit's 
output vector O,mu and the output vectors of its neighbors o i, i  Nh (bmu), 
Y = (i {brnuL;Nh(bmu)} aii) ' (6) 
502 Jdirg Bruske, GeraM Sommer 
where = 1/(ollv-w,112+ 1) is the activation of neuron i on stimulus v, 
,  >alJ, representing the size of the receptive fields. In our simulations, the size of 
receptive fields have been equal for all units. 
 adaption of output vectors by the delta-rule: A simple delta-rule is employed to adjust 
the output vectors of the best matching unit and its neighbors. 
Most important, the approximation (classification) error can be used for resource updating. 
This leads to insertion of new units in regions where the approximation error is worst, thus 
promising to outperform dynamic algorithms which do not employ such a criterion for in- 
sertion. In our simulations we used the accumulated squared distance of calculated and 
teaching output, A'gOm u = Ily - ull 2. 
3.1 Supervised DCS-GCS simulation results 
We applied our supervised DCS-GCS algorithm to three CMU benchmarks, the supervised 
two-spiral problem, the speaker independent vowel recognition problem and the sonar 
mine/rock separation problem. 5 
The tvgo spirals benchmark contains 194 examples, each consisting of an input vector 
v  91" and a binary label indicating to which spiral the point belongs. The spirals can not 
be linearly separated. The task is to train the examples until the learning system can pro- 
duce the correct output for all of them and to record the time. 
The decision regions learned by supervised DCS-GCS are depicted in Figure 3 after 110 
and 135 epochs of training, where the classification error on the training set has dropped to 
0%. Black indicates assignment to the fist, white assignment to the second spiral. The net- 
work and the examples are overlaid. 
Figure 3: Supervised learning of two spirals 
Results reported by others are 20000 epochs of Backprop for a MLP by Lang and Witbrok 
[Lang89], 10000 epochs of Cross Entropy Backprop and 1700 epochs of Cascade-Correla- 
tion by Fahlman and Lebiere [Fahlman90] and 180 epochs of GCS training by Fritzke 
[Fritzke93]. 
5. For details of simulation, parameters and additional statistics for all of the reported experi- 
ments the reader is refered to [Bruske94] which is also available viaftp. informatik. uni-kiel. de in 
directory pub/kiel/publications/TechnicalReports/Ps.Z/ as 1994trO3.ps.Z 
Dynamic Cell Structures 503 
The data for the speaker independent recognition of 11 vowels comprises a training set of 
582 examples and a test set of 462 examples, see [Robinson89]. 
We obtained 65% correctly classified test samples with only 108 neural units in the DCS- 
GCS network. This is superior to conventional models (including single and multi layer 
perceptron, Kanerva Model, Radial Basis Functions, Gaussian Node Network, Square 
Node Network and Nearest Neighbor) for which figures well below 57% have been report- 
ed by Robinson. It also qualitatively compares to GCS (jumps above the 60% margin), for 
which Fritzke reports best classification results of 61%(158 u nits) up to 67 % (154 units) for 
a 3-dim GCS. On the other hand, our best DCS-GCS used much fewer units. Note that 
DCS-GCS did not rely on a pre-specified connection structure (but learned it!). 
Our last simulation concerns a data set used by Gorman and Sejnowski in their study of 
classification of sonar data, [Gorman88]. The training and the test set contain 104 examples 
each. 
Gorman and Sejnowski report their best results of 90.4% correctly classified test examples 
for a standard BP network with 12 hidden units and 82.7% for a nearest neighbor classifier. 
Supervised DCS-GCS reached a peak classification rate of 95% after only 88 epochs of 
training. 
4 Conclusion 
We have introduced the idea of RBF networks which concurrently learn and utilize perfect- 
ly topology preserving feature maps for adaptation and interpolation. This family of ANNs, 
which we termed Dynamic Cell Structures, offers conceptual advantage compared to clas- 
sical Kohonen type SOMs since the emerging lateral connection structure maximally pre- 
serves topology. We have discussed the DCS-GCS algorithm as an instance of DCS. 
Compared to its ancestor GCS of Fritzke, this algorithm elegantly avoids computational 
overhead for handling sophisticated data structures. If connection updates (eq.(1)) are re- 
strjcted to the best matching unit and its neighbors, DCS has linear (s_e,rial) time complexi- 
ty" and thus may also be considerqd as an improvement of Martinetz s Neural Gas idea 7. 
Space complexity of DCS is O (N ~) in general and can be shown to become linear if the 
feature manifold M is two dimensional. The simulations on CMU-Benchmarks indicate 
that DCS indeed has practical relevance for classification and approximation. 
Thus encouraged, we look forward to apply DCS at various sites in our active computer 
vision project, including image compression by dynamic vector quantization, sensorimotor 
maps for the oculomotor system and hand-eye coordination, cartography and associative 
memories. A recent application can be found in [Bruske95] where a DCS network attempts 
to learn a continous approximation of the Q-function in a reinforcement learning problem. 
6. Here we refer to the serial time a DCS algorithm needs to process a single stimulus (including 
response calculation and adaptation). 
7. The serial time complexity of the Neural Gas is f2 (N) , approaching O (NlogP0 for 
k - N, k the number of nearest neighbors. 
504 JOrg Bruske, Gerald Sommer 
References 
[Bruske94] J. Bruske and G. Sommer, Dynamic Cell Structures: Radial Basis Function Networks 
with Perfect Topology Preservation, Inst. f. Inf. u. Prakt. Math, CAU zu Kiel, Technical Report 9403. 
[Bruske95] J. Bruske, I. Ahrns and G. Sommer, Heuristic Q-Learning, submitted to ECML 95. 
[Fahlman90] S.E. Fahlman, C.Lebiere, The Cascade-Correlation Learning Architecture, Advances 
in Neural Information processing systems 2, Morgan Kaufman, San Mateo, pp.524-534. 
[Fahlman93] S.E. Fahlman, CMU Benchmark Collection for Neural Net Learning Algorithms, 
Carnegie Mellon Univ., School of Computer Science, machine-readable data repository, Pittsburgh. 
[Fritzke92] B. Fritzke, Growing Cell Structures - a self organizing network in k dimensions, Arti- 
ficial Neural Networks 2, I.Aleksander & J.Taylor eds., North-Holland, Amsterdam, 1992. 
[Fritzke93] B. Fritzke, Growing Cell Structures - a self organizing network for unsupervised and 
supervised training, ICSI Berkeley, Technical Report, tr-93-026. 
IGorman88] R.B. Gorman and T.J. Sejnowski, Analysis of Hidden Units in a Layered Network 
Trained to Classify Sonar Targets, Neural Networks, Vol. 1, pp. 75-89 
[Lang89] K.J. Lang & M.J. Witbrock, Learning to tell two spirals apart, Proc. of the 1988 Connec- 
tionist Models Summer School, Morgan Kaufmann, pp.52-59. 
[Martinetz92] Thomas Martinetz, Selbstorganisierende neuronale Netzwerke zur Bewegungss- 
teuerung, Dissertation, DIFKI-Verlag, 1992. 
[Martinetz93] Thomas Martinetz, Competitive Hebbian Learning Rule Forms Perfectly Topology 
Preserving Maps, Proc. of the ICANN 93, p.426-438, 1993. 
[Martinetz94] Thomas Martinetz and Klaus Schulten, Topology Representing Networks, Neural 
Networks, No. 7, Vol. 3, pp. 505-522, 1994. 
[Moody89] J.Moody, C.J. Darken, Fast Learning in Networks of Locally-Tuned Processing Units, 
Neural Computation Vol. 1 Num.2, Summer 1989. 
[Robinson89] A.J. Robinson, Dynamic Error Propagation Networks, Cambridge Univ., Ph.D. the- 
sis, Cambridge. 
[Viilmann94] T. Villmann and R. Der and T. Martinetz, A NovelApproach to Measure the Topology 
Preservation of Feature Maps, Proc. of the ICANN 94, 1994. 
