Optimizing Cortical Mappings 
Geoffrey J. Goodhill 
The Salk Institute 
10010 North Torrey Pines Road 
La Jolla, CA 92037, USA 
Steven Finch 
Human Communication Research Centre 
University of Edinburgh, 2 Buccleuch Place 
Edinburgh EH8 9LW, GREAT BRITAIN 
Terrence J. Sejnowski 
The Howard Hughes Medical Institute 
The Salk Institute for Biological Studies 
10010 North Torrey Pines Road, La Jolla, CA 92037, USA 
& 
Department of Biology, University of California San Diego 
La Jolla, CA 92037, USA 
Abstract 
"Topographic" mappings occur frequently in the brain. A pop- 
ular approach to understanding the structure of such mappings 
is to map points representing input features in a space of a few 
dimensions to points in a 2 dimensional space using some self- 
organizing algorithm. We argue that a more general approach 
may be useful where similarities between features are not con- 
strained to be geometric distances, and the objective function for 
topographic matching is chosen exphcitly rather than being spec- 
ified implicitly by the self-organizing algorithm. We investigate 
analytically an example of this more general approach applied to 
the structure of interdigitated mappings, such as the pattern of 
ocular dominance columns in primary visual cortex. 
1 INTRODUCTION 
A prevalent feature of mappings in the brain is that they are often "topographic". 
In the most straightforward case this simply means that neighbouring points on 
a two-dimensional sheet (e.g. the retina) are mapped to neighbouring points in a 
more central two-dimensional structure (e.g. the optic tectum). However a more 
complex case, still often referred to as topographic, is the mapping from an abstract 
space of features (e.g. position in the visual field, orientation, eye of origin etc) to 
Optimizing Cortical Mappings 331 
the cortex (e.g. layer 4 of V1). In many cortical sensory areas, the preferred sensory 
stimuli of neighbouring neurons changes slowly, except at discontinuous jumps, 
suggestive of an optimization principle that attempts to match "similar" features 
to nearby points in the cortex. In this paper, we (1) discuss what might constitute 
an appropriate measure of similarity between features, (2) outline an optimization 
principle for matching the similarity structure of two abstract spaces (i.e. a measure 
of the degree of topography of a mapping), and (3) use these ideas to analyse the 
case where two equivalent input variables am mapped onto one target structure, 
such as the "ocular dominance" mapping from the right and left eyes to V1 in the 
cat and monkey. 
2 SIMILARITY MEASURES 
A much-investigated computational approach to the study of mappings in V1 is 
to consider the input features as points in a multidimensional euclidean space 
[1, 5, 9]. The input dimensions then consist of e.g. spatial position, orientation, 
ocular dominance, and so on. Some distribution of points in this space is assumed 
which attempts, in some sense, to capture the statistics of these features in the visual 
world. For instance, in [5], distances between points in the space are interpreted 
as a decreasing function of the degree to which the corresponding features are 
correlated over an ensemble of images. Some self-organizing algorithm is then 
applied which produces a mapping from the high-dimensional feature space to 
a two-dimensional sheet representing the cortex, such that nearby points in the 
feature space map to nearby points in the two-dimensional sheet. 1 
However, such approaches assume that the dissimilarity structure of the input 
features is well-captured by euclidean distances in a geometric space. There is 
no particular mason why this should be true. For instance, such a representation 
implies that the dissimilarity between features can become arbitrarily large, an 
unlikely scenario. In addition, it is difficult to capture higher-order relationships in 
such a representation, such as that two oriented line-segment detectors will be more 
correlated if the line segments are co-linear than if they are not. We propose instead 
that, for a set of features, one could construct directly from the statistics of natural 
stimuli a feature matrix representing similarities or dissimilarities, without regard 
to whether the resulting relationships can be conveniently captured by distances in 
a euclidean feature space. There are many ways this could be done; one example is 
given below. Such a similarity matrix for features can then be optimally matched 
(in some sense) to a similarity matrix for positions in the output space. 
A disadvantage from a computational point of view of this generalized approach is 
that the self-organizing algorithms of e.g. [6, 2] can no longer be applied, and pos- 
sibly less efficient optimization techniques am required. However, an advantage 
of this is that one may now explore the consequences of optimizing a whole range 
of objective functions for quantifying the quality of the mapping, rather than hav- 
ing to accept those given explicitly or implicitly by the particular self-organizing 
algorithm. 
We mean this in a rather loose sense, and wish to include here the principles of mapping 
nearby points in the sheet to nearby points in the feature space, mapping distant points in 
the feature space to distant points in the sheet, and so on. 
332 G.J. GOODHILL, S. FINCH, T.J. SEJNOWSKI 
Vi n Vout 
Figure 1: The mapping flamework. 
3 OPTIMIZATION PRINCIPLES 
We now outline a general framework for measuring to what degree a mapping 
matches the structure of one similarity matrix to that of another. It is assumed that 
input and output matrices are of the same (finite) dimension, and that the mapping 
is bijective. Consider an input space Vi and an output space Vox, t, each of which 
contains N points. Let M be the mapping from points in V to points in Vox, t (see 
figure 1). We use the word "space" in a general sense: either or both of Vi and 
Vox, t may not have a geometric interpretation. Assume that for each space there is 
a symmetric "similarity" function which, for any given pair of points in the space, 
specifies how similar (or dissimilar) they are. Call these functions F for Vi and G 
for Vox, t. Then we define a cost functional C as follows 
N 
c = 5- 5- 
i=! 
(1) 
where  and j label points in Vi, and M([) and M(j) are their respective images in 
Vox, t. The sum is over all possible pairs of points in Vi. Since M is a bijection it is 
invertible, and C can equivalently be written 
N 
c = 5- 5- 
i=I j<i 
(2) 
where now  and j label points in Vox, t, and h/t- is the inverse map. A good (i.e. 
highly topographic) mapping is one with a high value of C. However, if one of F or 
G were given as a dissimilarity function (i.e. increasing with decreasing similarity) 
then a good mapping would be one with a low value of C. How F and G are defined 
is problem-specific. 
C has a number of important properties that help to justify its adoption as a 
measure of the degree of topography of a mapping (for more details see [3]). For 
instance, it can be shown that if a mapping that preserves ordering relationships 
between two similarity matrices exists, then maximizing C will find it. Such maps 
are homeomorphisms. However not all homeomorphisms have this propert3 
so we refer to such "perfect" maps as "topographic homeomorphisms". Several 
previously defined optimization principles, such as m'mimum path and m'mimum 
Optimizing Cortical Mappings 333 
wiring [1], are special cases of C. It is also closely related (under the assumptions 
above) to Luttrell's minimum distortion measure [7], if F is euclidean distance in a 
geometric input space, and G gives the noise process in the output space. 
4 INTERDIGITATED MAPPINGS 
As a particular application of the principles discussed so far, we consider the case 
where the similarity structure of Vi can be expressed in matrix form as 
Qc Qs 
where Qs and Qc are of dimension N/2. This means that Vi consists of two 
halves, each with the same intemal similarity structure, and an in general different 
similarity structure between the two halves. The question is how best to match 
this dual similarity structure to a single similarity structure in Vot. This is of 
mathematical interest since it is one of the simplest cases of a mismatch between 
the similarity structures of Vi and Vot, and of biological interest since it abstractly 
represents the case of input from two equivalent sets of receptors coming together 
in a single cortical sheet, e.g. ocular dominance columns in primary visual cortex 
(see e.g. [8, 5]). For simplicity we consider only the case of two one-dimensional 
retinae mapping to a one-dimensional cortex. 
The feature space approach to the problem presented in [5] says that the dissim- 
ilarities in Vi are given by squared euclidean distances between points arranged 
in two parallel rows in a two-dimensional space. That is, 
I-Jl 2  ,, j in same half of V 
F(,j)= I-j-N/212+k 2  , j in different halves of V (3) 
assuming that indices 1 ... N/2 give points in one half and indices N/2 + 1 ... N 
give points in the other half. G (, j) is given by 
I 1  ,j neighbouring 
G(i, j) = 0  otherwise (4) 
It can be shown that the globally optimal mapping (i.e. m'mimum of C) when k > 1 
is to keep the two halves of Vin entirely separate in Vot [5]. However, there is also a 
local minimum for an interdigitated (or "striped") map, where the interdigitations 
have width rt = 2k. By varying the value of k it is thus possible to smoothly vary 
the periodicity of the locally optimal striped map. Such behavior predicted the 
outcome of a recent biological experiment [4]. For k < 1 the globally optimal map 
is stripes of width rt = 2. 
However, in principle many alternative ways of measuring the similarity in Vi 
are possible. One obvious idea is to assume that similarity is given directly by the 
degree of correlation between points within and between the two eyes. A simple 
assumption about the form of these correlations is that they are a gaussian function 
of physical distance between the receptors (as in [8]). That is, 
e-0li-il   i, j in same half of V 
F(,j) = ce -oli-i-n/21'  ,j in different halves of Vi (5) 
with c < 1. We assume for ease of analysis that G is still as given in equation 4. 
This directly implements an intuitive notion put forward to account for the inter- 
digitation of the ocular dominance mapping [4]: that the cortex tries to represent 
334 G.J. GOODHILL, S. FINCH, T.J. SEJNOWSKI 
similar inputs close together, that similarity is given by the degree of correlation 
between the activities of points (cells), and additionally that natural visual scenes 
impose a correlational structure of the same qualitative form as equation 5. We 
now calculate C analytically for various mappings (c.f. [5]), and compare the cost 
of a map that keeps the two halves of Vi entirely separate in Vout to those which 
interdigitate the two halves of Vi with some regular periodicity. The map of the 
first type we consider will be refered to as the "up and down" map: moving from 
one end of Vout to the other implies moving entirely through one half of Vi, then 
back in the opposite direction through the other half. For this map, the cost Cud is 
given by 
Cud '- 2(N - 1)e - + c. (6) 
For an interdigitated (striped) map where the stripes are of width rt > 2: 
(7) 
where for rt even f(rt) = g(rt) = (_)2 and for rt odd f(r 0 = (-2) 2, g(rt) = 
( _2)2 To characterize this system we now analyze how the rt for which Cs (rt) has 
a local maximum varies with c, 0c, fi, and when this local maximum is also a global 
maximum. Setting dCs() _ 0 does not yield analytically tractable expressions 
(unlike [5]). However, more direct methods can be used: there is a local maximum 
at rt if Cs(rt - 1) < C(rt) > C(rt+ 1). Using equation 7we derive conditions on c 
for this to be true. For rt odd, we obtain the condition c < c < c2 where c = c2; 
that is, there are no local maxima at odd values of ft. For rt even, we also obtain 
C! < C < C2 where now 
2e -0 
c = rte_(,_)  _ (r[- 2)e-('-)  
and c2(rt) = c(rt + 2). c(rt) and c2(rt) are plotted in figure 2, from which one 
can see the ranges of c for which particular rt are local maxima. As f5 increases, 
maxima for larger values of rt become apparent, but the range of c for which they 
exist becomes rather small. It can be shown that Cud is always the global maximum, 
except when e-0 > c, when rt = 2 is globally optimal. As c decreases the optimal 
stripe width gets wider, analogously to k increasing in the dissimilarities given by 
equation 3. When f5 is such that there is no local maximum the only optimum is 
stripes as wide as possible. This fits with the intuitive idea that if corresponding 
points in the two halves of Vi (i.e. li - J l = N/2) are sufficiently similar then it is 
favorable to interdigitate the two halves in Vout, otherwise the two halves are kept 
completely separate. 
The qualitative behavior here is similar to that for equation 3. rt = 2 is a global 
optimum for large c (small k), then as c decreases (k increases) rt = 2 first becomes a 
local optimum, then the position of the local optimum shifts to larger ft. However, 
an important difference is that in equation 3 the dissimilarities increase without 
limit with distance, whereas in equation 5 the similarities tend to zero with dis- 
tance. Thus for equation 5 the extra cost of stripes one unit wider rapidly becomes 
negligible, whereas for equation 3 this extra cost keeps on increasing by ever larger 
amounts. As rt - oc>, Cud ,,o Cs(11. ) for the similarities defined by equation 5 (i.e. 
there is the same cost for traversing the two blocks in the same direction as in the 
opposite direction), whereas for the dissimilarities defined by equation 3 there is a 
quite different cost in these two cases. That F and G should tend to a bounded value 
as i and j become ever more distant neighbors seems biologically more plausible 
than that they should be potentially unbounded. 
Optimizing Cortical Mappings 335 
(a) 
1.0 
0.9 
0.8 
0.7' 
0.6- 
0.5- 
0.4- 
0.3- 
0.2- 
0.1 - 
0.0 
0 
1.0 - 
0.9- 
0.8- 
0.7'- 
0.6- 
0.,5- 
0.4- 
0.3- 
0.2- 
0.1 - 
Co) 
2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 
n n 
Figure 2: The ranges of c for which particular r are local maxima. (a) 0c = fi = 0.25. (b) 
0c = 0.25, f5 = 0.1. When the c2 (dashed) line is below the c (solid) line no local maxima 
exist. For each (even) value of r to the left of the crossing point, the vertical range between 
the two lines gives the values of c for which that r is a local maximum. Below the solid line 
and to the right of the crossing point the only maximum is stripes as wide as possible. 
Issues such as those we have addressed regarding the transition from "triped" to 
"blocked" solutions for combining two sets of inputs distinguished by their intra- 
and inter-population similarity structure may be relevant to understanding the 
spatial representation of functional attributes across cortex. The results suggest 
the hypothesis that two variables are interdigitated in the same area rather than 
being represented separately in two distinct areas if the inter-population similarity 
is sufficiently high. An interesting point is that the striped solutions are often 
only local optima. It is possible that in reality developmental constraints (e.g. a 
chemically defined bias towards overlaying the two projections) impose a bias 
towards finding a striped rather than blocked solution, even though the latter may 
be the global optimum. 
5 DISCUSSION 
We have argued that, in order to understand the structure of mappings in the 
brain, it could be useful to examine more general measures of similarity and of 
topographic matching than those implied by standard feature space models. The 
consequences of one particular alternative set of choices has been examined for the 
case of an interdigitated map of two variables. Many alternative objective functions 
for topographic matching are of course possible; this topic is reviewed in [3]. Two 
issues we have not discussed are the most appropriate way to define the features 
of interest, and the most appropriate measures of similarity between features (see 
[10] for an interesting discussion). 
A next step is to apply these methods to more complex structures in V1 than just the 
ocular dominance map. By examining more of the space of possibilities than that 
occupied by the current feature space models, we hope to understand more about 
the optimization strategies that might be being pursued by the cortex. Feature 
space models may still turn out to be more or less the right answer; however even 
if this is true, our approach will at least give a deeper level of understanding why. 
336 G.J. GOODHILL, S. FINCH, T.J. SEJNOWSKI 
Acknowledgements 
We thank Gary Blasdel, Peter Dayan and Paul Viola for stimulating discussions. 
References 
[10] 
[1] Durbin, R. & Mitchison, G. (1990). A dimension reduction framework for 
understanding cortical maps. Nature, 343, 644-647. 
[2] Durbin, R. & Willshaw, D.J. (1987). An analogue approach to the travelling 
salesman problem using an elastic net method. Nature, 326, 689-691. 
[3] Goodhill, G. J., Finch, S. & Sejnowski, T. J. (1995). Quantifying neighbour- 
hood preservation in topographic mappings. Institute for Neural Com- 
putation Technical Report Series, No. INC-9505, November 1995. Avail- 
able from ftp://salk.edu/pub/geoff/goodhill_finch_sejnowski_tech95.ps.Z 
or http://cnl.salk.edu/,,,geoff. 
[4] Goodhill, G.J. & Ltiwel, S. (1995). Theory meets experiment: correlated neu- 
ral activity helps determine ocular dominance column periodicity. Trends in 
Neurosciences, 18, 437-439. 
[5] Goodhill, G.J. & Willshaw, D.J. (1990). Application of the elastic net algorithm 
to the formation of ocular dominance stripes. Network, 1, 41-59. 
[6] Kohonen, T. (1982). Self-organized formation of topologically correct feature 
maps. Biol. Cybern., 43, 59-69. 
[7] Luttrell, S.P. (1990). Derivation of a class of training algorithms. IEEE Trans. 
Neural Networks, 1, 229-232. 
[8] Miller, K.D., Keller, J.B. & Stryker, M.P. (1989). Ocular dominance column 
development: Analysis and simulation. Science, 245, 605-615. 
[9] Obermayer, K., Blasdel, G.G. & Schulten, K. (1992). Statistical-mechanical 
analysis of self-organization and pattern formation during the development 
of visual maps. Phys. Rev. A, 45, 7568-7589. 
Weiss, Y. & Edelman, S. (1995). Representation of similarity as a goal of early 
sensory coding. Network, 6, 19-41. 
