828 Cowan 
Neural networks: the early days 
J.D. Cowan 
Department of Mathematics, Committee on 
Neurobiology, and Brain Research Institute, 
The University of Chicago, 5734 S. Univ. Ave., 
Chicago, Illinois 60637 
ABSTRACT 
A short account is given of various investigations of neural network 
properties, beginning with the classic work of McCulloch & Pitts. 
Early work on neurodynamics and statistical mechanics, analogies with 
magnetic materials, fault tolerance via parallel distributed processing, 
memory, learning, and pattern recognition, is described. 
1 INTRODUCTION 
In this brief account of the early days in neural network research, it is not possible to be 
comprehensive. This article then is a somewhat subjective survey of some, but not all, of 
the developments in the theory of neural networks in the twent-five year period, from 
1943 to 1968, when many of the ideas and concepts were formulated, which define the 
field of neural network research. This comprises work on connections with automata 
theory and computability; neurodynamics, both deterministic and statistical; analogies 
with magnetic materials and spin systems; reliability via parallel and parallel distributed 
processing; modifiable synapses and conditioning; associative memory; and supervised 
and unsupervised learning. 
2 McCULLOCH-PITTS NETWORKS 
The modem era may be said to have begun with the work of McCulloch and Pitts (1943). 
This is too well-known to need commenting on. Let me just make some historical re- 
marks. McCulloch, who was by training a psychiatrist and neuroanatomist, spent some 
twenty years thinking about the representation of event in the nervous system. From 1941 
to 1951 he worked in Chicago. Chicago at that time was one of the centers of neural of 
Neural Networks: The Early Days 829 
Figurel: Warren McCulloch circa 1962 
network research, mainly through the work of the Rashevsky group in the Committee on 
Mathematical Biology at the University of Chicago. Rashevsky, Landahl, Rapaport and 
Shimbel, among others, carried out many early investigations of the dynamics of neural 
networks, using a mixture of calculus and algebra. In 1942 McCulloch was introduced to 
Walter Pitts, then a 17 year old student of Rashevsky's. Pitts was a mathematical prodigy 
who had joined the Committee sometime in 1941. There is an (apocryphal) story that 
Pitts was led to the Rashevsky group after a chance meeting with the philosopher 
Bertrand Russell, at that time a visitor to the University of Chicago. In any event Pitts 
was already working on algebraic aspects of neural networks, and it did not take him long 
to see the point behind McCulloch's quest for the embodiment of mind. In one of 
McCulloch later essays (McCulloch 1961) he describes the history of his efforts thus: 
My object, as a psychologist, was to invent a least psychic event, or 
"psychon", that would have the following properties: First, it was to be 
so simple an event that it either happened or else it did not happen. 
Second, it was to happen only if its bound cause had happened-shades 
of Duns Scotus!-that is, it was to imply its temporal antecedent. 
Third it was to propose this to subsequent psychons. Fourth, these 
were to be compounded to produce the equivalents of more 
complicated propositions concerning their antecedents...In 1921 it 
dawned on me that these events might be regarded as the all-or- 
nothing impulses of neurons, combined by convergence upon the next 
neuron to yield complexes of propositional events. 
Their subsequent 1943 paper was remarkable in many respects. It is best appreciated 
within the zeitgeist of the era when it was written. As Papert has documented in his 
introduction to a collection of McCulloch's papers (McCulloch 1967), 1943 was a semi- 
830 Cowan 
nal year for the development of the science of the mind. Craik's monograph The Nature 
of Explanation and the paper "Behavior, Purpose and Teleology, by Rosenbleuth, 
Wiener and Bigelow, were also published in 1943. As Papert noted, "The common 
feature [of these publications] is their recognition that the laws governing the 
embodiment of mind should be sought among the laws governing information rather than 
energy or matter". The paper by McCulloch and Pitts certainly lies within this 
framework. 
Figure 2: Walter Pitts circa 1952 
McCulloch-Pitts networks (hence-forth referred to as MP networks), are finite state 
automata embodying the logic of propositions, with quantifiers, as McCulloch wished; 
and permit the framing of sharp hypotheses about the nature of brain mechanisms, in a 
form equivalent to computer programs. This was a remarkable achievement. It 
established once and for all, the validity of making formal models of brain mechanisms, 
if not their veridicality. It also established the possibility of a rigorous theory of mind, in 
that neural networks with feedback loops can exhibit purposive behavior, or as 
McCulloch and Pitts put it: 
both the formal and the final aspects of that activity which we are 
wont to call mental are rigorously deducible from present 
neurophysiology...[and] that in [imaginable networks]..."Mind" no 
longer "goes more ghostly than a ghost". 
2.1 FAULT TOLERANCE 
MP networks were the first designed to perform specific logical tasks; and of course logic 
can be mapped into arithmetic. Landahl, McCulloch and Pitts (1943), for example, 
noted that the arithmetical operations +, 1-, and x can be obtained in MP networks via the 
logical operations OR, NOT, and AND. Thus the arithmetical expression a-a.b = a.(1-b) 
Neural Networks: The Early Days 831 
corresponds to the logical expression a AND NOT b, and more generally, all (finite) 
arithmetical calculations can be implemented in an MP network. But what happens if 
such a network malfunctions from time to time, or is damaged? It was this problem that 
attracted von Neumann. In 1951-2, following conversations with McCulloch, and with 
Bruckner and Gell-Mann, von Neumann took up the problem of designing MP networks 
to function reliably despite malfunctions and failures of their component neurons, or 
misconnected wires, or damage to a portion of the network (von Neumann 1956). 
Von Neumann solved the reliability problem in two differing ways. His first solution was 
to make use of the error correcting properties of majority logic elements. Such an 
element executes the logical function m(a,b,c) = (a AND b) OR (b AND c) OR (c AND 
a). The proceedure is to triplicate each logical function to be executed, i.e.; execute 
each logical function three times in parallel, and then feed the outputs through majority 
logic elements. Let  be the probability of a majority element malfunctioning, and let l 
be the probability of error on any of its input lines. The general result is that provided  
< 0.007, an output error l* equal to about 4 can be achieved. However the 
redundancy is high. Let t be the logical depth of the function to be computed, i.e.; the 
longest serial chain of MP units in the original network. Then about 3 MP neurons are 
required to achieve outputs whose probability of error is about four times the probability 
of error of the MP units. If x = 3 the requisite redundancy is 9:1, but if x = 6 it is about 
700:1, and if gt = 10 it is about 60, 000:1. Von Neumann's second solution to the 
reliability problem was to multiplex, i.e.; use N MP circuits to do the job of one. In such 
networks one bit of information (the choice between "1" and "0") is signaled not by the 
activation of one MP neuron, but instead by the synchronous activation of many MP 
neurons. Let A be a number between 0 and 1. Then "1" is signaled if , the fraction of 
activated MP neurons involved in any job, exceeds A; otherwise "0" is signalled. 
Evidently a multiplexed MP network will function reliably only if  is close to either 0 
or 1. Von Neumann achieved this with networks made up entirely of NAND logic 
elements. Let  now be the probability of such an element malfunctioning. Von 
Neumann then proved that with A = 0.07 and  < 0.0107, l* can be made to decrease 
with increasing N. With g = 0.005, von Neumann showed that 'fl* - aN-1/210-bN, 
where a = 6.4 and b = 8.6'10 -4. It follows that 'fl* can be made less than g provided N > 
2, 000. This is achieved with a redundancy of 3N: 1 that is independent of the logical 
depth gt; thus for logical computations of large depth the method of multiplexing is 
superior to majority logic decoding. 
2.2 PARALLEL DISTRIBUTED PROCESSING 
This solution to the problem of reliable computing with unreliable elements is general 
since the NAND logic element is universal. Is it biologically plausible? The answer 
seems to be negative since real neurons have thousands of synaptic contacts, so that it is 
not necessary to concatenate many NAND elements in circuits of large logical depth to 
implement logical functions of many variables. This suggests that real neurons can 
832 Cowan 
implement logical functions of many variables with a probability of error g not much 
greater than that of the NAND element. This observation motivated Winograd and I 
(1963) to study the limiting case in which g is independent of the logical complexity of 
the function to be implemented. In such a case it is possible to use error-correcting 
codes just as is done in noisy communication channels (Shannon 1948, Hamming 1950) 
in which a message comprising K symbols is transmitted via a signal comprising N 
signals, N - K of which are used by the receiver for error detection and correction. In the 
computing case this implies that K computations be implemented by an MP-network 
comprising N/K times as many elements as would be required in the error-free case. 
The overall effect of such an encoding scheme is to distribute the logical functions to be 
implemented, over the entire MP network. Such a scheme works most efficiently with 
large K and N; in effect in a parallel distributed architecture. Thus the Winograd-Cowan 
scheme is an early example of Parallel Distributed Processing or PDP. Of course it can 
be argued that the scheme is not realistic, in that all the extra coding machinery is 
assumed to be error free. As we have noted this may not be true for simple logical 
elements, but it may be more plausible for real neurons. In unpublished work Winograd 
and Cowan studied this issue in more detail and found a realistic optimal scheme 
involving both multiplexing and PDP (Cowan and Winograd, In preparation). 
These solutions to the fault tolerance problem provided an insight into the way neural 
networks in the brain might function reliably despite damage. Ever since Hughlings 
Jackson's neurological studies of brain-damaged patients (Taylor, 1932), and Lashley's 
demonstration of the spared cognitive abilities of brain-damaged rats (Lashley, 1942), it 
has become apparent that, although different regions of the brain are specialized for 
differing functions, the scale of such a localization of function need not extend down to 
single neurons. In terms of the von Neumann-Winograd-Cowan analysis, the 
representation of a bit of information need not be unary, but may be redundant or even 
distributed. There has been much debate on this point. Lashley, for example, proposed 
that different brain regions are equipotent with respect to function (Lashley, 1950)--any 
region can implement a given task--the very antithesis of regional localization. More 
recently, Barlow (1972) asserted that the level of redundancy in brain functioning is 
reduced, the further one moves from peripheral to central regions of the brain, 
culminating in a unary representation of information deep in the brain. In current 
terminology, we speak of "grandmother" neurons, supposedly activated only when 
grandmother is perceived. 
2.3 CELL ASSEMBLIES AND NEURODYNAMICS 
Lashley's notion of the equipotentiality of brain regions is reflected in the work of Hebb 
(1949). In his book The Organization of Behavior, Hebb proposed that the connectivity 
of the brain is continually changing as an organism learns differing functional tasks, and 
that cell assemblies are created by such changes. Hebb followed up an early suggestion 
of Cajal and introduced his now famous postulate: repeated activation of one neuron by 
another, across a particular synapse, increases its conductance. It follows that groups of 
Neural Networks: The Early Days 833 
weakly connected cells, if synchronously activated, will tend to organize into more 
strongly connected assemblies. Here again, the representation of a bit of information is 
distributed. Hebb's book has proved to be very influentual. The cell assembly theory has 
triggered many investigations of learning in neural networks and of the way in which 
synchronized neural activity is generated and propagated. Studies of this topic, now 
known as neurodynamics, began with Rashevsky (1938). Rashevsky and his co-workers 
represented activation and propagation in neural networks in terms of differential 
equations and tried to make contact with related applications to physical problems. A 
more elaborate mathematical approach to neurodynamics was introduced by Wiener 
(1949) in his influential book, Cybernetics, or Control and Communication in the Animal 
and the Machine, and continued with Rosenbleuth, Pitts, and Garcia-Ramos (1952) in a 
series of investigations of reverberations in excitable networks. 
2.4 CONTINUUM NEURODYNAMICS 
It was Beurle (1956) however, who first provided a detailed analysis of the triggering and 
propagation of large-scale brain activity. However Beurle focussed, not on the activation 
of individual neurons, but on the proportion of neurons becoming activated per unit time 
in a given volume element of a slab of model brain tissue consisting of randomly con- 
nected neurons (see Fig. 3). In modem terms, this is the continuum approximation of 
neural activity. Beurle's work triggered many computer simulations of randomly 
connected neural networks. Farley and Clark (1961), for example, simulated the action of 
1024 randomly connected model neurons, somewhat more complicated and realistic than 
McCulloch-Pitts neurons. They confirmed the Wiener-Beurle deduction of the existence 
of traveling and rotating waves in nerve tissue. 
Figure 3. Computer simulation of activity in a sheet of neural tissue. 
White denotes regions of activated neurons; black, regions of neurons 
which have just been activated and are therefore insenstive, and gray, 
regions in which neurons have recovered and are again sensitive to 
incoming excitation (Reproduced from Beurle 1962). 
834 Cowan 
3 MEMORY & LEARNING 
3.1 ANALOGIES WITH LATTICE SPIN SYSTEMS 
Following Hebbs work, perhaps the most stimulating suggestion concerning the 
properties of cell assemblies was that of Cragg and Temperley (1954), who noted that, 
just as neurons can be either activated (emitting action potentials), or quiescent (at rest), 
so can atoms in an assembly or lattice be in one of two energetic states, e.g., with spins 
pointing "up" or "down". Furthermore, just as neurons either excite or inhibit one 
another, so do spinning atoms exert magnetic forces on their neighbors tending to set 
their spins in either the same or opposite direction. Therefore the properties of neurons in 
a densely connected network should be analogous to those of spinning atoms (or binary 
alloys) in a lattice. Systems of spins showing various kinds of order provide good 
models of the properties of magnetic materials. For example, a ferromagnet, which 
consists of atoms tending to force each other to spin in the same direction, has long-range 
order; an antiferromagnet, which consists of atoms tending to force each other to spin in 
the opposite direction, also has long-range order, whereas a paramagnet is disordered. It 
is plausible that neural networks should exhibit analogous properties. Cragg and 
Temperley therefore suggested (a) that the domain patterns which are a ubiquitous 
feature of ferromagnets, comprising patches of "up" or "down" spins, should show up in 
neural networks as patches of excited or quiescent neurons, and (b) that neural networks 
should show effects similar to ferromagnetic hysteresis in transitions between disordered 
and ordered states. This implied that neural domain patterns, once triggered by external 
stimuli, should be stable against spontaneous random activity, and could therefore con- 
stitute a memory of the stimulus (Cragg & Temperley, 1955). It is interesting to note that 
20 years later Little (1974) arrived at virtually the same conclusions as Cragg and 
Temperley concerning the existence of persistent neural states, via the mathematical 
analysis of a lattice spin system. 
3.2 MODIFIABLE SYNAPSES 
It was Hebb's proposal of synaptic modification during learning, however, that triggered 
even more work on neural networks; specifically on adaptive networks which could learn 
to perform specified tasks. Early work toward this goal was carried out by Uttley (1956), 
who demonstrated that neural networks with modifiable connections could indeed learn 
to classify simple sets of binary patterns into equivalence classes. Uttley's first suggestion 
was that synaptic weights represent conditional probabilities. Let ui(t ) be the binary 
variable representing the state of the ith neuron at time t, and 0 Iv] the Heaviside step 
function Let time be measured in quantal units At, t = nat. As is well-known, the acti- 
vation of an MP neuron can be expressed by the equation: ui(n + 1) = 0 [jwijuj(n ) - 
VTH], where wij is the "weight" of the (j --> i) th connection, and where VTH is the volt- 
age threshold. Suppose instead that ui(t+At ) =0 [jwije j(t) - VTH] where e i(t) =kUi(t- 
tk)l :- 1 e-(t-tk)/l:, l: is the neural membrane time-constant and tk denotes the time of 
Neural Networks: The Early Days 835 
of arrival of an incoming current impulse. This is the so-called "leaky integrate and fire" 
or LIF neuron, later formally analyzed by Caianiello (1961). By Uttley's hypothesis wij 
= - k log2 [eij (0/ej(0], a time-averaged representation of - log2 Pr[ui(0/uj0]. Utfley 
built a hydraulic computer to calculate such probabilities (see fig. 5 for details), and 
demonstrated that such machinary could perform simple pattern classification after a 
period of conditioning. In later work Uttley (1966) introduced the hypothesis that wij = 
- klog2 [gij(t)/gi(0gj(0], where gi(t) is a further weighted time average of e i(t): thus wij is 
proportional to the mutual information provided by the ith and jth firing patterns. 
Sz S S R R 
s,, 
C C C. C 
S 
() 
A 
(d) 
Figure 4: A hydraulic computer of conditional probability. (a) A 
counter [of Pr{A}] using an exponential scale. A siphon R connects 
two containers C1 and C2; normally tap S! is off and S2 is on. To effect 
a count S! is turned on and S2 off. C1 then empties. The two taps are 
returned to their original positions and a fixed fraction of the liquid in 
C2 is siphoned into C1. The height of the liquid in C2 is the measure of 
the total count. (b) An equivalent electrical counter. (c) A simplified 
electrical counter. (d) A conditional certainty computer which indicates 
when the conditional probability Pr{AB/A} exceeds a critical value. 
(Reproduced from Uttley 1959). 
836 Cowan 
3.3 ASSOCIATIVE MEMORY 
Another topic which was investigated in the 1950s is associative memory, beginning 
with the work of W.K. Taylor (1956). Fig. 5 shows Taylor's original network. Note it's 
structural similarity to an elementary Perceptton with no hidden units, except that the 
units are not M-P neurons, but analog devices operating in the fashion shown in Fig. 6. 
Ioto;' eu/pu! c#11s 
Outputs Znput$ 
Figure 5: Taylor network. This network uses analog neurons with 
modifiable weights, and can be trained to associate differing sets of 
stimulus patterns, see text for details (Reproduced from Taylor 1956). 
The training procedure also differs from that in Perceptrons and Adalines: it is simply 
Hebb's rule. The network learns to associate differing sensory patterns through repeated 
presentation of pairs of patterns, one of which initially elicits a motor response. 
Eventually the other pattern triggers the response. Thus Taylor networks exhibit 
simplePavlovian conditioning, and the associated memory is stored in a distributed 
fashion in the pattern of weights. In later work (Taylor, 1964) Taylor constructed a more 
elaborate network in which motor output units inhibit each other, in modern parlance, a 
"winner-take-all circuit". Such a network is capable of forming associations with paired 
stimuli in a more reliable and controllable way than the earlier network, and also of 
pattern discrimination in the style of Perceptrons and Adalines. Taylor suggested that the 
association areas of cerebral cortex and thalamus contained such networks. 
Neural Networks: The Early Days 837 
Figure 6: Analog neurons used by Taylor. The output firing rate in- 
creases smoothly with increasing input current. There is also a stored 
quantity which increases with the firing rate, equivalent to a lowering 
of the threshold. (Reproduced from Taylor 1956). 
z  3  
Figure 7: Structure of Steinbuch's Learning Matrix. It consists of a pla- 
nar array of switches, each of which can be either open or closed. Each 
switch is connected to a receptor. However there are switches for both 
receptor ON and receptor OFF configurations. The switches control a 
set of relays, connected together in a winner-take-all circuit as in a Tay- 
lor network. The correspondence between the elements of a Learning 
matrix--receptors, switches and relays, and those of Percepttons and 
Taylor networks is evident (Reproduced from Steinbuch 1961). 
838 Cowan 
Shortly before this a very similar network was introduced by K. Steinbuch (1961), the 
"Learning Matrix" (see Fig. 7). It consists of a planar network of switches interposed 
between arrays of "sensory" receptors and "motor" effectors. As in Taylor's scheme, the 
network learns to associate sensory with motor patterns. The associated memory is again 
stored in the pattern of opened switches. 
3.4 PERCEPTRONS AND ADALINES 
Some fifteen years after the publication of McCulloch and Pitts' paper, a major approach 
to the pattern recognition problem was introduced by Rosenblatt (1958) in his work on 
the Perceptron. Shortly thereafter Widrow and Hoff (1960) introduced the Adaline. As 
is well-known, the only difference between Perceptrons and Adalines lies in the training 
procedure. What is not appreciated is the confusion which such results generated in the 
late 1950s and early 1960s. It was not at all clear what had been accomplished. I 
remember vividly Rosenblatt's first lecture on the Perceptron at MIT, in the fall of 1958. 
To put it mildly, the lecture was not well-received. However Novikoff's proof of the 
Perceptron convergence theorem (Novikoff 1963) clarified things somewhat, although 
the initial claims of the agency sponsoring Rosenblatt's work (ONR), left a residue of 
disbelief, particularly at MIT. This led to the demonstration by Minsky and Papert 
(1969) that there are limits to the performance of elementary Perceptrons and Adalines. 
They proved that such devices are not computationally universal, even with modifiable 
connections. In addition, they conjectured that hidden units in multilayer Perceptrons 
cannot be trained, or in other words, that the problem of assigning credit to hidden units 
is unsolvable. Rosenblatt (1961) had almost solved the problem with his 
backpropagation scheme, but as we all know, he had the wrong neurons. 
Interestingly, Adalines had been anticpated somewhat in the work of Gabor (1954), one 
of the early pioneers of communication theory and cybernetics, and the inventor of 
holography, who also invented the "Learning filter". This operates in the following 
fashion. Let s(t) be a signal of bandwidth F, and let so, Sl, s2, .... , SN be past samples of 
s(t). Then the output of a learning filter can be written in the form: 
N NN 
O(s)='.w isi + '. '. wijsisj + ... , 
0 0 0 
where the coefficients wij are adjusted via gradient descent to minimize <(O(s) - Sd)2>, 
where sd is the desired output. If s(t) is a noisy message, then sd can be the pure message, 
and O(s) will be a filtered version of s(t), or sd can be retarded, in which case O(s) will 
predict sd. The learning filter is clearly analogous to a network of input, output, and 
hidden units, and it seems that Gabor had, in effect, solved the credit assignment 
problem! 
4 ANALOG NEURAL NETWORKS 
The MP neuron is of course a very simplified representation of real neural properties. 
Since 1943 it has been extended and elaborated in a number of ways. Perhaps the first 
significant extension was the LIF neuron. Neither neural model generates network 
equations which are mathematically tractable. It was for this reason that I introduced the 
Neural Networks: The Early Days 839 
sigmoid firing characteristic (Cowan, 1967) and the smooth firing condition ei(t+ At) = 
[2:Yjwijej( 0 + h i] where e i(0 is the firing rate of the ith element (a time-averaged 
version of u i(t)), t is measured in units of 2:, hi( 0 is an additional external stimulus, and 
[x] is the logistic function. The differential equation version of this condition is of 
course the equation 2:dei/dt = - e i + [2:%jkwijej(0 + hi]. One of my colleagues at 
Imperial College, J.J. Sparkes, then devised the transistor circuit shown in fig. 8 to 
implement the function q. 
V. o 
kT I1 
v. - 
 e 1_i2 
T 1 
lo -- 1 
T 2 
Figure 8: A transistor circuit which implements the logistic function 
, using the (approximately) exponential collector current- base emitter 
voltage characteristic of a transistor (Sparkes, personal communication 
circa 1965). 
4.1 STATISTICAL NEUROMECHANICS 
I made the further observation (Cowan 1968) that if wij = - wji, wij = 0, and if 2: is 
large, then this equation can be rewritten in the form: 
dv i 8G 
dT- . wiJsvj 
where vi =[2:Y. jkwijej(t) + hi] and G = Z{log[l+ qiexp(vi)] - qivi} ,h i + . wijq j =0, is 
i j 
a "constant of motion" of the network. Since the matrix iW, W = {wij }, is Hermitian, 
we can form the block diagonal matrix W', made out of blocks of the form: 
840 Cowan 
by way of the congruence transformation W' = AW,, where A is built up from the 
dyi 0G 
eigenvectors of W. Then if Yi = Avi, we have the Hamiltonian equations dT - 0yj' 
The physical content of this result is that a network of neural-like elements with skew- 
symmetric coupling constants can generate neutrally stable oscillations in x i in the range 
between 0 and 1. Moreover, because G is a constant of the motion, one can introduce a 
form of equilibrium statistical mechanics, in which the probability of being in the state 
{Yl, Y2, .... , YN} takes the form Z-lexp[ -otG], where Z = Y exp[-G]. One can then 
{y} 
compute various statistical averages of the behavior of such networks. Of course the 
skew-symmetry of the coupling coefficients w ij is rather artificial, as is the effective ne- 
glect of the damping produced by the term -e i in the original differential equation. For 
similar reasons I did not pursue the study of the symmetric case! 
5 CONCLUSIONS 
It is evident that by the late 1960s most of the ideas and concepts necess,'u'y to solve the 
Perceptron credit assignment problem were already formulated, as were many of the 
ideas underlying Hopfield networks. Why did it take so long? I believe that there are 
two or three reasons for the lag. One was technological. There weren't personal 
computers and work stations to try things out on. For example, when Gabor developed 
the learning filter, it took he and his students a further seven years to implement the filter 
with analog devices. Similar delays obtained for others. The other reason was in part 
psychological, in part financial. Minsky and Papert's monograph certainly did not 
encourage anyone to work on Perceptrons, nor agencies to support them. A third reason 
was that the analogy between neural networks and lattice spins was premature. The 
Sherrington-Kirkpatrick spin-glass was not invented until 1975. 
Acknowledgements 
We thank The University of Chicago Brain Research Foundation and the US Department 
of the Navy, Office of Naval Research, (Grant # N00014-89-J-1099) for partial support 
of this work. 
References: 
Barlow, H.B. (1972) Single units and sensation: a neuron doctrine for perceptual psy- 
chology? Perception 1, 371-394. 
Beurle, R.L. (1956) Properties of a mass of cells capable of regenerating pulses, Phil. 
Trans. Roy. Soc.Lond. B, 240,669, 55-94; (1962) Functional organization in random 
Neural Networks: The Early Days 841 
networks, Principles of Self-Organization (Eds.), Von Foerster, H & Zopf, G.W. Jr., 
Pergamon Press. 
Caianiello, E.R. (1961) Outline of a theory of thought processes and thinking machines, 
J. Theor. Biol., 1, 204-235. 
Cowan, J.D. (1967) A mathematical theory of central nervous activity, Thesis, University 
of London; (1968) Statistical mechanics of nervous nets, Neural Networks, (Ed.), E.R. 
Caianiello, 181-188, Springer-verlag, Berlin; 
Cragg, B.G. & Temperley, H.N.V. (1954) The organisation of neurones: a cooperative 
analogy, EEG Clin.Neurophysiol., 6, 85-92; (1955) Memory: the analogy with fer- 
romagnetic hysteresis, Brain, 78, II, 304-316. 
Craik, K.J.W. (1943) The nature of explanation, Cambridge Univ. Press, Cambridge. 
Farley, B.G. & Clark, W. A. (1961) Activity in networks of neuron-like elements, 
Information Theory, 4, (Ed.) Cherry, E.C., 242-251, Butterworths, London. 
Gabor, D. (1954) Communication theory and cybernetics, IRE Trans., CT-1, 4, 19-31 
Hamming, R.W. (1950) Error detecting and error correcting codes, Bell Syst. Tech. J., 
29, 147-160. 
Hebb, D.O. (1949) The Organization of Behavior, Wiley, New York. 
Landahl, H.D., McCulloch, W.S. & Pitts, W. (1943) A statistical consequence of the 
logical calculus of nervous nets, Bull. Math.Biophys. 5, 135-137. 
Lashley, K.S. (1942) Persistent problems in the evolution of mind, Quart. Rev. Biol., 24, 
1, 28-42; (1950) In search of the engram, Symp. Soc. Expt. Biol., 4, 454-482. 
Little, W.A. (1974) The existence of persistent states in the brain, Math. Biosci., 19, 101- 
120. 
McCulloch, W.S. (1961) What Is a Number, that a Man May Know It, and a Man, that 
He May Know a Number?, General Semantics Bull., 26 - 27, 7-18; (1967) Embodiments 
of Mind, MIT Press, Cambridge Mass. 
McCulloch, W.S.& Pitts, W. (1943) A logical calculus of the ideas immanent in nervous 
activity, Bull. Math. Biophys. 5, 115-133. 
Minsky, M. & Paperr, S. (1969) Perceptrons: an introduction to computational geometry, 
MIT Press. 
Neumann, J. von (1956) Probabilistic logics and the synthesis of reliable organisms from 
unreliable components, Automata Studies, (Eds.) Shannon, C.E. & McCarthy, J. 
Princeton University Press, Princeton, New Jersey, 43-98. 
Novikoff, A. (1963) On convergence proofs for Perceptrons, Syrup. on Mathematical 
Theory of Automata, (Ed.) Fox, J., Polytechnic Press New York (1963), 615-622. 
Rashevsky, N. (1938) Mathematical Biophysics, Univ. of Chicago Press, Chicago. 
Rosenblatt, F. (1958) The Perceptron, a probabilistic model for information storage and 
organization in the brain, Psych. Rev., 62, 386-408; (1961) Principles of Neurodynamics: 
Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington DC. 
Rosenbleuth, A., Wiener, N. & Bigelow, J. (1943) Behavior, Purpose and Teleology, 
Philosophy of Science, 10, 18-24. 
Rosenbleuth, A., Wiener, N., Pitts, W., & Garcia Ramos, J. (1949) A statistical 
analysis of synaptic excitation, J. Cell. Comp. Physiol., 34, 173-205. 
Shannon, C.E, (1948) A mathematical theory of communication, Bell Syst. Tech. J., 27, 
379-423; 623-656. 
842 Cowan 
Steinbuch, K. (1961) Die Lernmatrix, Kybemetik, 1, 1, 36-45. 
Taylor, J. (1932) Selected writings of John Hughlings Jackson, Hodder & Stoughton, 
London, reprinted (1958) New York. 
Taylor, W.K. (1956) Electrical simulation of some nervous system functional activities, 
Information Theory, 3, (Ed.) Cherry, E.C., Butterworths, London, 314-328; (1964) Cor- 
tico-thalamic organization and memory, Proc. Roy. Soc. Lond. B, 159,466-478. 
Uttley, A.M. (1956) A theory of the mechanism of learning based on the computation of 
conditional probabilities, Proc. 1st Int. Conf. on Cybernetics, Namur, Gauthier-Villars, 
Paris; (1959) The design of conditional probability computers, Information & Control 2, 
1-24; (1966) The transmission of information and the effect of local feedback in 
theoretical and neural networks, Brain Research 2, 21-50. 
Widrow, B. & Hoff, M.E. (1960) Adaptive switching circuits, WESCON convention 
record, IV, 96-104. 
Wiener, N. (1948) Cybernetics, or Control and Communication in the Animal and the 
Machine, Wiley, New York. 
Winograd, S. & Cowan, J.D. (1963)Reliable Computation in the Presence of Noise, 
M1T Press, Cambridge, Mass. 
