Information Factorization in 
Connectionist Models of Perception 
Javier It. Movellan 
Department of Cognitive Science 
Institute for Neural Computation 
University of California San Diego 
James L. McClelland 
Center for the Neural Bases of Cognition 
Department of Psychology 
Carnegie Mellon University 
Abstract 
We examine a psychophysical law that describes the influence of 
stimulus and context on perception. According to this law choice 
probability ratios factorize into components independently con- 
trolled by stimulus and context. It has been argued that this pat- 
tern of results is incompatible with feedback models of perception. 
In this paper we examine this claim using neural network models 
defined via stochastic differential equations. We show that the law 
is related to a condition named channel separability and has little 
to do with the existence of feedback connections. In essence, chan- 
nels are separable if they converge into the response units without 
direct lateral connections to other channels and if their sensors are 
not directly contaminated by external inputs to the other chan- 
nels. Implications of the analysis for cognitive and computational 
neurosicence are discussed. 
1 Introduction 
We examine a psychophysical law, named the Morton-Massaro law, and its implica- 
tions to connectionist models of perception and neural information processing. For 
an example of the type of experiments covered by the Morton-Massaro law consider 
an experiment by Massaro and Cohen (1983) in which subjects had to identify syn- 
thetic consonant sounds presented in the context of other phonemes. There were 
two response alternatives, seven stimulus conditions, and four context conditions. 
The response alternatives were/1/and/r/, the stimuli were synthetic sounds gen- 
erated by varying the onset frequency of the third formant, followed by the vowel 
/i/. Each of the 7 stimuli was placed after each of four different context consonants, 
/v/, /s/, /p/, and/t/. Morton (1969) and Massaro independently showed that in a 
remarkable range of experiments of this type, the influence of stimulus and context 
on response probabilities can be accounted for with a factorized version of Luce's 
strength model (Luce, 1959) 
!s(i, k) !c(j, k) 
P(R = k lS = i,C = j) = Evs(i, 1)vc(j, 1)' for (i,j,k) e $ x  x 7. (1) 
Here $, C and R are random variables representing the stimulus, context and the 
subject's response, $,  and 7 are the set of stimulus, context and response al- 
46 J.. R. Movellan and J.. L. McClelland 
ternatives, $(i, k) > 0 represents the support of stimulus i for response k, and 
r/c(j, k) > 0 the support of context j for response k. Assuming no strength param- 
eter is exactly zero, (1) is equivalent to 
P(R=kIS=i,C=j) 
P(R-11$ -i,C -j) 
= \ n(i, t) ] j 
, for all (i, j, k)  $ x  x 7. 
(2) 
This says that response probability ratios factorize into two components, one which 
is affected by the stimulus but unaffected by the context and one affected by the 
context but unaffected by the stimulus. 
2 Diffusion Models of Perception 
Massaro (1989) conjectured that the Morton-Massaro law may be incompatible 
with feedback models of perception. This conjecture was based on the idea that in 
networks with feedback connections the stimulus can have an effect on the context 
units and the context can have an effect on the stimulus units making it impossible 
to factorize the influence of information sources. In this paper we analyze such 
a conjecture and show that, surprisin. gly, the Morton-Massaro law has little to do 
with the existence of feedback and lateral connections. We ground our analysis 
on continuous stochastic versions of recurrent neural networks . We call these 
models diffusion (neural) networks for they are stochastic diffusion processes defined 
by adding Brownian motion to the standard recurrent neural network dynamics. 
Diffusion networks are defined by the following stochastic differential equation 
dYi(t) - !i(Y(t),X) dt + rr dBi(t) for i  {1,..-,n}, (3) 
where Y/(t) is a random variable representing the internal potential at time t of the 
i th unit, Y(t) = (Y (t),-.. , Yn(t))', X represents the external input, which consists 
of stimulus and context, and Hi is Brownian motion, which acts as a stochastic 
driving term. The constant rr > 0, known as the dispersion, controls the amount 
of noise injected onto each unit. The function/i, known as the dr/f, determines 
the average instantaneous change of activation and is borrowed from the standard 
recurrent neural network literature: this change is modulated by a matrix w of 
connections between units, and a matrix v that controls the influence of the external 
inputs onto each unit. 
1 
ti(ri(t),X) -- i(ri(t) ) (i(t) - ri(t)), for all/ {1,.--,n}, (4) 
where 1/;i is a positive function, named the capacitance, controlling the speed of 
processing and 
?i(t)=Ewi,jZj(t)+Evi,kXk, for all i e {1,-.- ,n}, (5) 
Zj(t) = 9i(Yj(t)) = 9(ci Yj(t)) = 1/(1 + e -a' (t)). (6) 
Here wi,j, an element of the connection matrix w, is the weight from unit j to unit i, 
vi,& is an element of the matrix v, 9 is the logistic activation function and the ci > 0 
terms are gain parameters, that control the sharpness of the activation functions. 
For large values of ci the activation function of unit i converges to a step function. 
The variable Zj (t) represents a short-time mean firing rate (the activation) of unit 
1For an analysis grounded on discrete time networks with binary states see McClelland 
Information Factorization 47 
j scaled in the (0, 1) range. Intuition for equation (4) can be achieved by thinking 
of it as a the limit of a discrete time difference equation, in such case 
Y(t + At) = Y/(t) + !i(Yi(t),X)At + av/Ni(t), 
(7) 
where the Ni(t) are independent standard Gaussian random variables. For a fixed 
state at time t there are two forces controlling the change in activation: the drift, 
which is deterministic, and the dispersion which is stochastic. This results in a 
distribution of states at time t + At. As At goes to zero, the solution to the 
difference equation (7) converges to the diffusion process defined in (4). In this 
paper we focus on the behavior of diffusion networks at stochastic equilibrium, i.e., 
we assume the network is given enough time to approximate stochastic equilibrium 
before its response is sampled. 
3 Channel Separability 
In this section we show that the Morton-Massaro is related to an architectural con- 
straint named channel separability, which has nothing to do with the existence of 
feedback connections. In order to define channel separability it is useful to char- 
acterize the function of different units using the following categories: 1) Response 
specification units: A unit is a response specification unit, if, when the state of all 
the other units in the network is fixed, changing the state of this unit affects the 
probability distribution of overt responses. 2) Stimulus units: A unit belongs to 
the stimulus channel if: a) it is not a response unit, and b) when the state of the 
response units is fixed, the probability distribution of the activations of this unit is 
affected by the stimulus. 3) Context units: A unit belongs to the context channel if: 
a) it is not a response unit, and b) when the states of the response units are fixed, 
the probability distribution of the activations of this unit can be affected by the 
context. Given the above definitions, we say that a network has separable stimulus 
and context channels if the stimulus and context units are disjoint: no unit simul- 
taneously belongs to the stimulus and context channels. In essence, channels are 
structurally separable if they converge into the response units without direct lateral 
connections to other channels and if their sensors are not directly contaminated by 
external inputs to the other channels (see Figure 1). 
In the rest of the paper we show that if a diffusion network is structurally separable 
the Morton-Massaro law can be approximated with arbitrary precision regardless of 
the existence of feedback connections. For simplicity we examine the case in which 
the weight matrix is symmetric. In such case, each state has an associated goodness 
function that greatly simplifies the analysis. In a later section we discuss how the 
results generalize to the non-symmetric case. 
Let y  11 ' represent the internal potential of a diffusion network. Let zi = 99(otiYi) 
for i = 1,..-,n represent the firing rates corresponding to y. Let z s, z c and 
z r represent the components of z for the units in the stimulus channel, context 
channel and response specification module. Let x be a vector representing an input 
and let x s, x c be the components of x for the external stimulus and context. Let 
ot = (Oil,''' ,Otn) be a fixed gain vector and Za(t) a random vector representing 
the firing rates at time t of a network with gain vector c. Let Z a = limt-oo Z a (t), 
represent the firing rates at stochastic equilibrium. In Movellan (1998) it is shown 
that if the weights are symmetric i.e., w = w' and 1/ni(x) = d99i(x)/dx then the 
equilibrium probability density of Z a is as follows 
s z r 1 exp((2/cr 2) Ga(zS,zrlxs xc)) (S) 
PzIx(z ,z c, I x*'xC) = K(x,,xc) ' ' 
48 J. R. Movellan and J. L. McClelland 
Rqonse 
Spicatlon 
Units 
Stimulus Context 
!iays Rdays 
Stimulus Context 
Smsors Sensors 
$timuim / 
Input 
Context 
Figure 1: A network with separable context and stimulus processing channels. The 
stimulus sensor and stimulus relay units make up the stimulus channel units, and 
the context sensor and context channel units make up the context channel units. 
Note that any of the modules can be empty except the response module. 
where 
Ka(xs,xc) = f exp((2/a 2) Ga(z I xs,xc)) dz, (9) 
n 
o( Ix) = H(z Ix) -  S, (z), (10) 
i=1 
H(z Ix) = z' w z/2 + z' v x, (11) 
Sc,,(zi)= ai(log(zi)+log(1 zi))+l( 
- zlog(z) + (1 - zi)log(1 - z) . (12) 
Without loss of generality hereafter we set a 2 = 2. When there are no direct con- 
nections between the stimulus and context units there are no terms in the goodness 
function in which x  or z  occur jointly with x c or z c. Because of this, the goodness 
can be separated into three additive terms, that depend on x , x c and a third term 
which depends on the response units: 
Z r 
o,(z',z c, I x',x ) = a(z',zr l x') + a2(zr, zC l x) + a,(z) , 
where 
8 f' I 8 
a(, I x ) = (s)%,,/2 + (z)',r + (s)%,,x  + () w,x -  s(4), 
i 
(14) 
c c I 
c(z, I x ') = (C)'w,c/2 + () w,z + ()'v,x c + (r)'w,x  -  s), 
i 
(15) 
C,(z") = (z[')'w,.,,.z"/2- E S(z[') . (16) 
i 
Inf ormat'on Facto rization 49 
where ws,r is a submatrix of w connecting the stimulus and response units. Similar 
notation is used for the other submatrices of w and v. It follows that we can write 
the ratio of the joint probability density' of two states z and 2 as follows: 
Pz"lx(zS'zC'zr l xS'xC) = exp(G2(zS'z* l x*) + G2(zC'z* l c) + G2(z*)) (17) 
pzlx(2,2c, 2 l x,x*) exp(G(:,2r l x ) + G(2,2 l x ) + G(2)) ' 
whh factorizes  desired. To get probability densities for the response units, we 
integrate over the states of l the other units 
Pz:Ix( zr I xS, xc) = PzIx(Z*, zc, I x c) dz s dz c (18) 
d after rerging terms 
i + 
pz:lx(z l x*,x ) = K(x,,x ) 
(1) 
which also factorizes. All is left is mapping continuous states of the response units 
to discrete externM responses. To do so we ptition the space of the response 
specification units into screte regions. The probability of a response becomes the 
integral of the probability density over the region corresponding to that response. 
The problem is that the integral of probability densities does not necessarily fac- 
torize even though the densities factorize at every point. 
Fortunately there are two importt ces for which the law holds, at let  a 
good approximation. The first ce is when the response reons e small and thus 
we c approximate the inteM over that re,on by the density at a point times the 
volume of the re,on. In such a ce the ratio of the integrMs can be appromated 
by the ratio of the probability densities of those individuM states. The second ce 
applies to models, like McClelld and Rumelhm's (1981) interactive activation 
model, in which each response is sociated with a distinct response unit. These 
models typicMly have negative connections mongst the response units so that at 
equilibrium one unit tends to be active while the others e inactive. In such a 
ce a common response policy picks the response corresponding to the tive unit. 
We now show that such a policy can approximate the Morton-Mso law to an 
bitrary level of precision  the gain pmeter of the response units is incre,ed. 
Let z represent the joint state of a network d let the first r components of z 
be the states of the response specification units. Let z (1) = (1, 0, 0,..., 0) , z (2) = 
(0, 1,0,... ,0)  be two r-dimensionM vectors representing states of the response 
specification units. For i  {1, 2} and A  (0, 1) let 
,) = (1 - ,('))5 + ('))(1 - 5), (20) 
2) = { e :  e ((1 - 5),J'),5 + (1 - 5),J')), for  = 1,..-,}. (21) 
The sets R? and R? 
external responses. We 
these two responses as 
corners of [0, 1] . 
lim P(Z  R? I X = x) 
o P(Z  R(2) I x = ) 
lim A'Pzlx(z() l x) 
,x-o ZXpz:lx(z(2) lx) 
are regions of the [0, 1] r space mapping into two distinct 
now investigate the convergence of the probability ratio of 
we let A -+ 0, i.e., as the response regions collapse into 
= lim fR? pz:lx(ulx)du _ 
/-o fR(2) pzxlx(u l x)du - 
= lim f f er;((2)''zl)dzS dzc 
-o f f er((2),o,l)dzS dz c' 
(22) 
(23) 
50 J. R. Movellan and J. L. McClelland 
Table 1: Predictions by the Morton-Massaro law (left side) versus diffusion network 
(square brackets) for subject 7 of Massaro and Cohen (1983) Experiment 2. Each 
prediction of the diffusion network is based on 100 random samples. 
Context 
Stimulus V S P T 
0 0.0017 [0.01] 0.0000 [0.00] 0.0152 [0.03] 0.9000 [0.91] 
i 0.0126 [0.00] 0.0000 [0.00] 0.1008 [0.10] 0.9849 [0.97] 
2 0.1105 [0.19] 0.0008 [0.00] 0.5208 [0.45] 0.9984 [1.00] 
3 0.5463 [0.54] 0.0079 [0.00] 0.9133 [0.91 0.9998 '1.00] 
4 0.9827 [1.00] 0.2756 [0.30] 0.9980 [1.00 0.9999 1.00] 
5 0.9999 [1.00] 0.9924 [0.99] 0.9999 [1.00] 1.0000 [1.00] 
6 0.9999 [1.00] 0.9924 [1.00] 0.9999 [1.00] 1.0000 [1.00] 
Now note that 
= 
i=1 i j 
(24) 
and since 2=1Soi (Z(Aii) ---- 2ir=1Soi (Z(ff!i), it follows that 
lim P(Z  R() ] X = x) 
f f eH(Z),z,z c I x)-E, $,(zi)-Ej $j( j)dz s dz c 
f f eH(Z(2),z,zc I x)-E, S,(zl)-Ei Ss( j)dz s dz c 
(25) 
It is easy to show that this ratio factorizes. Moreover, for all A > 0 if we let 
o/1 = ''' = 0/r -- 0/, where 0/> 0 then 
lim P(Za r  [A, 1- A] r) =O, 
(26) 
since as the gain of the response units increases Sot decreases very fast at the corners 
of (0, 1) r. Thus as 0/- cx> the random variable Za r converges in distribution to a 
discrete random variable with mass at the corner of the [0, 1] r hypercube and with 
factorized probability ratios as expressed on (25). Since the indexing of the response 
units is arbitrary the argument applies to all the responses. 
4 Discussion 
Our analysis establishes that in diffusion networks the Morton-Massaro law is not 
incompatible with the presence of feedback and lateral connections. Surprisingly, 
even though in diffusion networks with feedback connections stimulus and context 
units are interdependent, it is still possible to factorize the effect of stimulus and 
context on response probabilities. 
The analysis shows that the Morton-Massaro can be arbitrarily approximated as 
the sharpness of the response units is increased. In practice we have found very 
good approximations with relatively small values of the sharpness parameter (see 
Table i for an example). The analysis assumed that the weights were symmetric. 
Mathematical analysis of the general case with non-symmetric weights is difficult. 
Information Factorization 51 
However useful approximations exist (Movellan & McClelland, 1995) showing that 
if the noise parameter a is relatively small or if the activation function T is approx- 
imately linear, symmetric weights are not needed to exhibit the Morton-Massaro 
law. 
The analysis presented here has potential applications to investigate models of per- 
ception and the functional architecture of the brain. For example the interactive 
activation model of word perception has a separable architecture and thus, diffusion 
versions of it adhere to the Morton Massaro law. The analysis also points to po- 
tential applications in computational neuroscience. It would be of interest to study 
whether the Morton-Massaro holds at the level of neural responses. For example, 
we may excite a neuron with two different sources of information and observe its 
short term average response to combination of stimuli. If the observed distribution 
of responses exhibits the Morton-Massaro law, this would be consistent with the 
existence of separable channels converging into that neuron. Otherwise, it would 
indicate that the channels from the two input areas to the response may not be 
structurally separable. 
References 
Luce, R. D. (1959). Individual choice behavior. New York: Wiley. 
Massaro, D. W. (1989). Testing between the TRACE Model and the fuzzy logical 
model of speech perception. Cognitive Psychology, 21, 398-421. 
Massaro, D. W. (1998). Perceiving Talking Faces. Cambridge, Massachusetts: MIT 
Press. 
Massaro, D. W. & Cohen, M. M. (1983a). Phonological constraints in speech per- 
ception. Perception and Psychophysics, 36, 338-348. 
McClelland, J. L. (1991). Stochastic interactive activation and the effect of context 
on perception. Cognitive Psychology, 23, 1-44. 
Morton, J. (1969). The interaction of information in word recognition. Psychological 
Review, 76, 165-178. 
Movellan, J. R. (1998). A Learning Theorem for Networks at Detailed Stochastic 
Equilibrium. Neural Computation, i0(5), 1157-1178. 
Movellan, J. R. & McClelland, J. L. (1995). Stochastic interactive processing, chan- 
nel separability and optimal perceptual inference: an examination of Mor- 
ton's law. Technical Report PDP. CNS.95.4, Available at http://cnbc.cmu.edu, 
Carnegie Mellon University. 
