Closed-Form Inversion of Backpropagation 
Networks: Theory and Optimization Issues 
Michael L. Possen 
HNC, Inc. 
5501 Oberlin Drive 
San Diego, CA 92121 
rossen@amos.ucsd.edu 
Abstract 
We describe a closed-form technique for mapping the output of a trained 
backpropagation network into input activity space. The mapping is an in- 
verse mapping in the sense that, when the image of the mapping in input 
activity space is propagated forward through the normal network dynam- 
ics, it reproduces the output used to generate that image. When more 
than one such inverse mappings exist, our inverse mapping is special in 
that it has no projection onto the nullspace of the activation flow opera- 
tor for the entire network. An important by-product of our calculation, 
when more than one inverse lnappings exist, is an orthogonal basis set of 
a significant portion of the activation flow operator nullspace. This basis 
set can be used to obtain an alternate inverse mapping that is optimized 
for a particular real-world application. 
I Overview 
This paper describes a closed-form technique for mapping a particular output of a 
trained backpropagation network into input activity space. The mapping produced 
by our technique is a.n i, verse mappiag in the sense that, when the image in input 
space of the mapping of an ontput activity is propagated forward through the 
normal network dynamics, it reproduces the output used to generate it. 1 When 
multiple inverse ma.ppings exist, our inverse mapping is unique in that it has no 
 It is possible that no such inverse mappings exist. This point is addressed in section 4. 
868 
Closed-Form Inversion of Backpropagation Networks 869 
projection onto the nullspace of the activation flow operator for the entire network. 
An important by-product of our calculation is an orthogonal basis set of a significant 
portion of this nullspace. Any vector within this nullspace can be added to the 
image from the inverse mapping, producing a new point in input space that is 
still an inverse mapping image in the above sense. Using this nullspace, the inverse 
mapping can be optimized for a particular application by minimizing a cost function 
over the input elements, relevant to that application, to obtain the vector from the 
nullspace to add to the original inverse mapping image. For this reason and because 
of the closed-form we obtain for calculation of the network inverse mapping, our 
method compares favorably to previously proposed iterative methods of network 
inversion [Widrow & Stearns, 1985, Linden & Kinderman, 1989]. We now briefly 
summarize our method of closed-form inversion of a backpropagation network. 
2 The Inverse Mapping Operator 
To outline the calculation of our inverse mapping operator, we start by consid- 
ering a trained feed-forward backpropagation network with one hidden layer and 
bipolar sigmoidal activation fimctions. We calculate this inverse as a sequence of 
the inverses of the sub-operations constituting the dynamics of activation flow. If 
we use the 'I, It, O' as subscripts indicating input, hidden and output modules of 
model neurons, respectively, the activation flow fi'om input through hidden module 
to output module is: 
= O Wto,m O 
where 
a  bipolar sigmoid function; 
]/Vta,,t,,o,,.)' Matrix operator of connection veights, indexed 
by ' ' 
..ource and 'dest'(des[ina[ion) modnles; 
[)  Vector of activities for module 'k'. 
M is defined here as the activation flow operator for [he en[ire ne[work. The symbol 
 separates operators sequentially applied to the argumen[. 
Since the sub-operators cons[itning M are applied sequen[ially, [he inverse [ha[ we 
calculate, M +, is equal to a composition of inverses of [he individual sub-opera[ors, 
wi[h the order of the composition reversed from [he order in activation flow. The 
closed-form mapping of a specified output o) to input space is then' 
= A + (2) 
- W + b a- W + a- 
- (o,n ; (n,)   (o), 
where 
Inverse of the bipolar logistic sigmoid; 
14;,t..o,,.,   Pseudo-inverse of ]4;(aest,,ource)  
870 Rossen 
Subject to the existence conditions discussed in section 4, t) is an inverse mapping 
off.(o ) in that it reproduces f--(o) when it is propagated forward through the network: 
: .,4  
We use singular value decomposition (SVD), a well-known matrix analysis method 
(e.g., [Lancaster, 1985]), to calculate a particular matrix inverse, the pseudo-inverse 
142 + (also known as the Moore-Penrose inverse) of each connection weight matrix 
(j,i) 
block. In the case of W(ztd), for example, SVD yields the two unitary matrices, 
$(z/,z) and V(z/,z), and a rectangular matrix )(z/,z), all zero except for the singular 
values on its diagonal, such that 
where 
V*  
(H,I) Transposes of$(/,t), Vt'/d), respectively; 
Pseudo-inverse of Utd), which is simply its transpose 
xvith each non-zero singular value replaced by its inverse. 
3 Uniqueness and Optimization Considerations 
The pseudo-inverse (calculated by SVD or other methods) is one of a class of solu- 
tions to the inverse of a matrix operator that may exist, called generalized inverses. 
For our purposes, each of these generalized inverses, if they exist, are inverses in 
the useful sense that when substitued for Wj,0 in eq. (2), the resultant ) will be 
and inverse mapping image a.s defined by eq. (3). 
When a matrix operator W does not have a nullspace, the pseudo-inverse is the 
only generalized inverse that exists. If W does have a. nullspace, the pseudo-inverse 
is special in that its range contains no projection onto the nullspace of W. It follows 
that if either of the lnatrix operators W(ud) or W(o,) in eq. (1) have a nullspace, 
then multiple inverse lnapping operators will exist. However, the inverse mapping 
operator .A + calculated using pseudo-inverses will be the only inverse mapping 
operator that has no projection in the nullspace of .A. The derivation of these 
properties follow in a straightforward lnanner from the discussion of generalized 
inverses in [Lancaster, 1985]. An interesting result of using SVD to obtain the 
pseudo-inverse is that: 
SVD provides a direct method for varying t) within the space of inverse 
mapping images in input space of f-4o' 
This becomes clear when we uote that if r = P(W(Hd)) is the rank of W(H,O, only 
the first r singular values in 2D(/,) are non-zero. Thus, only the first r columns of 
S(/d) and 1;(tf,t) participate in the activity flow of the network from input module 
to hidden module. 
Closed-Form Inversion of Backpropagation Networks 871 
The columns {X.(H,l)(i)}i>r of 12(H j) span the nullspace of 4](H,I). This nullspace is 
also the nullspace of.A, or at least a significant portion thereof? If) is an inverse 
mapping image of f--(o), then the addition of any vector from the nullspace to ) 
would still be an inverse mapping image of f__(o), satisfying eq. (3). If an inverse 
mapping inage ) obtained from eq. (2) is unphysical, or somehow inappropriate 
for a particular application, it could possibly be optimized by combining it with a 
vector from the nullspace of.A. 
4 Existence and Stability Considerations 
There are still implementational issues of importance to address: 
1. For a given f--(ol, can eq. (2) produce some mapping image ()? 
2. For a given f--(o), will the image P__(r) produced by eq. (2) be a true inverse 
mapping image; i.e., will it. satisfy eq. (3)? If not, is it a best a.pproximation in 
some sense? 
3. 1tow stable is an inverse mapping from f--(o) that produces the answer 'yes' to 
questions 1 and 2; i.e., if f--(o) is perturbed to produce a new output point, will 
this new output point satisfy questions 1 and 2? 
In general, eq. (2) will produce an image for any output point generated by the 
forward dynamics of the network, eq. (1). If f--(o) is chosen arbitrarily, however, 
then whether it is in the domain of.A + is purely a function of the network weights. 
The domain is restricted because the domain of the inverse sigmoid sub-operator is 
restricted to (-1,+l). 
Whether an image produced by eq. (2) will be an inverse mapping image, i.e., 
satisfying eq. (3), is dependent on both the network weights and the network ar- 
chitecture. A strong sufficient conditiou for guaranteeing this condition is that the 
network have a coverget architecture; that is: 
 The dimension of input space is greater than or equal to the dimension of 
output space. 
 The rank of T)(Hd) is greater than or equal to the rank of (o,u). 
The stability of inverse mappings of a desired output away from such an actual 
output depends wholly on the weights of the network. The range of singular values 
of weight matrix block W(O,H) can be used to address this issue. If the range is 
much more than one order of magnitude, then random perturbations about a given 
point in output space will often be outside the domain of .A +. This is because the 
columns of $(O,H'I and (o,) associated with small singular values during forward 
eSince its first sub-operation is linear, and the sigmoid non-linearity we employ maps 
zero to zero, the non-linear operator J can still have a nullspace. Subsequent layers of the 
network might add to this nullspace, however, and the added region may not be a linear 
subspace. 
872 Rossen 
activity flow are associated with proportionately large inverse singular values in the 
inverse mapping. Thus, if singular value do,hi is small, a random perturbation 
with a projection on column o,n)(i of S(o,n) will cause a large magnitude swing 
in the inverse sub-operator Wo,n), with the result possibly outside the domain of 
5 
Summary 
 ]Ve have shown that a closed-form inverse mapping operator of a backprop- 
agation network can be obtained using a composition of pseudo-inverses and 
inverse sigmoid operators. 
 This inverse mapping operator, specified in cq. (2), operating on any point in 
the network's output space, will obtain an inverse image of that point that 
satisfies cq. (3), if such an inverse image exists. 
 When many inverse images of an output point exist, an extension of the SVD 
analyses used to obtain the original inverse image can be used to obtain an 
alternate inverse image optimized to satisfy the problem constraints of a par- 
ticular application. 
 The existence of an inverse image of a particular output point depends on that 
output point and the network weights. The dependence on the network can be 
expressed conveniently in ternis of the singular values and the singular value 
vectors of the network weight matrices. 
 Application for these techniqnes inchide explanation of network operation and 
process control. 
References 
[Lancaster, 1985] 
[Linden & Kinderman, 1989] 
[Widrow & Stearns, 1985] 
Lancaster, P., & Tismenetsky, M. (1985). The The- 
ory of Matrices. Orlando: Academic. 
Linden, A., & Kinderman, J. (1989). Inversion of 
mnltilayer nets. Proceedings of the Third Annual In- 
ternational Joint Conference on Neural Networks, 
Vol H, 425-430. 
Widrow, B., & Stearns, S.D. (1985). Adpative Signal 
Processing. Englewood Cliffs: Prentice-Hall. 
Part XIII 
Learning and Generalization 
