Recovering a Feed-Forward Net 
From Its Output 
Charles 'efferman* and Scott Markel 
David Sarnoff Research Center 
CN5300 
Princeton, NJ 08543-5300 
e-mail: cfmath.princeton.edu 
smarkel@sarnoff. com 
ABSTRACT 
We study feed-forward nets with arbitrarily many layers, using the stan- 
dard sigmoid, tanh x. Aside from technicalities, our theorems are: 
1. Complete knowledge of the output of a neural net for arbitrary inputs 
uniquely specifies the architecture, weights and thresholds; and 2. There 
are only finitely many critical points on the error surface for a generic 
training problem. 
Neural nets were originally introduced as highly simplified models of the nervous 
system. Today they are widely used in technology and studied theoretically by 
scientists from several disciplines. However, they remain little understood. 
Mathematically, a (feed-forward) neural net consists of: 
(i) 
(2) 
A finite sequence of positive integers (Do, D,..., DL); 
A family of real numbers (w) defined for 1 < t? < L, 1 < j < D, 
and 
l <k<De_; 
(3) A family of real numbers (0) defined for I<<L, I<j<D. 
The sequence (Do, D,..., D) is called the architecture of the neural net, while the 
wk are called weights and the 0 thresholds. 
Neural nets are used to compute non-linear maps from I 2 to m by the following 
construction. Ve begin by fixing a nonlinear function (x) of one variable. Analogy 
with the nervous system suggests that we take (x) asymptotic to constants as x 
tends to q-(x>; a standard choice, which we adopt throughout this paper, is r(:) = 
*Alternate address: Dept. of Mathematics, Princeton University, Princeton, NJ 085,t4-1000. 
335 
336 Fefferman and Markel 
e for 
tanh(x). Given an "input" (t1,...,too)  Oo, we define real numbers x 3 
0 <  < L, 1 <_ j < Dt by the following induction on . 
t--tj 
(4) If g = 0 then xj . 
(5) If the x - are known vith g fixed (1 < g < L), then we set 
xj =  wjix + Oj for 15jsD. 
l<k<Dt_ 
Here x .. z t 
'' , Dt are interpreted as the outputs of Dt "neurons" in the th "layer" 
of the net. The output map of the net is defined as the map 
(6) (I): (tl, . . .,tDo ) , , (Zf,   .,Z//.). 
In practical applications, one tries to pick the neural net [(D0,Dt,...,D), (w,), 
(0)] so that the output map (I) approximates a given map about which we have 
only imperfect information. The main result of this paper is that under generic 
conditions, perfect knowledge of the output map (I) uniquely specifies the architec- 
tre, the weights and the thresholds of a neural net, up to obvious symmetries. 
More precisely, the obvious symmetries are as follows. Let (70,7,.-., 7) be per- 
mutations, with 7e' {1,...,De} ---, {1, ...,De}; and let {e' 0<  < L, 1 <j < Dr} be 
a collection of -t-l's. Assume that -yt = (identity) and ej = +1 whenever  = 0 or 
 -- L. Then one checks easily that the neural nets 
(7) [(Do,D,...,D,), (wJ), (OJ)] and 
(s) 
h ave 
[(Do, DL), -t 
the same output map if we set 
(9) -e t t t- 7t t 
3jk -- zjW[ytj][vt_lk]ek and Oj -- ejO[7tj I. 
This reflects the facts that the neurons in layer  are interchangeable (1 <  < L- 1), 
and that the function or(x) is odd. The nets (7) and (8) will be called isornorph,c 
if they are related by (9). Note in particular that isomorphic neural nets have the 
same architecture. Our main theorem asserts that, under generic conditions, any 
two neural nets with the same output map are isomorphic. 
We discuss the generic conditions which we impose on neural nets. We have to 
avoid obvious counterexamples such as: 
Suppose all the weights w are zero. Then the output map (I) is constant. 
The architecture and thresholds of the neural net are clearly not uniquely 
determined by (I). 
(11) Fix 0, j, j2 with 1 _< e0 <_ L - 1 and 1 <j < j2 < Dto. Suppose we have 
0 e- = 0: and to to to = x..o Therefore, the 
3 Wj k -- Wj k for all k. Then (5) gives xj . 
Recovering a Feed-Forward Net from Its Output 337 
 .t0+l . fo+l So 
 ,/o+1 ,lo+1 only through the sum jj 
output depends on '"3J and Wjj . . 
the output map does not uniquely determine the weights. 
Our hypotheses are more than adequate to exclude these counterexamples. Specif- 
ically, we assume that 
(12) /  0 and 10ffl # 10ff,I for j -7 ! j'. 
(13) wk 0: and forjj',theratio l 
 Wjk/Wj, k is not equal to any fraction of the 
form p/q with p, q integers and 1 < q < 100 D]. 
Evidently, these conditions hold for generic neural nets. The precise statement of 
our main theorem is as follows. If two neural nets satisfy (12), (13) and have the 
same output, then the nets are isomorphic. It would be interesting to replace (12), 
(13) by minimal hypotheses. and to study functions rr(;c) other than tanh (x). 
We now sketch the proof of our main result. sacrificing accuracy for simplicity. 
After a trivial reduction. we may assume Do = D/. = 1. Thus, the outputs of the 
nodes :c5 (t) are functions of one variable, and the output map of the neural net is 
t -, a:(t). The key idea is to continue the xS(t) analytically to complex values oft, 
 Note 
and to read off the structure of the net from the set of singularities of the xj. 
that rr(;c) = tanh (x) is meromorphic, with poles at the points of an arithmetic 
progression {(2m + 1),-ri: m  Z}. This leads to two crucial observations. 
e  and 
(14) When  = 1, the poles of zj(t) form an arithmetic progression 
(15) When g > 1, every pole of any z-(t) is an accumulation point of poles of 
any xj (t). 
:(t) = rr(w)t + 0J), which is merely 
In fact, (14) is immediate from the formula z 
the special case Do = 1 of (5). We obtain 
 { (2rn + 1)"'ri - ). rn G Z } 
(16) IIj= cv) 
To see (15), fix e, j, /, and assume for simpliciw that .r.-(t) has a simple pole at 
to, while x-(t) (k  ) is analytic in a neighborhood of to. Then 
(17) .-l(t) - 
t --to 
From 
+ f(t), 
(17) and (5), we obtain 
with f analytic in a neighborhood of to. 
(lS) 
xj(t) to)- + g(t)), with 
g(t) wjf(t) + Z' e 
= wjxk (t) + Oj analytic in a neighborhood of to. 
Thus, in a neighborhood of to, the poles of x5 (t ] are the solutions f, of the equation 
+ = + e - 
(20) - to 
338 Fefferman and Markel 
There are infinitely many solutions of (20), accumulating at to. Hence. to is an 
accumulation point of poles of a:(t), which completes the proof of (15). 
hi view of (14), (15), it is natural to make the following definitions. The natural 
domain of a neural net is the largest open subset of the complex plane to which the 
output map t  z(t) can be analytically continued. For t> 0 we define the th 
singular set Sing(C) by setting 
Sing(O) = complement of the natural domain in C, and 
Sing(e + 1) = the set of all accumulation points of Sing(t). 
These definitions are made entirely in terms of the output map, without reference 
to the structure of the given neural net. On the other hand, the sets Sing(r.) contain 
nearly complete information on the architecture, weights and thresholds of the net. 
This will allow us to read off the structure of a neural net from the analytic contin- 
uation of its output map. To see how the sets Sing(f) reflect the structure of the 
net, we reason as follows. 
From (14) and (15) we expect that 
(21) For 1 <  < L, Sing(L - t) is the union over j = 1,..., Dt of the set of poles of 
zJ(t), together with their accumulation points (which we ignore here), and 
(22) For t?> L, Sing(t) is empty. 
Immediately, then, we can read off the "depth" L of the neural net; it is simply the 
smallest e for which Sing(t) is empty. 
z  We proceed bv induction on . 
Ve need to solve for D, 
When  = 1, (14) and (21) show that Sing(L- 1) is the union of arithmetic pro- 
gressions nJ, j = 1,..., Therefore, from Sing( - 1) we can read off D and 
the II}. (We will return to this point later in the introduction.) In view of (1), 
IIJ determines the weights and thresholds at layer 1, modulo signs. Thus. we have 
found D, wJk, 0J. 
When t > 1, we may assume that 
z' 0' are already known, for 1 < ' < . 
(23) The D,, wit, _ 
Our task is to find D, wi, 0. In view of (23), we can find a pole to of z.-(t) for 
our favorite k. Assume for simplicity that to is a simple pole of x.-(t), and that 
the x.-(t) (k :/e) are analytic in a neighborhood of t0. Then 
 (t) is given by 
(17) in a neighborhood of to, with ,X already known by virtue of (23). Let U be a 
small neighborhood of to. 
We will look at the image Y of U Cl Sing(L - t) under the map t - t-to , 
to and Sing(L - f) are already known, so is Y. On the other hand, we can relate Y 
to D, , t?J as follows. From (21) we see that Y is the union over j = 1,..., De 
of 
(24) Yj =imageofUCl{ Poles ofxj(t)) undert, ,- 
It-to)' 
Recovering a Feed-Forward Net from Its Output 339 
For fixed j, the poles of zJ(t) in a neighborhood of to are the , given by (20). We 
write 
= 3 + 
(25) im- to - 
+ [g(t0)- 
Equation (20) shows that the first expression in brackets in (25) is equal to (2m + 
1)ri. Also, since ',n -- to as Irnl --. oc and g is analytic in a neighborhood of to, 
the second expression in brackets in (25) tends to zero. Hence, 
cA _. (2m+l)ri-g(t0)+o(1) for large m. 
,n --to 
Comparing this with the definition (24), we see that 5 is asymptotic to the arith- 
metic progression 
e {(2m+l)i-g(to) 
(26) II= we .mZ . 
j 
Thus, the known set Y is the union over j = 1 .... , De of sets Yj, with Yj asymptotic 
to the arithmetic progression IIJ. From Y, we can therefore read off De and the IIJ. 
(We will return to this point in a moment.) We see at once from (26) that w{ is 
determined up to sign by IIJ. Thus, we have found De and w e With more work, 
we can also find the 0J, completing the induction on . 
The above induction shows that the structure of a neural net may be read off 
from the analytic continuation of its output map. We believe that the analytic 
continuation of the output map will lead to further consequences in the study of 
neural nets. 
Let us touch briefly on a few points which we glossed over above. First of all, suppose 
we are given a set Y C C, and we know that ' is the union of sets Y,..., YD, with 
Yj asymptotic to an arithmetic progression IIj. Ve assumed above that II,..., IID 
are uniquely determined by Y. In fact, without some further hypothesis on the 
IIj, this need not be true. For instance, we cannot distinguish II U II2 from IIa 
if II -- {odd integers}, II2 - {even integers}. II3 - {all integers}. On the other 
hand, we can clearly recognize II = {all integers} and II2 = {m,/:- m an integer} 
from their union II U II2. Thus, irrational numbers enter the picture. The r61e of 
our generic hypothesis (13) is to control the arithmetic progressions that arise in 
our proof. 
Secondly, suppose xi(t) has a pole at to. We assumed for simplicity that xi.(t ) is an- 
alytic in a neighborhood of to for k : . However, one of the x(t) (k :/) may also 
have a pole at to. In that case, the x + (t) may all be analytic in a neighborhood of 
to, because the contributions of the singularities of the x. in  ,j  + Oj 
may cancel. Thus, the singularity at to may disappear from the output map. While 
this circumstance is hardly generic, it is not ruled out by our hypotheses (12), (13). 
340 Fefferman and Markel 
Because singularities can disappear, we have to make technical changes in our de- 
scription of Sing(). For example, in the discussion following (23), Y need not be 
the union of the sets l) .. Rather, Y is their "approximate union". (See IF]). 
Next, we should point out that the signs of the weights and thresholds require 
some attention, even though we have some freedom to change signs by applying 
isomorphisms. (See (9).) 
Finally, in the definition of the natural domain, we have assumed that there is a 
unique maximal open set to which the output map continues analytically. This 
need not be true of a general real-analytic function on the line - for instance. take 
f(t) = (1 +/2)1/2 Fortunately, the natural domain is well-defined for any function 
that continues analytically to the complement of a countable set. The defining 
formula (5) lets us check easily that the output map continues to the complement 
of a countable set, so the natural domain makes sense. This concludes our overview 
of the proof of our main theorem. The full proof of our results will appear in [Fl. 
Both the uniqueness problem and the use of analytic continuation have already 
appeared in the neural net literature. In particular, it was R. Hecht-Nielson who 
pointed out the r61e of isomorphisms and posed the uniqueness problem. His pa- 
per with Chen and Lu [CLH] on "equioutput transformations" on the space of all 
neural nets influenced our work. E. Sontag [So] and H. Sussman [Su] proved sharp 
uniqueness theorems for one hidden layer. The proof in [So] uses complex variables. 
Acknowledgements 
Fefferman is grateful to R. Crane, S. Markel, J. Pearson, E. Sontag, R. Sverdlove, 
and N. Winarsky for introducing him to the study of neural nets. 
This research was supported by the Advanced Research Projects Agency of the De- 
partment of Defense and was monitored by the Air Force Office of Scientific Research 
under Contract F49620-92-C-0072. The United States Government is authorized to 
reproduce and distribute reprints for governmental purposes notwithstanding any 
copyright notation hereon. This work was also supported by the National Science 
Foundation. 
The following posters, presented at NIPS 93, may clarify our uniqueness theorem. 
References 
[CLH] 
[F] 
[So] 
[Su] 
R. Hecht-Nielson, et al., On the geometry of feedforward neural network error 
surfaces. (to appear). 
C. Fefferman, Reconstructing a neural network from its output, Revista 
Mathemgztica Iberoamericana. (to appear). 
F. Albertini and E. Sontag, Uniqueness of weights for neural networks. (to 
appear). 
H. Sussman, Uniqueness of the weights for minimal feedforward nets tcith a 
given input-output map, Neural Netxvorks 5 (1992), pp. 589-593. 
Recovering a Feed-Forward Net from Its Output 341 
Recovering a Feed-Forward 
Net from Its Output 
Charles Feferman 
David Samoff Research Center and PrinceIon Universily 
Princeton, New Jersey 
Scott A. Markel 
David Samoff Research Center 
Princeton, New Jersey 
Tske-Home Mecaage 
Suppose an unknown neural network is placed in a 
You a/en't altowed to look in the box, but you are 
allowed to observe the outputs produced by the 
network for arbitral/inputs. 
Then, in principle, you have enough information to 
determine the network architecture (number o layers 
and number of nodes in each layer) and the unique 
values for all the weights. 
The Output Map of a Neural Network 
Fix a feed-forward neural network with the standard 
sigrnoid o(x) =tanh x. 
The map that carries input vectors (x ..... Xn) 
to output vectors t,y ..... Ym) 
is called the OUTPUT MAP of the neural network. 
The Key Question 
When can two neural networks 
have the same output map? 
Obvloug Example of Two Neural 
Networks with the Same Output Map 
Start with a neural network N. 
Then either 
1. permute the nodes in a hidden layer, or 
2. fix a hidden node, and change the sign oi 
every weight (including the bLas weight) that 
involves that node 
This yields a new neural necwork with the same 
output map as N. 
Unlquene Theorem 
Let N and N' be neural networks that satisfy generic 
conditions described below. 
If N and N' have the same output map, then they differ 
only by sign changes and permutations of hidden nodes. 
342 Fefferman and Markel 
Generic Conditions 
We assume that 
 all weights are non-zero 
 bias weights within each layer have distinct 
absolute values 
 the ratio of weights from node i in layer I to nodes j 
and k in layer (1+1) is not equal to any fraction of the 
form p/q with p, q integers and 1<q<100*(numbe of 
nodes in layer I) 
Some such assumptions are needed to avoid obvious 
counterexamples. 
Outline of the Proof 
 it's enough to consider networks with one input node 
and one output node (see below) 
 all node outputs are now functions of a single, real 
variable t (the nelwork input) 
 analytically continue the network output to a function f 
of a single, complex variable t 
 the qualitative geornet/of the poles of the function f 
determines the network architecture (see below) 
 the asymptotics of the function f near its singularities 
determine the weights 
Reduction to a Network with 
Single Input and Output Nodes 
 focus attention on a single output node. ignoring the 
others 
 study only input data with a single non-zero entry 
Geometric Description of the Poles 
 
 poles (small dots) accumulate at essential singularities 
(small squares) 
 essential singularities (small squares) accumulate at 
more complicated esse; sinulr;les (large dos) 
Determining the Network Architecture tram the Picture 
 three kinds of singularities (small dots, sinaft squares, 
large dcts) 
 three layers of sigmoids. i.e. two hidden 
layers and an output layer 
 three 'spiral arms" of small squares accumulate at 
each large dot 
 three nodes in the second hidden iaye 
 two 'spiral arms' of small dcts accumulate at each 
small square 
:= two nodes in the first hidden layer 
Determining the Network Architecture from the Picture 
(cont'd) 
 from the network reduction we know that there is 
one input node and one output node 
 therefore, the network architecture is as pictured 
