Discrete Affine Wavelet Transforms For Analysis 
And Synthesis Of Feedforward Neural Networks 
Y. C. Pati and P. S. Krishnaprasad 
Systems Research Center and Department of Electrical Engineering 
University of Maryland, College Park, MD 20742 
Abstract 
In this paper we show that discrete affine wavelet transforms can provide 
a tool for the analysis and synthesis of standard feedforward neural net- 
works. It is shown that wavelet frames for L2(I) can be constructed based 
upon sigmoids. The spatio-spectral localization property of wavelets can 
be exploited in defining the topology and determining the weights of a 
feedforward network. Training a network constructed using the synthe- 
sis procedure described here involves minimization of a convex cost func- 
tional and therefore avoids pitfalls inherent in standard backpropagation 
algorithms. Extension of these methods to L2(I N) is also discussed. 
I INTRODUCTION 
Feedforward type neural network models constructed from empirical data have been 
found to display significant predictive power [6]. Mathematical justification in sup- 
port of such predictive power may be drawn from various density and approximation 
theorems [1, 2, 5]. Typically this latter work doesn't take into account the spec- 
tral features apparent in the data. In the present paper, we note that the discrete 
affine wavelet transform provides a natural framework for the analysis and synthe- 
sis of feedforward networks. This new tool takes account of spatial and spectral 
localization properties present in the data. 
Throughout most of this paper we restrict discussion to networks designed to ap- 
proximate mappings in L2(I). Extensions to L2(I N) are briefly discussed in 
Section 4 and will be further developed in [10]. 
743 
744 Pati and Krishnaprasad 
2 WAVELETS AND FRAMES 
Consider a function f of one real variable as a static feedforward input-output map 
y = f(x) 
For simplicity assume f 6 L2(I) the space of square integrable functions on the real 
line. Suppose a sequence {f,} C L2(IR,) is given such that, for suitable constants 
A>O, B < cx>, 
Al]f[] 2 < -1< f,f. > 12 < Bllf]] 2 (1) 
for all f 6 L2(lR) . Such a sequence is said to be a frame. In particular orthonormal 
bases are frames. The above definition (1) also applies in the general Hilbert space 
setting with the appropriate inner product. Let T denote the bounded operator 
from L2(IR,) to I2(z), the space of square summable sequences, defined by 
(Tf) = {< f,f >}.Z' 
In terms of the frame operator T, it is possible to give series expansions, 
f 
= 
= < >, 
(e) 
where {]', = (T*T)-if,) is the dual frame. 
A particular class of ames leads to affine wavelet expansions. 
of functions {;b,, } of the form, 
Consider a family 
;bran(x) = a-m/2;b(a-mx- rib) 
(3) 
where, the function ;b satisfies appropriate admissibility conditions [3, 4] (e.g. f ;b = 
0). Then for suitable choices of a > 1, b > 0, the family {;bin, } is a frame for L2(IR). 
Hence there exists a convergent series representation, 
f(x) 
m 
= cmna-m/2;b(a-mx- nb) 
m 
(4) 
The frame condition (1) guarantees that the operator (T'T) is boundedly invertible. 
Also since [lI- (2(A + B)-iT*T)[I < 1, (T'T) - is given by a Neumann series [3]. 
Hence, given f, the expansion coefficients Cm, can be computed. 
The representation (4) of f above as a series in dilations and translations of a single 
function ;b is called a wavelet expansion and the function ;b is known as the analyzing 
or mother wavelet for the expansion. 
Discrete Affine Wavelet Transforms 745 
3 
FEEDFORWARD NETWORKS AND WAVELET 
EXPANSIONS 
Consider the input-output relationship of a feedforward network with one input, 
one output, and a single hidden layer, 
f'x) = Z cng(anx - bn) (5) 
where a, are the weights from the the input node to the hidden layer, b, are the 
biases on the hidden layer nodes, c, are the weights from the hidden layer to the 
output layer and g defines the activation function of the hidden layer nodes. It is 
clear from (5) that the output of such a network is given in terms of dilations and 
translations of a single function g. 
3.1 WAVELET ANALYSIS OF FEEDFORWARD NETWORKS 
1 and let b be defined as 
Let g be a 'sigmoidal' function e.g. g(x) = l+e-x 
%b(x) = g(x + 2) + g(x - 2) - 2g(x). (6) 
Then it is possible (see [9] for details) to determine a translation stepsize b and 
0.5 
-0 .5 
2.5 
1.5 
0.5 
-1 ' ' 0 
-4 -2 0 2 4 -4 2 
time (seconds) Log Frequency (Hz) 
Figure 1' Mother Wavelet %3 (Left) And Magnitude Of Fourier Transform 
a dilation stepsize a for which the family of functions b,, as defined by (3) is a 
frame for L2(I) . Note that wavelet frames for L2(IR) can be constructed based 
upon other combinations of sigmoids (e.g b(x) = g(x+p)+g(z-p)-2g(z), p > 0) 
and that we use the mother wavelet of (6) only to illustrate some properties which 
are common to many such combinations. 
It follows from the above discussion that a feedforward network having one hidden 
layer with sigmoidal activation functions can represent any function in L 2(I) . In 
such a network (6) says that the sigmoidal nodes should be grouped together in sets 
of three so as to form the mother wavelet b. 
746 Pati and Krishnaprasad 
3.2 
WAVELETS AND SYNTHESIS OF FEEDFORWARD 
NETWORKS 
In defining the topology of a feedforward network we make use of the fact that the 
function b is well concentrated in both spatial and spectral domains (see Figure 
1). Dilating b corresponds to shifting the spectral concentration and translating b 
corresponds to shifting the spatial concentration. 
The synthesis procedure we describe here is based upon estimates of the spatial 
and spectral localization of the unknown mapping as determined from samples 
provided by the training data. Spatial locality of interest can easily be determined 
by examination of the training data or by introducing a priori assumptions as to the 
region over which it is desired to approximate the unknown mapping. Estimates of 
the appropriate spectral locality are also possible via preprocessing of the training 
data. 
Let Q,, and Q! respectively denote the spatio-spectral concentrations of the 
wavelet b,, and of f. Thus Q,m and Q! are rectangular regions in the spario- 
spectral plane (see Figure 2) which contain 'most' of the energy in the functions 
b,m and f. More precise definitions of these concentrations can be found in [9]. 
Assuming that Q! has been estimated from the training data. We choose only those 
(ore fix  
      
_mml I    t  
-(Omflx 
Xmln Xmax 
Figure 2: Spatio-Spectral Concentrations Q,m And Q! Of Wavelets b.. And 
Unknown Map f. 
elements of the frame {lPrnn } which contribute 'significantly' to the region ! by 
defining an index set 27! C_ Z 2 in the following manner, 
= e z. > o) 
where,/t is the Lesbegue measure on IR 2. Since f is concentrated in Q/, by choosing 
27! as above, a 'good' approximation of f can be obtained in terms of the finite set 
of frame elements with indices in 27/. That is f can be approximated by ]'where, 
Discrete Afflne Wavelet Transforms 747 
for some coefficients {Cmn(m,nE.t. 
Having determined Z,, a network is constructed to implement the appropriate 
wavelets ;b,m. This is easily accomplished by choosing the number of sigmoidal 
hidden layer nodes to be M = 3 x Z! and then grouping them together in sets of 
three to implement ;b as in (6). Weights from the input to the hidden layer are set 
to provide the required dilations of ;b and biases on the hidden layer nodes are set 
to provide the required translations. 
3.2.1 Computation of Coefficients 
By the above construction, all weights in the network have been fixed except for the 
weights from the hidden layer to the output which specify the coefficients Cm, in 
(7). These coefficients can be computed using a simple gradient descent algorithm 
on the standard cost function of backpropagation. Since the cost function is convex 
in the remaining weights, only globally minimizing solutions exist. 
3.2.2 Simulations 
Figure 3 shows the results of a simple simulation example. The solid line in Figure 
3 indicates the original mapping f which was defined via the inverse Fourier trans- 
form of a randomly generated approximately bandlimited spectrum. Using a single 
dilation of p which covered the frequency band sufficiently well and the required 
translations, the dashed curve shows the learned network approximation. 
-.2 
-.4 
0 05 .i .15 
"Tltne {seconds)" 
Figure 3: Simulation Using Network Synthesis Procedure. Solid Curve: Original 
Function, Dashed Curve: Network Peconstruction. 
4 DISCUSSION AND CONCLUSIONS 
It has been demonstrated here that affine wavelet expansions provide a framework 
within which feedforward networks designed to approximate mappings in L2(IFL) can 
be understood. In the case when the mapping is known, the expansion coefficients, 
and therefore all weights in the network can be computed. Hence the wavelet 
748 Pati and Krishnaprasad 
transform method (and in general any transform method) not only gives us rep- 
resentability of certain classes of mappings by feedforward networks, but also tells 
us what the representation should be. Herein lies an essential difference between 
the wavelet methods discussed here and arguments based upon density in function 
spaces. 
In addition to providing arguments in support of the approximating power of feed- 
forward networks, the wavelet framework also suggests one method of choosing 
network topology (in this case the number of hidden layer nodes) and reducing 
the training problem to a convex optimization problem. The synthesis technique 
suggested is based upon spatial and spectral localization which is provided by the 
wavelet transform. 
Most useful applications of feedforward networks involve the approximation of map- 
pings with higher dimensional domains e.g. mappings in L2(IN). Discrete affine 
wavelet transforms can be applied in higher dimensions as well (see e.g. [7] and [8]). 
Wavelet transforms in L2(I N) can also be defined with respect to mother wavelets 
constructed from sigmoids combined in a manner which doesn't deviate from stan- 
dard feedforward network architectures [10]. Figure 4 shows a mother wavelet for 
L2(I 2) constructed from sigmoids. In higher dimensions it is possible to use more 
than one analyzing wavelet [7], each having certain orientation selectivity in addi- 
tion to spatial and spectral localization. If orientation selectivity is not essential, 
an isotropic wavelet such as that in Figure 4 can be used. 
Figure 4: Two-Dimensional Isotropic Wavelet From Sigmoids 
The wavelet formulation of this paper can also be used to generate an orthonormal 
basis of compactly supported wavelets within a standard feedforward network ar- 
chitecture. If the sigmoidal function g in Equation (6) is chosen as a discontinuous 
threshold function, the resulting wavelet p is the Haar function which thereby re- 
sults in the Haar transform. Dilations of the Haar function in powers of 2 (a = 2) 
together with integer translations (b = 1), generate an orthonormal basis for L(I). 
Multidimensional Haar functions are defined similarly. The Haar transform is the 
earliest known example of a wavelet transform which however suffers due to the 
discontinuous nature of the mother wavelet. 
Discrete Affine Wavelet Transforms 749 
Acknowledgement s 
The authors wish to thank Professor Hans Feichtinger of the University of Vienna, 
and Professor John Benedetto of the University of Maryland for many valuable dis- 
cussions. This research was supported in part by the National Science Foundation 's 
Engineering Research Centers Program: NSFD CDR 8803012, the Air Force Office 
of Scientific Research under contract AFOSR-88-0204 and by the Naval Research 
Laboratory. 
References 
[1] G. Cybenko. Approximations by Superpositions of a Sigmoidal Function. Tech- 
nical Report CSRD 856, Center for Supercomputing Research and Develop- 
ment, University of Illinois, Urbana, February 1989. 
[2] G. Cybenko. Continuous Valued Neural Networks with Two Hidden Layers are 
Sufficient. Technical Report, Department of Computer Science, Tufts Univer- 
sity, Medford, MA, March 1988. 
[3] I. Daubechies. The Wavelet Transform, Time-Frequency Localization and 
Signal Analysis. IEEE Transactions on Information Theory, 36(5):961- 
1005,September 1990. 
[4] C. E. Heil and D. F. Walnut. Continuous and Discrete Wavelet Transforms. 
SIAM Review, 31(4):628-666, December 1989. 
[5] K. Hornik, M. Stinchcombe, and H. White. Multilayer Feedforward Networks 
are Universal Approximators. Neural Networks, 2:359-366, 1989. 
[6] A. Lapedes, and R. Farber. Nonlinear Signal Processing Using Neural Net- 
works: Prediction and System Modeling. Technical Report LA-UR-87-2662, 
Los Alamos National Laboratory, 1987. 
[7] S. G. Mallat. Multifrequency Channel Decompositions of Images and Wavelet 
Models. IEEE Transactions On Acoustics Speech and Signal Processing, 
37(12):2091-2110, December 1989. 
[8] R. Murenzi, "Wavelet Transforms Associated To The n-Dimensional Euclidean 
Group With Dilations: Signals In More Than One Dimension," in Wavelets 
Time-Frequency Methods And Phase Space (J. M. Combes, A. Grossman and 
Ph. Tchamitchian, eds.), pp. 239-246, Springer-Verlag, 1989. 
[9] Y. C. Pati and P.S. Krishnaprasad, "Analysis and Synthesis of Feedforward 
Neural Networks Using Discrete Affine Wavelet Transforms," Technical Report 
SRC TR 90-44, University of Maryland, Systems Research Center, 1990. 
[10] Y. C. Pati and P.S. Krishnaprasad, In preparation. 
