Incorporating Contextual Information in White 
Blood Cell Identification 
Xubo Song* 
Department of Electrical Engineering 
California Institute of Technology 
Pasadena, CA 91125 
xubosong @ fire. work. caltec h. edu 
Yaser Abu-Mostafa 
Dept. of Electrical Engineering 
and Dept. of Computer Science 
California Institute of Technology 
Pasadena, CA 91125 
Yaser @ over. work. c altech. edu 
Joseph Sill 
Computation and Neural Systems Program 
California Institute of Technology 
Pasadena, CA 91125 
joe @busy. work.caltech.edu 
Harvey Kasdan 
International Remote Imaging Systems 
9162 Eton Ave., 
Chatsworth, CA 91311 
Abstract 
In this paper we propose a technique to incorporate contextual informa- 
tion into object classification. In the real word there are cases where the 
identity of an object is ambiguous due to the noise in the measurements 
based on which the classification should be made. It is helpful to re- 
duce the ambiguity by utilizing extra information referred to as context, 
which in our case is the identities of the accompanying objects. This 
technique is applied to white blood cell classification. Comparisons are 
made against "no context" approach, which demonstrates the superior 
classification performance achieved by using context. In our particular 
application, it significantly reduces false alarm rate and thus greatly re- 
duces the cost due to expensive clinical tests. 
*Author for correspondence. 
Incorporating Contextual Information in White Blood Cell Identification 951 
1 INTRODUCTION 
One of the most common assumptions made in the study of machine learning is that the 
examples are drawn independently from some joint input-output distribution. There are 
cases, however, where this assumption is not valid. One application where the indepen- 
dence assumption does not hold is the identification of white blood cell images. Abnormal 
cells are much more likely to appear in bunches than in isolation. Specifically, in a sample 
of several hundred cells, it is more likely to find either no abnormal cells or many abnormal 
cells than it is to find just a few. 
In this paper, we present a framework for pattern classification in situations where the 
independence assumption is not satisfied. In our case, the identity of an object is dependent 
of the identities of the accompanying objects, which provides the contextual information. 
Our method takes into consideration the joint distribution of all the classes, and uses it to 
adjust the object-by-object classification. 
In section 2, the framework for incorporating contextual information is presented, and an 
efficient algorithm is developed. In section 3 we discuss the application area of white 
blood cell classification, and address the importance of using context for this application. 
Empirical testing results are shown in Section 4, followed by conclusions in Section 5. 
2 
INCORPORATING CONTEXTUAL INFORMATION INTO 
CLASSIFICATION 
2.1 THE FRAMEWORK 
Let xi be the feature vector of an object, and ci - c(xi) be the classification for xi, i - 
1, ...N, where N is the total number of objects. ci  {1, ..., D}, where D is the number of 
total classes. 
According to Bayes rule, 
p(clx) = p(xIc)p() 
p(x) 
It follows that the "with context" a posteriori probability of the class labels of all the objects 
assuming values c, c2,..., cv, given all the feature vectors, is 
p(c, c2, ..., CN{X, x2, ..., XN) 
p(x,x2, ..., x;,) 
(1) 
It is reasonable to assume that the feature distribution given a class is independent of the 
feature distributions of other classes, i.e., 
p(xx, x2, ..., XN ]Cx, C2, ..., CN) = p(x [Cx)...p(XN Ic/v) 
Then Equation (1) can be rewritten as 
p(cl, c2, ..., CNlXl, x2, ..., XN) = p(xllcl)'"P(Xivlciv)p(cx' c2, ..., cv) 
p(xl, X2, ...,XN) 
p(c Ix )...pen IxN )p(x )...p(x;, )p, 2 ..., ;,) 
- p(c )...p( ;, )p(x , x2,..., x ;, ) 
(2) 
952 X Song, Y. Abu-Mostafa, J. Sill and H. Kasdan 
where p(ci [xi) is the "no context" object-by-object Bayesian a posteriori probability, and 
p(ci) is the a priori probability of the classes, p(xi) is the marginal probability of the 
features, and p(xx, x2, ..., xv) is the joint distribution of all the feature vectors. 
Since the features (xx, x2, ..., x jr) are given, p(xx, x2, ..., x/v) and p(xi) are constant, 
p(Cl,...,CN) 
p(C, ..., CJVlX, 
= p(q Ix )...p(CN[XN)p(q, C2, ..., CN) 
where 
p(Cl,C2,...,CN)  
p(c, ..., 
p(C)...p(CN) 
(3) 
The quantity p(cx, c2, ..., cv), which we call context ratio and through which the context 
plays its role, captures the dependence among the objects. In the case where all the objects 
are independent, p(q, c2, ..., civ) equals one- there will be no context. In the dependent 
case, p(c, c2,..., cs) will not equal one, and the context has an effect on the classifications. 
We deal with the application of object classification where it is the count in each class, 
rather than the particular ordering or numbering of the objects, that matters. As a result, 
p(c, c2, ..., c2v) is only a function of the count in each class. Let Na be the count in class 
d, andva- /va d=l...,D, 
N' 
p(Cl C2 ... CN) 
p(q,C2,...,CN): p(C)...p(CN) 
NI...ND] p(v, v2, ..., VD) 
(4) 
= p(x, ..., vD) 
where Pa is the prior distribution of class d, for d = 1, ...D. 
D 
a= v = 1. 
The decision rule is to choose class labels a, a2,   , Ov such that 
D 
a:xNa = N and 
(,a2,...,Ov)= argmax p(Cl,C2,...,CNIXi,X2,...,XN) 
(c,2,...,N) 
(5) 
When implementing the decision rule, we need to compute and compare D v cases for 
Equation 5. In the case of white blood cell recognition, D - 14 and N is typically around 
600, which makes it virtually impossible to implement. 
In many cases, additional constraints can be used to reduce computation, as is the case in 
white blood cell identification, which will be demonstrated in the following section. 
3 WHITE BLOOD CELL RECOGNITION 
Leukocyte analysis is one of the major routine laboratory examinations. The utility of 
leukocyte classification in clinical diagnosis relates to the fact that in various physiological 
and pathological conditions the relative percentage composition of the blood leukocytes 
Incorporating Contextual Information in White Blood Ceil Identification 953 
changes. An estimate of the percentage of each class present in a blood sample conveys 
information which is pertinent to the hematological diagnosis. Typical commercial differ- 
ential WBC counting systems are designed to identify five major mature cell types. But 
blood samples may also contain other types of cells, i.e. immature cells. These cells occur 
infrequently in normal specimen, and most commercial systems will simply indicate the 
presence of these cells because they can't be individually identified by the systems. But it 
is precisely these cell types that relate to the production rate and maturation of new cells 
and thus are important indicators of hematological disorders. Our system is designed to 
differentiate fourteen WBC types which includes the five major mature types: segmented 
neutrophils, lymphocytes, monocytes, eosinophils, and basophils; and the immature types: 
bands (unsegmented neutrophils), metamyelocytes, myelocytes, promyelocytes, blasts, and 
variant lymphocytes; as well as nucleated red blood cells and artifacts. Differential counts 
are made based on the cell classifications, which further leads to diagnosis or prognosis. 
The data was provided by IRIS, Inc. Blood specimens are collected at Harbor UCLA 
Medical Center from local patients, then dyed with Basic Orange 21 metachromatic dye 
supravital stain. The specimen is then passed through a flow microscopic imaging and im- 
age processing instrument, where the blood cell images are captured. Each image contains 
a single cell with full color. There are typically 600 images from each specimen. The task 
of the cell recognition system is to categorize the cells based on the images. 
3.1 PREPROCESSING AND FEATURE EXTRACTION 
The size of cell images are automatically tailored according to the size of the cell in the 
images. Images containing larger cells have bigger sizes than those with small cells. The 
range varies from 20x20 to 40x40 pixels. The average size is around 25x25. See Figure 
3.1. At the preprocessing stage, the images are segmented to set the cell interior apart from 
the background. Features based on the interior of the cells are extracted from the images. 
The features include size, shape, color  and texture. See Table I for the list of features. 2 
Figure 1: Example of some of the cell images. 
3.2 CELL-BY-CELL CLASSIFICATION 
The features are fed into a nonlinear feed-forward neural network with 20 inputs, 15 hidden 
units with sigmoid transfer functions, and 14 sigmoid output units. A cross-entropy error 
A color image is decomposed into three intensity images - red, green and blue respectively 
2The red-blue distribution is the pixel-by-pixel log(red)- log(blue) distribution for pixels in cell 
interior. The red distribution is the distribution of the red intensity in cell interior. 
954 J Song, Y. Abu-Mostafa, J. Sill and H. Kasdan 
feature number 
feature description 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
cell area 
number of pixels on cell edge 
the 4th quantile of red-blue distribution 
the 4th quantile of green-red distribution 
the median of red-blue distribution 
the median of green-red distribution 
the median of blue-green distribution 
the standard deviation of red-blue distribution 
the standard deviation of green-red distribution 
the standard deviation of blue-green distribution 
the 4th quantile of red distribution 
the 4th quantile of green distribution 
the 4th quantile of blue distribution 
the median of red distribution 
the median of green distribution 
the median of blue distribution 
the standard deviation of red distribution 
the standard deviation of green distribution 
the standard deviation of blue distribution 
the standard deviation of the distance from the edge to the mass center 
function is used in order to give the output a probability interpretation. Denote the input 
feature vector as x, the network outputs a D dimensional vector ( D = 14 in our case) 
p = {p(dlx )), d = 1, ..., D, where p(dlx ) is 
p(dlx ) = Prob( a cell belongs to class d I feature x) 
The decision made at this stage is 
d(x)- argmax p(dlx) 
d 
3.3 COMBINING CONTEXTUAL INFORMATION 
The "no-context" cell-by-cell decision is only based on the features presented by a cell, 
without looking at any other cells. When human experts make decisions, they always look 
at the whole specimen, taking into consideration the identities of other cells and adjusting 
the cell-by-cell decision on a single cell according to the company it keeps. On top of the 
visual perception of the cell patterns, such as shape, color, size, texture, etc., comparisons 
and associations, either mental or visual, with other cells in the same specimen are made to 
infer the final decision. A cell is assigned a certain identity if the company it keeps supports 
that identity. For instance, the difference between lymphocyte and blast can be very subtle 
sometimes, especially when the cell is large. A large unusual mononuclear cell with the 
characteristics of both blast and lymphocyte is more likely to be a blast if surrounded by or 
accompanied by other abnormal cells or abnormal distribution of the cells. 
Incorpo rating Contextual Info rmation in White Blood Ceil Identification 955 
This scenario fits in the framework we described in section 2. The Combining Contextual 
Information algorithm was used as the post-precessing of the cell-by-cell decisions. 
3.4 OBSERVATIONS AND SIMPLIFICATIONS 
Direct implementation of the proposed algorithm is difficult due to the computational 
complexity. In the application of WBC identification, simplification is possible. We ob- 
served the following: First, we are primarily concerned with one class blast, the presence 
of which has clinical significance. Secondly, we only confuse blast with another class 
lymphocyte. In other words, for a potential blast, p(blast}x) 
0, p(any other classIx )  0. Finally, we are fairly certain about the classification of all 
other classes, i.e. p(a certain classIx )  1, p(any other class}x) m 0. Based on the above 
observations, we can simplify the algorithm, instead of doing an exhaustive search. 
Let p/a = p(ci = d}xi),i = 1,...,N. More specifically, let p/B = p(blast}xi), p/ = 
p(lymphocytelxi ) and p. -- p(class  Ix/) where * is neither a blast nor a lymphocyte. 
Suppose there are K potential blasts. Order the p, p2 
...,p: s in a descending manner 
over i, such that 
then the probability that there are k blasts is 
 K--k 
B B L L   I/L -- YL q- N 
PB(k) = Px ...P P+x."PK PK+X'"PJv p(vS -- , 
where v is the proportion of unambiguous lymphocytes and v3,..., VD are the proportions 
of the other cell types. 
We can compute the P (k)'s recursively. 
, K 
L L   
= Px ."PK PK+I'"PN p(VB -- O, FL -- VL q- ,/'3, "., VD) 
' K--k-1 . . \ 
P(k + = P.k) p(" = = + 
L ; ........... 
_- + 
for k - 1, ..., K-1, and 
PB(K) = Pf...PK PJC+S...Pv P(" = ," = ", Y3,.-.,"D) 
This way we only need to compute K terms to get P (k)'s. Pick the optimal number of 
blasts k* that maximizes P (k), k = 1, ..., K. 
An important step is to calculate p(vx,..., VD) which can be estimated from the database. 
3.5 THE ALGORITHM 
Step 1 Estimate p(vx, ..., UD) from the database, for d = 1, ..., D. 
Step 2 Compute the object-by-object "no context" a posteriori probability p(i}Xi), i = 
1, ..., N, and ci  {1, ..., D}. 
Step 3 Compute P (k) and find k* for k = 1, ..., K, and relabel the cells accordingly. 
956 35 Song, Y. Abu-Mostafa, J. Sill and H. Kasdan 
4 EMPIRICAL TESTING 
The algorithm has been intensively tested at IRIS, Inc. on the specimens obtained at Harbor 
UCLA medical center. We compared the performances with or without using contextual 
information on blood samples from 220 specimens (consisting of 13,200 cells). In about 
50% of the cases, a false alarm would have occurred had context not been used. Most cells 
are correctly classified, but a few are incorrectly labelled as immature cells, which raises a 
flag for the doctors. Change of the classification of the specimen to abnormal requires ex- 
pert intervention before the false alarm is eliminated, and it may cause unnecessary worry. 
When context is applied, the false alarms for most of the specimens were eliminated, and 
no false negative was introduced. 
methods cell normality false false 
classification identification positive negative 
no context 88%  50% 50% 0% 
with context 89%  90%  10% 0% 
Table 2: Comparison of with and without using contextual information 
5 CONCLUSIONS 
In this paper we presented a novel framework for incorporating contextual information into 
object identification, developed an algorithm to implement it efficiently, and applied it to 
white blood cell recognition. Empirical tests showed that the "with context" approach is 
significantly superior than the "no context" approach. The technique described could be 
generalized to a number of domains where contextual information plays an essential role, 
such a speech recognition, character recognition and other medical diagnosis regimes. 
Acknowledgments 
The authors would like to thank the members of Learning Systems Group at Caltech for 
helpful suggestions and advice: Dr. Amir Atiya, Zehra Cataltepe, Malik Magdon-Ismail, 
and Alexander Nicholson. 
References 
Richard, M.D., & Lippmann, R.P., (1991)Neural network classifiers estimate Bayesian a 
posteriori probabilities. Neural Computation 3. pp.461-483. Cambridge, MA: MIT Press. 
Kasdan, H.K., Pelmulder, J.P., Spolter, L., Levitt, G.B., Lincir, M.R., Coward, G.N., Haiby, 
S. I., Lives, J., Sun, N.C.J., & Deindoerfer, F.H., (1994) The WhiteltUS TM Leukocyte 
differential analyzer for rapid high-precision differentials based on images of cytoprobe- 
reacted cells. Clinical Chemistry. Vol. 40, No. 9, pp.1850-1861. 
Haralick, R.M., & Shapiro, L.G.,(1992),Computer and Robot Vision, Vol. I, Addison- 
Welsley. 
Aus, H. A., Harms, H., ter Meulen, V., & Gunzer, U. (1987) Statistical evaluation of com- 
puter extracted blood cell features for screening population to detect leukemias. In Pierre 
A. Devijver and Josef Kittler (eds.) Pattern Recognition Theory and Applications, pp. 509- 
518. Springer-Verlag. 
Kittler, J., (1987) Relaxation labelling. In Pierre A. Devijver and Josef Kittler (eds.) Pattern 
Recognition Theory and Applications, pp. 99-108. Springer-Verlag. 
