An Efficient CBIR Approach for Diagnosing the Stages of Breast Cancer Using KNN Classifier

Bonfring International Journal of Advances in Image Processing,Volume 2, Issue 1, 2012
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Bonfring International Journal of Advances in Image Processing, Vol. 2, No. 1, March 2012 1 ISSN 2277  –   503X | © 2012 Bonfring Abstract--- This paper proposes a mammogram image retrieval technique using pattern similarity scheme. Comparing previous and current mammogram images associated with pathologic conditions are used to diagnose the real stage of breast cancer by doctors. Lack of awareness and  screening programs causes the breast cancer deaths. Early detection is the best way to reduce the deaths per incident ratio. Mammogram is the best one in the currently used technique for diagnosing breast cancer. In this paper, the retrieval process is divided into four distinct parts that are  feature extraction, kNN classification, pattern instantiation and computation of pattern similarity. In feature extraction  step, low level texture features like entropy, homogeneity, contrast, energy, correlation and run length matrix features are extracted. These extracted features are classified using K- Nearest Neighbor classifier to differentiate the normal tissue  from abnormal one. Each group is considered as patterns.  Finally, pattern similarity is estimated for retrieving images based on their similarity with the query image. This scheme is effectively applied to the Content Based Image Retrieval  systems to retrieve the images from large databases and identify the real stage of breast cancer. If we find cancer in early stages we can cure it.  Keywords--- Cancer Stage, Content Based Image Retrieval (CBIR), KNN Classifier, Pattern I.   I  NTRODUCTION  HE increasing trend of cancer related deaths have forced the humanity to work more on the cancer detection and treatments. Cancer is a leading health problem in India, with approximately 1 million cases occurring each year. Breast cancer is one of the most common cancers and the second most frequent cause of cancer-related deaths among women [1]. It is a malignant tumor that develops from ductal and lobular cells of the breast. A malignant tumor is a group of cancer cells that affects surrounding tissues and also spread to other parts of the body. The main risk factors of breast cancer are later age at child birth, fewer children, shorter duration of  breast feeding, fear of self examination, fear of chemotherapy, and the consumption of fatty foods has increased substantially. Age with women of 40-69 years have more risk of breast  Jini.R. Marsilin, Department of Computer Science & Engineering,  Dr.Sivanthi Aditanar College of Engineering, Tiruchendur, India. E-mail:  Dr.G. Wiselin Jiji, Professor & HOD, Department of Computer Science &  Engineering, Dr.Sivanthi Aditanar College of Engineering, Tiruchendur,  India. E-mail:   cancer [9]. World Health Organization reports that every year more than 1.2 million people will be diagnosed with breast cancer worldwide [16].   For the past 20 years breast cancer death rates have remained steady even though the number of new cases has grown, because of earlier detection and better treatments [2]. Breast cancer is the most frequently diagnosed cancer of women in US and other developed countries. Deaths per incident ratio are higher in India almost at 50%. When it is compared to china, it is only 30% and in US 18% [17]. This implies breast cancer is not detected earlier in India. This could be due to lack of screening programs and lack of a culture of frequent self examination or breast awareness. Therefore, even with the best treatments, breast cancer death rate is high in India. Breast cancer is diagnosed at early stages with the help of the mammogram image. Mammogram is a low dose x-ray of the breast. Early detection is needed to cure the breast cancer. Early detection technique used in [3] detects the tumor from mammogram images. This proposed scheme is used to reduce the mortality among women due to breast cancer by identifying the tumor in initial stage using content based image retrieval (CBIR) and to get the treatment in appropriate time. There is increasing interest in the use of CBIR techniques to diagnose the stage of  breast cancer by identifying similar past cases. CBIR is the  process of retrieving desired images from a large collection  based on the features (such as color, texture and shape) that are automatically extracted from the images themselves. SIMPLicity [4] and FIRE [5] are the CBIR systems widely used to retrieve the images. Hierarchical clustering and K-Means clustering [6] works faster for retrieving better favored image results. Large image retr  ieval task’s efficiency is not satisfied for retrieving radiographic images in CBIR approach used in [8]. Similarity learning approach to content based image retrieval: Application to digital mammography [10] requires prior knowledge about the dataset. These approaches also introduce constraints to the semantics required for image retrieval task The main purpose of this proposed scheme is to develop the dedicated CBIR systems for predicting the real stage of the  breast cancer by comparing the query image with the database images and to reduce the death rate. II.   B REAST C ANCER S TAGING Tumor size, lymph node involvement, tumor grade, whether the cancer has spread to other parts of the body An Efficient CBIR Approach for Diagnosing the Stages of Breast Cancer Using KNN Classifier Jini.R. Marsilin and Dr.G. Wiselin Jiji T  Bonfring International Journal of Advances in Image Processing, Vol. 2, No. 1, March 2012 2 ISSN 2277  –   503X | © 2012 Bonfring  beyond the breast are used to categorize the stage of the breast cancer. Staging information of the breast cancer helps the doctor to understand the disease and make decisions about treatment. Stage 0 describes noninvasive breast cancers and there is no evidence of cancer cells breaking out of the part of the  breast, in which they started. Stage I measures tumor size up to 2 cm and no lymph nodes are involved. It describes invasive breast cancer. Stage II (A or B) tumor size is 2 to 5 cm. The cancer may or may not have spread to lymph nodes. It is the invasive breast cancer. Stage III (A, B, or C) is the advanced stage (i.e.) the cancer is any size and has spread to lymph node within the breast itself and has spread to the chest wall and/or skin of the breast. Stage IV cancer has spread the lymph nodes and also spread to other parts of the body, most often the bones, lungs, liver, or brain. Stages I, IIA, IIB, and IIIA are the early-stage breast cancer. When comparing the data of 117 breast cancer  patients, the study found that over 51.3% cases were in clinical stage II, 21.4% in clinical stage III and 11.1% in clinical stage IV [7].   It enforces the need for improved screening techniques and increases the awareness of women about the potential risk of breast cancer for early detection. The proposed scheme allowed the development of content  based image retrieval systems, capable of retrieving images  based on their similarity with the query image and identifies the correct stages of the breast cancer. III.   M ETHODOLOGY  We propose a CBIR approach using pattern similarity scheme to diagnose the real stage of breast cancer using mammographic images. The retrieval process is illustrated using the flowchart shown in Figure 1. Query mammogram image Retrieved Image Figure 1: Block Diagram of Proposed Image Retrieval System The low level features are extracted from mammogram image. These extracted features are then use k Nearest  Neighbor (KNN) classification. Each group is considered as  patterns. Structural and measure components distance are identified and similarity between two patterns is estimated using the distance measures of both components. Using this similarity measures the most similar images are retrieved with respect to the query image. The retrieved image is used to identify the real stage of breast cancer.  A.    Pattern Base (PB) It keeps the extracted pattern’s information from the images. It consists of 3 basic layers. Pattern type defines the description of the pattern structure. Pattern is the instance of the corresponding pattern type. Class is a collection of patterns of the same pattern type. Pattern type PT is defined as a pair PT = {SS, MS} or  p={s, m} where SS is the pattern space by describing the structure schema of the pattern type. MS is the quality of the source data representation. A pattern-type PT is called complex if its structure schema SS includes another pattern type, otherwise PT is called simple [11] .  B.    Low-Level Image Feature Extraction Feature extraction means the process of determining the relevant content of the images. Color, shape, and texture [12] are the important features commonly used in CBIR. Texture feature plays an important role in medical image interpretation. Image texture is a function of the spatial variation in pixel intensities (gray values) in a spatial neighborhood. It   is a connected set of pixels satisfying a gray level property. Texture analysis is used in the applications like remote sensing, automated inspection, and medical image  processing. Co-occurrence matrix and run length matrix are used as texture analysis tools. In this paper entropy, contrast, energy, correlation, homogeneity and run length matrix texture features are used. Entropy, contrast, energy, correlation, homogeneity features are calculated using gray level co-occurrence matrix. The run length features are computed from run length matrix. Figure 2: Co-Occurrence Matrix Directions Spatial gray level co-occurrence matrix estimates  properties related to second-order features from the image. The GLCM is defined as how often different combinations of  pixel intensity values (grey levels) occur in an image. Gray level co-occurrence matrix (GLCM) [13] captures the spatial dependence of gray level values within an image. It is also known as the gray-level spatial dependence matrix. Co-occurrence matrix is often formed using a set of offsets 0, 45, 90, and 135 degrees. Offset is often expressed as an angle that Feature Extraction KNN Classification Pattern Instantiation Pattern Similarity Pattern Base Medical Image database 90 0  45 0  135 0  0 0    Bonfring International Journal of Advances in Image Processing, Vol. 2, No. 1, March 2012 3 ISSN 2277  –   503X | © 2012 Bonfring is used to specify the distance between the pixel of interest and its neighbor. Contrast is related to the dynamic range of gray levels in an image. It measures the intensity contrast for the pixel of interest and its neighbor. d  C  (i, j)  is the co-occurrence matrix with pixel i, j.   i jd   jiC  ji  ),()(  2  (1) Energy is also known as angular second moment which measures the sum of squared elements in the GLCM. i jd   jiC   2 ),(  (2) Correlation estimates how correlated a pixel to its neighbor over the whole image. i j jid   jiC  j jii  ),(   ))((  (3) Homogeneity measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal. i jd   ji jiC  1),(  (4) Entropy is the inverse measure of homogeneity. i jd d   jiC  jiC   ),(log),(  (5) In Run length matrix, each element of P(i, j)   represents the number of runs with pixels of gray level intensity equal to i and length of run equal to  j along a specific orientation. For a given image, a gray level run is a set of consecutive, collinear  pixels having the same gray level. Length of the run is the number of pixels in the run [14]. Short Run Emphasis (SRE) measures the distribution of short runs.  M i N   jr   j  ji pn  1 12 ),(1  (6) Long Run Emphasis (LRE) measures the distribution of long runs.  M i N   jr    j  ji p n  1 12 ).,(1  (7) Gray-Level Nonuniformity (GLN) measures the similarity of gray level values throughout the image.  M i N   jr    ji pn  1 1 ),(1  (8) Run Length Nonuniformity (RLN) measures the similarity of length of runs throughout the image.  N   j M ir    ji pn  121 ),(1  (9) Low Gray-Level Run Emphasis (LGRE) measures the distribution of low gray level values.  M i N   jr   i  ji pn  1 12 ),(1  (10) High Gray-Level Run Emphasis (HGRE) measures the distribution of high gray level values.  M i N   jr  i  ji P  n  1 12 ).,(1  (11) Short Run Low Gray-Level Emphasis (SRLGE) measures the joint distribution of short runs and low gray level values.  M i N   jr   ji  ji P n  1 122 .),(1  (12) Short Run High Gray-Level Emphasis (SRHGE) measures the joint distribution of short runs and high gray level values.  M i N   jr   ji  ji P  n  1 122 ).,(1  (13) Long Run Low Gray-Level Emphasis (LRLGE) measures the joint distribution of long runs and low gray level values.  M i N  jr   i j ji P  n  1 122 ).,(1  (14) Long Run High Gray-Level Emphasis (LRHGE) measures the joint distribution of long runs and high gray level values.  M i N  jr   ji ji P  n  1 122 .).,( 1  (15) Here  M   is the number of gray levels,  N   is maximum run length. r  n  is the total number of runs.  p n  is the number of  pixels in the image. C.    KNN Classifier K-nearest neighbor (kNN) classification [15] finds a group of k training tuples (k nearest neighbors) in the training set that are closest to the unknown tuple. To classify an unlabeled tuple, the distance of this unknown tuple to the labeled tuple is computed for identifying k-nearest neighbors and most common class labels of these nearest neighbors are then used to determine the class label of the unknown tuple. K-nearest neighbor algorithm (k-NN) is a method of  lazy learning. Classification of unknown tuples can be done using the closeness of unknown to the known according to some distance/similarity function. Euclidean distance is used as the distance metric. Euclidean distance between two tuples is estimated by: niii  x x X  X dis 122121  )(),(  (16) Once the nearest-neighbor list is obtained, the test tuple is classified based on the majority class of its nearest neighbors. If k   = 1, then the unknown tuple is simply assigned the class of its nearest neighbor.  Bonfring International Journal of Advances in Image Processing, Vol. 2, No. 1, March 2012 4 ISSN 2277  –   503X | © 2012 Bonfring  D.    KNN Algorithm Input: the set of training tuples and unlabeled test tuple. Process: Compute the distance between unlabeled test tuple and each training tuple. Select the set of closest (k nearest neighbor) training tuples to the unlabeled tuple. Output: label the test tuple with the majority class of its nearest neighbor.  E.    Pattern Instantiation After the classification each group is considered as  patterns. Specimen i   is instantiated for each pattern  P  i   representing a physical anatomic specimen in a medical image: Specimen i = ][Re:],[Re:(: )]][Re:],[Re:[:(: 1 al SV al  pp MS  al al  DSS   N   (17) where Structure schema SS is represented by the pair (  μ , σ  ). Measure schema MS is represented by two values, the  prior  probability (pp) and the  scatter value (SV) of P i . Prior  probability  pp is defined as the fraction of the feature vectors of the image that belong to pattern  P  i   . SV is a measure of the cohesiveness of the data items in a group with respect to the centroid of that group. If the SV is low, it indicates good scatter quality.  F.    Pattern Similarity Using the distance of the structures and the measures components of two simple patterns  P  1 and  P  2, the pattern similarity is computed. When comparing two medical images, MI1 and MI2, the component patterns of MI1 must be associated with the component patterns of MI2. The distance of measure components using scatter value and prior probability is: (18) For finding structural similarity between  P  1 and  P  2, first find the standardized difference d between two distributions  by Cohen’s distance metric. It is calculated by: d (D1,D2)= (19) If d=0, then distributions are identical. Low d value refers quite similar distributions and high d value refers quite dissimilar distributions. Structural distance between two sets of distributions should be the result of aggregate function. That is: (20) Distance between two patterns dis(p1,p2): (21) To compare two medical images MI1, MI2 adopt the coupling methodology between the different patterns of each image. This is given by: dis(MI  1  ,MI  2  ) =  ),(. .1 21 11  MI  j MI i K  j M i  P  P dis K  M   (22) M & K are the numbers of constituent simple patterns of each image. The final outcome is the average of all possible matching. IV.   R  ESULTS  The experiments are performed with the mammogram images taken randomly from patients of different ages with  pathologic conditions. Mammogram image gives the internal structure of the breast. Texture features are commonly used features for differentiating normal and abnormal tissues. Abnormal tissues in breast (tumor) had higher contrast than normal tissues. (a) Query Image   (b) After Classification   Figure 3: Figure 4: Detected Tumor in Stage I otherwise    D D Dor  Dif   D D D D ,.0.0., 2.... .2121222121 22121 121 ),()).,(1( )2,(),(  P  P dis P  P dis  P  P dis P  P dis meas struct  struct  SV  P SV  P  SV  P  pp P SV  P  pp P   P     P dis meas ........ , 21221.1 21  N  j D Dd  g  P  P dis  j jaggr  struct   ..2.1 ,),( 2121
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!