Infected Fruit Part Detection using K-Means Clustering Segmentation Technique

— Nowadays, overseas commerce has increased drastically in many countries. Plenty fruits are imported from the other nations such as oranges, apples etc. Manual identification of defected fruit is very time consuming. This work presents a novel defect segmentation of fruits based on color features with K-means clustering unsupervised algorithm. We used color images of fruits for defect segmentation. Defect segmentation is carried out into two stages. At first, the pixels are clustered based on their color and spatial features, where the clustering process is accomplished. Then the clustered blocks are merged to a specific number of regions. Using this two step procedure, it is possible to increase the computational efficiency avoiding feature extraction for every pixel in the image of fruits. Although the color is not commonly used for defect segmentation, it produces a high discriminative power for different regions of image. This approach thus provides a feasible robust solution for defect segmentation of fruits. We have taken apple as a case study and evaluated the proposed approach using defected apples. The experimental results clarify the effectiveness of proposed approach to improve the defect segmentation quality in aspects of precision and computational time. The simulation results reveal that the proposed approach is promising.


I. INTRODUCTION
igital images are one of the most key medium of conveying information. Extracting the information from images and understanding them such that the extracted information can be used for several tasks is an important characteristic of Machine learning. Using images for the navigation of robots is an example of the same. Other applications such as extracting malign tissues from the body scans etc form an integral part of Medical diagnosis. Image segmentation is one of the initial steps in direction of understanding images and then finds the different objects in them.
Modern agricultural science and technology is extreme advance. The value of fruit depends on the quality of fruit. It is an important issue how to assay quality of fruit in agricultural science and technology. The classical approach of fruits quality assessment is done by the experts and it is very time consuming. Defect segmentation of fruits can be seen as an instance of the image segmentation in which we are interested only to the defected portion of the image.
Image segmentation entails the separation or division of the image into areas of similar attributes. In another way, segmentation of the image is nothing but pixel classification. The difficulty to which the image segmentation process is to be carried out mostly depends on the particular problem that is being solved. It is treated as an important operation for meaningful interpretation and analysis of the acquired images. It is one of the most crucial components of image analysis and pattern recognition and still is considered as most challenging tasks for the image processing and image analysis. It has application in several areas like Analysis of Remotely Sensed Image, Medical Science, Traffic System Monitoring, and Fingerprint Recognition and so on.
Image segmentation methods are generally based on one of two fundamental properties of the intensity values of image pixels: similarity and discontinuity. In the first category, the concept is to partition the image into several different regions such that the image pixels belonging to a region are similar according to a set of predefined criteria's. Whereas, in the second category, the concept of partition an image on the basis of abrupt changes in the intensity values is used. Edge detection technique is an example of this category which is similar to the boundary extraction. Researchers have been working on these two approaches for years and have given various methods considering those region based properties in mind. But, still, there is no fixed approach for the image segmentation. Based on the discontinuity or similarity criteria, many segmentation methods have been introduced which can be broadly classified into six categories: (1) Histogram based method, (2) Edge Detection, (3) Neural Network based segmentation methods, (4) Physical Model based approach, (5) Region based methods (Region splitting, Region growing & merging), (6) Clustering (Fuzzy C-means clustering and K-Means clustering).
Histogram based image segmentation techniques are computationally very efficient when compared to other image segmentation techniques because they usually require only a single pass through the image pixels. In this technique, a histogram is calculated from all of the image pixels, and the peaks and valleys are detected in the histogram. Now the image pixels between two consecutive peaks can be considered to a single cluster. A disadvantage of this method is that it is not able to categorize when the image has no clear gray level histogram peak. Another disadvantage of this method is that the continuity of the segmented image regions cannot be ensured. We should focus on global peaks that are likely to correspond to the dominant image regions for the histogram based segmentation method to be efficient.
The edge detection method is very widely used approaches to the image segmentation problems. It works on the basis of the detection of points considering abrupt changes at gray levels. A disadvantage of the edge detection method is that it does not work well when there are many edges in the image because in that case the segmentation technique produces an over segmented output, and it cannot easily identify a boundary or closed curve. For an edge based segmentation method to be efficient, it should identify the global edges and these edges have to be continuous.
Neural Network based image segmentation relies on processing small regions of an image using a neural network or a set of different artificial neural networks. After this, the decision-making method marks the regions of an image on the basis of the category recognized by the artificial neural network. Kohonen self organizing map is a type of network designed especially for such type of problems.
The physical model based image segmentation technique assumes that for an image, individual regions follow a recurring form of geometrical structure. This type of segmentation methods uses texture feature.
The region based image segmentation method uses the similarity of pixels within a region in an image. Sometimes a hybrid method incorporating the region based and edge based methods have been proved to be very useful for some applications. The seeded region growing method was the first region growing method.
Clustering based image segmentation methods are also used by many researchers [1] [2]. The segmentation method incorporating clustering approaches encounters great difficulties when computing the number of clusters that are present in the feature space or extracting the appropriate feature. This type of image segmentation is widely used due to the simplicity of understanding and more accurate result.
This paper presents an efficient image segmentation approach using K-means clustering technique based on color features from the images. Defect segmentation is carried out into two stages. At first, the pixels are clustered based on their color and spatial features, where the clustering process is accomplished. Then the clustered blocks are merged to a specific number of regions. Using this two step procedure, it is possible to increase the computational efficiency avoiding feature extraction for every pixel in the image of fruits. Although the color is not commonly used for defect segmentation, it produces a high discriminative power for different regions of the image.
The rest of the paper is organized as follows: Section 2 presents a brief overview of the related work. Section 3 describes the K-means clustering method. In section 4 the proposed method for the defect segmentation of fruits based on color using K-means clustering technique is presented and discussed. Section 5 demonstrates the experimental results obtained with apple as a case study. Finally, section 6 concludes with some final remarks.

II. A BRIEF OVERVIEW OF RELATED WORK
Color image segmentation has been a difficult task for the researchers over the past two decades. It is an essential operation in image processing and in many computer vision, pattern recognition, and image interpretation system, with applications in industrial and scientific field(s) such as Remote Sensing, Microscopy, Medicine, content-based image and video retrieval, industrial automation, document analysis and quality control [3]. The efficiency of color image segmentation may significantly influence the quality of an image understanding system [4]. A detail review on various image segmentation techniques are provided by Pal & Pal [5].
Among myriads of existing segmentation techniques, many have used unsupervised clustering methods. For example, image segmentation on the basis of region merging is analogue of agglomerative clustering [6]. Graph cut methods such as normalized cut and minimal cut characterize the problem of clustering in a graph theoretic way [7]. A major problem for this kind of methods known as the problem of validity is how to decide the number of clusters in any image. Since the problem is basically unresolved, most techniques need that the user should provide a terminating criterion.
Soft computing techniques have been used for segmenting color image by Sowmya and Sheelarani [8]. The soft computing techniques they used were competitive neural network and Possibilistic C means algorithm (PCM). Researchers also used Fuzzy set and Fussy logic techniques for solving segmentation problem. Borji et al. presented CLPSO-based Fuzzy color image segmentation [9]. Cheng et al. used Fuzzy homogeneity approach for the segmentation of color image [10]. Besides this, Genetic algorithm (GA) and artificial neural network (ANN) techniques also have been used for the image segmentation [11].
There are various segmentation techniques in medical imaging problems depending on the region of interest in the image. There are region growing segmentation methods and atlas-guided techniques. Some of them use a semi-automatic method and still need some operator relations. Other techniques use fully automatic methods and the operator has just a verification role.
Automatic image segmentation by integrating seeded region growing and color edge detection was proposed by Fan et al. [12]. They have used fast Entropy thresholding for the extraction of edges. After they have obtained color edges that provided the foremost geometric structures in an image, then they have determined the centroids between these adjacent regions and considered it as the initial seeds. These seeds were then replaced by centroids of the generated homogeneous edge regions by incorporating the additional pixels step by step.
Another method using seeded region growing was proposed by Adams and Bischof [13]. Shih and Cheng proposed another image segmentation method using regions in the image [14] where based on the standard deviation in a neighbor, initial seeds are selected. This method assigns each pixel in that region as seeds after checking whether the value is under a threshold. They have applied region growing and region merging techniques after the selection of seeds. As discussed above color image segmentation has been widely used by the researchers.
Authors in [17], [18] have used the concept of k-means clustering for background subtraction. They segmented the region of interest (i.e. foreground) with the background by making two clusters one for foreground and one for background. In the case of fruit diseases more than one disease may be present at a time so we have to use more than two clusters to segment the infected part with fruit and background.

III. K-MEANS CLUSTERING ALGORITM
The food image processing using clustering is an efficient method. Clustering technique classifies the objects into different groups, or more specifically, partitioning of a data set into clusters (subsets), so that the data in each cluster (ideally) shares some common trait -often according to some defined distance measurement. Data partitioning is a usual technique for the analysis of statistical data, which is used in many areas, including machine learning, image analysis, pattern recognition, bioinformatics and data mining. The computational task of partitioning the data set into k subsets is often referred to unsupervised learning.
There are many approaches of clustering designed for a wide variety of purposes. K-means is a typical clustering algorithm (MacQueen, 1967) [15]. K-means is generally used to determine the natural groupings of pixels present in an image. It is attractive in practice, because it is straightforward and it is generally very fast. It partitions the input dataset into k clusters. Each cluster is represented by an adaptively changing center (also called cluster center), starting from some initial values named seed-points. K-means clustering computes the distances between the inputs (also called input data points) and centers, and assigns inputs to the nearest center.
K-means method is an unsupervised clustering method that classifies the input data objects into multiple classes on the basis of their inherent distance from each other [16]. Clustering algorithm assumes that a vector space is formed from the data features and tries to identify natural clustering in them. The objects are clustered around the centroids ii = 1 . . . k which are computed by minimizing the following objective Where k is the number of clusters i.e. Si, i = 1, 2 , . . . , k and i is the mean point or centroid of all the points xj  Si.
As a part of this work, we implemented an iterative version of K-means algorithm. The algorithm requires a color image as input. The algorithm of K-means clustering is as follows Step 1 Compute the distribution of the intensity values.
Step 2 Using k random intensities initialize the centroids.
Step 3 Repeat the step 4 and step 5 until the labels of the cluster do not change any more.
Step 4 Cluster the image points based on the distance of their intensity values from the centroid intensity values.
Step 5 Compute new centroid for each cluster.
Where k is the number of clusters, i iterates over all the intensity values, j iterates over all the centroids (for each cluster) and i are the centroid intensities.

IV. DEFECT SEGMENTATION
Image segmentation using k-means algorithm is quite useful for the image analysis. An important goal of image segmentation is to separate the object and background clear regardless the image has blur boundary. Defect segmentation of fruits can be seen as an instance of image segmentation in which number of segmentation is not clearly known. Figure 1 shows the framework for the fruits defect segmentation.
The basic aim of the proposed approach is to segment colors automatically using the K-means clustering technique and L*a*b* color space. The introduced framework of defect segmentation operates in six steps as follows Step 1. Read the input image of defected fruits.
Step 2. Transform Image from RGB to L*a*b* Color Space. We have used L*a*b* color space because it consists of a luminosity layer in 'L*' channel and two chromaticity layer in 'a*' and 'b*' channels. Using L*a*b* color space is computationally efficient because all of the color information is present in the 'a*' and 'b*' layers only.
Step 3. Classify Colors using K-Means Clustering in 'a*b*' Space. To measure the difference between two colors, Euclidean distance metric is used.
Step 4. Label Each Pixel in the Image from the Results of K-Means. For every pixel in our input, Kmeans computes an index corresponding to a cluster. Every pixel of the image will be labeled with its cluster index.
Step 5. Generate Images that Segment the Input Image by Color. We have to separate the pixels in image by color using pixel labels, which will result different images based on the number of clusters. Programmatically determine the index of each cluster containing the defected part of the fruit because K-means does not return the same cluster index value every time. But we can do this using the center value of clusters, which contains the mean value of 'a*' and 'b*' for each cluster.

V. EXPERIMENTAL RESULT
To demonstrate the performance of the proposed approach, we have taken apples as a case study. The introduced method is evaluated on the defected apples. We have taken some of the diseases of the apples such as apple scab, apple rot and apple blotch for the defect segmentation. Figure 2 shows some images of the data set infected with various diseases. Presence of a lot of variations in the data set makes it more realistic. Figure 3 shows the defect segmentation result of an apple fruit infected with the apple scab disease using K-means clustering technique. We have segmented the input image into four clusters in Figure 3 and it is clear that fourth cluster correctly segment the defected portion of the image. From the empirical observations it is found that using 3 or 4 clusters yields good segmentation results. So, in this experiment input images are partitioned into three or four segments as per requirement.   Figure 4 shows the detection result on an image infected with apple rot while considering different number of clusters for K-Mean clustering. When number of cluster is set to 2, one cluster contains fruit part while other one contains defected part and background. But if we increase the number of cluster to 3, then defected part is separated with background. If we further increase number of clusters to 4 and 5, then we are not able to segment all defected portion in a single cluster (i.e. single segment) as shown in figure 4. But, in some cases using 3 clusters is not sufficient such as in the example taken in figure 5. In this figure, we have considered an image of apple infected with apple scab.
In figure 5, segmentation result is better for 4 clusters than 3 clusters because the area of infected portion in apple scab is less than the apple rot and the color of infected part is quite similar to the color of fruit part. Only 3 clusters is sufficient in the case of figure 4 because it is infected with apple rot which have generally larger area of infected portion. So, we can say that if defected area is larger, fewer clusters will be required while if defected area is smaller more clusters will be needed. It means number of clusters required for the defect detection from the infected image is invertionally proportion to the defected area. Figure 6 shows the results of defect segmentation of two defected apple fruits using K-means clustering method with only three clusters. Whereas, in Figure 3, we have used four clusters because using three clusters was not sufficient in that case due to the natural variability of skin color in the input apple fruit image. In the first case of Figure 7, there is the presence of the stem/calyx in the input image of defected apple, and using only three clusters our proposed approach are able to segment the defected portion with the stem/calyx of the image. Figure 7 shows more segmentation results using proposed approach of defect segmentation of fruits using Kmeans clustering technique. The experimental results suggest that the introduced method for defect segmentation in this paper is robust because it can accurately segment the defected part with the fruit region, background and stem/calyx.

VI. CONCLUSION
A framework for the defect segmentation of fruits using images is proposed and evaluated in this paper. The proposed approach used K-means clustering technique for segmenting defects with three or four clusters. We have used defected apples for the experimental observations and evaluated the introduced method considering apples as a case study. Experimental results suggest that the proposed approach is able to accurately segment the defected area of fruits present in the image. K-means based defect segmentation approach is also segment defected area with the stem and calyx of the fruits. The future work includes automatic determination of number of clusters required to segment the defects more accurately.