Feature Selection for Image Retrieval based on Genetic Algorithm

— This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co-occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm.

I. InTRoducTIon I n recent years, continuous development of multimedia technologies, storing high quality of images and digital image collection data are rapidly increasing with social networking (such as facebook, yahoo, Google, etc), storage technologies, academic website, research and development etc to upload images, videos, many possible means. With the use of internet, image capturing devices (scanner, camera, etc) and handheld devices for tremendous collection of digital images is generated every day. The intent for development of many general use of image retrieval system requires efficient searching, browsing and retrieving tools.
In current technologies the acquirement, transmission, stack away, and handling are allow in large collection of database. With the increasing in use of network and multimedia development, users are not gratified with traditional image retrieval. So in recent year, CBIR has become an area of wide interest and source of fast retrieval and exact.
Content based image retrieval is image search technique; automatically extract features (colour, texture, shape, etc) to allow searching relevant images from large image database to given input query image based on similarity in features from query image compared with feature from database. The feature extraction techniques is commonly use in CBIR. There are three feature extractions such as colour feature extraction, texture feature extraction and shape feature extraction. Each feature extraction has several techniques. Colour feature extraction is low level visual feature. The respective techniques are used in colour extraction such as colour moment, vector quantization, co-occurrence matrix, etc. Texture feature extraction is also low level visual feature. It measures look for visual patterns in images. It describes distribution of image intensity. In texture, many techniques are used like Gabor filter, Tamura features, etc. Shape feature extraction is high level visual feature. It describes surface of an object within images or particular region. It contains two methods, first is external boundary of shape; another is shape of whole region. The respective techniques are edge density (sobel, prewitt, canny,etc), moment invariant, etc.
In CBIR, the problem is which features are relevant in retrieval process that means large number of features is irrelevant. To avoid this problem, feature selection is used to reduce extraneous, excess and noisy data. The main objective of feature selection is find best feature from large feature set. To improve the performance of content based image retrieval system, feature selection which includes optimal features.
Genetic algorithm is an approach of feature selection for best optimal feature subset from large feature set. Genetic Algorithm is found on primal of evolution, natural selection and biology inheritance. Genetic algorithm is used to find optimal or best solutions to computational problem that minimizes or maximizes a particular function. Genetic algorithm works iteratively by using Genetic operators such as Selection, recombination and Mutation.The optimal feature is selected using Genetic algorithm that searches for the best feature subset corresponding to better image retrieval result.
Clustering is collection of articles which are similar between same clusters while dissimilar articles belong to other cluster. By using clustering in CBIR system is reducing the elapsed time of system. Different clustering techniques are used in CBIR such as k-means clustering, hierarchical clustering, SIFT,etc.
The rest of paper is organized as follow: in section 2 describes related work to content based image retrieval using various techniques. Section 3 presents methodology. In section 4 introduced on framework of proposed approach. The experimental result shows in section 5; and last section 6 concludes the paper.

II. RelATed woRk
In content based image retrieval system, many techniques have been used extensively mentioned in various research papers for better performance. Lakshmi p.s et.al. [4] Performed retrieval of different input query images from the image database based on texture feature. Texture feature is extracted from image using gray level co-occurrence matrix (GLCM). They approached feature selection using genetic algorithm (GA) to improve the accuracy of content based image retrieval. The results of feature selection based on the performance measures (precision and recall) showed higher accuracy of the retrieval system can obtained in lesser computation time. P.K.Bhargavi et.al. [9] Contributes that content based image retrieval system based on the relevant feature. They used color coherence vector and Gabor wavelets feature extraction technique. For Feature Discrimination, it used maximum entropy method for transforming numerical features with nominal using Class Attribute Interdependence Maximization (CAIM) algorithm. They also analyzed proposed approach by optimizing it with the feature selection using Particle Swarm optimization (PSO) algorithm for extracting the near relevant features. The result showed effectiveness and efficiency of the proposed model is compared with other models using precision and recall. C.V. Rashmi et.al. [7] Views that novel image retrieval using Ant Colony Optimization and Relevance Feedback. The proposed system, feature vector of the image is extracted by calculation of color correlogram, Gabor filter and edge histogram descriptors. In their model, feature selection using ACO technique to optimize the features for speed up retrieval and similarity computation. They used support vector machine (SVM) to improve efficiency of the system by using Relevance Feedback.
Clustering is used in CBIR system, Mit Patel et.al. [7] Describes collection of features or a dataset is divided into similar image classes using clustering and classification. The clustering is done with k-means clustering, and classification is done with fuzzy rule based classification. These algorithms are based on texture and color information. In their proposed model, the result showed accuracy is increases and retrieval time is decreases. They compared with proposed model and normal model.

III. MeThodology
In this section, we are introducing methods for new proposed system. As described further the new proposed system is done with three efficient techniques such as • Three Feature extractions techniques are used for colour, shape, and texture.
• Feature selection using Genetic Algorithm. In this method, we used new fitness function • Clustering technique The following new approaches describe below:

A. Feature Extraction
Feature extraction is most valuable operation of CBIR system. It translates the input data into set of features. In this section, we describe three feature extraction techniques which are used in our proposed CBIR system.
Color moment represents characterized a color image. There are 3 different color moments: first order is mean, second order is standard deviation, and third order is skewness of color; are extracted from RGB and HSI color spaces to form an 18-dimensional, using the following mathematical formulation: (1) where, p ij is an i th color channel at the j th image pixel.

2) Texture Feature Extraction
Texture feature extraction describes distribution of image intensities. For texture feature extraction a gray level co-occurrence matrix is simple and most extensively used approaches to extract texture feature from an original image. There are four components of GLCM that are used to characterize the texture such as entropy, contrast, energy and homogeneity. The following formulas are given below: where, (m,n) represents number of rows and number of columns of the image x.

3) Shape Feature Extraction
Shape describes surface of an object within images or particular region. Edge histogram represents 4 directional edges. The image is subdivided into 4 x 4 sub images i.e. 16 sub blocks. For each of the sub images, compute the histogram by using 4 edge types: vertical, horizontal, 45ₒ and 135ₒ.

B. Feature Selection using Genetic Algorithm
Genetic algorithm is compute to find solutions to search and optimization problems. Genetic algorithm is used to find optimal or best solutions to computational problem that minimizes or maximizes a particular function. They simulate biological process of natural selection and reproduction to solve for 'fittest' solutions. This is called 'survival of fittest' used for optimization problems. The basic components to Genetic algorithms are: 1. Initial population of chromosomes: Let m be the number of features. The size of population is N. To create random population P of N number of chromosomes is given below: 2. Fitness function: Standard Deviation is used to evaluate the fitness of each individual population.
3. Selection: Select two parents from population according to their best fitness, which can generate new offspring. It assures that only the best fittest solutions made to generate offspring.

Recombination or Crossover:
Recombinant the parents to form new offspring from two parents string, by copying selected bit of each parents.

5.
Mutation: after the performance of crossover, mutate the new offspring from single parent. It reduces local optimum.
The block diagram of genetic algorithm is shown in below:

C. Clustering
Clustering is collection of articles which are similar between same clusters while dissimilar articles belong to other cluster. In proposed system, we use k-means algorithm. We select the k-means algorithm because it manages the large number of image in cluster. In CBIR system is reducing the elapsed time of system and fast retrieval. Using k-means algorithm, the results are measure by sum of among cluster between every vector and its centroid cluster. To calculate centroid of each cluster using sum of squared error, the given formula below:

IV. PRoPosed sysTeM
Here, we propose new CBIR system using three approaches which are described in previous section. The proposed architecture of CBIR system is shown in below: The working of this system describes in step by step: • The system will extract feature from images by using color, texture and shape automatically.
• These feature images are store in feature vector database.
• Then, we use feature selection using Genetic Algorithm that searches for the best feature subset from feature vector database.
• We use clustering which contains similar image classes. After that, system will compute distance between query image and centroid cluster to find smaller distance.
• The most similar images will retrieve and shows to the user.

V. exPeRIMenTAl ResulT And dIscussIon
In this section, we present a proposed CBIR system which is introduced in previous section. We are considering the new approaches which the enhancement is done. Here is first part of the expected result is present. We introduce database of image that we choose to test our system.

A. Image Database
We use image database in our evaluation is WANG database. It is subset of COREL database. It contains 1000 images in JPEG format. In this paper, 6 classes were taken and each class contains 15 images. The classes are Flowers, Bus, Architecture, Food, Elephant, and Dinosaurs.

B. Performance Measurements
In this section, we perform of the CBIR system can be evaluate in terms of precision and recall. And also compute computational time of each class.

Precision: It is defined as ratio of number of relevant images retrieved and total number of images retrieved
If precision value is 1.0 that means result images is retrieved by search was retrieved.

2.
Recall: It is defined as ratio of number of relevant images retrieved and total number of images retrieved

C. Expected outcome
Here is expected outcome of each class in terms of precision, recall and computational time (s).  In Fig.3 shows, relevant features and irrelevant features based retrieved images without using feature selection based on Genetic algorithm in CBIR system. In Fig.4 shows, most optimal relevant images using feature selection based on Genetic algorithm.  Table 1 shows feature selection using Genetic Algorithm is compared with without Feature Selection in terms of precision, recall, and computational time. The computational time of feature selection using GA takes less than without feature selection. The performance measure analysis is done for each image class. The highest precision value in dinosaurs contain 85% without feature selection, and after using feature selection based on GA improve precision value contain 100% in dinosaurs image. Recall values contains in the range 22-82% without using feature selection and 23-94% using feature selection based on GA.  Blue indicates computational time of system by using feature selection based on GA. Red indicates computational time of system without feature selection. The bar chart shows, by using feature selection based on GA takes less time and fast retrieval. It is more efficient and better performance.    Figure 5.c shows, precision and recall of each image class in database. Blue indicates precision and recall of each image class without using feature selection. Red indicates precision and recall of each image class using feature selection based on Genetic Algorithm. The chart shows highest precision and recall of each image class using feature selection based on Genetic Algorithm is better than without feature selection.   Table 2 and Table 3 are showing the precision and recall of Previous Approach and Proposed Approach for each Image class. The Previous approach contains Texture feature result based on feature extraction and feature selection using Genetic Algorithm; clustering is not used in this approach. The Proposed approach contains Multi Feature (Color, Texture, and Shape) result based on feature extraction and feature selection using Genetic Algorithm; clustering is used in this approach.

VI. conclusIon
In this paper, we introduce about the CBIR system with different techniques. Different Feature extraction techniques for colour, texture, and shape are used in CBIR system for better image retrieval. The proposed work is CBIR system based on Feature selection using genetic algorithm for best optimal features from feature set, and also use clustering for reduce elapsed time of system. The experimental result shows feature selection using Genetic Algorithm reduces the time for retrieval and also increases the retrieval precision and recall, thus it gives better and faster results as compared to normal image retrieval system. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm. The computational time is reduced. From the result, it is clear that feature selection using Genetic Algorithm is more optimize the searching time in seconds and also shows highest precision and recall of each image class as compared with normal image retrieval system i.e. without feature selection.
In future work, we try to use feature selection based on other optimization algorithm for scale down computational time and provide better accuracy result of CBIR system. It can give better performance of CBIR system. It is also possible to improve performance of retrieval system by relevance feedback.