C A fuzzy c-means bi-sonar-based Metaheuristic Optimization Algorithm

— Fuzzy clustering is an important problem which is the subject of active research in several real world applications. Fuzzy c-means (FCM) algorithm is one of the most popular fuzzy clustering techniques because it is efficient, straightforward, and easy to implement. Fuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. Objects on the boundaries between several classes are not forced to fully belong to one of the classes, but rather are assigned membership degrees between 0 and 1 indicating their partial membership. However FCM is sensitive to initialization and is easily trapped in local optima. Bi-sonar optimization (BSO) is a stochastic global Metaheuristic optimization tool and is a relatively new algorithm. In this paper a hybrid fuzzy clustering method FCB based on FCM and BSO is proposed which makes use of the merits of both algorithms. Experimental results show that this proposed method is efficient and reveals encouraging results.


I. INTRODUCTION
lustering is the process of assigning data objects into a set of disjoint groups called clusters so that objects in each cluster are more similar to each other than objects from different clusters. Let {x(q): q = 1,…,Q} be a set of Q feature vectors. Each feature vector x(q) = (x 1 (q), …, x N (q)) has N components with weights w(q) = (w 1 (q), …, w N (q)) and distances metrics D(q) = (d 1 (q), …, d N (q)). The process of clustering is to assign the Q feature vectors into K clusters {c(k): k = 1, …, K} usually by the minimum distance assignment principle. Choosing the representation of cluster centers (or prototypes) is crucial to the clustering. Feature vectors that are farther away from the cluster center should not have as much weight as those that are close. These more distant feature vectors are outliers usually caused by errors in one or more measurements or a deviation in the processes that formed the object.
The simplest weighting method is arithmetic averaging. It adds all feature vectors in a cluster and takes the average as prototype. Because of its simplicity, it is still widely used in the clustering initialization. The arithmetic averaging gives the central located feature vectors the same weights as outliers. To lower the influence of the outliers, median vectors are used in some proposed algorithms. To be more immune to outliers and more representatives, the fuzzy weighted average is introduced to represent prototypes: Z n (k) =  {q: q k} w qk x (q) n (1) Rather than a Boolean value 1 (true, which means it belongs to the cluster) or 0 (false, does not belong), the weight wqk in equation (1) represent partial membership to a cluster. It is called a fuzzy weight. There are different means to generate fuzzy weights. One way of generating fuzzy weights is the reciprocal of distance. w qk = 1/ D qk , w qk = 1 if D qk (2) When the distance between the feature vector and the prototype is large, the weight is small. On the other hand, it is large when the distance is small. Using Gaussian functions to generate fuzzy weights is the most natural way for clustering. It is not only immune to outliers but also provides appropriate weighting for more centrally and densely located vectors. It is used in the fuzzy c-means (FCM) algorithm.
Clustering techniques are applied in many application areas such as pattern recognition [13], data mining [12], and machine learning [1]. Clustering algorithms can be broadly classified as Hard, Fuzzy, Possibilistic, and Probabilistic [6]. K-means [15] is one of the most popular hard clustering algorithms which partitions data objects into k clusters where the number of clusters, k, is decided in advance according to application purposes. This model is inappropriate for real data sets in which there are no definite boundaries between the clusters. After the fuzzy theory introduced by Lotfi Zadeh, the researchers put the fuzzy theory into clustering. Fuzzy algorithms can assign data object partially to multiple clusters. The degree of membership in the fuzzy clusters depends on the closeness of the data object to the cluster centers. The most popular fuzzy clustering algorithm is fuzzy c-means (FCM) which was introduced by Bezdek [8] in 1974 and now it is widely used.
Fuzzy clustering [9] is an important problem which is the subject of active research in several real world applications. Fuzzy c-means (FCM) algorithm is one of the most popular fuzzy clustering techniques because it is efficient, straightforward, and easy to implement. However FCM is sensitive to initialization and is easily trapped in local optima because of the random selection in center points. It generalizes c-means (also known by k-means). While c-means builds a crisp partition with c clusters, fuzzy c-means builds a fuzzy one (also with c clusters). u ik is used to formalize the membership of element x k to the i-cluster. The crisp case corresponds to have u ik as either 0 or 1 (boolean membership) while the fuzzy case corresponds to have u ik in [0; 1]. In this latter case, u ik = 0 corresponds to non-membership and u ik = 1 corresponds to full membership to cluster i. Values in-between correspond to partial membership (the largest the value, the greatest the membership). Due to this fuzzy nature, in this latter case elements are allowed to belong to more than one cluster.
In the 1970ies, a new kind of approximate algorithm has emerged which tries to combine basic heuristic methods in higher level frameworks aimed at efficiently and effectively exploring a search space. It is defined in St¨utzle, T. Local Search Algorithms for Combinatorial Problems -Analysis, Algorithms and New Applications. DISKI -Dissertationen zur K¨unstliken Intelligenz. infix, Sankt Augustin, Germany, 1999.St¨utzle, T. Local Search Algorithms for Combinatorial Problems -Analysis, Algorithms and New Applications. DISKI -Dissertationen zur K¨unstliken Intelligenz. infix, Sankt Augustin, Germany, 1999. as "Metaheuristics are typically high-level strategies which guide an underlying, more problem specific heuristic, to increase their performance. The main goal is to avoid the disadvantages of iterative improvement and, in particular, multiple descent by allowing the local search to escape from local minima. This is achieved by either allowing worsening moves or generating new starting solutions for the local search in a more "intelligent" way than just providing random initial solutions. Many of the methods can be interpreted as introducing a bias such that high quality solutions are produced quickly. This bias can be of various forms and can be cast as descent bias (based on the objective function), memory bias (based on previously made decisions) or experience bias (based on prior performance). Many of the metaheuristic approaches rely on probabilistic decisions made during the search. But, the main difference to pure random search is that in metaheuristic algorithms randomness is not used blindly but in an intelligent, biased form." The performance of simple iterative improvement local search procedures is in general unsatisfactory, for example in Figure 1 the final solution, Trial, is still not the optimal or best for this arbitrary objective function. The quality of the obtained local minimum heavily depends on the starting point for the local search process. As the basin of attraction of a global minimum is generally not known, iterative improvement local search might end up in a poor quality local minimum.
There are different ways to classify and describe metaheuristic algorithms, each of them being the result of a specific viewpoint. For example, we might classify metaheuristics as nature-inspired metaheuristics vs. non-nature inspired metaheuristics. This classification is based on the origins of the different algorithms. There are nature-inspired algorithms, such as evolutionary computation and ant colony optimization, and non nature-inspired ones such as tabu search and iterated local search. We might also classify metaheuristics as memory-based vs. memory-less methods. This classification scheme refers to the use metaheuristics make of the search history, that is, whether they use memory or not. Memory-less algorithms, for example, perform a Markov process, as the information they exclusively use to determine the next action is the current state of the search process. The use of memory is nowadays recognized as one of the fundamental elements of a powerful metaheuristic. Finally, metaheuristics may also be classified into methods that perform a single point vs. population-based search. This classification refers to the number of solutions used by a metaheuristic at any time. Generally, algorithms that work on a single solution at any time are referred to as trajectory methods. They comprise all metaheuristics that are based on local search, such as tabu search, iterated local search and variable neighborhood search. They all share the property that the search process describes a trajectory in the search space.
Population-based metaheuristics, on the contrary, either perform search processes which can be described as the evolution of a set of points in the search space (as for example in evolutionary computation), or they perform search processes which can be described as the evolution of a probability distribution over the search space (as for example in ant colony optimization).
For solving this problem, recently evolutionary metaheuristic algorithms such as genetic algorithm (GA) Vas, P. Artificial-intelligence-based Electrical Machines And Drives: Application Of Fuzzy, Neural, Fuzzy-neural, And Genetic-algorithm-based Techniques (monographs In Electrical And Electronic Engineering). Oxford University Press, 1999., simulated annealing (SA) Wang, J.X., Garibaldi, J. Simulated Annealing Fuzzy Clustering in Cancer Diagnosis. Informatica, 29:61-70, 2005., ant colony optimization (ACO) Ganji, M.F. Using fuzzy ant colony optimization for diagnosis of diabetes disease, IEEE, 18th Iranian Electrical Engineering (ICEE) Conference, pp. 501-505, 2010., particle swarm optimization (PSO) [14], [16] and Bi-sonar optimization (BSO) [11] have been successfully applied. BSO is a population based optimization tool, which could be implemented and applied easily to solve various function optimization problems, or the problems that can be transformed to functions where fitness can be used in optimization problems [8]. In this paper, a hybrid fuzzy clustering algorithm based on FCM and BSO called FCB is proposed. The experimental results over three real-life data sets indicate the FCB algorithm is superior to the FCM algorithm and BSO algorithm.
The rest of the paper is organized in the following manner. Section 2 introduces FCM, BSO and FCB. In Section 3 parameter settings for FCB algorithm for clustering is presented with experimental results. Finally section 4 concludes this work.

II. METHODS
Different algorithms have been developed using different approaches and considering different underlying assumptions on the data and on the final set of clusters. c-means, fuzzy cmeans, self-organizing maps are some of the well known clustering algorithms. Existing algorithms can be classified according to several dimensions. Some of them are described below. One of such dimensions is the direction of the clustering process. In this case, methods are divided into agglomerative ones and partitive ones. Agglomerative algorithms build clusters gathering together those records that are similar. This situation corresponds to a bottom-up strategy (or a bottom-up direction) i.e. from individual records to the set that contains all records. Partitive algorithms, instead, follow a top-down strategy. This is, clusters are defined by partitioning larger sets of records.
Another dimension corresponds to the membership of records to clusters. In this case, we can distinguish among crisp, fuzzy and probabilistic clusters. In crisp clusters, membership of a record into a cluster is boolean. This is, the record either belongs or not to the cluster. Instead, in the case of fuzzy clusters, membership is a matter of degree (in [0; 1]). At the same time, individual records can belong to several clusters. In the case of probabilistic clusters, membership is boolean but there is a distribution of probability of belonging to clusters.
A third dimension is the structure of the clusters. In short, this is whether the clusters themselves define a structure and, if so, which is the structure they define. The simplest case is when no structure is defined. Each cluster is understood as an independent object. Alternatively, clusters can define hierarchies or other complex structures. Such dimensions can be used to classify clustering methods. For example, agglomerative clustering methods are bottom-up (agglomerative) crisp methods that naturally lead to hierarchical cluster structures. c-means is a top-down (partitive) crisp method where clusters do not have any particular relation. Fuzzy c-means is also a top-down (partitive) algorithm that leads to fuzzy clusters (fuzzy memberships of elements to clusters). Self-Organizing Maps (SOM) is also a partitive crisp algorithm but in this case, a grid structure is established among clusters.

A. Fuzzy c-means (FCM)
The fuzzy c-means (FCM) clustering algorithm [3] generates fuzzy partitions for any set of numerical data, allowing one piece of data to belong to two or more clusters. FCM partitions a set of patterns X i = {x 1 , x 2 ,..., x n } with n features [2] into c (1<c<n) fuzzy clusters with a set of cluster centers Z j = {z 1 , z 2 , ... , z c } each being initialized.
Here, the membership degree μ ij [0, 1] quantifies the grade of membership of the ith pattern to jth cluster. The aim of FCM is to minimize the objective function with d ij being the Euclidean distance [5], [4] measure taken from pattern feature data point x i to the cluster center z j . m (m>1) is a scalar which controls the fuzziness of the resulting clusters.
In this formulation, x i corresponds to the centroid (cluster center/cluster representative) of the i-th cluster and m is a parameter (m ≥ 1) that plays a central role. With values of m near to 1, solutions tends to be crisp (with the particular case that m = 1 corresponds to the crisp c-means). Instead, larger values of m yield to clusters with increasing fuzziness in their boundaries. To solve this problem, an iterative process is applied. The method interleaves two steps. One that estimates the optimal membership functions of elements to clusters (when centroids are fixed) and another that estimates the centroids for each cluster (when membership functions are fixed).
The membership degree is μ. This method does not assure to find the optimal solution of the minimization problem given above but a local optimum. Different starting points can lead to different solutions.

B. Bi-sonar optimization (BSO)
Global optimization algorithms are often classified as either deterministic or stochastic. A stochastic method usually refers to an algorithm that uses some kind of randomness (typically a pseudo-random number generator), and may be called a Monte Carlo method. Examples include pure random search, simulated annealing, and genetic algorithms. Random search methods have been shown to have a potential to solve large problems efficiently in a way that is not possible for deterministic algorithms. An advantage to stochastic methods is that they are relatively easy to implement on complex problems.
A common experience is that the stochastic algorithms perform well and are "robust" in the sense that they give useful information quickly for ill-structured global optimization problems. Bat-inspired algorithm is a metaheuristic optimization algorithm developed by Xin-She Yang [14]. This bat algorithm is based on the bi-sonar/echolocation behaviour of microbats with varying pulse rates of emission and loudness. The idealization of the echolocation of microbats can be summarized as follows: Each virtual bat flies randomly with a velocity v i at position (solution) xi with a varying frequency or wavelength and loudness A i . As it searches and finds its prey, it changes frequency, loudness and pulse emission rate r.
Search is intensified by a local random walk. Selection of the best continues until certain stop criteria are met. This essentially uses a frequency-tuning technique to control the dynamic behaviour of a swarm of bats, and the balance between exploration and exploitation can be controlled by tuning algorithm-dependent parameters in bat algorithm. We have to define the rules how bats frequencies f i , positions x i and velocities v i in a d-dimensional search space are updated. The new solutions x i (t) and velocities v i (t) at time step t are given by: where δ [0, 1] is a random vector drawn from a uniform distribution. Here x(t gbest ) is the current global best location or hunting space or solution which is located after comparing all the solutions among all the n bats. As the product λ i f i is the velocity increment, we can use either f i (or λ i ) to adjust the velocity change while fixing the other factor λ i (or f i ), depending on the type of the problem of interest. The domain size of the problem in context determines the values of fmin and f max . Initially, each bat is randomly assigned a frequency which is drawn uniformly from [f min , f max ].
Bat algorithm has been used for engineering Yang, X. S. and Gandomi, A. H., Bat algorithm: a novel approach for global engineering optimization, Engineering Computations, Vol. 29

C. Fuzzy c-means bi-sonar (FCB) optimization for clustering
Stochastic methods, such as simulated annealing and genetic algorithms, are gaining in popularity among practitioners and engineers because they are relatively easy to program on a computer and may be applied to a broad class of global optimization problems. However, the theoretical performance of these stochastic methods is not well understood. The stochastic and fuzzy set theories cannot be considered to be an omnipotent mean which will solve all the problems automatically. They have to be understood as an appropriate instrument for modeling the indeterminateness. As the main objective of fuzzy sets is the modeling of the semantics of a natural language there exist numerous specializations in which the fuzzy sets can be applied.
Besides the most often used probabilistic models and the stochastic analysis techniques newer uncertainty models have been developed that offer the chance to take account of nonstochastic uncertainty that frequently appears in real world problems. The quantified uncertain parameters are introduced in the respective analysis algorithm: Fuzzy c-means and bi-sonar optimization algorithm. A modified bat algorithm for cluster analysis is proposed. The velocities (cf. equation (7)) of bats are redefined to update the fuzzy relation between variables.
(11) The variable x(t pbest ) is the personal best hunting space for a bat. The inclusion of this in the algorithm should enhance clustering by increasing exploitation of the algorithm towards favorable cluster centers. For evaluating the generalized solutions of the FBC algorithm's fitness function f(x) the objective function J fcm of the FCM algorithm is used: (12) where K is a constant. The smaller is J fcm the better is the clustering effect and the higher is the individual fitness.

A. Parameter settings
Optimization techniques traditionally depend on the setting of one or more parameters. Depending on the problem and the techniques the number of parameters can be one, two or even dozens of them. One of the main difficulties of applying an evolutionary algorithm (or, as a matter of fact, any heuristic method) to a given problem is to decide on an appropriate set of parameter values. The tuning process, when dealing with several parameters, is a time consuming and critical step. Typically these are specified before the algorithm is run and include population size, selection rate, operator probabilities, not to mention the representation and the operators themselves.
In order to optimize the performance of the FCB, fine tuning has been performed and best values for their parameters are selected. The parameters were tuned (meta-optimized) to perform well on the problem sets. Based on experimental results these algorithms perform best under the following settings: α=γ=0.9, initial loudness A i =1.35 and initial emission rate r i =0.001. The FCB terminating condition is the maximum number of iterations 3000 or no changes in g best in 400 iterations. In all of algorithms m, the weighting exponent is set to 2. Parameter settings for FCM and BSO are shown in [7].

B. Findings
For evaluating FCB, three well-known real-world data sets UCI Machine Learning Repository, Center for Machine Learning and Intelligent Systems, 2012. Available online: http://archive.ics.uci.edu/ml/ have been considered: 1. Glass, which consists of 214 objects and 6 different types of glasses. Each type has 9 features, 2. Vowel data set, which consists of 871 Indian Telugu vowel sounds, the data set has three features and six overlapping clusters, 3. Contraceptive Method Choice (CMC), which consists of 1473 objects and 3 different types characterized by 9 features. FCB obtained superior results than others in all three data sets and it can escape from local optima (cf.  (2)).
The experimental results show that when the size of data set (number of objects or clusters) is small (glass and vowel), the FCB surpasses FCM and with increasing the size of data set (CMC), FCB still obtains better results than FCM. It also performs better than fuzzy BSO (Fuzzy Bat Swarm Optimization) [10] in all test cases. The computation time for FCB algorithm is only about 51 seconds per instance on average with a maximum of 192 seconds for some of the largest instances. Here running times was used as a metric for the performance analysis of the clustering algorithms Zhao 3,9) 3533.8 3329.8 3413.7

IV. CONCLUSION
This paper presented a derivation of a swarm family of stochastic algorithms. The fuzzy c-means algorithm is sensitive to initialization and is easily trapped in local optima. On the other hand the bi-sonar optimization algorithm is a stochastic tool which could be implemented and applied easily to solve various function optimizations. In this paper in order to overcome the shortcomings of the fuzzy c-means we integrate it with bi-sonar optimization algorithm to produce the FCB algorithm. Experimental results over three well known data sets, Glass, Vowel and CMC, show that the proposed hybrid method is efficient and reveals very encouraging results in term of quality of solution found. Interpretation of this reformulated functional underlying the FCM model as a generalized mean of order might lead to new results for other families of metaheuristic swarm-based fuzzy models, for example, using cuckoo searchYang, X. Apart of the disambiguation of assignment of objects in clusters this approach is more robust in terms of finding the local minima of the given objective function. The conjecture that this method is more robust than deterministic (crisp) clustering is supported by the experimental results. The FCM is a global stochastic tool which could be implemented and applied easily to solve various function optimization problems, or the problems that can be transformed to other functionbased optimization problems.
The following properties are important research areas that can be taken in order to increase the efficiency and effectiveness of the FCB algorithm. The FCB algorithm should be able to generate arbitrary shapes of clusters rather than be confined to some particular shape, handle large volume of data as well as high-dimensional features with acceptable time and storage complexities, detect and remove possible outliers and noise, decrease the reliance of algorithms on users-dependent parameters, have the capability of dealing with newly occurring data without relearning from the scratch, be immune to the effects of order of input patterns; provide some insight for the number of potential clusters without prior knowledge, show good data visualization and provide users with results that can simplify further analysis and be capable of handling both numerical and nominal data or be easily adaptable to some other data type.
However, it is important to emphasis that ultimately, the tradeoff among different criteria and methods is still dependent on the applications themselves. Further work can be done on using multi-criteria analysis of the algorithm's performance, for example, space and data size. The advantages shown in using this approach can be applied in many areas including medical image segmentation, classification and soil-landform interrelationships, estimation and segmentation of magnetic resonance imaging (MRI) data, clustering of microarray data, image segmentation, color image segmentation, application to non-linear mapping to geochemical datasets, analysis of metabolomics, web document and snippet clustering, classification of remotely sensed images, eigenspace projections and pixel classification.