Face Detection for Augmented Reality Application Using Boosting-based Techniques

— Augmented reality has gained an increasing research interest over the few last years. Customers requirements have become more intense and more demanding, the need of the different industries to re-adapt their products and enhance them by recent advances in the computer vision and more intelligence has become a necessary. In this work we present a marker-less augmented reality application that can be used and expanded in the e-commerce industry. We take benefit of the well known boosting techniques to train and evaluate different face detectors using the multi-block local binary features. The work purpose is to select the more relevant training parameters in order to maximize the classification accuracy. Using the resulted face detector, the position of the face will serve as a marker in the proposed augmented reality.

Many applications have shown an important impact on the daily human life, for example, to reduce the number of accidents on roads, vehicle manufactures have developed a clever system that detects and warns the vehicle drivers in the event of tiredness [1] [2] [3].A small camera is placed in the cab of the vehicle, the images taken by the camera are used to measure the position of the head and rotation, and to check if eyes are closed or opened.By introducing the Kinect sensor to the game industry, Microsoft was able to position the controllerfree gaming device as en entirely new way to experience entertainment in the living room.With Kinect,games-players no longer need to memorize different commands for a hand-held control, they are the controllers themselves [4] [5].For a visual tracking algorithm to be useful in real-world scenarios, it should be designed to handle and overcome cases where the target's appearance changes from frameto-frame.Significant and rapid appearance variation due to noise, occlusions, background clutter, pose, scale and illumination changes are the major challenge situations that a detector needs to overcome.Many novel methods have been proposed to resolve each of these variations [6] [7].The accuracy of a trained face detector is heavily related to the data and algorithm used for the training.In this paper we highlight the importance of the choice of the training parameters values and show how this choice impact the accuracy of the resulted detector.

A. Face detection problem
Human emotions like sadness, happiness and anger are often expressed throw the face, these facial expressions make the human face a very dynamic body part.This high degree of variation combined with pose, scale and illumination changes makes of face detection a difficult problem.Over the two past decades, face detection problem has been an attractive research area for the computer vision community.Real time face detection was made possible since the publication of the seminal approach of Viola and Jones [8], in which they used a cascade of increasing complexity classifiers to detect up-right faces.The face detector accuracy depends not only on the features used for the face representation, but also on the training data and parameters.

B. Objective and Contribution outline
Training an accurate boosting model requires a data-set with a high degree of variation and a fine tuning of the training parameters.In this work we revisit the face detection problem to find the best training parameters that lead to an accurate face detector, the impact of each training parameter is examined by training classifiers with different parameter values.And to overcome the drawback of using markers in augmented reality applications, we integrate our face detector in a 3D augmented reality where the position of the face is used as a markerless object for placing 3D models.
The contributions of this paper are :

II. RelATed woRk
Classifiers are built by taking a set of labeled examples and using them to come up with a rule that will assign a label to any new example.In the general problem, we have a training data set (, ); each of the xi consists of measurements of the properties of different types of object, and yi are labels giving the type of the object that generated the example.In this paper we will use different learning-based techniques like decision tree learning [9] that is one of the most widely used and prac-tical methods for inductive inference.The boosting [10] is one of the most popular learning techniques, widely used for object detection, it consists of combining many weak learners to form a strong classifier.For this study will experiment classification using Gentle AdaBoost, Real AdaBoost and LogitAdaboost learning techniques, the algorithms train models sequentially, with a new model trained at each round.At the end of each round, miss-classified examples are identified and have their emphasis increased in a new training set which is then fed back into the start of the next round, and a new model is trained.Viola and Jones [8] introduced the approach of cascade of boosted classifiers.[11] proposed random forests, which is a collection of random trees (RT).Random trees are structurally identical to classical decision trees but are trained differently.During training not an exhaustive search of the possible test candidates is considered but only a randomized subset in order to allow for creating several different and independent random trees.

A. The original LBP
The local binary pattern (LBP) [12] [13] is defined as a gray-scale invariant texture measure, derived from a general definition of texture in a local neighborhood.The original LBP operator labels the pixels of an image by thresholding the 3-by-3 neighborhood of each pixel with the center pixel value and considering the result as a binary number.The decimal result is the sum of, the thresholds multiplied by their weights values, as it can be seen in Fig. 1.In other words given a pixel position (x c , y c ), LBP is defined as an ordered set of binary comparisons of pixels intensities between the central pixel and its surrounding pixels.
The resulting label value of the 8-bit word can be expressed as follows : (1) where l c corresponds to the gray value of the central pixel, l c the gray value of the neighbor pixel n, and function t(k) is defined as following : (2) (3) According to (2), the LBP code is invariant to monotonic gray-scale transformations, thus the LBP representation may be less sensitive to illumination changes.
The 256-bin histogram of the labels computed over an image can be used as texture descriptor.Each bin of histogram (LBP code) can be regarded as micro-texton and the histogram characterizes occurrence statistics of simple texture primitive.The histogram of the labeled image f l (x, y) can be defined as: (4) where L is the number of different labels produced by the LBP operator and l(A) is 1 if A true and 0 otherwise.

B. Multi-block LBP
The LBP operator has been extended to consider difference between blocks, the MB-LBP [14] operator is defined by comparing the central rectangles average intensity g c with those of its neighborhood rectangles g 0 , ..., g 8 .In this way, it can give us a binary sequence.An output value of the MBLBP operator can be obtained as follows: (5) where g c is the average intensity of the center rectangle, g i (i = 0, .., 8) are those of its neighborhood rectangles.Fig. 2 demonstrates how the MB-LBP features are calculated.

A. Data set preparation
To train the classifiers we use a subset of the FERET [15] face database.For the testing purpose we use the BioID Face database [16].The dataset consists of 1521 gray level images with a resolution of 384286 pixel.Each one shows the frontal view of a face of one out of 23 different test persons.There are 20 manually placed points on each image.The markup scheme is shown on Figure 5.

B. Experiments & Results
The choice of the training parameter values has an important impact on the trained classifiers accuracy.For the Multi-Blocklocal binary pattern features we choose a range of parameter values to apply and we plot a roc curve of each classifier to highlight the influence of each parameter.

1) Minimum hit rate parameter:
The first set of experiments consists of varying the diffrent values used for the parameter minHitRate, the minimum desired hit rate for each stage of the classifier.The Overall hit rate can be estimated as: (MinHitRate number−Of −stages ).Table II

2) Maximum false alarm parameter:
In the second set of experiments, we vary the values of the parameter MaxFalseAlarm, the maximum desired false alarm rate for each stage classifier.The Overall false alarm rate is estimated as: (MaxFalseAlarm number−Of−stages ).Table III

3) Max depth parameter:
In the third set of experiments, we vary the values of the parameter maxDepth, the maximum depth in each single weak classifier.Table IV lists the different values used for this parameter.

Classifier
Max Depth 4) The boosting variant parameter: In the last set of experiments, we vary the values of the boosting type parameter Boost Type, Table V lists the different values used for this parameter.

C. Performance evaluation
The resulting classifiers have been applied on the BioID Face database, we present the detection performance by the Receiver Operating Characteristic curves [17].

Classifier
Boosting variant The ROC curves shown in the figures 6, 7, 9 and 8 are obtained by scoring the detected windows on each test image and applying a threshold to decide if the detected window is a face or not.To represent the degree of match between a detection d i and an annotated region l j , we employ the commonly used ratio of intersected areas to joined areas: (6) And we use a slightly modified version of the evaluation tool provided by [18], in where we use a rectangular face annotation rather than an elliptical one.

D. Results discussion
Figure 6 shows how the values 0:9 and 0:8 of the minimum hit rate parameter give the best result for the Multi-block local binary features.Choosing the value 0:7 for the maximum false alarm parameter gives the best result in the second set of experiments as shown in Figure 7.
In Figure 9 we see how applying the Gentle AdaBoost variant gives the more accurate results in the third set of experiments.Finally, the figure 8 shows that a classifier based on weak learner of only one depth performs more accurately than a classifier with deeper weak learners.
These different experiments have shown how a single parameter value can influence the accuracy of a face classifier and for each type of object detection, user might need to use multiple combinations of the different parameters values to find an accurate classifier.

V. APPlIcATIon To AugMenTed ReAlITy
Augmented Reality (AR) employs computer vision, image processing and computer graphics techniques to merge digital content into the real world.It enables real-time interaction between the user, real objects and virtual objects.AR can, for example, be used to embed 3D graphics into a video in such a way as if the virtual elements were part of the real environment.This technology has known an increasing interest in many fields and it has been explored in the e-commerce applications [19] where clients try clothes online, discover and test itmes they are interested in without having to move to a store.It was also applied in the e-learning [20] where the classic teaching way can be mixed with the augmented reality to make students having fun while learning new things in a new manner.
For augmented reality, marker-less model-based tracking approaches [21] appear to be the most promising among the standard vision techniques currently applied in AR applications.While markerbased approaches such as ARToolkit [22] or commercial tracking systems such as ART provide a robust and stable solution for controlled environments, it is not feasible to equip a larger outdoor space with fiducial markers.Hence, any such system has to rely on models of natural features such as architectural lines or feature points extracted from reference images.The proposed augmented reality of this work   makes use of the powerful open source real time 3D rendering engine OGRE [23] and the 3D modeling tool Blender.OGRE enables a programmer to deal with the threedimensional graphical presentation of a particular application in a very object oriented manner and that is exactly what explain the name OGRE, Object-Oriented Graphics Rendering Engine.It acts as wrapper to the rendering subsystem (OpenGL or DirectX), allowing us to focus on the application rather than the rendering details.
Fig. 10 shows the workflow of an OGRE based application, in which a standard web camera is use to capture the video stream, the trained face detector is then loaded using the open computer vision library [24] and used to detect faces present on each frame of the video.finally, ogre is used to place a 3D object over the face region and the video stream is rendered to the user.

A. Modeling a 3D Hat
The 3D model that we use for our application is a 3D hat, see Fig. 12.
To model the hat we use the powerful modeling tool blender.The Ogre rendering engine uses meshes and skeletons for movable objects, in order to use the modeled hat, we need to convert blender format to resources that are managed by Ogre to render a 3D model.Figure 11 shows the generated resources from converting the blender 3D model to Ogre format.

B. Adding the 3D model to an Ogre scene
Every 3D rendering library uses a scene graph to organize its renderable items.This scene graph typically is optimized for fast search-ing and querying, providing the user with the ability to find items in the vicinity of other items, and allowing the library to find, sort, and cull polygons as needed in order to provide the most efficient rendering possible.For the proposed augmented reality application, the background of the scene will be the stream captured by the camera.Then for adding the 3D model to our scene, we process the camera stream frame by frame to detect the face position and to place the 3d model.Using the trained face detector, we process frame by frame to detect the face position.Fig. 13 shows the resulted scene, where the user is wearing a virtual 3D hat.

VI. conclusIon
In this paper, we presented an augmented reality application using the 3D rendering engine Ogre.We experimented the different boosting techniques using the local binary pattern features by applying multiple values for the classifier training task.The chosen approach of using boosting techniques with a real time based augmented reality system has shown a satisfying results and has avoided the end user the burden of wearing any form of markers to interact with the computer.
The perspectives of this work are to use deep learning for hand gesture recognition and to apply the different gestures for an augmented reality applications in the domain of education.The institutions from rural zones suffers from the lack of funding for buying laboratory materials, virtual experiments can come for help to reduce the gap between theory and practice for subjects like physics and chemistry.

Fig. 1 .
Fig.1.The original local binary pattern calculation, the central pixel is compared with its neighbors, the thresholded values are then multiplied by a power of 2 i , where i is the pixel index, the sum of all values results to the LBP code.

Fig. 5 .
Fig. 5. BioID labeled face image sample.The top left corner point of the ground truth face (the region of interest) is obtained by subtracting 10 pixel from the left temple x coordinate, index 8 on the figure 5, and by adding 15 pixel to the highest eye brow point, index 7 in this image example.The right bottom corner point of the region of interest is obtained adding 10 to the right temple x coordinate and taking the tip of chin y coordinate, index 19 on the image , as the y coordinate.

Fig. 6 .
Fig. 6.Variation of the minimum hite rate parameter values for the MB-LBP features based classifiers.

TAble I lIsT of lAbels Indexes And TheIR descRIPTIon Label index Description
Centre point on outer edge of upper lip 18 Centre point on outer edge of lower lip 19 Tip of chin

TAble II .
lists some values used for the training.VARIATIon of The MInIMuM hIT RATe PARAMeTeR To TRAIn Mb-lbP bAsed clAssIfIeRs lists the parameter values used for each feature.Table III lists the different values used for this parameter.

TAble III .
VARIATIon of The MAxIMuM fAlse AlARM PARAMeTeR To TRAIn Mb-lbP clAssIfIeRs