A Convolution Neural Network Engine for Sclera Recognition

The world is shifting to the digital era in an enormous pace. This rise in the digital technology has created plenty of applications in the digital space, which demands a secured environment for transacting and authenticating the genuineness of end users. Biometric systems and its applications has seen great potentials in its usability in the tech industries. Among various biometric traits, sclera trait is attracting researchers from experimenting and exploring its characteristics for recognition systems. This paper, which is first of its kind, explores the power of Convolution Neural Network (CNN) for sclera recognition by developing a neural model that trains its neural engine for a recognition system. To do so, the proposed work uses the standard benchmark dataset called Sclera Segmentation and Recognition Benchmarking Competition (SSRBC 2015) dataset, which comprises of 734 images which are captured at different viewing angles from 30 different classes. The proposed methodology results showcases the potential of neural learning towards sclera recognition system.


I. Introduction
T HERE is an exponential growth in digital technology associated with the increased application in commerce and telecommunication throughout the world which demands a strong authentication system. Consequently, there is a constant search for novel and secured system for authentication. In this regard biometric system of identification and authentication is getting more popular due to its reliability and uniqueness. Biometric authentication to individuals premise that everyone is unique with their physical or behavior traits which includes face [1], [2], iris [3], [4], fingerprints [5], finger vein patterns [6], palm prints [7], etc. Sclera is one such trait which is gaining lot of attention in research community. Sclera is the white outer layer of the eyeball as shown in Fig. 1  Sclera trait has emerged as most promising to complement other traditional traits. This is because sclera is a highly-protected portion of the eye which is very difficult to forge [8].The vessel pattern of the sclera is observed to be never the same for any two individuals, including the identical twins, and the pattern even differs with the left eye and the right eye in each individual. Authentication of a person by the vessel pattern of the sclera is possible due to its high degree of randomness and uniqueness. Hence, it is necessary to develop an efficient and robust scleral pattern recognition system for person authentication. This can be tackled by applying deep learning techniques such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Recursive Neural Networks, etc. Convolution Neural Network (CNN) and its approaches have become most talked topic in the field of deep learning. The neural learning model solutions have been most successful than ever before by making the computer learn the model and yield effective results especially for image segmentation [29] and detection [30]. This carves a breakthrough by providing state of the art results to many applications such as face recognition [9], object detection [10], driver monitoring system [11], etc.
The rest of the paper is structured as follows. Section 2 reviews the background work, section 3 presents the proposed model engine for recognition, followed by discussion on dataset and result analysis in section 4 and later the paper concludes with future scope in section 5.

II. Background
Biometric trait has been integrated to a wide spectrum of applications and services. Sclera trait is emerging as it has more promising trait to complement the traditional traits, this is because of its pattern uniqueness and randomness. Hence this demands to develop an efficient and robust scleral pattern recognition system for person authentication. Over the past decade, researchers have proposed different algorithms for segmentation and recognition. However, there are many unsolved challenges which paves way to open research in this area. The premise of sclera authentication system is segmentation and recognition. Typically when the image is taken as input, it is preprocessed to get rid of occluded portions of the eye and segmentation process extracts the relevant data objects from the image [12]. Further, the selective region of interest is subjected to feature extraction and is thus matched against the knowledge base for recognition. Fig. 2 depicts the overall process of sclera segmentation and its recognition. A lot of importance has been given to segmentation stage, where in a lot of studies such as TASOM (Time Adaptive Self-Organizing Map)-based active contour [13], [14] method proposed to get the inner boundary of the sclera. For non-ideal conditions, Crihalmeanu et al., [15] used K-means clustering to segment the sclera by matching conjunctival vasculature of an eye. For the colour images [16], the HSV model was used to segment the eye images automatically. The prominence of the blood veins in the sclera region is not striking, for which image enhancement plays a vital role in recognition system. In [17], the authors used Histogram Equalization (HE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) for enhancing the images, and later combined the methods with K-means and FCM for which it was found that HE with K-means had better precision rate when compared to CLAHE with K-means and FCM. An adaptive Histogram Equalization was applied to the green layer of the colour image in order to get enhanced sclera vein pattern in [18] - [21]. The features of interest for effective sclera recognition build an authentic mathematical model for authentication and identification purposes. In [18], the Discrete Cosine Transform (DCT) and wavelets were used for feature representation. A Local Binary Pattern (LBP) feature was used for extracting texture based feature and the effect of feature fusing in sclera biometrics was studied in [22]. The template matching algorithm was proposed for classification in [16]. A template based matching was introduced using hamming distance for classification in [23], [24]. For recognition, a benchmarking competition was organized to record the recent advancements in recognition techniques where the winning team achieved 72.56% accuracy in eye recognition [25]. From various literature, it is observed that many challenges are being addressed by various researchers to improve the accuracy of the recognition system. Convolution Neural Network (CNN) in recent times has set benchmarking results on various computer vision tasks and applications including the biometrics [26], [27]. There are various advanced deep learning architectures like GoogleNet, AlexNet, ResNet, etc., applied to various classification problems. Authors in [32] present a simple unsupervised convolutional deep learning network called PCANet for image classification in cascaded way to learn multi stage filter banks which are further processed to binary hashing and block histograms for indexing and pooling. For hyperspectral image classification, a new model R-VCANet was introduced in [33] which reveals better results even though if there exists less training samples. This method used the inherent properties of HSI data like spatial information and spectral characteristics in order to improve the feature expression of the network. Authors in [34] present a CNN model to embed visual words in images. This method treated images as textual document, built visual words and embedded them to capture the spatial context surrounding them. This method performed better than the original images and showed how to bridge between embedded text data and how to adapt for visual data in a wide range of generic image and video applications. All the above mentioned works have attempted using deep learning networks for image classification. However for sclera recognition, there are no findings of usage of deep learning in general and convolution neural nets in specific. This brings us to explore the neural model to next level by proposing Convolutional Neural Network Sclera Recognition Engine (CNNSRE) for sclera recognition system. The proposed method is a simplest form of deep learning model. Hence there are lot of potential avenues that can be explored using neural model for building sclera recognition system.

III. Proposed Model
The proposed model is first of its kind to propose CNN based approach for a sclera recognition system. Although it appears to be at the initial level of deep learning architecture, unlike the traditional approaches, this gives the opportunity to testify the power of deep learning towards sclera biometric system. As there are no attempts made towards recognition process of sclera biometric using deep learning models, we take the initiative to explore and learn how the proposed model works for sclera recognition system. The implementation of this model is done using keras [31] open source package in python language.

A. First Convolutional Unit
The CNN model requires all the sample images to be of the same size. In this regard we rescaled the images to 373 X 921 X 3, and this is done by considering the minimum height of all the images which is 373 and the minimum width of all the images which is 921. The first convolutional unit consist of 10 filters of size 3 X 3 X 3. The 10 filters is due to the limitation in availability of computational resources. The depth of the 10 filters is 3 because of the input image, which is an RGB and the depth of an RGB image is 3. To retrieve the same dimension of the images in the next layer, we zero-pad the input image before convolving the filters on the image of size 373 X 921 X 3. Another parameter is striding which is kept as 1 because there could be chances of missing important features in the input image. The output after convolving the filters on the image will be of size 373 X 921 X 10. The output's height and width are same as input because we zeropadded the image and depth as 10 as we are using 10 filters. Equation (1) shows the dot product that happens when we convolve the filters on the image.

(1)
Where, W is the weight matrix of the filters x is the pixel value of the image b is the bias After convolving the filters, the output of dimension 373 X 921 X 10 is fed to the Rectified Linear Unit (ReLU) activation unit. Choosing ReLU as an activation function over sigmoid and Tanh is because sigmoid has a problem called as vanishing gradient problem. The gradient could approach to zero and no weight update could take place resulting in no learning. The other problem of the sigmoid is that it has an exponential term which increases the complexity. Tanh activation function solves the zero centered problem but not the important vanishing gradient problem. However, ReLU solves above problems and hence this is being used in our proposed model.

(2)
The range of the ReLU activation function lies between [0, ∞] and the equation for the same is given in (2). This will give zero for negative numbers and for the rest, the output will be same as input. After employing the ReLU activation function the image is now fed to second convolution unit.

B. Second Convolutional Unit
The second convolutional unit is similar to the first convolutional unit. The only change is in the number of filters used and an addition of max pooling layer. As earlier, at first we convolve the filters over the output of the previous convolutional layer. The output image from the first convolution layer is of dimension 373 X 921 X 10. In this unit, we are increasing the filter count to 20. The depth of each of the 20 filters in this layer is set to 10, because the depth of the output of the previous layer was 10. In this unit the stride parameter for convolutional filters is kept as 1. The input is zero padded so as to get the output which is of dimension 373 X 921 for each filter. The output thus after convolving the image will become 373 X 921 X 20 as there are 20 filters. The input image is then fed to the ReLU activation unit as done in the previous layer. Further getting the output from the ReLU activation unit, the output is inputted to the 2 X 2 max pooling layer with stride 2. This is because the image would get overlapped with the previous image if the stride is kept to 1. Therefore, the output of the second convolutional unit becomes 187 X 461 X 20. This change in dimension is because of max pooling layer as shown in Fig. 4. This figure depicts a small max pooling operation for the 2 X 2 grid. In the 2 X 2 max pooling operations, the whole image is divided into 2 X 2 grids because 1 X 1 keeps the same dimension and 3 X 3 could down sample the image in large scale. From each grid maximum pixel unit is taken i.e., 158 from first grid, 167 from the second grid, 137 from the third grid, and 152 from the last grid forms a new image after 2 X 2 max pooling.

C. Third Convolutional Unit
The third convolutional unit is similar to the first convolutional unit. As earlier, at first, we convolve the filters over the output of the previous convolutional layer. The output image from the second convolution unit is of dimension 187 X 461 X 20. In this unit we are considering the filter count as 10. The depth of each of the 10 filters in this layer is set to 20, because the depth of the output of the previous layer was 20. Thus in this unit the stride parameter for convolutional filters is kept as 1. The input is zero padded so as to get the output which is of dimension 187 X 461 for each filter. The output thus after convolving the image will become 187 X 461 X 10 as there are 20 filters. The input image is then fed to the ReLU activation unit as done in the previous layer. After getting the output from the ReLU activation unit, the output with dimensions 187 X 461 X 10 is fed to the fourth convolutional unit.

D. Fourth Convolutional Unit
The fourth convolutional unit is similar to the second convolutional unit. At first, we convolve the filters over the output of the previous convolutional layer. The output image from the second convolution unit is of dimension 187 X 461 X 10. In this unit we are considering the filter count as 20. The depth of each of the 20 filters in this layer is set to 10, because the depth of the output of the previous layer was 20. In this layer, the stride parameter for convolutional filters is kept as 1. The input is zero padded so as to get the output which is of dimension 187 X 461 for each filter. The output thus after convolving the image will become 187 X 461 X 20 as there are 20 filters. The input image is then fed to the ReLU activation unit as done in the previous layer. After getting the output from the ReLU activation unit, it is inputted to the 2 X 2 max pooling layer with stride 2. If stride was not 2, then the image would get overlapped with the previous image (if the stride was kept to 1). Therefore, the output of the second convolutional unit becomes 94 X 231 X 20. After flattening the 94 X 231 X 20 output to 434,280 dimensional vector, it is fed to the input of the fully connected unit.

E. Fully Connected Unit
This is the fifth unit of the proposed model. Here the output of the previous layer is taken as input to the Fully Connected Layer (FCL) of 434,280 neurons. This is because of converting the matrix of dimension 94 X 231 X 20 into a vector. The output of fully connected layer is fed to softmax layer which outputs 30 scores because of 30 classes i.e., 30 different individual's data. The purpose of using the softmax function as the loss function, is to convert the linear inputs to probabilities i.e., to range the scores between 0 and 1 which is easier to compute.
The softmax function is given in (3): Where exp(y i ) is the exponential function and i denotes the i th class. The denominator exp(y j ) is the summation of exponential scores of all the classes. After calculating the loss, the model back propagates the errors and update the weights. The number of epochs was set to 100 to train the CNNSRE to achieve the highest accuracy of 87.65%. When the epochs was set above 100, the model met its threshold due to the computational power, and when the epochs was set below 100, the results was degraded. Table I presents the characteristic of CNNSRE which describes the sclera recognition engine with its number of layers, type of layers, number of filters, dimension of feature maps, kernel size, striding and padding.

IV. Dataset and Result Analysis
To validate the strength of the proposed method, experimentation is carried on Sclera Segmentation and Recognition Benchmarking Competition (SSRBC2015) dataset [28]. This dataset contains 30 individuals' eye images of different cases such as blurred, blinking and closed eye. For every individual eye, multi-angle images are captured in different views like center, left, right and up. Thus, dataset in total consists of 734 segmented eye images from all 30 classes. The computational complexity is cubic for training and linear for testing. The determined complexities for training time is shown in (4) and testing time in (5). Where, n is number of test images. Table II presents the precision, recall, False Rejection Rate (FRR), Genuine Acceptance Rate (GAR), and False Acceptance Rate (FAR) for individual classes of the dataset. The model was run for 100 epochs because of computational constraints for which the result achieved a maximum of 87.65% accuracy rate and an average of 83.76% accuracy rate and a least of 81.17% accuracy rate. The variation of accuracy is due to the random initialization of weights in the model. The model was again tested against the scores of precision, recall, FRR, GAR, and FAR. It is been very interesting to notice the precision score of 0.86, for the sclera recognition. GAR scored 0.85 on an average, and FRR scored 0.13 and FAR scored 0.015. This shows that deep learning models can perform better and show less error rate for sclera recognition. Table III presents the results of the proposed model over the existing models presented by different authors in SSRBC 2016 competition. The table presents only two works, this is because of very limited work towards recognition. Among the two works, the participating team Sl. No.1 from the table III proposed KNN based sclera recognition system by two simple steps of feature extraction and matching. For feature extraction they used Histogram of Oriented Gradient (HOG) descriptor to extract the features and match them against the training tuples by comparing a given test tuple with training tuples which are similar to it. The other team Sl. No.2 uses multiclass feature-based classifier for recognition. They employed a 2D Gabor filter, which down sampled the feature vector to size 720, and KNN for classification for which resulted in a hike of 1.56% when compared to  the first team. The proposed model using CNN exploits the knowledge of self learning by training its network architecture i.e., at every layer the images are convolved and output result is fed to the next layers. By doing so the model learns to correct its errors by updating the weights which happens by back propagation. This repetition helps the model to train effectively by correcting the errors and yielding better results. This kind of approach is possible through deep learning architectures.
In the traditional approach like KNN, the testing time is more complex rather than training. However using CNN in biometrics the testing should be faster than training for ease of applications.

V. Conclusion
This work is first of its kind in applying convolution neural network towards sclera recognition which showcases the potential of neural learning to biometric applications. It is found that there are good amount of works done towards segmentation with competition results. However there are limited contributions in sclera recognition system. Hence, in this work we presented CNN model with four convolution layers and with the limited computational resource, the result table shows that the maximum accuracy achieved is 87.65%. The CNN model could be trained more by providing increased learning data, with more classes and the results could be improved by adding more layers in to the model and randomizing the weights with more computational power. The sclera recognition with deep learning concepts being successful in this work shall be able to open plethora of opportunities to researchers to work on recognition model in near future. Further the results can be enhanced for more accuracy by exploring, training and developing complex neural models for both segmentation and recognition. In future, we also intend to exploit the concept of CNN with neuromorphic models with CNN networks. This hybridization will be responsible to capture more realistic sense of data which inturn improvises the results to the best extent.