A Low Cost and Computationally Efficient Approach for Occlusion Handling in Video Surveillance Systems

In the development of intelligent video surveillance systems for tracking a vehicle, occlusions are one of the major challenges. It becomes difficult to retain features during occlusion especially in case of complete occlusion. In this paper, a target vehicle tracking algorithm for Smart Video Surveillance (SVS) is proposed to track an unidentified target vehicle even in case of occlusions. This paper proposes a computationally efficient approach for handling occlusions named as Kalman Filter Assisted Occlusion Handling (KFAOH) technique. The algorithm works through two periods namely tracking period when no occlusion is seen and detection period when occlusion occurs, thus depicting its hybrid nature. Kanade-Lucas-Tomasi (KLT) feature tracker governs the operation of algorithm during the tracking period, whereas, a Cascaded Object Detector (COD) of weak classifiers, specially trained on a large database of cars governs the operation during detection period or occlusion with the assistance of Kalman Filter (KF). The algorithm’s tracking efficiency has been tested on six different tracking scenarios with increasing complexity in real-time. Performance evaluation under different noise variances and illumination levels shows that the tracking algorithm has good robustness against high noise and low illumination. All tests have been conducted on the MATLAB platform. The validity and practicality of the algorithm are also verified by success plots and precision plots for the test cases.


I. Introduction
I N recent times quest for development of intelligent security systems has become the need of the hour for making residential and office premises safer. Due to increasing threat from different types of activities leading to breach of security, it has become impossible for conventional security systems to detect such activities and alert security system in advance.
Thus, an increasing reliance on surveillance systems has resulted in need for better target detection and tracking techniques. Methods such as Radio Frequency Identification (RFID) tracking are not useful in preventing above mentioned situations; hence there is a need of wide area surveillance. Target tracking via image processing for a video surveillance system provides an attractive solution, which can efficiently track a specific target, record its position throughout the video stream and also analyze its motion pattern.
Video tracking is the process of locating a moving object (or multiple objects) over time using a camera. It has a variety of uses, some of which are; human-computer interaction [1], security and surveillance [2] [3], video communication and compression [4], augmented reality [5], traffic control [6] [7], medical imaging [8] [9], video editing [10] [11], multimedia contexts [12] [13], complex object movements [14], video streaming [15], healthcare systems and smart indoor security systems. Video tracking is a time consuming process due to the amount of data that is captured and needs to be processed.
Further, the algorithm complexity increases if object recognition for tracking is also involved. The objective of video tracking is to maintain detectability of target object in consecutive video frames. To perform video tracking, an algorithm analyses sequential video frames and outputs the movement of targets between the frames. There are a variety of algorithms, each having its strengths and weaknesses. Considering the intended use, it is important to choose the algorithm best suited for the serving the purpose. Traditional tracking algorithms involve foreground extraction of the moving target from a static background and then tracks the coherent blobs of the target. Though, these algorithms are computationally efficient but track all the vehicles which exhibit motion in the stream. Similarly, other tracking algorithms like optical flow techniques as discussed in [16] and wavelet based vehicle tracking as illustrated in [17] also track all the vehicles which exhibit motion or all that are similar in appearance.
The lighting condition varies throughout the whole day (depending upon the weather condition for outdoor and lights for indoor). Low light conditions result into poor discrimination of objects from their background and sometimes lighting condition causes shadows or white-out effect. Most of the background subtraction techniques are sensitive to illumination change and it is difficult to handle the shade and shadow caused by the illumination change. Most algorithms which are able to handle these situations, need time on the order of several frames to estimate and train the background model [18]. Less work has been done in direction of tracking a specific vehicle throughout the video stream even in presence of other moving and similar in appearance vehicles, which is essentially the need for tracking an unidentified target.
The algorithm proposed in this paper has been developed keeping in mind the problem of tracking an unidentified vehicle entering a secured premise, such as a college campus. A basic complexity present in such situation is possible presence of other vehicles. The requirement thus becomes to track only the vehicle classified as unidentified and not the other vehicles irrespective of the fact that they are in motion or not. It is also required to track the target vehicle even through any possible occlusion and detect it post occlusion. The above mentioned problem can be divided into three different image processing procedures: Classification of vehicle if it is unidentified or not, target vehicle detection, target vehicle tracking, occlusion handling (if any). Vehicle can be classified as unidentified by a license plate reader as discussed in [19] which has been an established and widely used technology. The proposed algorithm, offers the approach how to detect, track and handle occlusion irrespective of presence of other vehicles in the videostream. The problem at hand has been solved by adopting Kanade-Lucas-Tomasi (KLT) feature tracker with Cascaded Object Detector (COD) assisted by Kalman Filter (KF) which switches over the control of tracking according to the situations encountered. KLT tracker controls the tracking during no occlusion situations whereas COD takes over the flow when the target vehicle is under occlusion assisted by KF. This approach thus facilitates the realization of a smart vehicle tracking system and handles occlusion also. The algorithm has been tested successfully for tracking a single vehicle in 6 different cases of increasing complexity to test detection, tracking and occlusion handling capability of algorithm. Noise and Illumination variations for all cases have also been considered to test overall robustness of the algorithm.
The remainder of this paper is organized as follows. Section II describes related work in the area of the object tracking. Section III describes different approaches in vehicle tracking. Section IV introduces our proposed system. Experimental results are discussed in Section V. Finally, Section VI offers our conclusions.

II. Related Work
In [20], Borisova et al., proposed target tracking based method on object's shape when target has low contrast when compared to background. Maresca et al.,in [21] proposed the Matrioska tracking framework using Oriented FAST and Rotated BRIEF (ORB) features thus minimizing computational cost over Speeded Up Robust Features (SURF) features. Vehicle tracking using Fractional Feedback Kalman Filter was proposed by Kaur et al.,in [22] to improve the Kalman gain over traditional Kalman Filter. Feature tracking has been discussed by Ali et al.,in [23], to achieve multi object tracking of humans. In [24], Baheti et al., discussed an automatic object tracking by a combination of Scale Invariant Feature Transform (SIFT) and Random Sample Consensus (RANSAC) which makes tracking invariant to translation and geometric transformations. Xia et al.,in [25] reliably track a target object in far field using SIFT features and RANSAC.
The real-time tracking with high accuracy in surveillance system is a challenging task. Most of the methods such as SANet [22], HCF [26], MDNet [27], SRDCF [28], MEEM [29], SINT [30], though give high distance precision but due to the computational load, these cannot be utilized in real-time tracking scenarios. In [31], a tracking method is proposed based on the object matching in every frame. It gives highly accurate results but comes with limitations on tracking speed which is due to the computational load in deep feature extraction requirements in each frame. On the contrary, Kernelized Correlation Filter (KCF) [32] tracks fast but it is unable to give much accurate results in short period of time [33]. Similarly, the tracker in [34] requires many frames (normally 6) for initialization and tracks at low speed of 2 fps [35].
Fan and Ling [33] presented a parallel tracking and verifying framework, which consists of two major components, tracker and verifier. Tracker tracks in real time and verifier which runs at specific frame interval instead of each frame, check the tracking results and correct tracker if it is needed. Tracker adjusts the results according to the feedback provided by the verifier. But this method is highly dependent on value of frame interval for verification. If it is small computational load increases otherwise if it is large, the chances to deal the case of occlusion reduces. But our proposed algorithm, take necessary steps at the instant when object goes through occlusion. Kalal et al. [35] divided the tracking problem into tracking, learning and detection. Tracker provides the training data from learning component which estimates detection error, is used for updating the detector and the detector initiates again when tracker fails.
Correlation filter requires the information of object as well as the background or negative training data. In [36], to start a tracking process and to gather more information, an image patch is first cropped from the first frame as a sample patch got from initial object position and size information. There patch size is set to 2.5 times of the object size to provide context information about the foreground and background.
Li et al. [37] proposed a correlation filter tracker which also considers background hard-negative patches. Fully adaptive cluster method-Affinity Propagation (AP), is applied as a Background patch selection strategy. For AP method, clustering is executed with the help of real-value similarity matrix. In iteration process, cluster numbers and cluster centers are given. It uses one layer of Convolutional Neural Network (CNN). Although the accuracy is high in terms of success and precision rate, the computational load in training process is very high.
A tracking system has been discussed in [38] where a mixture of particle filter, and background modeling to track a group of people is used. Background modeling is helpful when application requires multiple targets to be tracked. An occlusion handling problem has been discussed in [39] using Gaussian background estimation but it does not track a single vehicle under occlusion. Single vehicle tracking under occlusion was also attempted to be solved in [40] where single targeted tracking has been achieved using color segmentation for segmentation of a particular vehicle and locking mechanism. Color segmentation however poses for high possibility of false matches as there can be multiple targets of same color.
Object tracking and data association comprise a set of computer vision techniques which deal with generation of the path and its trajectory in image plane by finding and locating the position of the object in every frame of the video sequence. A considerable research work has been done in both transportation and non-transportation applications. Different vehicle tracking approaches can be classified as follows.

A. Model Based Tracking
Three-dimensional model-based vehicle tracking algorithms have previously been discussed extensively in [41] and [42]. The techniques rely on recovering trajectories and models with high accuracy for a small number of vehicles. A major disadvantage of these approaches is the reliance on mathematically detailed geometric object models. It is not feasible to expect to have detailed models for all vehicles that could be found on the roadway as they can be so diverse in their geometry. A more practical approach will be to use template matching which has been discussed later as it relies on correlation and relative scores.

B. Region Based Tracking
In this approach, tracking system identifies a coherently moving connected region in the image, a `blob', associated with each vehicle and then tracks it over time [43]. Typically, the process is facilitated by the popular background subtraction technique. Foreground vehicles are detected by subtracting the input image frame from the current background estimate, looking for pixels where difference image is above some threshold and then finding connected components. This technique to track targets is widely used due to its low complexity.
However, this technique is more suited for the applications where all the moving targets are to be tracked, for example in studying flow of traffic. The problem discussed in this paper requires a single vehicle to be tracked even if other vehicles exhibit motion in the frame.

C. Active Contour Based Tracking
This tracking technique is based on active contour models, also known as snakes. This approach is similar to region based tracking. The underlying principle is to retrieve a bounding contour of the object and update it dynamically [44] [45]. According to the contour position in previous and current frame the object is tracked. This technique requires some overlapping of the object region in between previous frame and the current frame. This technique, however, generates significant measurement errors when the target undergoes an occlusion. Also, the problem of tracking the targeted vehicle due to the presence of other vehicles still remains unresolved since active contour method extract fine details about the boundary of the objects with respect to all the background disturbances.

D. Kernel Based Tracking
Kernel by definition means "the central part of something". When it comes to object tracking, kernel is the central component of the target being tracked. A kernel based tracking algorithm is discussed in [46] in which an isotropic kernel is marked with the target spatially to generate the similarity function and according to that object is localized in an image. Recognition and classification based methods such as template matching, cascade object detection and support vector machine are among the kernel based tracking methods. All of these methods provide the center point of the rigid body which is to be tracked.
These methods provide an attractive option for tracking a single vehicle as they return a single best match; however, this best match is heavily based on the appearance of the target. Since, vehicles like cars have similar appearance, the possibility of false match increases. Also, applying search on every frame over all the pixels increases the computational cost unless search for the target can somehow be made local.

E. Feature Based Tracking
A different approach for tracking an object is feature based tracking which involves finding distinguishable lines, points and corners, considered as features of the target object in the current frame and then matching these features in the next frame. This matching allows the computation of motion model of the vehicle which can be used for tracking the vehicle [47] [48]. This method has many advantages. It stabilizes the computational cost of the system. In the presence of partial occlusion some features of the moving vehicle remain visible. Also, feature tracking is possible in low illumination.
From the review of related work, it is found that it is difficult to achieve high precision in real-time tracking applications. Moreover, post occlusion detection accuracy is further affected in such situations. From the forgoing discussion we also concluded that a simultaneous operation of different algorithms is required to track a single targeted vehicle. On comparing all the approaches of tracking approaches feature based tracking is a promising choice as it stabilizes computational cost, does not require foreground extraction and doesn't require detailed geometric models.

III. Proposed System
The paper presents an approach to track an already classified unidentified vehicle and to handle its occlusion. Our paper proposes utilization of IP camera using a local dedicated network having wireless routers and mobile hotspots to capture the scene unlike in [42] where frame capturing is done via IP camera using an internet connection which makes the system speed and network connectivity dependent.
We propose to detect the target using spatially local template matching technique as discussed later which returns only one detected target whose features are then tracked without any locking mechanism thus decreasing the computational cost of the system. Another tracking technique has been discussed in [49] using KLT feature tracker over SURF features. Though SURF features are reliable, single targeted tracking requires maximum features to be extracted. As discussed in [50], a much higher number of features are extracted using Minimum Eigen Value of interest points used in our algorithm as compared to the number of features extracted using SURF algorithm.
In Fig. 1, the flow chart elaborates the algorithm operation, comprehensively. In the last condition of flow chart, STOP signifies that the bounding box around the targeted vehicle goes out of the frame. Then the system is reset and all the previous values are deleted.
Unidentified vehicle tracking problem is divided in terms of different situations that are required to be handled.

A. Detection In First Frame
In case of vehicle classified as unidentified, camera suitably positioned takes over its tracking operation. In order to be tracked it is essential to identify the location of this vehicle in the first frame correctly. The detection is done using template matching method by dividing the frame into a number of square windows and obtaining some score for each window such as suggested in [51]. However, searching the whole frame for the template can result in false detection due to possible presence of other vehicles. This issue can be resolved by searching the template locally instead of searching it globally. Thus template search is applied only in the region of pixels where geographical location of the entrance is correlated in the frame, as it can be safely assumed that the target car in the first frame will be in vicinity of the entrance. The template matching algorithm used in our study is based on the sum of absolute differences to find the best match given by (1). (1) where � = Input frame, and � = Template and �, = coordinate of pixel.
The tracking bounding box is constructed around the kernel for the window where (�,�) min is recorded.

B. Tracking in Subsequent Frames
Once the position in first frame is detected successfully, bounding box is constructed on the detected location to indicate the car in output frame. Feature tracking has been recognized as very reliable method to track targets in computer vision. Feature tracking relies on the concept of finding distinguishable points, corners & lines, called features, of the target. These features are then matched in every subsequent frame and a geometric transformation motion model is estimated which facilitates tracking of the target.
The proposed algorithm also detects features inside the bounding box, i.e. on the car, by finding distinguished points and corners. As bounding box is constructed with the help of template matching approach, so it works better than those methods such as TLD [35], LOT [52], DFT [53], CXT [54] which has reliability on exact boundary around the target in first frame or fixed-size patches. These methods degrade their performance on increasing scaling of the patch at initialization and are more sensitive to the background clutter [55]. Features can be detected by various methods as in [56] [57] [58] [59]. Good features are located by examining the minimum Eigen value of each 2 x 2 gradient matrix of every pixel in region of interest as proposed in [60]. This set of feature points is then tracked by the Pyramidal Implementation of the KLT feature tracking algorithm [61] in each subsequent frame which minimizes the residual function , throughout the down-sampled image pyramids, defined in (2) as follows (2) where, is a point on image I, whose location, in subsequent frame J is to be found out.
This algorithm is often used for short-term tracking as a part of larger tracking framework. Number of image pyramids formed by down-sampling previous frame levels is chosen by default to be three. This algorithm also has the capability of marking each feature as either valid or invalid for each frame. Once the feature tracker maps the features from one frame to next, geometric transformation is computed using the similarity projection as given by (3) [62]. (3) Here, [� �] are the coordinates of the feature tracked in first frame, [ ' ' ] are the coordinates of the same feature in the next frame, t x and t y are the translation in x and y axis respectively, θ is the projection angle and � is the scaling factor. For similarity geometric transformation, at least 2 pairs of matched features are required to compute all 4 degree of freedom. Four degree of freedom meant for translation (t x , t y ), scaling (�) and rotation (θ). For affine transformation two more degree of freedom added i.e. aspect ratio and shear. On the basis of transformation model thus computed the tracking marker/bounding box is also transformed forward. Only the valid feature points are considered to estimate the geometric transform of the target.
In the first frame the size of the bounding box is directly correlated to the size of template which in turn is dependent on the geographical position where the camera is installed i.e., the size of template in first frame will be decided on case to case basis during the installation of the system, while in subsequent frames by finding the geometric transform between the feature points using the similarity projection (which encompasses scaling transform) the size of bounding box is adjusted, hence even if scale of vehicle size changes during tracking bounding box size is also scaled. As the tracking progresses over time, points can be lost due to occlusions. If a condition arises where number of valid points fall below a threshold, they need to be reacquired to track the object further. This is the condition of occlusion and needs a different technique to be handled.

C. Occlusion Handling
In computer vision, occlusion is the condition when the target being tracked is hidden by another object in the frame. The problem becomes more complicated when the target disappears and reappears after a brief occlusion and has to be tracked again. Partial occlusion (target size > occluding object's size) and complete occlusion (target size < occluding object's size) cases are shown in Fig. 2. In feature tracking based algorithm like ours, occlusion is characterized by an extensive loss of features as the target hides. To solve the problem of occlusion it is important to redetect and specify new features if the target re-appears again. Redetection in our algorithm is in the form of a degenerate decision tree, as shown in Fig. 3, of classifiers cascaded together and applied to every sub-window of the frame to classify them as a positive or a negative result as proposed in [63]. The classifiers are trained with a dataset of cars named GRAZ-02 as developed in [64]. However, just like the first frame detection, the probability of a false match due to the presence of other cars when applied to the frame globally is high. Again, to eliminate the possibility of a false match it is important to search the target after occlusion locally, only at the expected location of target in the next frame, instead of searching it globally in the whole frame.
However, to predict this expected location in next frame it is necessary to compute a motion model of the target according to the history of its motion. The position of the car is a time dependent state vector. Kalman filter [65] is an efficient recursive computational solution to track a time dependent state vector, like position vector of a car, with equations of motion using least-squares method. Kalman filter assists and computes a state transition model to predict the dynamic position of the tracked target defined by following equations. For each frame is estimated depending on . The model is then corrected using the measurement . This process is then performed recursively to enhance the model's accuracy.
Condition of occlusion is characterized by the absence of . In this condition, is computed for next frame and the classifier searches for positive result only in the enlarged region of interest in the local neighborhood of which makes our algorithm capable of tracking and detecting the object which is moving with slow or fast speed w.r.t. predicted speed and can also handle the drift or movement in upper and lower coordinated points as shown in Fig. 4. The region of interest (ROI) is greater than the maximum size of any vehicle that may be viewed by the camera and it is just enough in size so that it recognizes the vehicle coming out of the occlusion. ROI constructed in our proposed algorithm is 2.5 times greater than the bounding box. If a positive result is retrieved new features are detected in the returned bounding box and tracking continues. As positive result will be achieved for the vehicle, most of the features are of vehicle. But in some critical cases, it can be possible that at the end of occlusion there is the inclusion of features of the occluding objects inside the bounding box, but those features are eliminated as correction is done in terms of KLT feature updating and, at the end, it will track only the object features and static object or occluding object's features get lost. If a positive result is not retrieved, prediction for next frame is computed and classifier searches again in the new region of interest. The process is recursive until vehicle is detected again after occlusion. If in case of occlusion, COD is not applied, then it can track accurately only those vehicles which are exact at the point of prediction xk. But such case is not always possible and extended ROI concept will also not be applicable. One solution is to make buffer storage of cropped image at certain interval, around the bounding box, and match that patch at the instant when occlusion occurs around the predicted point or on the whole frame. But additionally, computational overload will also be increased.
Vehicle can also be tracked as usual if there are object's features, even in those cases where it exhibits the non-linear motion or when linear motion of vehicle in ground plane is projected non-linear in image plane due to the distance between vehicle and image plane.
In Fig. 5, working of KLT tracker, KF and COD is shown in 2 different cases, i.e. with and without occlusion.

IV. Results
The proposed algorithm has been tested on six different cases of increasing complexity. We have used an Android mobile phone camera (13 megapixels) with IP cam application for capturing the video which is connected to a server cum processing system operating on a 1.5 GHz AMD processor, with 4GB RAM and 64 bit operating system, via Mobile Hotspot. Image processing is done on MATLAB 2016 version. Frame size of the video is 1280×720 and more.
Parametric studies of different Gaussian noise variance and different illumination level have also been performed for the no occlusion, partial occlusion and complete occlusion cases (in all cases where other vehicles are also present). In all the cases variance of noise and illumination level have been varied, the error between the coordinates of tracked vehicle in the case under study and the coordinates obtained for the reference conditions, i.e. zero noise and normal day light illumination (approximately 111,000 lux); have been plotted for each frame. The variance has been kept in increasing trend from 0.01 to 0.125, whereas illumination with respect to the reference condition is varied synthetically by reducing it from 100% illumination to 10% illumination. The closer the recorded tracking error is to zero, the better is the tracking for that frame.
We have taken such cases where there are less number of frames (on an average 50 frames) before occlusion to make cases more complex to check robustness of algorithm. Fig. 6 shows the images with varying Gaussian noise variance 0.01(default value), 0.075 and 0.125 with respect to the original scene.
Although the bounding box size (width) will vary in some cases due to the distance of the camera and the vehicle, in our case it was observed that an average size of 150 pixels is the size of the bounding box during tracking. In all the cases, the deviation (error) from ideal case was not more than 14% and therefore a tolerance level has been set at (±) 20 pixels. Even in the case when the bounding box size was as large as 300 pixels, the tracking error was found to be less than (±) 20 pixels which shows the high accuracy of our algorithm.    In the case of Vehicle tracking without occlusion for which error plots are depicted in Fig. 9, system worked perfectly fine when Gaussian noise variance was less than 0.125.With the noise variance crossing 0.125 tracking error increased beyond 20 pixels. However, tracking of vehicle under different illumination levels was found largely illumination invariant under the no occlusion condition as seen in Fig. 9(b).    In the case of Vehicle tracking with partial occlusion, system worked well between the ±20 pixels limit until the noise variance was below 0.075 as illustrated in Fig. 12(a), at 0.1 variance; the system tracks the vehicle accurately before the occlusion but the tracking error increases beyond the ±20 pixels limit after occlusion which indicates that algorithm is more susceptible to errors after occlusion due to classification by cascade classifiers. At 0.125 variance, the system fails to track after occlusion, which is an expected result. As in the previous case, system again proved to be illumination invariant, shown in Fig.  12(b) with a slight error after occlusion but still inside the ±20 pixels limit even when illumination is as low as 10% or 0.1 times the reference illumination. The scene is very complex as it consists of a tree as an occluding object instead of a simple pole. In this scene features not reached threshold instantly, as features not wiped out on a vertical line. Hence, command transfer to the COD is delayed by few frames. In this case, our expanded ROI approach is able to handle these type of cases.  In the case of Vehicle tracking with complete occlusion, our algorithm is capable to handle the occlusion even after short duration of learning and the system is robust to noise until variance value of 0.075. Although the tracking error increases after occlusion, it is in between the ±20 pixels limit for variance value of 0.075. For variance equal to 0.1, the system loses tracking earlier and shows an erratic behavior after occlusion, i.e. obtained position of the target leads the reference position and then after some frames, obtained position starts lagging the reference position of the target, for 0.125, system fails after occlusion due to very high noise as depicted in Fig. 15(a). On varying the Illumination, the system worked fine with a slight tracking error after occlusion until 17% illumination, at 15% illumination level; there was a tracking error of approximately (-)18 pixels from the first frame, which was reduced by Kalman filter during complete occlusion and was reduced to (+)7 pixels after occlusion, as shown in Fig 15(b). At 10% illumination or 0.1 times reference illumination level, tracking error was slightly higher than (-)20 pixels and during occlusion; Kalman filter reduced the tracking error to approximately (-)6 pixels. The tracking error recorded can be attributed to excessively low illumination level. The tracking is judged basically on two parameters-Success and Precision rate. Precision plot represents the percentage of frames which are within the estimated ground truth threshold varying from 0 to 50. Precision score is calculated where difference of tracked object location and ground truth location is within the distance of 20 pixels. Success plot represents the percentage of frames where overlap score is greater than the threshold. Overlap score, S is calculated as- (6) where, and are the tracked and ground truth bounding box respectively, represents the number of pixels in a region [55] [66]. Success score is calculated for 0.5 overlap threshold. Precision plots and success plots are shown in Fig. 16 and 17 respectively.    From an overall analysis, it can be deduced that the proposed algorithm works best in the conditions where Gaussian noise variance is lower than or equal to 0.075 and illumination is greater than 10% or 0.1 times the reference illumination level.

V. Conclusion
To commercially develop a smart surveillance systems that can track an unidentified vehicle and can also detect if it is passing through some occlusion, it is important to design an algorithm which has the capability to track and analyze a single moving vehicle through a video stream even when other vehicles are also present in the stream. This paper proposed an algorithm which solves the above problem by a coherent operation of different tracking and recognition techniques such as Feature Tracking, Template Matching, Cascade Object Detection and Motion estimation using Kalman Filter. The algorithm was tested successfully for 6 cases of different complexity in real time. The performance was analyzed at high noise and low illumination and the algorithm was found to be exhibiting good performance even at high Gaussian noise with a variance of 0.075 and low illumination of 10%. It is able to handle occlusion in a short interval of learning. The precision and success score are 0.9861 and 1.0 respectively which signifies very good tracking.
Our proposed system is completely wireless and works on real time, even more frames can be handled if we reduce the frame size. It is easy to install and cost effective as it is not using other things such as FPGA boards or server based system. System is resistive to motion blur; capable of handling high frame rate (30 fps or more) depending on the connectivity between IP camera and router/hotspot. We have the flexibility to use it with internet or without internet i.e. If we want to use for the local surveillance we can use Wi-Fi network or mobile hotspot without the internet and if we require the surveillance from far place or any part of the world then we can use the network with internet. Surveillance system is secured as the network made through mobile hotspot/ Wi-Fi is password protected (WPA security).

VI. Future Scope
A series of modifications have to be made in above algorithm to make it more efficient to be implemented in a surveillance system to solve other problems. These include solving the problem of inter vehicle occlusion, introducing PTZ (Pan-Tilt-Zoom) camera to make more efficient tracking so that less number of cameras will be required, communication with other cameras when the tracked object goes out of the field of view of one camera to other camera's field of view. Our developed algorithm can also be utilized to track human and other objects during occlusion.

Rakesh Chandra Joshi
Rakesh Chandra Joshi was born in Champawat, India in 1994. He received the B.Tech. degree from Invertis University, Bareilly, India in 2015. He has completed the M.Tech. degree from Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, India in 2018 and currently, he is perusing independent research in area of image processing. His research interests include Digital Image Processing, Machine Learning and Surveillance Systems.

Adithya Gaurav Singh
Adithya Gaurav Singh has completed the B.Tech. degree from Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, India in 2018. Currently, he is associated with BaseApp Systems & Software Pvt. Ltd. His research interests include Computer Vision, Robotics and Image processing.

Mayank Joshi
Mayank Joshi has completed the B.Tech. degree from Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, India in 2018 and currently, he is in the process of joining for higher studies. His research interests include Robotics, Computer Vision and Embedded Systems.