Modified Three-Step Search Block Matching Motion Estimation and Weighted Finite Automata based Fractal Video Compression

— The major challenge with fractal image/video coding technique is that, it requires more encoding time. Therefore, how to reduce the encoding time is the research component remains in the fractal coding. Block matching motion estimation algorithms are used, to reduce the computations performed in the process of encoding. The objective of the proposed work is to develop an approach for video coding using modified three step search (MTSS) block matching algorithm and weighted finite automata (WFA) coding with a specific focus on reducing the encoding time. The MTSS block matching algorithm are used for computing motion vectors between the two frames i.e. displacement of pixels and WFA is used for the coding as it behaves like the Fractal Coding (FC). WFA represents an image (frame or motion compensated prediction error) based on the idea of fractal that the image has self-similarity in itself. The self-similarity is sought from the symmetry of an image, so the encoding algorithm divides an image into multi-levels of quad-tree segmentations and creates an automaton from the sub-images. The proposed MTSS block matching algorithm is based on the combination of rectangular and hexagonal search pattern and compared with the existing New Three-Step Search (NTSS), Three-Step Search (TSS), and Efficient Three-Step Search (ETSS) block matching estimation algorithm. The performance of the proposed MTSS block matching algorithm is evaluated on the basis of performance evaluation parameters i.e. mean absolute difference (MAD) and average search points required per frame. Mean of absolute difference (MAD) distortion function is used as the block distortion measure (BDM). Finally, developed approaches namely, MTSS and WFA, MTSS and FC, and Plane FC (applied on every frame) are compared with each other. The experimentations are carried out on the standard uncompressed video databases, namely, akiyo, bus, mobile, suzie, traffic, football, soccer, ice etc. Developed approaches are compared on the basis of performance evaluation parameters, namely, encoding time, decoding time, compression ratio and Peak Signal to Noise Ratio (PSNR). The video compression using MTSS and WFA coding performs better than MTSS and fractal coding, and frame by frame fractal coding in terms of achieving reduced encoding time and better quality of video.


I. IntroductIon
V Ideo compression techniques deal with the lossy or lossless data compression for the series of image sequences. Gray and color image sequences are the customers for the video compression techniques. From the preprocessing point of view different color spaces [1] can be used for the image sequences. With the recent development in the area of internet and multimedia technology with moving video, a color image sequences processing plays an important role. Each color plane is represented by 8 bits/pixel in a RGB color space therefore a RGB color image is represented by 24 bits/ pixel. The intra-frame and inter-frame coding is used to reduce the spatial and temporal data redundancy present in the image sequences.
Block matching motion estimation plays an important role in interframe coding technique to reduce temporal redundancy present in the series of image sequences. Therefore it is a most popular and efficient technique for computing the motion vectors which has been used by various coding standards. Video coding standards are related to the organization, such as ITU-T Rec. H.261, H.263, ISO/IEC MPEG-1, MPEG-2, MPEG-4, and recent progress is H.264/AVC. Different block searching approaches are available for motion estimation [2][3][4][5][6][7] between the successive frames. A block-matching algorithm [5][6] can be used to improve the quality of video and performance of coding process [7].
Block-matching algorithms are the most successful approaches for motion estimation in the video compression technology because of it easy to understand and with some efforts it can be implemented easily. The simplest full search (FS) block matching algorithm provides a full exhaustive search within the search window for searching the optimal solution. Therefore to reduce the computational cost of FS, many block matching motion estimation algorithms are proposed such as three step search [8], 2D logarithmic search [9], orthogonal search [10], cross search [11], binary search [12], new three-step search [13], the fourstep search [14], the block-based gradient descent search [15], the diamond search [16], the cross-diamond search [17], efficient three step search [18], etc. All these search algorithms employs a rectangular and diamond search patterns having the center-biased motion vector distribution characteristics [19][20]. Hexagon-based search algorithm which employs a two hexagon-based pattern i.e. large and small, and results in fewer search points is proposed [21]. Novel Cross-Hexagon based Search algorithm [6] consists of two cross shape patterns and two hexagon-based patterns. Modified partial distortion criterion (MPDC) used a certain block of pixels which improve the computations [22]. New Cross-hexagon search (NHEXS) algorithm [23][24] consist of two cross search patterns and hexagon search patterns which is similar to fast block-matching Motion Estimation [6].
Block matching Motion estimation is widely used in the video application areas such that rain pixel recovery for videos, video compression, medical video processing [25][26], object tracking and surveillance [27], etc. Motion estimation means displacement of pixels position from one frame to another frame which gives the best motion vector (MV) [28]. In Block matching algorithm, each frame is divided into a fixed sized macro blocks and each corresponding macro blocks in the current frame are then compared with the adjacent neighbor macro blocks in the previous frame to estimate a motion vector (MV). The estimated MV specifies the displacement of a macro block from one position to another position in a previous frame as shown in Fig.  1. Motion vector for the block Bf(x, y) is computed as (+1,-1) i.e. MV-Bf(x,y) = (+1,-1). Still there is a scope of improvement in this field for implementing the fast block matching motion estimation algorithm. Using Weighted Finite Automata (WFA) the weights are assigned to state transitions proposed by [29][30], WFA provide a powerful tool for image representation as a WFA and compress the image in term of a good compression ratio. The inference algorithm for WFA subdivides an image into a set of non-overlapping range blocks and then separately approximates each range block with a linear combination of the domain block. WFA coding techniques is based on the idea of fractal coding that an image has self-similarity in itself [31][32][33][34] i.e. WFA coding is similar to the fractal coding approach. This paper discusses a modified three step search algorithm for block matching motion estimation and WFA coding approach for color video compression. The paper is organized as follows. Section 2 reviews Three Step Search algorithm. New cross hexagonal search algorithm discussed in section 3. The proposed MTSS presented in section 4 and explains the video compression process using MTSS and WFA coding. Section 5 presents experimental results of proposed MTSS approach in comparison with TSS, ETSS and NTSS block matching algorithm and further the results of proposed MTSS and WFA coding is compared with MTSS and fractal coding. Finally, Section 6 presents the conclusion and future scope based on proposed work.

II. three Step Search algorIthm
Three step search (TSS) proposed by Koga et al. [8] is one of the earliest attempt to implement a fast motion estimation algorithm. The computational cost of this algorithm is less in terms of MAD matching criterion, average search point and computation required as compared to the full search algorithm. In this algorithm first step is to define a fixed size of 16 ×16 macro block, searching parameter p pixels in all four directions and 9×9 search window in the central part of the 16 ×16 macro block for searching the best match. The TSS algorithm is summarized as follow: Step 1: Plot 9 points in the search window at the equal distance of step size s=4 and checked 9 points on the 9×9 search window. .
Step 2: Step size is divided by 2 i.e. s=2 and check 8 points to generate 5×5 square shape pattern. If minimum block distortion measure (BDM) is one of the nine points of generated search window then this point consider as a center point in the step 3.
Step 3: Step size is divided by 2 i.e. s=1 and check 8 points to generate 3×3 square shape pattern and search will be terminated. The minimum BDM point found at the 3×3 square shape pattern is the final motion vector. Fig. 2 shows the two different search path of TSS for estimating a motion vector within search window. TSS can be easily extended up to n-step search for larger search window. The number of checking points required for TSS is 25.

III. new croSS hexagonal Search algorIthm
Kamel Belloulata et al. [24] proposed a novel fast block matching algorithm called new cross hexagon (NCHEXS) pattern based search using two small/large cross shape(SCSP/LCSP) pattern as a first three initial steps and two small/large cross hexagon shape pattern (SHSP/LHSP) as a subsequent steps of search. In the beginning of this algorithm, initialize SCSP by plotting 5 points at the center of 16×16 macro blocks [28]. The algorithm is summarized as follow: Step 1: If minimum BDM point found at the center of the small cross shape pattern then stop searching otherwise go to step 2.
Step 2: If minimum BDM point found at the center of newly formed small shape pattern then stop searching otherwise go to step 3.
Step 3: Check the another 3 unchecked points of large cross pattern and 2 unchecked points of the square center biased to show the best possible direction for the hexagonal search.
Step 4: A new large hexagonal shape pattern is formed by considering a center point as minimum BDM point found in previous step. If minimum BDM point found at the center of new large hexagonal pattern then go to step 5 otherwise again form a new large hexagon pattern i.e., repeat step 4.
Step 5: If minimum point found at the center of large hexagonal pattern then large hexagonal pattern shifted to small hexagonal pattern and find best motion vector in small hexagon shape pattern Fig. 3 shows the search path of NCHEXS. The number of checking point required for NCHEXS is from 5 to 8 for best case which is better than all the techniques available for estimating a motion vector.

IV. propoSed approach
The main objective of proposed work is to develop a mechanism for color video compression based fractal coding technique using Modified three-step search (MTSS) and weighted finite automata (WFA) coding. Therefore, fractal coding based color video compression using MTSS block matching algorithm and WFA coding is proposed.
In color images, Each R, G and B components contains 8 bit data and also every color image contains lots of redundancy. In this approach, each frame is converted into the YC b C r color space i.e. most suitable color space for video processing and the first component of the frame in YC b C r color space is treated as gray scale image. The first component of YC b C r i.e. luminance component is sensitive to human eye and similar to grey scale image and remaining two chrominance component consist of color information and human eye is not that much sensitive for these two planes. Therefore, the focus is on the compression of the first luminance plane. Each gray scale frame is divided into a fixed/equal sized macro blocks. The MTSS approach is used for estimating a motion vector in the current frame with respect to the previous frame. The predicted frame for the current frame is then created from the previous frame with assigned motion vectors. Encoding and decoding of the previous frame is performed using WFA or FC. Encoding and decoding of the difference between the predicted frame and the current frame is, also, performed using WFA or FC. The first frame is encoded and decoded using WFA/FC and then the predicted frame for the second frame is formed from the decoded first frame using MTSS. Then, the predicted frame is encoded using WFA/ FC and decoded to form the predicted frame for the third frame. This process is repeated for all the frame sequences. The process flow of proposed MTSS and WFA/FC coding approach for video compression is shown in Fig. 4.

A. Modified Three Step Search (MTSS) Algorithm
The proposed MTSS approach is used to calculate the motion compensation prediction error (MCPE) between two consecutive frames. In MTSS, Two cross search pattern i.e. small and large crosssearch pattern and two cross-hexagon search pattern i.e. large crosshexagon and small cross-hexagon search pattern is used in the center part of search window to exploit central biased characteristic of Motion Vector (MV) in video sequences. The proposed MTSS approach consists of TSS and NCHEXS approach. TSS and NCHEXS employ square and hexagon based shape pattern of different sizes respectively. If minimum BDM point found at outer point of 9×9 search window then TSS will execute. Otherwise NCHEXS will execute at the central part of the search window. Fig. 5 shows the search pattern used in first step of MTSS proposed approach. The MTSS can be summarized as follows: Step 1: Total 9+4 points are checked, if minimum BDM point found at the center of 9+4 points then stop searching otherwise go to step 2.
Step 2: If minimum BDM point found at the outer part of search window then search process is same as TSS discussed in section 2 otherwise go to step 3.
Step 3: If minimum BDM point found at the 4 outer points of small cross search pattern (SCSP) then search process is same as new cross hexagon search (NCHEXS) discussed in section 3.    The concept of intra-frame and inter-frame coding are used for exploiting the self similarity present in the frame. The intra-frame coding is similar to the individual frame by frame coding i.e. individual frames are coded independently. The first frame in a video sequence is always an intra-frame because there is no previous data related to the first frame. It is not necessary in a video sequence always a first frame is an intra-frame. Intra-frame occurs anywhere in video sequences when the scene of video sequence is totally changed. On the other side in inter-frame coding, the previous frame and current frame is coded using proposed MTSS block matching motion estimation algorithm and WFA coding. The previous and current frames are extracted from the video sequence and apply the proposed MTSS algorithm on each fixed sized blocks. The obtained motion vectors (MV) are assigned to the current frame i.e., the newly formed predicted frame. Then calculate the motion compensated predicted error (MCPE) frame by deleting the current frame from newly formed predicted frame. Finally apply the WFA encoding and decoding on them as shown in block schematic Fig.  4. The WFA decoded frame is considered as a previous frame for the new subsequent current frame and same process is repeated.

B. Proof-Outlines of the Parameters used in Modified Three-Step Search (MTSS) algorithm
In general, videos consist of different number of frames. Consecutive frames have the huge spatial redundancy. Occurrence of the change in next frame is very less in comparison with the previous frame. Most of the part of next frame is similar to the previous frame. Therefore, the change, in general, may occur within the area of 3×3 or 5×5 or 7×7 or 9×9 neighborhood of pixels. Hence, to measure the similarity between two consecutive frames, it is better to use the bigger size search window to grab most of the area of macro block size (i.e. 16×16). So, initially, 9×9 search window (centre-biased) is selected. Now, 3 or 4 pixels are left on every side of 9×9 search window. As unexplored area consist of 3 or 4 pixels, the scope of next search window of size (3×2+1)×(3×2+1) i.e. 7×7 or (2×2+1)×(2×2+1) i.e. 5×5 exists. Here, we have used 5×5 search window (centre-biased). Now, unexplored area consist of 1 pixel, the scope of next search window of size (1×2+1)×(1×2+1) i.e. 3×3 exists. Hence, we have used 3×3 search window (centre-biased). W is the weight function between two states, i.e. W Є Q×Q → R for W (q 0 , q 1 ) = R, where q 0 , q 1 Є Q and R is a real number, i.e. the weight between states q 0 and q 1 ;

C. Weighted Finite Automata Representation
I is the initial configuration of states Q → R and indicates which states correspond to the entire image; I (q 0 ) = 1 and I(q i ) = 0, q 0 ≠ q i , where q 0 , q i Є Q and I =1, 2, 3....n; F is the final configuration of states Q → R, e.g. F (q 0 ) = f (Є), where q 0 Є Q and f (Є) is the average intensity (greyness) of the entire image; q 0 is the initial state of the WFA, i.e. the entire original image: q 0 Є Q The transition from (q 0 , 1) = q 1 Є Q×∑ →Q if W (q 0 , q 1 ) = R or W (q 0 , q 1 ) ≠ 0 implies that there is a transition from q 0 to q 1 on the input symbol labelled by 1. Further, W i (q 0 , q 1 ) denotes a weighted transition from q 0 to q 1 on the input symbol, i.e. state or sub-image i.
The basic principles of the WFA approach are somewhat similar to those of the fractal image compression approach based on PIFS (partitioned iterated function systems). Both the approaches use the fact that images used in practice have a certain amount of selfsimilarity present in images to achieve compression. In other words, a sub-image of the image to be compressed may be similar to another sub-image of the same image, except perhaps for size, contrast, or brightness. The main difference between the two approaches is that the PIFS-based fractal compression uses affine transformations to find the self-similarity of sub-images, while on the other hand, WFA finds a sub-image as a weighted linear combination of other states/subimages. The real value (weight) and quadrant address assigned with each transition along an edge in the WFA indicate how each state in the WFA is expressed as a linear combination of the other states: (1) Where s i is the image associated with state s i and (s i )q indicates the address of quadrant q of state s i .
WFA provide a powerful tool for image generation and compression. First, the image is subdivided into non-overlapping sub-images through a quadtree partitioning scheme. These sub-images are identical to those range blocks used in the PIFS-based fractal image compression approach. Next, one or more sub-images that are very similar or identical to the original image or to each range block/sub-image to be encoded separately are obtained from a domain block/sub-images present in domain pool, and a transition graph is constructed to describe the relationship between these sub-images and finally, the image to be encoded using WFA approach. The domain pool may contain all states or sub-images which could be generated from the given partitioning scheme. In general, the WFA uses an inference algorithm to construct a transition graph that is very similar to graphs used to represent finite state automata. The various states / sub-images of the finite state automata are then compressed to become the compressed image.
The process of decoding an encoded image with suitable example is discussed below. The image to be encoded at resolution 4×4 i.e. 2 k=2 × 2 k=2 is given in Fig. 8 (a). The generated quadtree, WFA representation and its transition diagram representation are given in Fig. 8

WFA Encoding Process
The WFA encoding algorithm takes as input a grayscale image of size 2 k ×2 k and gives as output the WFA representation of an image of size 2 k ×2 k i.e. the initial configuration I, the final configuration F and weights W. In WFA all the four sub-image/ quadrants of an image are processed by approximating the quadrant/sub-image with the linear combination of all the existing states. If the quadrant of state/image is not a linear combination of the existing states then the quadrant is chosen as a new state and placed in the unprocessed state list; if not, then add transition and stores coefficients in the list. All the four quadrants/sub-images of a state/image are processed before moving to the next unprocessed state present in the list. Once all the new unprocessed states present in the unprocessed list have been processed, the algorithm terminates. The recursive WFA encoding algorithm is shown below. The decoding algorithm takes as input an encoded WFA represented by n + n 0 states, F and W created during encoding process as well as the resolution level k and returns as output decoded image of size 2 k ×2 k . An WFA decoding approach is discussed below.

V. experImental reSultS and dIScuSSIon
The proposed approach is implemented using MATLAB® 2013a (8.1.0.604). The experiments were carried out on a system with an Intel® Core™ i3 CPU (2.40 GHz). The experiments were carried out on color videos (i.e. Soccer, Suzie, Bus, Football, Xylophone, Paris, Traffic, Akiyo, Ice, and Mobile sequences) obtained from online resources i.e., http://media.xiph.org/video/derf/ (see Fig. 9). Standard input streams with different frame rates, lengths of sequences, and frame widths/heights were considered to demonstrate the performance of the proposed approach (see Table I).

A. Quality Measures
Inter-frame and intra-frame coding is used to eliminate the large amount of temporal and spatial redundancy exists in the video sequences and therefore, helps in compressing them. The matching of the one current frame macro block and previous frame macro block is based on the output of matching criteria. The macro block that results in the minimum value is the one that matches the closest to current block with respect to the corresponding previous frame macro block. The popular matching criteria used for block matching motion estimation are mean of absolute difference (MAD), mean squared error (MSE) and sum of absolute difference (SAD) given by equation 2 ,3 and 4 respectively.
Where, N×N is the row and column of the macro block, C ij and R ij are the pixels value compared in current macro block and previous macro block, respectively.
In Block matching algorithms, the size of macro block is the important parameter for motion estimation. Smaller the macro block size results in more motion vectors and more macro blocks per frame. Therefore, improves a quality of motion compensated prediction error (MCPE). Most video coding standards used a macro block of size 16×16 and 8×8. The best/single motion vector is computed for each macro block in reference frame. On the other hand, total number of search point to find motion vector per frame is one of the key parameter in block matching motion estimation algorithm. The performance of video coding is measured in terms of compression ratio (CR), quality of video i.e. PSNR, encoding time and decoding time. The compression ratio (CR) is given by equation 5.

(5)
Therefore, compression ratio in percentage is computed from 5 and given by equation 6.

CR ⁄
When measuring the quality of compressed video, the peak signal-to-noise ratio is used. Sometimes mean squared error (mse) is also used, is given by 7.
∑ X From this, the PSNR for an 8-bit grayscale image is defined by equation (8), where 255 is the maximum value for 8-bit pixel can assume.

B. Stepwise Results Obtained for Color Space Conversion and the Quad-tree Partitioning Scheme
The intermediate results based on color space conversion and Quadtree partitioning of first ten frames of "soccer" video are shown in Fig. 10. The input video "Soccer" was obtained from an uncompressed video database i.e., http://media.xiph.org/video/derf/. The first 10 frames are shown in Fig. 10(a). The YC b C r color space i.e., Y-luminance component sequences of the first 10 frames are shown in Fig. 10(b). The Y-luminance component sequences of the first 10 frames are then converted into gray frames, as shown in Figures 10(c). Fig. 10(d) shows first 10 frames of quadtree partitioning of gray frames. Note that in experimentation the standard input frame size must be converted into square of size 2 n ×2 n . This conversion involves shrinking and replicating the pixels for finding out the frame of size 2 n ×2 n . The presented approach initially converts the RGB sequence into the YC b C r color space sequence and finally convert Y-luminance component to grey scale for applying the quadtree partitioning.

C. Evaluation and Comparison of Modified Three Step Search (MTSS) Block Matching Estimation
The mean absolute difference (MAD) minimum cost function is used as the Block distortion measure (BDM) given by equation 2. The performance of the MTSS block matching motion estimation algorithm is compared with the existing block matching algorithms i.e., TSS, NTSS and ETSS. Performance evaluation parameters used for comparison are-MAD and total number of search points/ check points required per frame and the results are shown in Fig. 11 and 12 respectively. a. First ten color frames of soccer video.
b. First ten YCbCr frames of soccer video.
c. First ten gray frames of soccer video. d. First ten frames of quadtree partitioning Fig.10. Intermediate results for the first ten frames of "soccer video".

D. Evaluation and Comparison of MTSS Block Matching Estimation and Weighted Finite Automata Coding
The performance of proposed approach measured on the evaluation parameters are quality of decoded video i.e. PSNR, MSE, compression ratio, decoding and encoding time at search parameter p=3 as shown in Table II.

E. Evaluation and Comparison of MTSS Block Matching Estimation and Fractal Coding
In every image, there is a scope to have the self similarity. Fractal coding explores the self similarity existing in the images. For fractal encoding and decoding process the approach with range block of size 8×8 and domain block of size 16×16 is used. Table II to III gives the statistical performance comparison of first 15 frames of standard sequences. The proposed MTSS block matching algorithm and WFA coding approach is compared with MTSS and fractal coding approach given in Table III. Also both the approaches are compared with the frame by frame fractal coding approach. The fractal image compression approach is further implemented on image/video sequences based on intra-frame coding for comparison in terms of encoding time, compression ratio, and quality of video i.e. PSNR given in Table IV. Fig. 14 shows the performance measure in terms of MSE, PSNR and encoding time of first 15 frames of bus sequence using MTSS and fractal coding approach, and frame by frame based fractal coding approach respectively. Fig. 15 shows the 15 th decoded frame of bus sequence in MTSS and WFA coding.

F. Evaluation and Comparison of Fractal Coding
From the Table II to Table IV, we can find that the proposed video compression using MTSS and WFA coding performs better than MTSS and fractal coding, and frame by frame fractal coding in terms of achieving reduced encoding time and better quality of video. The performance measure of simple fractal coding (individual frame coding) for first 15 frames of video sequences in terms of MSE, PSNR, encoding time and decoding time is given in Table IV.

VI. concluSIon and future Scope
In this paper, a modified three step search (MTSS) algorithm and WFA coding approach is proposed to reduce the encoding time. A MTSS block matching motion estimation approach performs better in terms of small MAD and less average search points/check points required than the TSS, NTSS and ETSS at search parameter p=3 pixels. MTSS performs efficiently for the frames with slow and fast motions. Hence, the MTSS algorithm is suitable for video applications. The proposed MTSS and WFA coding approach performed better than MTSS and FC as well as frame-by-frame FC in terms of the encoding time. In MTSS and WFA, the encoding time is reduced by 70% to 80% in comparison with MTSS and FC.
Developed block matching algorithms have scope for improvement through optimization of searching process by exploring the nearest neighborhood of pixels. The developed block matching algorithm can be combined with different coding mechanisms from video compression point of view.