Detecting Image Brush Editing Using the Discarded Coefficients and Intentions

This paper describes a quick and simple method to detect brush editing in JPEG images. The novelty of the proposed method is based on detecting the discarded coefficients during the quantization of the image. Another novelty of this paper is the development of a subjective metric named intentions. The method directly analyzes the allegedly tampered image and generates a forgery mask indicating forgery evidence for each image block. The experiments show that our method works especially well in detecting brush strokes, and it works reasonably well with added captions and image splicing. However, the method is less effective detecting copy-moved and blurred regions. This means that our method can effectively contribute to implementing a complete image-tampering detection tool. The editing operations for which our method is less effective can be complemented with methods more adequate to detect them.

4. Image painting. This category includes image tampering by painting and drawing. [20]. Cutzu et al [21] have proposed a method to discriminate between drawn images and genuine photo images by detecting changes in the hue, edge and texture features. Elgammal et al. [22] have developed a method to analyze forged strokes in paintings by characterizing personal strokes in drawings. Farid [23] has modeled brush detection as a segmentation problem, using a graph-cut algorithm to detect changes in intensity or texture. Lin and Huang [24] have detected air-brush and brush strokes by: (1) using the expectation-maximization (EM) algorithm in the JPEG coefficients, (2) generating a probability map in the frequency domain and (3) segmenting the periodicity in the probability map. 5. Image retouching. This category groups more subtle changes in the image that enhance or reduce certain features. For instance, Sutthiwan et al. [25] have proposed a method to detect changes in clarity or color of the texture. Mahalakshmi et al. [26] have proposed a method to detect affine transformations (rotation, scaling, etc.) by analyzing changes in the texture of the transformed region.

B. Current Approaches, Strategies and Features
Forged image detection techniques have been broadly divided into active and passive (or blind) approaches [4][5] [6][27] [28]. Active approaches usually watermark or sign the image in order to detect future changes. Passive approaches use only the received image to assess if the image has suffered some kind of post-processing. The rest of this paper focuses on passive approaches.
H. Farid [27] (2009) classified forensic strategies into five categories. Ali Qureshi and Deriche [1] (2015) propose similar categories, but refer to these strategies as tools. In particular, these categories are: 1. Pixel-based strategies detect spatial irregularities in the pixel distribution properties. These strategies include, for instance, changes in noise level [12] or inter-block correlation [29]. These strategies have proved to be especially effective in identifying edited regions.

2.
Compression-based strategies detect traces of forgery in the transformed domain, i.e., they are mainly designed for forensic analysis of JPEG images. These techniques can detect effects such as compression with a specific JPEG quantization table [30] [31] or the quantization with two different quantization tables [32].
3. Camera-based strategies detect alterations in the characteristic artifacts that a specific camera model introduces. An example of these artifacts are the characteristic camera noise [33], or the remaining color after sensor interpolation (demosaicing) [34]. This means that they cannot be applied to analyze any image, since they only apply in certain camera models.

4.
Lighting-based strategies detect inconsistencies in the 3D real world lighting effects, specular lighting or highlights in the surface geometry [35]. These techniques often require manual intervention to identify and analyze possible inconsistencies.

5.
Perspective-based strategies detect when constraints are not met in the perspective of objects with respect to the camera, because the object has undergone a geometric transformation [36] [37]. Although these strategies are named geometric-based in [1][27], we call them perspective-based, to distinguish them from the detection of geometric transformations in sliced regions (e.g. [7]).
Regardless of the strategy, most forgery detection methods are based on the general concept of features: the information extracted from the image to detect forgeries. These methods usually have two stages: 1) Feature extraction measures relevant characteristics of the image, and 2) Feature matching searches regions of the image with similar features. The existence of regions with similar features is an indicator that one region may have been cloned from the other.
The extracted features can in turn be divided into three main types: 1. Block-based features are extracted from (overlapping or nonoverlapping) rectangular blocks. The most typical features are the frequency representation, such as the histogram (e.g. [38]), or the Discrete Cosine Transform (DCT) (e.g. [39][40]) of the blocks. Other features are the texture of the blocks (e.g. [41]) or the moment invariant features, which are block features invariant to rotation and scaling [42].
2. Keypoint-based features are extracted from distinctive parts of the image such as corners, edges, or textures [43]. With these features, [44] identifies three issues to address: the non-uniform distribution of the keypoints, the threshold to select keypoints with low contrast, and how to cluster forged areas. For instance, [17] uses a Gabor filter for keypoint texture retrieval. Most of these features tend to be more robust to affine transformations. SIFT (Scale Invariant Feature Transform) is the most popular affine transform invariant keypoint feature. SURF [45] is an improvement on SIFT to reduce the dimension of the features and the computational time. The authors of [46] combine a point of interest detector with SIFT to extract more features points.
3. Multi-scale features allow for analyzing the image at different levels to achieve better detection results. The authors of [47] analyze textures at different levels to find copy-moved regions. The authors of [48] use multi-scale representation to cluster regions based on geometric constraints. The authors of [14] use multi-scale variation in noise to detect spliced regions.
As indicated above, feature matching searches for similarities (copy-move) or dissimilarities (spliced regions, blurring, retouching) between image features. An example of an effective matching method is clustering: the search space is divided into regions with similar features-vector distributions (e.g. [38]). Another popular feature matching is sorting. For instance, the authors of [49] use as features a histogram of oriented gradients that are lexicographically sorted to find duplicated blocks.

III. Proposed Approach
As we have described in Section II.A, there is an extensive bibliography addressing the detection of forgery techniques of copymove (cloning), image splicing (composites), blurring and sharpening. However, although graphic designers use the brush on a daily basis, its detection has not received the same level of attention. In the forth category in Section II.A we have described, to the best of our knowledge, the current research in brush painting forgeries.
Disturbances in the JPEG compression coefficients have already been successfully used to detect spliced regions [15] [50] or doublecompression [29] [51]. In this work, we have hypothesized that brush editing also alters the distribution of these coefficients. However, the metric we use to detect these disturbances is different. Specifically, we first cancel the effect of lighting, and then assess the number of normalized coefficients that the JPEG compressor has discarded.

A. Forgery Localization and Forgery Mask
There are two granularity levels to represent image forgery localization: 1. Image-based localization classifies the entire image. Binary classification determines whether the image is forged or not. In this case, the true / false and positive / negative rates of the classifier are evaluated (see, for instance [40]). One way to represent the fuzzyness in the classification decision is to assign a forgery probability to the image. To evaluate this probability, it may be useful to have a confidence interval, rather than a single point estimate. For this reason, authors such as [52] study the confidence intervals of the probabilistic classification.

2.
Region-based localization is used when the application requires identifying the parts of the scene that have been modified. This occurs when a change in the image modifies the semantics of the scene. For example, a change in the light of a traffic light might eliminate the traffic offense of the scene. The forgery mask is a tool to represent these areas of the image with high probability of falsification. Our experiments use this forgery mask to highlight tampering. For example, Fig. 1(c) shows the forgery mask for an unedited image, and Fig. 1(d) shows the forgery mask after sharpening the monkey's body. In particular, in our forgery mask dark pixels indicate high probability of alteration.

B. Intentions and Interactive Forgery Mask
Typically, research in image forensics evaluates classification performance at either image or region level using an objective metric. For binary classification, they frequently use two metrics: sensitivity, i.e., the percentage of forgery correctly identified, and specificity, i.e., the percentage of unedited image correctly identified. Other alternative metrics, from the field of information retrieval, are precision, recall or the F-score (e.g., in [9][40]).
A drawback of these objective metrics is that a result like the one shown in our experiment in Fig. 2(d) has a relatively low sensitivity rate Se = 0.5622 (percentage of tampered pixels correctly detected). However, a visual inspection allows concluding that the image is forged. This is because a human is able to detect the intention of the forger without having to resort to soft computing techniques [53].
A second drawback of the objective metrics is their dependency on a threshold parameter. However, there is no general guideline to obtain this threshold, because usually each image has a threshold for maximum detection performance [54]. A human operator can effectively use semantics to effectively address this problem by means of an interactive gauge that allows the operator to visualize the forgery mask with different thresholds. We will refer to this gauge as the interactive forgery mask. Table I shows the sensibility Se and specificity Sp for different threshold values with Fig. 2(b). Fig. 3 shows the corresponding forgery masks. Note that for a human operator the interactive mask is more helpful than the objective metrics.   A third drawback of the objective metrics is that editing does not prove tampering. For instance, the experiments in [55] reported that frequently the double-compression artifact was merely due to an image resaved with a different quality level. A human operator can semantically interpret the image to effectively decide if the marked area corresponds to an intentional forgery. This operator can also use an interactive forgery mask to best "focus" each image at its appropriate threshold level.
In spite of this, in order to facilitate the comparison, we have added to the figures of the reported experiments the objective sensitivity Se and specificity Sp.

IV. Method
The method does not require the original image (untampered image). It requires only an analyzed image (potentially tampered image) in JPEG format in order to calculate the editing evidence of each block. There is a twofold output: 1. An objective metric with the sensibility Se and specificity Sp of the classification.

2.
A subjective metric with the interactive forgery mask indicating the probability of edition of each block (see Fig. 3).

A. Detected Effect
JPEG image compression uses the DCT to concentrate each block's energy in the low frequency coefficients, and high frequency coefficients are often reduced to zero. As described in [56], these high frequencies correspond to excessively sharp changes, which are the least noticeable for the human eye. The DCT tends to assign a low magnitude to these coefficients, and subsequently the JPEG compressor tends to round them to zero. In the rest of this paper we will refer to these zeroed coefficients as discarded coefficients.
Our working hypothesis is that brush strokes regenerate these unnoticeable sharp changes in the edited blocks. Consequently, 1) edited blocks will concentrate these high frequency coefficients, and 2) as a whole, the unedited blocks possesses a fewer count, in contrast with edited blocks. Therefore, during recompression the JPEG compressor will need to discard a fewer number of coefficients to achieve the same compression ratio as the original image (as indicated in Section V, in our experiment we have used Adobe Photoshop default JPEG quality level: Q=12). This effect would be even more prominent when the forger saves the image in a lossless format with the intention of preventing detection by other methods (e.g. double-compression [55][57] [58]). As a consequence, edited blocks will have fewer discarded coefficients.
Note that, we are not indicating that brush painting necessarily increases the number of high coefficients than are in a natural photo (e.g. there are blurring brushes). What we are hypothesizing is that a higher number of coefficients will remain in the tampered area because during recompression the compressor discards a fewer number of them.

B. Tools
The tools to search for the abovementioned effect are the following: 1. Counting discarded coefficients. In our preliminary experiments we have observed that brush edited and recompressed blocks (especially those in the borders) keep a larger number of high coefficients, i.e., have a fewer number of discarded coefficients. Therefore, we use this count to gauge the editing probability of each block.
2. Normalized energy. In our preliminary experiments we found that lighting also influences the count of discarded coefficients. In particular, lighted areas (original or tampered) yield a lower count of discarded coefficients. So, before counting discarded coefficients, we need to normalize the energy of the analyzed image to eliminate the bias in the coefficients due the effect produced by lighting. After normalization, the count of discarded coefficients will not depend on the lighting of the blocks. The following section describes this normalization in more detail.

C. Canceling the Effect of Differences in Lighting
The magnitudes of the DCT coefficients indicate the energy of the block: lighter blocks will have larger DCT coefficients, and so fewer of them will be discarded. We cancel the effect of lighting in the magnitude of the coefficients by means of normalization.
To normalize the energy of the coefficients faster, we avoid converting them to their spatial representation, using the Parseval relationship. This relationship states that the mean energy of the spatial signal �[ , �] is equal to the mean energy of the coefficients in the frequency domain �[�, �]. In particular, given an �x� block, Parseval relationship states that: (1) Where �, are indexing the spatial block, p,q are indexing the coefficient of the corresponding block, and |·| refers to the absolute value of the samples. Note that the spatial samples �[ , �] are integers in the range 0..255, while the coefficients �[�, �] are, in general, complex numbers.
Therefore we accomplish normalization in two steps: 1. Squaring the coefficients to measure energy and eliminate negative values: 2. Scaling the �[�, �] values to the JPEG compressor storage range (i.e., 0..255). We can obtain these values using the following formula: Where min{} and max{} refer to the minimum and maximum value in the block.

D. Filter Algorithm
The proposed filter algorithm is as follows: 1. Divide the image into non-overlapping blocks of 8x8 pixels each.
We propose using N=8, as this is the block size that the JPEG encoder typically uses.
2. Calculate the DCT coefficients of each block.
3. Normalize the energy of the blocks as described in Section IV.C.
4. Calculate the forgery evidence for each block as the sum of discarded coefficients in the block (i.e., zeroed coefficients). In our implementation, for a given threshold �, we calculate forgery evidence for each block with the following rule: if <� then =1.0 else if <2� then =0.5 else =0.0 5. Create the forgery mask representing forgery evidence by assigning a grayscale level to each block. In our implementation a black pixel means definitely edited ( =1.0), a white pixel unedited ( =0.0), and a gray pixel that there is doubt ( =0.5). Fig. 1(d) and Fig. 3 are examples of the result of applying this algorithm. Note that in our implementation the forgery mask uses 3 gray levels to visually show the forgery evidence for each block. It is always possible to increase the number of gray levels, but we believe that, in general, it is difficult for the user to visually interpret more than 3 levels.

V. Validation Methodology
This section demonstrates and assesses the proposed filter with different forgery techniques. For this purpose, we have surveyed the ability of the tool to detect intentions according to the purposes described in Section III.B. In addition, we are adding the sensibility Se and specificity Sp to the figure of each experiment. Due to space limitations in this section we only show a representative experiment of each type of analyzed forgery. All reported experiments have been performed with grayscale images. In the case of RGB images, the described procedure can be repeated in each channel of the JPEG image.

A. Detection of Brush Editing
For the reported experiments we have used Adobe Photoshop and done our best to create semantically realistic forgeries without sharp borders or any other forgery sign. We have saved the forged images with the default quality level of Adobe Photoshop (Q=12), assuming that this default value is the more likely to be used by a forger. The figure of each experiment indicates a threshold T manually chosen for the interactive forgery mask.

1) Experiment 1: Added Caption
The first experiment was made by adding a caption with perspective and blended border to the original image in Fig. 2(a). The forged image is shown in Fig. 2(b). Fig. 2(c) shows the forgery mask that the filter produces with threshold T=20 on the original image. The forgery mask in Fig. 2(c) does not indicate signs of forgery in any block (Se = 0/0, Sp = 1.0). Fig. 2(d) shows the forgery mask with T = 20 on the forged image. The signs of forgery are evident, and a human can easily detect the semantic intention. However, the objective metric is reporting relatively low sensibility Se = 0.5622, i.e., the proportion of forged blocks that are correctly identified as such is 56.22%. Fig. 3 shows the result of the analysis of the same image with three different threshold values.

2) Experiment 2: Brush Painting
For the second experiment in Fig. 4 we have used an 80% solid brush to turn the broken lines into a solid line. Note that the forgery mask in Fig. 4 correctly detects the edited blocks without leaving doubt about the forger's intention. In addition, the forgery mask reaches a high objective detection score: Se = 0.7794, Sp = 0.9997.

B. Detection of Other Forgeries
Our experiments have revealed promising results with the detection of other types of forgeries, although without reaching the same level of precision. Therefore, we are demonstrating below the results that we are obtaining with these other types of forgeries.

1) Experiment 3: Copy-move (Cloning)
The third experiment is for copy-move forgery. Fig. 5(b) shows a copy-moved cat from the original photo in Fig. 5(a). The forgery mask gives some evidence of forgery, but mainly detects forgery in the edges of the forged region. Note that while the forgery mask enables us to perceive the forgery, the objective metric indicates a very low detection rate Se = 0.0319.

2) Experiment 4: Splicing (Composite)
The fourth experiment is for image splicing. In Fig. 6(b) a duck has been added to the lake. The forgery mask in Fig. 6(c) shows some sign of editing in the original image. We have downloaded the original lake image from the Internet, so we do not have access to the original photo. However, we think that the vegetation of the lower right corner has been edited (possibly with a contrast enhancement filter). Also the shadow over the water seems to have been artificially generated.
Regarding the forgery mask in Fig. 6(d), it gives some evidence of forgery, mainly in the edges of the spliced region.

3) Experiment 5: Blurring
For the fifth experiment we have spliced a bottle image in Fig. 7(b) obtaining the forgery mask in Fig. 7(c), which has traces of forgery on the edges. Then we have applied a Gaussian Blur filter with radius 3 to the tampered image in Fig. 7(b), and then recalculated the forgery mask in Fig. 7(d). These results show that our method loses it effectiveness when the forger applies a blurring filter to the edited image.

VI. Conclusions and Future Work
We have observed that the recompression of an edited image block leaves a significant amount of undiscarded high frequency coefficients, and we have identified that the compressor is the responsible for it. In particular, this effect occurs because the first compression of the original JPEG image removes a large portion of these coefficients, and so the compressor is not as greedy for high coefficients when recompressing. The effect is more noticeable when the forger saves the image in a lossless format, but it is enough if the forger saves the image in a lossy format, as is the case in the reported experiments. The experiments also show that this effect is more prominent with brushedited images, but is also able to detect other forgeries. The experiments also reveal that the clustering of potentially modified blocks in semantically noticeable areas (intentions) has two benefits. 1) It makes the subjective human evaluation best determine forged areas that with a classical objective classification rate. For example, the forgery mask in the traffic infraction photo in Fig. 4 serves to provide enough evidence to identify an intentionally tampered image. 2) The interactive forgery mask (see Fig. 3) eases the determination of a suitable threshold with the help of an human operator, compared to using an optimal automatic threshold search algorithm (e.g., [54]).
The major limitation of our method is that the forger can easily erase the effect that we search for by blurring the edited region. This means that to detect blurred edited areas, our method should be combined with other methods, such as [18] [59].

A. Future Work
The first pending future work is to assess and compare intention recognition in alternative state of the art methods. The second future work is the execution of the implemented method with a standard forgery image database, such as [60] [61].