Image tampering has become a leading issue in the digital age, which has given rise to serious implications in various fields such as journalism, forensics and photography. Detecting manipulated images with high accuracy is important to ensure the authenticity and credibility of visual content. In this research paper, we propose a robust and effective approach for image tampering detection utilizing a concatenated ResNet and XceptionNet model with Error Level Analysis which has achieved an accuracy of 98.58%.
Introduction
I. INTRODUCTION
Image Tampering detection requires heavy data processing due to the fact that image data is large as each image consists of numerous pixels in contrast to simple numerical or text data, training conventional models on image data doesn’t seem promising when comparing it to other methods like deep learning. A type of neural networks, Convolutional neural networks reduce the complexity as compared to dense neural networks and also don’t compromise on the accuracy which are the most important points to be considered while training a model. When dealing with such heavy data, its essential to apply image processing techniques which can highlight certain features which are important for training models.
Many image processing techniques like edge detection have been implemented in training deep learning models like object detection. One such technique is error level analysis which has shown a lot of promise in detecting image tampering, as it is becoming more and more difficult to identify between real and tampered images due to rise in the use of image editing tools. Generally, images should have a consistent error level in all their areas otherwise they can be marked as probable anomalies. In order to find the best possible combination of deep learning models, experimentation using different methods is important, one such method known as concatenating different models together has shown the most promising results.
Threats due to image tampering are increasing day by day as it has become extremely easy to tamper images related to crucial areas like journalism where these images can spread like fire over social media and can brainwash the mass audience with false information. The main aim for this research is to provide a probable solution for detecting such tampered images and thus protecting everyone from the challenges caused due to them.
II. LITERATURE SURVEY
“Image Processing based on Deep Neural Networks for Detecting Quality Problems in Paper Bag Production” (Syberfeldt et al. 2020) [1]. This paper proposes a deep neural network for detecting quality issues during paper bag production. It highlights certain features of image processing which can be applied before training the model. The trained model can be used for real-time defect detection, providing a powerful tool for quality control and ensuring the consistent production of high-quality paper bags.
“Image Classification Using Deep Neural Network” (Tiwari et al. 2020) [2]. This proposed work implemented the VGG16 model to perform classification of living and non-living things. This work was able to establish an accuracy of 99.89% on the selected dataset.
“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” (Ren et al. 2015) [3]. This paper has introduced a Faster R-CNN which does object detection with high accuracy and real time processing using deep learning. The authors propose Region Proposal Networks (RPNs) for object detection. The RPN and a Fast R-CNN network are combined to form Faster R-CNN, which provides better performance.
“Development of Photo Forensics Algorithm by Detecting Photoshop Manipulation Using Error Level Analysis” (Gunawan et al. 2017).[4] This paper has introduced a photo forensics algorithm utilizing Error Level Analysis (ELA) for identifying manipulations in images. Image tampering is successfully detected by the algorithm by observing the varying compression levels spread throughout the image.
"ImageNet Classification with Deep Convolutional Neural Networks" (Krizhevsky et al. 2012) [5]. The AlexNet architecture has been introduced and the significant improvement it provides over the ImageNet dataset has been demonstrated.
“Very Deep Convolutional Networks for Large-Scale Image Recognition” (Simonyan et al. 2015) [6]. The Visual Geometry Group architecture has been introduced and the importance of depth in CNNs for image classification has been demonstrated.
"Deep Residual Learning for Image Recognition" (He et al. 2016) [7]. The ResNet architecture has been introduced, proper demonstration for the use of residual connections in ResNet has been given along with how it helps for successful training of extremely deep CNNs.
“Image Data Augmentation for Deep Learning: A Comprehensive Survey” (Shorten et al. 2019) [8]. Various methods like flipping, rotation and scaling are used on images in this study for enhancing the data which is fed to models while training. The importance of data augmentation has been presented.
“Data Augmentation Generative Adversarial Networks” (Antoniou et al. 2017) [9]. A new approach for leveraging Generative Adversarial Networks (GANs) for data augmentation. The DAGAN framework enables to generate synthetic images which prove to make any model more robust while training using the augmented data.
III. DATASET
The casia dataset has been used in this paper, it consists of two sets of images, tampered and non-tampered. In order to provide the models with more information while training, data augmentation has been done using Image Data Generator which appends copies of existing images with slight modifications like rotations and flipping. There are 7492 actual images and 5124 tampered images in the dataset used.
Further, the dataset size has been increased even more through data augmentation.
IV. SYSTEM ARCHITECTURE
In the proposed work, two simultaneous processes have been tried out which differ in their data preprocessing methods. Error Level Analysis has been applied on the data in one process while it has not been applied in the other process. The next step is data augmentation for generating more images which help to make the model more robust and better in performance. A series of models have been trained on both preprocessings of the dataset which include simple CNN, ResNet, XceptionNet and a concatenated version of ResNet and XceptionNet. Finally, a proper comparison has been made on all the results achieved.
C. Model Training
Various different models (pretrained and non-pretrained) have been used in the training process. Like simple CNNs, ResNet, XceptionNet and concatenated ResNet and XceptionNet.
Convolutional Neural Network: Convolutional neural networks perform really well at grasping spatially invariant and hierarchical features from raw data. It consists of different layers which include fully connected, convolutional and pooling layers. A convolution operation is applied on input data and certain learnable filters known as kernels. When Convolution operation is performed on the raw data of the image with a particular filter, the result image data shrinks considerably keeping the essential details necessary for training the model while letting go of the redundant data which might cause increased training times. In this research, we utilized a convolutional neural network with 2 convolution layers, followed by 2 dense(fully-connected) layers for classification. We have also used Max-pooling and Dropout as regularization techniques to improve the model performance.
ResNet: Resnet, which is short for Residual Neural Network, has changed the way deep learning works as it aims to address the challenges of training multi-layered deep neural networks. The vanishing gradient problem is faced by traditional CNNs as the gradients diminish exponentially with increase in depth, hampering convergence and the overall learning. ResNet introduced the concept of residual blocks, utilizing skip connections to mitigate the vanishing gradient issue effectively. Let x represent the input to a residual block. The output y of the block is then obtained by adding the learned residual mapping F(x) to the input x, followed by a non-linear activation function σ. The equation can be expressed as:
By learning the residual mapping, the gradient signal can flow easily through the skip connection, thereby facilitating the training of significantly deeper models. This enables ResNet to be trained much deeper than traditional CNNs, leading to a marked improvement in performance and making them a fundamental building block in modern deep learning architectures for various computer vision and other complex tasks. The Resnet50, which is a special type of residual neural network, has been trained on the dataset for comparison purposes. This model comprises of a 50-layer convolutional neural network, with 48 convolutional layers, 1 max-pooled layer and 1 average pooled layer.
3. XceptionNet: XceptionNet is an extension of inception model which introduces separable convolutions. It divides the normal convolution operation into two steps depth-wise convolution and point-wise convolution. Depth-wise convolution applies filter to each channel in the image to reduce the cost incurred while computation. Where as point-wise convolution performs the combination of resultant channels from previous step in a linear manner. It performs exceptionally well in certain areas including object detection, image generation and semantic segmentation.
4. Concatenated ResNet and XceptionNet: In a concatenated model, each individual model processes the input data independently and produces its own output. These outputs are then combined into a single tensor through concatenation along a specified axis. The idea behind concatenated models is to exploit complementary information captured by different models, which can lead to enhanced feature representation and better generalization. This approach is especially useful in transfer learning scenarios, where pre-trained models specialized in specific tasks can be combined to tackle new, related tasks effectively.
???????
Conclusion
The combination of Error Level Analysis (ELA) and a concatenated ResNet and Xception Net architecture demonstrates really promising results for image tampering detection. ELA acts as a tool which is robust and efficient for detecting certain image manipulations by observing the compression differences which might arise during editing. The concatenated model leverages the learnings from both the models and thus its ability to detect tampering in images is enhanced. The highest accuracy achieved is 98.58%.
References
[1] Syberfeldt, A. and Vuoluterä, F., 2020. Image processing based on deep neural networks for detecting quality problems in paper bag production. Procedia CIRP, 93, pp.1224-1229.S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin elevated channel low-temperature poly-Si TFT,” IEEE Electron Device Lett., vol. 20, pp. 569–571, Nov. 1999.
[2] Tiwari, V., Pandey, C., Dwivedi, A. and Yadav, V., 2020, December. Image classification using deep neural network. In 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) (pp. 730-733). IEEE.
[3] Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
[4] Gunawan, T.S., Hanafiah, S.A.M., Kartiwi, M., Ismail, N., Za’bah, N.F. and Nordin, A.N., 2017. Development of photo forensics algorithm by detecting photoshop manipulation using error level analysis. Indonesian Journal of Electrical Engineering and Computer Science, 7(1), pp.131-137.
[5] Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
[6] Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[7] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[8] Shorten, C. and Khoshgoftaar, T.M., 2019. A survey on image data augmentation for deep learning. Journal of big data, 6(1), pp.1-48.
[9] Antoniou, A., Storkey, A. and Edwards, H., 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340.