I. Introduction
In recent years, the rapid development of deep neural networks and the widespread use of large-scale datasets have driven the emergence of many new applications. For example, Generative Adversarial Networks (GANs) [1] are capable of generating fake content that is almost indistinguishable from real content. In particular, current deep learning-based facial forgery techniques, due to their open-source nature and ease of use, have led to a surge in the number of fake videos on the internet, causing significant negative societal impacts. To mitigate the abuse and malicious use of these deepfake technologies, researchers have conducted extensive studies on fake video detection and proposed a series of methods to identify these forged models. Afchar et al. designed several small convolutional modules to capture the micro-features of tampered images. Rossler et al. utilized the Xception architecture to train both full frames and faces of videos. Nguyen designed a capsule network that extracts facial features using VGG19, followed by classification with a capsule network to distinguish fake images or videos. Mo et al. [2] used high-pass filtering and background as inputs. Ding et al. [3] employed transfer learning and fine-tuned ResNet18. Nguyen et al. [4] designed a Y-shaped decoder, combining segmentation and reconstruction loss with classification, using segmentation to assist classification. These detection methods perform well on benchmark datasets. However, existing research indicates that deep learning methods are highly sensitive to perturbations and processing operations in real-world scenarios [5]. In particular, these models face several challenges in practical applications:
Strong Contrast Interference: In real-world scenarios, strong contrast between the foreground and background can cause the model to over-focus on background information, neglecting forgery details in the foreground. For example, in scenes with very complex and high-contrast backgrounds, the model may mistakenly classify forged content as normal because it relies too much on background features and overlooks subtle differences in the foreground.
Geometric Distortion: Geometric distortions in images, such as deformation or warping, can affect the model's recognition ability. For instance, when an image is distorted due to the camera angle or lens distortion, the model may fail to accurately identify forged content, reducing classification accuracy.
Weather Condition Changes: Changes in weather conditions, such as rain or haze, can reduce the clarity and contrast of images or videos. Thus, under low-contrast or blurry conditions, the model may struggle to extract features of forged content, leading to a significant decrease in classification accuracy. For example, fake videos captured in rainy conditions may become blurry, making it difficult for the model to effectively detect forged features.