42,356 results on '"image manipulation"'
Search Results
2. FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection
- Author
-
Zhang, Yangxiang, Li, Yuezun, Luo, Ao, Zhou, Jiaran, and Dong, Junyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Cryptography and Security - Abstract
With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection. Our method consists of two-stream branches targeting the cognitive and inspective perspectives. In the cognitive branch, we propose efficient wavelet-guided Transformer blocks to capture the global manipulation traces related to frequency. This block contains an interactive wavelet-guided self-attention module that integrates wavelet transformation with efficient attention design, interacting with the knowledge from the inspective branch. The inspective branch consists of simple convolutions that capture fine-grained traces and interact bidirectionally with Transformer blocks to provide mutual support. Our method is lightweight ($\sim$ 8M) but achieves competitive performance compared to many other counterparts, demonstrating its efficacy in image manipulation detection and its potential for portable integration., Comment: BMVC 2024
- Published
- 2024
3. ForgeryTTT: Zero-Shot Image Manipulation Localization with Test-Time Training
- Author
-
Liu, Weihuang, Shen, Xi, Pun, Chi-Man, and Cun, Xiaodong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Social media is increasingly plagued by realistic fake images, making it hard to trust content. Previous algorithms to detect these fakes often fail in new, real-world scenarios because they are trained on specific datasets. To address the problem, we introduce ForgeryTTT, the first method leveraging test-time training (TTT) to identify manipulated regions in images. The proposed approach fine-tunes the model for each individual test sample, improving its performance. ForgeryTTT first employs vision transformers as a shared image encoder to learn both classification and localization tasks simultaneously during the training-time training using a large synthetic dataset. Precisely, the localization head predicts a mask to highlight manipulated areas. Given such a mask, the input tokens can be divided into manipulated and genuine groups, which are then fed into the classification head to distinguish between manipulated and genuine parts. During test-time training, the predicted mask from the localization head is used for the classification head to update the image encoder for better adaptation. Additionally, using the classical dropout strategy in each token group significantly improves performance and efficiency. We test ForgeryTTT on five standard benchmarks. Despite its simplicity, ForgeryTTT achieves a 20.1% improvement in localization accuracy compared to other zero-shot methods and a 4.3% improvement over non-zero-shot techniques. Our code and data will be released upon publication., Comment: Technical Report
- Published
- 2024
4. Image manipulation localization using reconstruction attention
- Author
-
Meng, Sijiang, Wang, Hongxia, Zhou, Yang, Zeng, Qiang, and Zhang, Rui
- Published
- 2024
- Full Text
- View/download PDF
5. Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
- Author
-
Zheng, Jian-Qing, Mo, Yuanhan, Sun, Yang, Li, Jiahua, Wu, Fuping, Wang, Ziyang, Vincent, Tonia, and Papież, Bartłomiej W.
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, named Deformation-Recovery Diffusion Model (DRDM), diverges from traditional score/intensity and latent feature-based approaches, emphasizing morphological changes through deformation fields rather than direct image synthesis. This is achieved by introducing a topological-preserving deformation field generation method, which randomly samples and integrates a set of multi-scale Deformation Vector Fields (DVF). DRDM is trained to learn to recover unreasonable deformation components, thereby restoring each randomly deformed image to a realistic distribution. These innovations facilitate the generation of diverse and anatomically plausible deformations, enhancing data augmentation and synthesis for further analysis in downstream tasks, such as few-shot learning and image registration. Experimental results in cardiac MRI and pulmonary CT show DRDM is capable of creating diverse, large (over 10\% image size deformation scale), and high-quality (negative rate of the Jacobian matrix's determinant is lower than 1\%) deformation fields. The further experimental results in downstream tasks, 2D image segmentation and 3D image registration, indicate significant improvements resulting from DRDM, showcasing the potential of our model to advance image manipulation and synthesis in medical imaging and beyond. Project page: https://jianqingzheng.github.io/def_diff_rec/
- Published
- 2024
6. GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
- Author
-
Chen, Yirui, Huang, Xudong, Zhang, Quan, Li, Wei, Zhu, Mingjian, Yan, Qiangyu, Li, Simiao, Chen, Hanting, Hu, Hailin, Yang, Jie, Liu, Wei, and Hu, Jie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed, incorporating the powerful SAM, ChatGPT and generative models. Upon this basis, We propose the GIM dataset, which has the following advantages: 1) Large scale, including over one million pairs of AI-manipulated images and real images. 2) Rich Image Content, encompassing a broad range of image classes 3) Diverse Generative Manipulation, manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce two benchmark settings to evaluate the generalization capability and comprehensive performance of baseline methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial Block (FSB), and a Multi-window Anomalous Modelling (MWAM) Module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses previous state-of-the-art works significantly on two different benchmarks., Comment: Code page: https://github.com/chenyirui/GIM
- Published
- 2024
7. IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization
- Author
-
Ma, Xiaochen, Zhu, Xuekang, Su, Lei, Du, Bo, Jiang, Zhuohang, Tong, Bingkui, Lei, Zeyu, Yang, Xinyu, Pun, Chi-Man, Lv, Jiancheng, and Zhou, Jizhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments and faithful comparisons among IMDL models challenging. To address these challenges, we introduce IMDL-BenCo, the first comprehensive IMDL benchmark and modular codebase. IMDL-BenCo:~\textbf{i)} decomposes the IMDL framework into standardized, reusable components and revises the model construction pipeline, improving coding efficiency and customization flexibility;~\textbf{ii)} fully implements or incorporates training code for state-of-the-art models to establish a comprehensive IMDL benchmark; and~\textbf{iii)} conducts deep analysis based on the established benchmark and codebase, offering new insights into IMDL model architecture, dataset characteristics, and evaluation standards. Specifically, IMDL-BenCo includes common processing algorithms, 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of robustness evaluation. This benchmark and codebase represent a significant leap forward in calibrating the current progress in the IMDL field and inspiring future breakthroughs. Code is available at: https://github.com/scu-zjz/IMDLBenCo, Comment: Technical report
- Published
- 2024
8. EmoEdit: Evoking Emotions through Image Manipulation
- Author
-
Yang, Jingyuan, Feng, Jiawei, Luo, Weibin, Lischinski, Dani, Cohen-Or, Daniel, and Huang, Hui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psychological insights, we extend AIM by incorporating content modifications to enhance emotional impact. We introduce EmoEdit, a novel two-stage framework comprising emotion attribution and image editing. In the emotion attribution stage, we leverage a Vision-Language Model (VLM) to create hierarchies of semantic factors that represent abstract emotions. In the image editing stage, the VLM identifies the most relevant factors for the provided image, and guides a generative editing model to perform affective modifications. A ranking technique that we developed selects the best edit, balancing between emotion fidelity and structure integrity. To validate EmoEdit, we assembled a dataset of 416 images, categorized into positive, negative, and neutral classes. Our method is evaluated both qualitatively and quantitatively, demonstrating superior performance compared to existing state-of-the-art techniques. Additionally, we showcase EmoEdit's potential in various manipulation tasks, including emotion-oriented and semantics-oriented editing.
- Published
- 2024
9. Generalized Consistency Trajectory Models for Image Manipulation
- Author
-
Kim, Beomsu, Kim, Jaemin, Kim, Jeongsol, and Ye, Jong Chul
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Diffusion models (DMs) excel in unconditional generation, as well as on applications such as image editing and restoration. The success of DMs lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process by injecting guidance terms into each denoising step. However, the iterative process is also computationally intensive, often taking from tens up to thousands of function evaluations. Although consistency trajectory models (CTMs) enable traversal between any time points along the probability flow ODE (PFODE) and score inference with a single function evaluation, CTMs only allow translation from Gaussian noise to data. This work aims to unlock the full potential of CTMs by proposing generalized CTMs (GCTMs), which translate between arbitrary distributions via ODEs. We discuss the design space of GCTMs and demonstrate their efficacy in various image manipulation tasks such as image-to-image translation, restoration, and editing.
- Published
- 2024
10. DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
- Author
-
Kim, Jeongsol, Park, Geon Yeong, and Ye, Jong Chul
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Reverse sampling and score-distillation have emerged as main workhorses in recent years for image manipulation using latent diffusion models (LDMs). While reverse diffusion sampling often requires adjustments of LDM architecture or feature engineering, score distillation offers a simple yet powerful model-agnostic approach, but it is often prone to mode-collapsing. To address these limitations and leverage the strengths of both approaches, here we introduce a novel framework called {\em DreamSampler}, which seamlessly integrates these two distinct approaches through the lens of regularized latent optimization. Similar to score-distillation, DreamSampler is a model-agnostic approach applicable to any LDM architecture, but it allows both distillation and reverse sampling with additional guidance for image editing and reconstruction. Through experiments involving image editing, SVG reconstruction and etc, we demonstrate the competitive performance of DreamSampler compared to existing approaches, while providing new applications. Code: https://github.com/DreamSampler/dream-sampler, Comment: ECCV 2024
- Published
- 2024
11. A New Benchmark and Model for Challenging Image Manipulation Detection
- Author
-
Zhang, Zhenfei, Li, Mingyang, and Chang, Ming-Ching
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The ability to detect manipulation in multimedia data is vital in digital forensics. Existing Image Manipulation Detection (IMD) methods are mainly based on detecting anomalous features arisen from image editing or double compression artifacts. All existing IMD techniques encounter challenges when it comes to detecting small tampered regions from a large image. Moreover, compression-based IMD approaches face difficulties in cases of double compression of identical quality factors. To investigate the State-of-The-Art (SoTA) IMD methods in those challenging conditions, we introduce a new Challenging Image Manipulation Detection (CIMD) benchmark dataset, which consists of two subsets, for evaluating editing-based and compression-based IMD methods, respectively. The dataset images were manually taken and tampered with high-quality annotations. In addition, we propose a new two-branch network model based on HRNet that can better detect both the image-editing and compression artifacts in those challenging conditions. Extensive experiments on the CIMD benchmark show that our model significantly outperforms SoTA IMD methods on CIMD., Comment: 9 pages, 6 figures, 3 tabels. AAAI-24
- Published
- 2023
- Full Text
- View/download PDF
12. PRISM: Progressive Restoration for Scene Graph-based Image Manipulation
- Author
-
Jahoda, Pavel, Farshad, Azade, Yeganeh, Yousef, Adeli, Ehsan, and Navab, Nassir
- Subjects
Computer Science - Machine Learning - Abstract
Scene graphs have emerged as accurate descriptive priors for image generation and manipulation tasks, however, their complexity and diversity of the shapes and relations of objects in data make it challenging to incorporate them into the models and generate high-quality results. To address these challenges, we propose PRISM, a novel progressive multi-head image manipulation approach to improve the accuracy and quality of the manipulated regions in the scene. Our image manipulation framework is trained using an end-to-end denoising masked reconstruction proxy task, where the masked regions are progressively unmasked from the outer regions to the inner part. We take advantage of the outer part of the masked area as they have a direct correlation with the context of the scene. Moreover, our multi-head architecture simultaneously generates detailed object-specific regions in addition to the entire image to produce higher-quality images. Our model outperforms the state-of-the-art methods in the semantic image manipulation task on the CLEVR and Visual Genome datasets. Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.
- Published
- 2023
13. Frequency-constrained transferable adversarial attack on image manipulation detection and localization
- Author
-
Zeng, Yijia and Pun, Chi-Man
- Published
- 2024
- Full Text
- View/download PDF
14. Shallowfake and deepfake image manipulation localization using noise and RGB-based dual branch method
- Author
-
Dagar, Deepak and Vishwakarma, Dinesh Kumar
- Published
- 2024
- Full Text
- View/download PDF
15. PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning
- Author
-
Liu, Xuntao, Yang, Yuzhou, Ying, Qichao, Qian, Zhenxing, Zhang, Xinpeng, and Li, Sheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deceptive images can be shared in seconds with social networking services, posing substantial risks. Tampering traces, such as boundary artifacts and high-frequency information, have been significantly emphasized by massive networks in the Image Manipulation Localization (IML) field. However, they are prone to image post-processing operations, which limit the generalization and robustness of existing methods. We present a novel Prompt-IML framework. We observe that humans tend to discern the authenticity of an image based on both semantic and high-frequency information, inspired by which, the proposed framework leverages rich semantic knowledge from pre-trained visual foundation models to assist IML. We are the first to design a framework that utilizes visual foundation models specially for the IML task. Moreover, we design a Feature Alignment and Fusion module to align and fuse features of semantic features with high-frequency features, which aims at locating tampered regions from multiple perspectives. Experimental results demonstrate that our model can achieve better performance on eight typical fake image datasets and outstanding robustness., Comment: Under Review
- Published
- 2023
16. RB-Net: integrating region and boundary features for image manipulation localization
- Author
-
Xu, Dengyun, Shen, Xuanjing, Huang, Yongping, and Shi, Zenan
- Published
- 2023
- Full Text
- View/download PDF
17. CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data
- Author
-
Gudavalli, Chandrakanth, Rosten, Erik, Nataraj, Lakshmanan, Chandrasekaran, Shivkumar, and Manjunath, B. S.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace objects in the map. The method proposed in this paper takes in the modified semantic map and alter the original image in accordance to the modified map. The method leverages traditional pre-trained image-to-image translation GANs, such as CycleGAN or Pix2Pix GAN, that are fine-tuned on a limited dataset of reference images associated with the semantic maps. We discuss the qualitative and quantitative performance of our technique to illustrate its capacity and possible applications in the fields of image forgery and image editing. We also demonstrate the effectiveness of the proposed image forgery technique in thwarting the numerous deep learning-based image forensic techniques, highlighting the urgent need to develop robust and generalizable image forensic tools in the fight against the spread of fake media.
- Published
- 2024
18. Key-point Guided Deformable Image Manipulation Using Diffusion Model
- Author
-
Oh, Seok-Hwan, Jung, Guil, Kim, Myeong-Gee, Kim, Sang-Yun, Kim, Young-Min, Lee, Hyeon-Jik, Kwon, Hyuk-Sool, and Bae, Hyeon-Min
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we introduce a Key-point-guided Diffusion probabilistic Model (KDM) that gains precise control over images by manipulating the object's key-point. We propose a two-stage generative model incorporating an optical flow map as an intermediate output. By doing so, a dense pixel-wise understanding of the semantic relation between the image and sparse key point is configured, leading to more realistic image generation. Additionally, the integration of optical flow helps regulate the inter-frame variance of sequential images, demonstrating an authentic sequential image generation. The KDM is evaluated with diverse key-point conditioned image synthesis tasks, including facial image generation, human pose synthesis, and echocardiography video prediction, demonstrating the KDM is proving consistency enhanced and photo-realistic images compared with state-of-the-art models., Comment: 24 pages
- Published
- 2024
19. MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
- Author
-
Li, Sijia, Chen, Chen, and Lu, Haonan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Diffusion-model-based text-guided image generation has recently made astounding progress, producing fascinating results in open-domain image manipulation tasks. Few models, however, currently have complete zero-shot capabilities for both global and local image editing due to the complexity and diversity of image manipulation tasks. In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions. First, we use large language models (ChatGPT) and conditional image synthesis models (ControlNet) to generate a large number of global image transfer dataset in addition to the instruction-based local image editing dataset. Then, using an MOE technique and task-specific adaptation training on a large-scale dataset, our conditional diffusion model can edit images globally and locally. Extensive experiments demonstrate that our approach performs surprisingly well on various image manipulation tasks when dealing with open-domain images and arbitrary human instructions. Please refer to our project page: [https://oppo-mente-lab.github.io/moe_controller/], Comment: 6 pages,6 figures
- Published
- 2023
20. Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning
- Author
-
Zhai, Yuanhao, Luan, Tianyu, Doermann, David, and Yuan, Junsong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As advanced image manipulation techniques emerge, detecting the manipulation becomes increasingly important. Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To address these limitations, we propose weakly-supervised image manipulation detection, such that only binary image-level labels (authentic or tampered with) are required for training purpose. Such a weakly-supervised setting can leverage more training images and has the potential to adapt quickly to new manipulation techniques. To improve the generalization ability, we propose weakly-supervised self-consistency learning (WSCL) to leverage the weakly annotated images. Specifically, two consistency properties are learned: multi-source consistency (MSC) and inter-patch consistency (IPC). MSC exploits different content-agnostic information and enables cross-source learning via an online pseudo label generation and refinement process. IPC performs global pair-wise patch-patch relationship reasoning to discover a complete region of manipulation. Extensive experiments validate that our WSCL, even though is weakly supervised, exhibits competitive performance compared with fully-supervised counterpart under both in-distribution and out-of-distribution evaluations, as well as reasonable manipulation localization ability., Comment: Accepted to ICCV 2023, code: https://github.com/yhZhai/WSCL
- Published
- 2023
21. ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
- Author
-
Sun, Yasheng, Yang, Yifan, Peng, Houwen, Shen, Yifei, Yang, Yuqing, Hu, Han, Qiu, Lili, and Koike, Hideki
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While language-guided image manipulation has made remarkable progress, the challenge of how to instruct the manipulation process faithfully reflecting human intentions persists. An accurate and comprehensive description of a manipulation task using natural language is laborious and sometimes even impossible, primarily due to the inherent uncertainty and ambiguity present in linguistic expressions. Is it feasible to accomplish image manipulation without resorting to external cross-modal language information? If this possibility exists, the inherent modality gap would be effortlessly eliminated. In this paper, we propose a novel manipulation methodology, dubbed ImageBrush, that learns visual instructions for more accurate image editing. Our key idea is to employ a pair of transformation images as visual instructions, which not only precisely captures human intention but also facilitates accessibility in real-world scenarios. Capturing visual instructions is particularly challenging because it involves extracting the underlying intentions solely from visual demonstrations and then applying this operation to a new image. To address this challenge, we formulate visual instruction learning as a diffusion-based inpainting problem, where the contextual information is fully exploited through an iterative process of generation. A visual prompting encoder is carefully devised to enhance the model's capacity in uncovering human intent behind the visual instructions. Extensive experiments show that our method generates engaging manipulation results conforming to the transformations entailed in demonstrations. Moreover, our model exhibits robust generalization capabilities on various downstream tasks such as pose transfer, image translation and video inpainting.
- Published
- 2023
22. CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
- Author
-
Xu, Sihan, Ma, Ziqiao, Huang, Yidong, Lee, Honglak, and Chai, Joyce
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attention-based methods, and image-conditioning. However, it remains a critical challenge to enable unpaired I2I translation with pre-trained DMs while maintaining satisfying consistency. This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation. We validate Cyclenet on unpaired I2I tasks of different granularities. Besides the scene and object level translation, we additionally contribute a multi-domain I2I translation dataset to study the physical state changes of objects. Our empirical studies show that Cyclenet is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt. Cyclenet is a practical framework, which is robust even with very limited training data (around 2k) and requires minimal computational resources (1 GPU) to train. Project homepage: https://cyclenetweb.github.io/, Comment: NeurIPS 2023
- Published
- 2023
23. Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features
- Author
-
Ma, Xiaochen, Zhou, Jizhe, Xu, Xiong, Jiang, Zhuohang, and Pun, Chi-Man
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Nowadays, multimedia forensics faces unprecedented challenges due to the rapid advancement of multimedia generation technology thereby making Image Manipulation Localization (IML) crucial in the pursuit of truth. The key to IML lies in revealing the artifacts or inconsistencies between the tampered and authentic areas, which are evident under pixel-level features. Consequently, existing studies treat IML as a low-level vision task, focusing on allocating tampered masks by crafting pixel-level features such as image RGB noises, edge signals, or high-frequency features. However, in practice, tampering commonly occurs at the object level, and different classes of objects have varying likelihoods of becoming targets of tampering. Therefore, object semantics are also vital in identifying the tampered areas in addition to pixel-level features. This necessitates IML models to carry out a semantic understanding of the entire image. In this paper, we reformulate the IML task as a high-level vision task that greatly benefits from low-level features. Based on such an interpretation, we propose a method to enhance the Masked Autoencoder (MAE) by incorporating high-resolution inputs and a perceptual loss supervision module, which is termed Perceptual MAE (PMAE). While MAE has demonstrated an impressive understanding of object semantics, PMAE can also compensate for low-level semantics with our proposed enhancements. Evidenced by extensive experiments, this paradigm effectively unites the low-level and high-level features of the IML task and outperforms state-of-the-art tampering localization methods on all five publicly available datasets.
- Published
- 2023
24. Pixel-Inconsistency Modeling for Image Manipulation Localization
- Author
-
Kong, Chenqi, Luo, Anwei, Wang, Shiqi, Li, Haoliang, Rocha, Anderson, and Kot, Alex C.
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity, this paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. The rationale is grounded on the observation that most image signal processors (ISP) involve the demosaicing process, which introduces pixel correlations in pristine images. Moreover, manipulating operations, including splicing, copy-move, and inpainting, directly affect such pixel regularity. We, therefore, first split the input image into several blocks and design masked self-attention mechanisms to model the global pixel dependency in input images. Simultaneously, we optimize another local pixel dependency stream to mine local manipulation clues within input forgery images. In addition, we design novel Learning-to-Weight Modules (LWM) to combine features from the two streams, thereby enhancing the final forgery localization performance. To improve the training process, we propose a novel Pixel-Inconsistency Data Augmentation (PIDA) strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces. This work establishes a comprehensive benchmark integrating 15 representative detection models across 12 datasets. Extensive experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints and achieve state-of-the-art generalization and robustness performances in image manipulation localization.
- Published
- 2023
25. Pre-training-free Image Manipulation Localization through Non-Mutually Exclusive Contrastive Learning
- Author
-
Zhou, Jizhe, Ma, Xiaochen, Du, Xia, Alhammadi, Ahmed Y., and Feng, Wentao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep Image Manipulation Localization (IML) models suffer from training data insufficiency and thus heavily rely on pre-training. We argue that contrastive learning is more suitable to tackle the data insufficiency problem for IML. Crafting mutually exclusive positives and negatives is the prerequisite for contrastive learning. However, when adopting contrastive learning in IML, we encounter three categories of image patches: tampered, authentic, and contour patches. Tampered and authentic patches are naturally mutually exclusive, but contour patches containing both tampered and authentic pixels are non-mutually exclusive to them. Simply abnegating these contour patches results in a drastic performance loss since contour patches are decisive to the learning outcomes. Hence, we propose the Non-mutually exclusive Contrastive Learning (NCL) framework to rescue conventional contrastive learning from the above dilemma. In NCL, to cope with the non-mutually exclusivity, we first establish a pivot structure with dual branches to constantly switch the role of contour patches between positives and negatives while training. Then, we devise a pivot-consistent loss to avoid spatial corruption caused by the role-switching process. In this manner, NCL both inherits the self-supervised merits to address the data insufficiency and retains a high manipulation localization accuracy. Extensive experiments verify that our NCL achieves state-of-the-art performance on all five benchmarks without any pre-training and is more robust on unseen real-life samples. The code is available at: https://github.com/Knightzjz/NCL-IML., Comment: Tech report. ICCV2023 paper
- Published
- 2023
26. TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization
- Author
-
Nandi, Soumyaroop, Natarajan, Prem, and Abd-Almageed, Wael
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The evaluation datasets and metrics for image manipulation detection and localization (IMDL) research have been standardized. But the training dataset for such a task is still nonstandard. Previous researchers have used unconventional and deviating datasets to train neural networks for detecting image forgeries and localizing pixel maps of manipulated regions. For a fair comparison, the training set, test set, and evaluation metrics should be persistent. Hence, comparing the existing methods may not seem fair as the results depend heavily on the training datasets as well as the model architecture. Moreover, none of the previous works release the synthetic training dataset used for the IMDL task. We propose a standardized benchmark training dataset for image splicing, copy-move forgery, removal forgery, and image enhancement forgery. Furthermore, we identify the problems with the existing IMDL datasets and propose the required modifications. We also train the state-of-the-art IMDL methods on our proposed TrainFors1 dataset for a fair evaluation and report the actual performance of these methods under similar conditions.
- Published
- 2023
27. TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image
- Author
-
Yuto Watanabe, Ren Togo, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama
- Subjects
Text-guided image manipulation ,generative adversarial network ,manipulation direction ,out-of-domain data ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF
28. Manipulation Mask Generator: High-Quality Image Manipulation Mask Generation Method Based on Modified Total Variation Noise Reduction
- Author
-
Yang, Xinyu and Zhou, Jizhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In artificial intelligence, any model that wants to achieve a good result is inseparable from a large number of high-quality data. It is especially true in the field of tamper detection. This paper proposes a modified total variation noise reduction method to acquire high-quality tampered images. We automatically crawl original and tampered images from the Baidu PS Bar. Baidu PS Bar is a website where net friends post countless tampered images. Subtracting the original image with the tampered image can highlight the tampered area. However, there is also substantial noise on the final print, so these images can't be directly used in the deep learning model. Our modified total variation noise reduction method is aimed at solving this problem. Because a lot of text is slender, it is easy to lose text information after the opening and closing operation. We use MSER (Maximally Stable Extremal Regions) and NMS (Non-maximum Suppression) technology to extract text information. And then use the modified total variation noise reduction technology to process the subtracted image. Finally, we can obtain an image with little noise by adding the image and text information. And the idea also largely retains the text information. Datasets generated in this way can be used in deep learning models, and they will help the model achieve better results.
- Published
- 2023
29. Patternshop: Editing Point Patterns by Image Manipulation
- Author
-
Huang, Xingchang, Ritschel, Tobias, Seidel, Hans-Peter, Memari, Pooran, and Singh, Gurprit
- Subjects
Computer Science - Graphics - Abstract
Point patterns are characterized by their density and correlation. While spatial variation of density is well-understood, analysis and synthesis of spatially-varying correlation is an open challenge. No tools are available to intuitively edit such point patterns, primarily due to the lack of a compact representation for spatially varying correlation. We propose a low-dimensional perceptual embedding for point correlations. This embedding can map point patterns to common three-channel raster images, enabling manipulation with off-the-shelf image editing software. To synthesize back point patterns, we propose a novel edge-aware objective that carefully handles sharp variations in density and correlation. The resulting framework allows intuitive and backward-compatible manipulation of point patterns, such as recoloring, relighting to even texture synthesis that have not been available to 2D point pattern design before. Effectiveness of our approach is tested in several user experiments.
- Published
- 2023
- Full Text
- View/download PDF
30. Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation
- Author
-
Wang, Luozhou, Yang, Shuai, Liu, Shu, and Chen, Ying-cong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Conditional diffusion models have demonstrated impressive performance in image manipulation tasks. The general pipeline involves adding noise to the image and then denoising it. However, this method faces a trade-off problem: adding too much noise affects the fidelity of the image while adding too little affects its editability. This largely limits their practical applicability. In this paper, we propose a novel framework, Selective Diffusion Distillation (SDD), that ensures both the fidelity and editability of images. Instead of directly editing images with a diffusion model, we train a feedforward image manipulation network under the guidance of the diffusion model. Besides, we propose an effective indicator to select the semantic-related timestep to obtain the correct semantic guidance from the diffusion model. This approach successfully avoids the dilemma caused by the diffusion process. Our extensive experiments demonstrate the advantages of our framework. Code is released at https://github.com/AndysonYs/Selective-Diffusion-Distillation.
- Published
- 2023
31. Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach
- Author
-
Singh, Harman, Garg, Poorva, Gupta, Mohit, Shah, Kevin, Goswami, Ashish, Modi, Satyam, Mondal, Arnab Kumar, Khandelwal, Dinesh, Garg, Dinesh, and Singla, Parag
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation., Comment: EMNLP 2023 (long paper, main conference)
- Published
- 2023
32. High-Precision Heterogeneous Satellite Image Manipulation Localization: Feature Point Rules and Semantic Similarity Measurement
- Author
-
Ruijie Wu, Wei Guo, Yi Liu, and Chenhao Sun
- Subjects
image manipulation localization ,change detection ,heterogeneous satellite images ,feature point ,Science - Abstract
Misusing image tampering software makes it easier to manipulate satellite images, leading to a crisis of trust and security concerns in society. This study compares the inconsistencies between heterogeneous images to locate tampered areas and proposes a high-precision heterogeneous satellite image manipulation localization (HSIML) framework to distinguish tampered from real landcover changes, such as artificial constructions, and pseudo-changes, such as seasonal variations. The model operates at the patch level and comprises three modules: The heterogeneous image preprocessing module aligns heterogeneous images and filters noisy data. The feature point constraint module mitigates the effects of lighting and seasonal variations in the images by performing feature point matching, applying filtering rules to conduct an initial screening to identify candidate tampered patches. The semantic similarity measurement module designs a classification network to assess RS image feature saliency. It determines image consistency based on the similarity of semantic features and implements IML using predefined classification rules. Additionally, a dataset for IML is constructed based on satellite images. Extensive experiments compared with existing SOTA models demonstrate that our method achieved the highest F1 score in both localization accuracy and robustness tests and demonstrates the capability for handling large-scale areas.
- Published
- 2024
- Full Text
- View/download PDF
33. DRAW: Defending Camera-shooted RAW against Image Manipulation
- Author
-
Hu, Xiaoxiao, Ying, Qichao, Qian, Zhenxing, Li, Sheng, and Zhang, Xinpeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source., Comment: To appear in ICCV 2023. The leading two authors contribute equally
- Published
- 2023
34. IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
- Author
-
Ma, Xiaochen, Du, Bo, Jiang, Zhuohang, Hammadi, Ahmed Y. Al, and Zhou, Jizhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Advanced image tampering techniques are increasingly challenging the trustworthiness of multimedia, leading to the development of Image Manipulation Localization (IML). But what makes a good IML model? The answer lies in the way to capture artifacts. Exploiting artifacts requires the model to extract non-semantic discrepancies between manipulated and authentic regions, necessitating explicit comparisons between the two areas. With the self-attention mechanism, naturally, the Transformer should be a better candidate to capture artifacts. However, due to limited datasets, there is currently no pure ViT-based approach for IML to serve as a benchmark, and CNNs dominate the entire task. Nevertheless, CNNs suffer from weak long-range and non-semantic modeling. To bridge this gap, based on the fact that artifacts are sensitive to image resolution, amplified under multi-scale features, and massive at the manipulation border, we formulate the answer to the former question as building a ViT with high-resolution capacity, multi-scale feature extraction capability, and manipulation edge supervision that could converge with a small amount of data. We term this simple but effective ViT paradigm IML-ViT, which has significant potential to become a new benchmark for IML. Extensive experiments on five benchmark datasets verified our model outperforms the state-of-the-art manipulation localization methods.Code and models are available at \url{https://github.com/SunnyHaze/IML-ViT}.
- Published
- 2023
35. DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation
- Author
-
Leng, Yipeng, Huang, Qiangjuan, Wang, Zhiyuan, Liu, Yangyang, and Zhang, Haoyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion probabilistic models (DPMs) have shown remarkable results on various image synthesis tasks such as text-to-image generation and image inpainting. However, compared to other generative methods like VAEs and GANs, DPMs lack a low-dimensional, interpretable, and well-decoupled latent code. Recently, diffusion autoencoders (Diff-AE) were proposed to explore the potential of DPMs for representation learning via autoencoding. Diff-AE provides an accessible latent space that exhibits remarkable interpretability, allowing us to manipulate image attributes based on latent codes from the space. However, previous works are not generic as they only operated on a few limited attributes. To further explore the latent space of Diff-AE and achieve a generic editing pipeline, we proposed a module called Group-supervised AutoEncoder(dubbed GAE) for Diff-AE to achieve better disentanglement on the latent code. Our proposed GAE has trained via an attribute-swap strategy to acquire the latent codes for multi-attribute image manipulation based on examples. We empirically demonstrate that our method enables multiple-attributes manipulation and achieves convincing sample quality and attribute alignments, while significantly reducing computational requirements compared to pixel-based approaches for representational decoupling. Code will be released soon.
- Published
- 2023
36. Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
- Author
-
Lu, Zeyu, Wu, Chengyue, Chen, Xinyuan, Wang, Yaohui, Bai, Lei, Qiao, Yu, and Liu, Xihui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Diffusion models have attained impressive visual quality for image synthesis. However, how to interpret and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the semantic representations into a semantic latent code, which fails to reflect the rich information of details and the intrinsic feature hierarchy. To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models. The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations. In addition, we propose a truncated-feature-based approach for disentangled image manipulation. We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable interpolation, detail-preserving and disentangled image manipulation, and multi-modal semantic image synthesis.
- Published
- 2023
37. Entity-Level Text-Guided Image Manipulation
- Author
-
Wang, Yikai, Wang, Jianan, Lu, Guansong, Xu, Hang, Li, Zhenguo, Zhang, Wei, and Fu, Yanwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical applications. In this work, we study a novel task on text-guided image manipulation on the entity level in the real world (eL-TGIM). The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the entity-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. To this end, we propose an elegant framework, dubbed as SeMani, forming the Semantic Manipulation of real-world images that can not only edit the appearance of entities but also generate new entities corresponding to the text guidance. To solve eL-TGIM, SeMani decomposes the task into two phases: the semantic alignment phase and the image manipulation phase. In the semantic alignment phase, SeMani incorporates a semantic alignment module to locate the entity-relevant region to be manipulated. In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions. We discuss and propose two popular generation processes that can be utilized in SeMani, the discrete auto-regressive generation with transformers and the continuous denoising generation with diffusion models, yielding SeMani-Trans and SeMani-Diff, respectively. We conduct extensive experiments on the real datasets CUB, Oxford, and COCO datasets to verify that SeMani can distinguish the entity-relevant and -irrelevant regions and achieve more precise and flexible manipulation in a zero-shot manner compared with baseline methods. Our codes and models will be released at https://github.com/Yikai-Wang/SeMani., Comment: Extension of our CVPR 2022 oral paper: 2204.04428. Yikai Wang and Jianan Wang contribute equally. The arxiv version uses small size figures for fast preview, the full size pdf version can be found in our project page: https://yikai-wang.github.io/semani/. arXiv admin note: substantial text overlap with arXiv:2204.04428
- Published
- 2023
38. HRFNet: High-Resolution Forgery Network for Localizing Satellite Image Manipulation
- Author
-
Niloy, Fahim Faisal, Bhaumik, Kishor Kumar, and Woo, Simon S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing high-resolution satellite image forgery localization methods rely on patch-based or downsampling-based training. Both of these training methods have major drawbacks, such as inaccurate boundaries between pristine and forged regions, the generation of unwanted artifacts, etc. To tackle the aforementioned challenges, inspired by the high-resolution image segmentation literature, we propose a novel model called HRFNet to enable satellite image forgery localization effectively. Specifically, equipped with shallow and deep branches, our model can successfully integrate RGB and resampling features in both global and local manners to localize forgery more accurately. We perform various experiments to demonstrate that our method achieves the best performance, while the memory requirement and processing speed are not compromised compared to existing methods., Comment: ICIP 2023
- Published
- 2023
39. Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image Manipulation
- Author
-
Krishnan, Prashant, Wang, Zilong, Wang, Yangkun, and Shang, Jingbo
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances of incorporating layout information, typically bounding box coordinates, into pre-trained language models have achieved significant performance in entity recognition from document images. Using coordinates can easily model the absolute position of each token, but they might be sensitive to manipulations in document images (e.g., shifting, rotation or scaling), especially when the training data is limited in few-shot settings. In this paper, we propose to further introduce the topological adjacency relationship among the tokens, emphasizing their relative position information. Specifically, we consider the tokens in the documents as nodes and formulate the edges based on the topological heuristics from the k-nearest bounding boxes. Such adjacency graphs are invariant to affine transformations including shifting, rotations and scaling. We incorporate these graphs into the pre-trained language model by adding graph neural network layers on top of the language model embeddings, leading to a novel model LAGER. Extensive experiments on two benchmark datasets show that LAGER significantly outperforms strong baselines under different few-shot settings and also demonstrate better robustness to manipulations.
- Published
- 2023
40. On the Effectiveness of Image Manipulation Detection in the Age of Social Media
- Author
-
VidalMata, Rosaura G., Saboia, Priscila, Moreira, Daniel, Jensen, Grant, Schlessman, Jason, and Scheirer, Walter J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image manipulation detection algorithms designed to identify local anomalies often rely on the manipulated regions being ``sufficiently'' different from the rest of the non-tampered regions in the image. However, such anomalies might not be easily identifiable in high-quality manipulations, and their use is often based on the assumption that certain image phenomena are associated with the use of specific editing tools. This makes the task of manipulation detection hard in and of itself, with state-of-the-art detectors only being able to detect a limited number of manipulation types. More importantly, in cases where the anomaly assumption does not hold, the detection of false positives in otherwise non-manipulated images becomes a serious problem. To understand the current state of manipulation detection, we present an in-depth analysis of deep learning-based and learning-free methods, assessing their performance on different benchmark datasets containing tampered and non-tampered samples. We provide a comprehensive study of their suitability for detecting different manipulations as well as their robustness when presented with non-tampered data. Furthermore, we propose a novel deep learning-based pre-processing technique that accentuates the anomalies present in manipulated regions to make them more identifiable by a variety of manipulation detection methods. To this end, we introduce an anomaly enhancement loss that, when used with a residual architecture, improves the performance of different detection algorithms with a minimal introduction of false positives on the non-manipulated data. Lastly, we introduce an open-source manipulation detection toolkit comprising a number of standard detection algorithms.
- Published
- 2023
41. Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models
- Author
-
Starodubcev, Nikita, Baranchuk, Dmitry, Khrulkov, Valentin, and Babenko, Artem
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Recent advances in diffusion models enable many powerful instruments for image editing. One of these instruments is text-driven image manipulations: editing semantic attributes of an image according to the provided text description. % Popular text-conditional diffusion models offer various high-quality image manipulation methods for a broad range of text prompts. Existing diffusion-based methods already achieve high-quality image manipulations for a broad range of text prompts. However, in practice, these methods require high computation costs even with a high-end GPU. This greatly limits potential real-world applications of diffusion-based image editing, especially when running on user devices. In this paper, we address efficiency of the recent text-driven editing methods based on unconditional diffusion models and develop a novel algorithm that learns image manipulations 4.5-10 times faster and applies them 8 times faster. We carefully evaluate the visual quality and expressiveness of our approach on multiple datasets using human annotators. Our experiments demonstrate that our algorithm achieves the quality of much more expensive methods. Finally, we show that our approach can adapt the pretrained model to the user-specified image and text description on the fly just for 4 seconds. In this setting, we notice that more compact unconditional diffusion models can be considered as a rational alternative to the popular text-conditional counterparts.
- Published
- 2023
42. DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
- Author
-
Lyu, Yueming, Lin, Tianwei, Li, Fu, He, Dongliang, Dong, Jing, and Tan, Tieniu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-driven image manipulation remains challenging in training or inference flexibility. Conditional generative models depend heavily on expensive annotated training data. Meanwhile, recent frameworks, which leverage pre-trained vision-language models, are limited by either per text-prompt optimization or inference-time hyper-parameters tuning. In this work, we propose a novel framework named \textit{DeltaEdit} to address these problems. Our key idea is to investigate and identify a space, namely delta image and text space that has well-aligned distribution between CLIP visual feature differences of two images and CLIP textual embedding differences of source and target texts. Based on the CLIP delta space, the DeltaEdit network is designed to map the CLIP visual features differences to the editing directions of StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the StyleGAN's editing directions from the differences of the CLIP textual features. In this way, DeltaEdit is trained in a text-free manner. Once trained, it can well generalize to various text prompts for zero-shot inference without bells and whistles. Code is available at https://github.com/Yueming6568/DeltaEdit., Comment: Accepted by CVPR2023. Code is available at https://github.com/Yueming6568/DeltaEdit
- Published
- 2023
43. Semi-supervised image manipulation localization with residual enhancement
- Author
-
Zeng, Qiang, Wang, Hongxia, Zhou, Yang, Zhang, Rui, and Meng, Sijiang
- Published
- 2024
- Full Text
- View/download PDF
44. Image manipulation detection and localization using multi-scale contrastive learning
- Author
-
Bai, Ruyi
- Published
- 2024
- Full Text
- View/download PDF
45. Exploring weakly-supervised image manipulation localization with tampering Edge-based class activation map
- Author
-
Zhou, Yang, Wang, Hongxia, Zeng, Qiang, Zhang, Rui, and Meng, Sijiang
- Published
- 2024
- Full Text
- View/download PDF
46. A Contribution-Aware Noise Feature representation model for image manipulation localization
- Author
-
Zhou, Yang, Wang, Hongxia, Zeng, Qiang, Zhang, Rui, and Meng, Sijiang
- Published
- 2024
- Full Text
- View/download PDF
47. Towards Arbitrary Text-driven Image Manipulation via Space Alignment
- Author
-
Bai, Yunpeng, Zhong, Zihan, Dong, Chao, Zhang, Weichen, Xu, Guowei, and Yuan, Chun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The recent GAN inversion methods have been able to successfully invert the real image input to the corresponding editable latent code in StyleGAN. By combining with the language-vision model (CLIP), some text-driven image manipulation methods are proposed. However, these methods require extra costs to perform optimization for a certain image or a new attribute editing mode. To achieve a more efficient editing method, we propose a new Text-driven image Manipulation framework via Space Alignment (TMSA). The Space Alignment module aims to align the same semantic regions in CLIP and StyleGAN spaces. Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description. The framework can support arbitrary image editing mode without additional cost. Our work provides the user with an interface to control the attributes of a given image according to text input and get the result in real time. Ex tensive experiments demonstrate our superior performance over prior works., Comment: 8 pages, 12 figures
- Published
- 2023
48. TriPINet: Tripartite Progressive Integration Network for Image Manipulation Localization
- Author
-
Liang, Wei-Yun, Xu, Jing, and Jin, Xiao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image manipulation localization aims at distinguishing forged regions from the whole test image. Although many outstanding prior arts have been proposed for this task, there are still two issues that need to be further studied: 1) how to fuse diverse types of features with forgery clues; 2) how to progressively integrate multistage features for better localization performance. In this paper, we propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization. First, we extract both visual perception information, e.g., RGB input images, and visual imperceptible features, e.g., frequency and noise traces for forensic feature learning. Second, we develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues. Third, we design a set of progressive integration squeeze-and-excitation (PI-SE) modules to improve localization performance by appropriately incorporating multiscale features in the decoder. Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches. The proposed TriPINet obtains competitive results on several benchmark datasets.
- Published
- 2022
49. CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics
- Author
-
Song, Yiren, Shao, Xuning, Chen, Kang, Zhang, Weidong, Li, Minzhe, and Jing, Zhongliang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Considerable progress has recently been made in leveraging CLIP (Contrastive Language-Image Pre-Training) models for text-guided image manipulation. However, all existing works rely on additional generative models to ensure the quality of results, because CLIP alone cannot provide enough guidance information for fine-scale pixel-level changes. In this paper, we introduce CLIPVG, a text-guided image manipulation framework using differentiable vector graphics, which is also the first CLIP-based general image manipulation framework that does not require any additional generative models. We demonstrate that CLIPVG can not only achieve state-of-art performance in both semantic correctness and synthesis quality, but also is flexible enough to support various applications far beyond the capability of all existing methods., Comment: 8 pages, 10 figures, AAAI2023
- Published
- 2022
50. Interactive Image Manipulation with Complex Text Instructions
- Author
-
Morita, Ryugo, Zhang, Zhiqiang, Ho, Man M., and Zhou, Jinjia
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recently, text-guided image manipulation has received increasing attention in the research field of multimedia processing and computer vision due to its high flexibility and controllability. Its goal is to semantically manipulate parts of an input reference image according to the text descriptions. However, most of the existing works have the following problems: (1) text-irrelevant content cannot always be maintained but randomly changed, (2) the performance of image manipulation still needs to be further improved, (3) only can manipulate descriptive attributes. To solve these problems, we propose a novel image manipulation method that interactively edits an image using complex text instructions. It allows users to not only improve the accuracy of image manipulation but also achieve complex tasks such as enlarging, dwindling, or removing objects and replacing the background with the input image. To make these tasks possible, we apply three strategies. First, the given image is divided into text-relevant content and text-irrelevant content. Only the text-relevant content is manipulated and the text-irrelevant content can be maintained. Second, a super-resolution method is used to enlarge the manipulation region to further improve the operability and to help manipulate the object itself. Third, a user interface is introduced for editing the segmentation map interactively to re-modify the generated image according to the user's desires. Extensive experiments on the Caltech-UCSD Birds-200-2011 (CUB) dataset and Microsoft Common Objects in Context (MS COCO) datasets demonstrate our proposed method can enable interactive, flexible, and accurate image manipulation in real-time. Through qualitative and quantitative evaluations, we show that the proposed model outperforms other state-of-the-art methods., Comment: Accepted to WACV2023
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.