Author: "Han, Junlin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Han, Junlin"' showing total 34 results

Start Over Author "Han, Junlin"

34 results on '"Han, Junlin"'

1. Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

Author: Yang, Ling, Zhang, Zixiang, Han, Junlin, Zeng, Bohan, Li, Runjia, Torr, Philip, and Zhang, Wentao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content. Code: https://github.com/YangLing0818/SemanticSDS-3D, Comment: Project: https://github.com/YangLing0818/SemanticSDS-3D
Published: 2024

2. Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

Author: Han, Junlin, Wang, Jianyuan, Vedaldi, Andrea, Torr, Philip, and Kokkinos, Filippos
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Generating high-quality 3D content from text, single images, or sparse view images remains a challenging task with broad applications. Existing methods typically employ multi-view diffusion models to synthesize multi-view images, followed by a feed-forward process for 3D reconstruction. However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. The first stage consists of a candidate view generation and curation pipeline. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. Subsequently, a view selection pipeline filters these views based on quality and consistency, ensuring that only the high-quality and reliable views are used for reconstruction. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs. FlemRM directly outputs 3D Gaussian points leveraging a tri-plane representation, enabling efficient and detailed 3D generation. Through extensive exploration of design and training strategies, we optimize FlexRM to achieve superior performance in both reconstruction and generation tasks. Our results demonstrate that Flex3D achieves state-of-the-art performance, with a user study winning rate of over 92% in 3D generation tasks when compared to several of the latest feed-forward 3D generative models., Comment: Project page: https://junlinhan.github.io/projects/flex3d/
Published: 2024

3. DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer

Author: Li, Runjia, Han, Junlin, Melas-Kyriazi, Luke, Sun, Chunyi, An, Zhaochong, Gui, Zhongrui, Sun, Shuyang, Torr, Philip, and Jakab, Tomas
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We present DreamBeast, a novel method based on score distillation sampling (SDS) for generating fantastical 3D animal assets composed of distinct parts. Existing SDS methods often struggle with this generation task due to a limited understanding of part-level semantics in text-to-image diffusion models. While recent diffusion models, such as Stable Diffusion 3, demonstrate a better part-level understanding, they are prohibitively slow and exhibit other common problems associated with single-view diffusion models. DreamBeast overcomes this limitation through a novel part-aware knowledge transfer mechanism. For each generated asset, we efficiently extract part-level knowledge from the Stable Diffusion 3 model into a 3D Part-Affinity implicit representation. This enables us to instantly generate Part-Affinity maps from arbitrary camera views, which we then use to modulate the guidance of a multi-view diffusion model during SDS to create 3D assets of fantastical animals. DreamBeast significantly enhances the quality of generated 3D creatures with user-specified part compositions while reducing computational overhead, as demonstrated by extensive quantitative and qualitative evaluations., Comment: Project page: https://dreambeast3d.github.io/, code: https://github.com/runjiali-rl/threestudio-dreambeast
Published: 2024

4. Learning-based Multi-View Stereo: A Survey

Author: Wang, Fangjinhua, Zhu, Qingtian, Chang, Di, Gao, Quankai, Han, Junlin, Zhang, Tong, Hartley, Richard, and Pollefeys, Marc
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments. Due to its efficiency and effectiveness, MVS has become a pivotal method for image-based 3D reconstruction. Recently, with the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods. We categorize these learning-based methods as: depth map-based, voxel-based, NeRF-based, 3D Gaussian Splatting-based, and large feed-forward methods. Among these, we focus significantly on depth map-based methods, which are the main family of MVS due to their conciseness, flexibility and scalability. In this survey, we provide a comprehensive review of the literature at the time of this writing. We investigate these learning-based methods, summarize their performances on popular benchmarks, and discuss promising future research directions in this area.
Published: 2024

5. VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Author: Han, Junlin, Kokkinos, Filippos, and Torr, Philip
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 90% of the time., Comment: ECCV 2024. Project page: https://junlinhan.github.io/projects/vfusion3d.html
Published: 2024

6. Strong and Controllable Blind Image Decomposition

Author: Zhang, Zeyu, Han, Junlin, Gou, Chenhui, Li, Hongdong, and Zheng, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing users to enter which types of degradation to remove or retain. We design an architecture named controllable blind image decomposition network. Inserted in the middle of U-Net structure, our method first decomposes the input feature maps and then recombines them according to user instructions. Advantageously, this functionality is implemented at minimal computational cost: decomposition and recombination are all parameter-free. Experimentally, our system excels in blind image decomposition tasks and can outputs partially or fully restored images that well reflect user intentions. Furthermore, we evaluate and configure different options for the network structure and loss functions. This, combined with the proposed decomposition-and-recombination method, yields an efficient and competitive system for blind image decomposition, compared with current state-of-the-art methods., Comment: Code: https://github.com/Zhangzeyu97/CBD.git
Published: 2024

7. How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs

Author: Tu, Haoqin, Cui, Chenhang, Wang, Zijun, Zhou, Yiyang, Zhao, Bingchen, Han, Junlin, Zhou, Wangchunshu, Yao, Huaxiu, Xie, Cihang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

8. VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Author: Han, Junlin, Kokkinos, Filippos, Torr, Philip, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

9. 3D-GPT: Procedural 3D Modeling with Large Language Models

Author: Sun, Chunyi, Han, Junlin, Deng, Weijian, Wang, Xinlong, Qin, Zishan, and Gould, Stephen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation., Comment: Project page: https://chuny1.github.io/3DGPT/3dgpt.html
Published: 2023

10. Hyperbolic Audio-visual Zero-shot Learning

Author: Hong, Jie, Hayder, Zeeshan, Han, Junlin, Fang, Pengfei, Harandi, Mehrtash, and Petersson, Lars
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Audio-visual zero-shot learning aims to classify samples consisting of a pair of corresponding audio and video sequences from classes that are not present during training. An analysis of the audio-visual data reveals a large degree of hyperbolicity, indicating the potential benefit of using a hyperbolic transformation to achieve curvature-aware geometric learning, with the aim of exploring more complex hierarchical data structures for this task. The proposed approach employs a novel loss function that incorporates cross-modality alignment between video and audio features in the hyperbolic space. Additionally, we explore the use of multiple adaptive curvatures for hyperbolic projections. The experimental results on this very challenging task demonstrate that our proposed hyperbolic approach for zero-shot learning outperforms the SOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSL achieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%, respectively., Comment: ICCV 2023
Published: 2023

11. GOSS: towards generalized open-set semantic segmentation

Author: Hong, Jie, Li, Weihao, Han, Junlin, Zheng, Jiyang, Fang, Pengfei, Harandi, Mehrtash, and Petersson, Lars
Published: 2024
Full Text: View/download PDF

12. NeRFEditor: Differentiable Style Decomposition for Full 3D Scene Editing

Author: Sun, Chunyi, Liu, Yanbin, Han, Junlin, and Gould, Stephen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360{\deg} as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360{\deg}views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation., Comment: Project page: https://chuny1.github.io/NeRFEditor/nerfeditor.html
Published: 2022

13. Publisher Correction: GOSS: towards generalized open-set semantic segmentation

Author: Hong, Jie, Li, Weihao, Han, Junlin, Zheng, Jiyang, Fang, Pengfei, Harandi, Mehrtash, and Petersson, Lars
Published: 2024
Full Text: View/download PDF

14. What Images are More Memorable to Machines?

Author: Han, Junlin, Zhan, Huangying, Hong, Jie, Fang, Pengfei, Li, Hongdong, Petersson, Lars, and Reid, Ian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper studies the problem of measuring and predicting how memorable an image is to pattern recognition machines, as a path to explore machine intelligence. Firstly, we propose a self-supervised machine memory quantification pipeline, dubbed ``MachineMem measurer'', to collect machine memorability scores of images. Similar to humans, machines also tend to memorize certain kinds of images, whereas the types of images that machines and humans memorize are different. Through in-depth analysis and comprehensive visualizations, we gradually unveil that``complex" images are usually more memorable to machines. We further conduct extensive experiments across 11 different machines (from linear classifiers to modern ViTs) and 9 pre-training methods to analyze and understand machine memory. This work proposes the concept of machine memorability and opens a new research direction at the interface between machine memory and visual data., Comment: Code: https://github.com/JunlinHan/MachineMem Project page: https://junlinhan.github.io/projects/machinemem.html
Published: 2022

15. Curved Geometric Networks for Visual Anomaly Recognition

Author: Hong, Jie, Fang, Pengfei, Li, Weihao, Han, Junlin, Petersson, Lars, and Harandi, Mehrtash
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning a latent embedding to understand the underlying nature of data distribution is often formulated in Euclidean spaces with zero curvature. However, the success of the geometry constraints, posed in the embedding space, indicates that curved spaces might encode more structural information, leading to better discriminative power and hence richer representations. In this work, we investigate benefits of the curved space for analyzing anomalies or out-of-distribution objects in data. This is achieved by considering embeddings via three geometry constraints, namely, spherical geometry (with positive curvature), hyperbolic geometry (with negative curvature) or mixed geometry (with both positive and negative curvatures). Three geometric constraints can be chosen interchangeably in a unified design given the task at hand. Tailored for the embeddings in the curved space, we also formulate functions to compute the anomaly score. Two types of geometric modules (i.e., Geometric-in-One and Geometric-in-Two models) are proposed to plug in the original Euclidean classifier, and anomaly scores are computed from the curved embeddings. We evaluate the resulting designs under a diverse set of visual recognition scenarios, including image detection (multi-class OOD detection and one-class anomaly detection) and segmentation (multi-class anomaly segmentation and one-class anomaly segmentation). The empirical results show the effectiveness of our proposal through the consistent improvement over various scenarios.
Published: 2022

16. CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping

Author: Han, Junlin, Petersson, Lars, Li, Hongdong, and Reid, Ian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We present a simple method, CropMix, for the purpose of producing a rich input distribution from the original dataset distribution. Unlike single random cropping, which may inadvertently capture only limited information, or irrelevant information, like pure background, unrelated objects, etc, we crop an image multiple times using distinct crop scales, thereby ensuring that multi-scale information is captured. The new input distribution, serving as training data, useful for a number of vision tasks, is then formed by simply mixing multiple cropped views. We first demonstrate that CropMix can be seamlessly applied to virtually any training recipe and neural network architecture performing classification tasks. CropMix is shown to improve the performance of image classifiers on several benchmark tasks across-the-board without sacrificing computational simplicity and efficiency. Moreover, we show that CropMix is of benefit to both contrastive learning and masked image modeling towards more powerful representations, where preferable results are achieved when learned representations are transferred to downstream tasks. Code is available at GitHub., Comment: Code: https://github.com/JunlinHan/CropMix
Published: 2022

17. GOSS: Towards Generalized Open-set Semantic Segmentation

Author: Hong, Jie, Li, Weihao, Han, Junlin, Zheng, Jiyang, Fang, Pengfei, Harandi, Mehrtash, and Petersson, Lars
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: In this paper, we present and study a new image segmentation task, called Generalized Open-set Semantic Segmentation (GOSS). Previously, with the well-known open-set semantic segmentation (OSS), the intelligent agent only detects the unknown regions without further processing, limiting their perception of the environment. It stands to reason that a further analysis of the detected unknown pixels would be beneficial. Therefore, we propose GOSS, which unifies the abilities of two well-defined segmentation tasks, OSS and generic segmentation (GS), in a holistic way. Specifically, GOSS classifies pixels as belonging to known classes, and clusters (or groups) of pixels of unknown class are labelled as such. To evaluate this new expanded task, we further propose a metric which balances the pixel classification and clustering aspects. Moreover, we build benchmark tests on top of existing datasets and propose a simple neural architecture as a baseline, which jointly predicts pixel classification and clustering under open-set settings. Our experiments on multiple benchmarks demonstrate the effectiveness of our baseline. We believe our new GOSS task can produce an expressive image understanding for future research. Code will be made available.
Published: 2022

18. You Only Cut Once: Boosting Data Augmentation with a Single Cut

Author: Han, Junlin, Fang, Pengfei, Li, Weihao, Hong, Jie, Armin, Mohammad Ali, Reid, Ian, Petersson, Lars, and Li, Hongdong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We present You Only Cut Once (YOCO) for performing data augmentations. YOCO cuts one image into two pieces and performs data augmentations individually within each piece. Applying YOCO improves the diversity of the augmentation per sample and encourages neural networks to recognize objects from partial information. YOCO enjoys the properties of parameter-free, easy usage, and boosting almost all augmentations for free. Thorough experiments are conducted to evaluate its effectiveness. We first demonstrate that YOCO can be seamlessly applied to varying data augmentations, neural network architectures, and brings performance gains on CIFAR and ImageNet classification tasks, sometimes surpassing conventional image-level augmentation by large margins. Moreover, we show YOCO benefits contrastive pre-training toward a more powerful representation that can be better transferred to multiple downstream tasks. Finally, we study a number of variants of YOCO and empirically analyze the performance for respective settings. Code is available at GitHub., Comment: ICML 2022, Code: https://github.com/JunlinHan/YOCO
Published: 2022

19. Blind Image Decomposition

Author: Han, Junlin, Li, Weihao, Fang, Pengfei, Sun, Chunyi, Hong, Jie, Armin, Mohammad Ali, Petersson, Lars, and Li, Hongdong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. For example, rain may consist of multiple components, such as rain streaks, raindrops, snow, and haze. Rainy images can be treated as an arbitrary combination of these components, some of them or all of them. How to decompose superimposed images, like rainy images, into distinct source components is a crucial step toward real-world vision systems. To facilitate research on this new task, we construct multiple benchmark datasets, including mixed image decomposition across multiple domains, real-scenario deraining, and joint shadow/reflection/watermark removal. Moreover, we propose a simple yet general Blind Image Decomposition Network (BIDeN) to serve as a strong baseline for future work. Experimental results demonstrate the tenability of our benchmarks and the effectiveness of BIDeN., Comment: ECCV 2022. Project page: https://junlinhan.github.io/projects/BID.html. Code: https://github.com/JunlinHan/BID
Published: 2021

20. Underwater Image Restoration via Contrastive Learning and a Real-world Dataset

Author: Han, Junlin, Shoeiby, Mehrdad, Malthus, Tim, Botha, Elizabeth, Anstee, Janet, Anwar, Saeed, Wei, Ran, Armin, Mohammad Ali, Li, Hongdong, and Petersson, Lars
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Underwater image restoration is of significant importance in unveiling the underwater world. Numerous techniques and algorithms have been developed in the past decades. However, due to fundamental difficulties associated with imaging/sensing, lighting, and refractive geometric distortions, in capturing clear underwater images, no comprehensive evaluations have been conducted of underwater image restoration. To address this gap, we have constructed a large-scale real underwater image dataset, dubbed `HICRD' (Heron Island Coral Reef Dataset), for the purpose of benchmarking existing methods and supporting the development of new deep-learning based methods. We employ accurate water parameter (diffuse attenuation coefficient) in generating reference images. There are 2000 reference restored images and 6003 original underwater images in the unpaired training set. Further, we present a novel method for underwater image restoration based on unsupervised image-to-image translation framework. Our proposed method leveraged contrastive learning and generative adversarial networks to maximize the mutual information between raw and restored images. Extensive experiments with comparisons to recent approaches further demonstrate the superiority of our proposed method. Our code and dataset are publicly available at GitHub., Comment: In submission, code/dataset are at https://github.com/JunlinHan/CWR. arXiv admin note: text overlap with arXiv:2103.09697
Published: 2021

21. Dual Contrastive Learning for Unsupervised Image-to-Image Translation

Author: Han, Junlin, Shoeiby, Mehrdad, Petersson, Lars, and Armin, Mohammad Ali
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data. Contrastive learning for Unpaired image-to-image Translation (CUT) yields state-of-the-art results in modeling unsupervised image-to-image translation by maximizing mutual information between input and output patches using only one encoder for both domains. In this paper, we propose a novel method based on contrastive learning and a dual learning setting (exploiting two encoders) to infer an efficient mapping between unpaired data. Additionally, while CUT suffers from mode collapse, a variant of our method efficiently addresses this issue. We further demonstrate the advantage of our approach through extensive ablation studies demonstrating superior performance comparing to recent approaches in multiple challenging image translation tasks. Lastly, we demonstrate that the gap between unsupervised methods and supervised methods can be efficiently closed., Comment: Accepted to NTIRE, CVPRW 2021. Code is available at https://github.com/JunlinHan/DCLGAN
Published: 2021

22. Single Underwater Image Restoration by Contrastive Learning

Author: Han, Junlin, Shoeiby, Mehrdad, Malthus, Tim, Botha, Elizabeth, Anstee, Janet, Anwar, Saeed, Wei, Ran, Petersson, Lars, and Armin, Mohammad Ali
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Underwater image restoration attracts significant attention due to its importance in unveiling the underwater world. This paper elaborates on a novel method that achieves state-of-the-art results for underwater image restoration based on the unsupervised image-to-image translation framework. We design our method by leveraging from contrastive learning and generative adversarial networks to maximize mutual information between raw and restored images. Additionally, we release a large-scale real underwater image dataset to support both paired and unpaired training modules. Extensive experiments with comparisons to recent approaches further demonstrate the superiority of our proposed method., Comment: Accepted to IGARSS 2021 as oral presentation. Code is available at https://github.com/JunlinHan/CWR
Published: 2021

23. Blind Image Decomposition

Author: Han, Junlin, Li, Weihao, Fang, Pengfei, Sun, Chunyi, Hong, Jie, Armin, Mohammad Ali, Petersson, Lars, Li, Hongdong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

24. NeRFEditor: Differentiable Style Decomposition for 3D Scene Editing

Author: Sun, Chunyi, primary, Liu, Yanbin, additional, Han, Junlin, additional, and Gould, Stephen, additional
Published: 2024
Full Text: View/download PDF

25. Publisher Correction: GOSS: towards generalized open-set semantic segmentation

Author: Hong, Jie, primary, Li, Weihao, additional, Han, Junlin, additional, Zheng, Jiyang, additional, Fang, Pengfei, additional, Harandi, Mehrtash, additional, and Petersson, Lars, additional
Published: 2023
Full Text: View/download PDF

26. GOSS: towards generalized open-set semantic segmentation

Author: Hong, Jie, primary, Li, Weihao, additional, Han, Junlin, additional, Zheng, Jiyang, additional, Fang, Pengfei, additional, Harandi, Mehrtash, additional, and Petersson, Lars, additional
Published: 2023
Full Text: View/download PDF

27. How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Author: Tu, Haoqin, Cui, Chenhang, Wang, Zijun, Zhou, Yiyang, Zhao, Bingchen, Han, Junlin, Zhou, Wangchunshu, Yao, Huaxiu, Xie, Cihang, Tu, Haoqin, Cui, Chenhang, Wang, Zijun, Zhou, Yiyang, Zhao, Bingchen, Han, Junlin, Zhou, Wangchunshu, Yao, Huaxiu, and Xie, Cihang
Abstract: This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness. For the OOD evaluation, we present two novel VQA datasets, each with one variant, designed to test model performance under challenging conditions. In exploring adversarial robustness, we propose a straightforward attack strategy for misleading VLLMs to produce visual-unrelated responses. Moreover, we assess the efficacy of two jailbreaking strategies, targeting either the vision or language component of VLLMs. Our evaluation of 21 diverse models, ranging from open-source VLLMs to GPT-4V, yields interesting observations: 1) Current VLLMs struggle with OOD texts but not images, unless the visual information is limited; and 2) These VLLMs can be easily misled by deceiving vision encoders only, and their vision-language training often compromise safety protocols. We release this safety evaluation suite at https://github.com/UCSC-VLAA/vllm-safety-benchmark., Comment: H.T., C.C., and Z.W. contribute equally. Work done during H.T. and Z.W.'s internship at UCSC, and C.C. and Y.Z.'s internship at UNC
Published: 2023

28. Underwater Image Restoration via Contrastive Learning and a Real-World Dataset

Author: Han, Junlin, primary, Shoeiby, Mehrdad, additional, Malthus, Tim, additional, Botha, Elizabeth, additional, Anstee, Janet, additional, Anwar, Saeed, additional, Wei, Ran, additional, Armin, Mohammad Ali, additional, Li, Hongdong, additional, and Petersson, Lars, additional
Published: 2022
Full Text: View/download PDF

29. Lack-of-fit tests based on weighted ratio of residuals and variances

Author: Tian, Maozai, Luo, Youxi, Su, Yunan, Fan, Yan, and Han, Junlin
Published: 2012
Full Text: View/download PDF

30. Single Underwater Image Restoration by Contrastive Learning

Author: Han, Junlin, primary, Shoeiby, Mehrdad, additional, Malthus, Tim, additional, Botha, Elizabeth, additional, Anstee, Janet, additional, Anwar, Saeed, additional, Wei, Ran, additional, Petersson, Lars, additional, and Armin, Mohammad Ali, additional
Published: 2021
Full Text: View/download PDF

31. Dual Contrastive Learning for Unsupervised Image-to-Image Translation

Author: Han, Junlin, primary, Shoeiby, Mehrdad, additional, Petersson, Lars, additional, and Armin, Mohammad Ali, additional
Published: 2021
Full Text: View/download PDF

32. Genetic variants of TSPAN12 gene in patients with retinopathy of prematurity

Author: Zhang, Tongmei, primary, Sun, Xiaoli, additional, Han, Junlin, additional, and Han, Mei, additional
Published: 2019
Full Text: View/download PDF

33. Load disturbance observer-based control method for sensorless PMSM drive.

Author: Lu Xiaoquan, Lin Heyun, and Han Junlin
Subjects: ELECTRICAL load, SENSORLESS control systems, PERMANENT magnet motors, SYNCHRONOUS electric motors, ROBUST statistics
Abstract: This study proposes a sensorless control method for the permanent magnet synchronous machine (PMSM) drive that is robust against load torque variations. The proposed method is based on the disturbance observer-based control (DOBC) method, and involves the use of a back electromotive force observer and a torque observer to estimate rotor position and compensate for load torque disturbance, respectively. This mechanism is simple to implement and requires no state vector derivation. The gains of the two observers are carefully selected to improve estimation accuracy and dynamic performance. The performance of the proposed load DOBC for sensorless PMSM drives was tested through a simulation, where the sensorless estimation error was considered to determine the adjustment laws for the proposed observer gains. Furthermore, results of evaluative experiments verified the effectiveness of the proposed method for high-performance sensorless control and torque disturbance rejection for the PMSM drive. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

34. Curved Geometric Networks for Visual Anomaly Recognition.

Author: Hong J, Fang P, Li W, Han J, Petersson L, and Harandi M
Abstract: Learning a latent embedding to understand the underlying nature of data distribution is often formulated in Euclidean spaces with zero curvature. However, the success of the geometry constraints, posed in the embedding space, indicates that curved spaces might encode more structural information, leading to better discriminative power and hence richer representations. In this work, we investigate the benefits of the curved space for analyzing anomalous, open-set, or out-of-distribution (OOD) objects in data. This is achieved by considering embeddings via three geometry constraints, namely, spherical geometry (with positive curvature), hyperbolic geometry (with negative curvature), or mixed geometry (with both positive and negative curvatures). Three geometric constraints can be chosen interchangeably in a unified design, given the task at hand. Tailored for the embeddings in the curved space, we also formulate functions to compute the anomaly score. Two types of geometric modules (i.e., geometric-in-one (GiO) and geometric-in-two (GiT) models) are proposed to plug in the original Euclidean classifier, and anomaly scores are computed from the curved embeddings. We evaluate the resulting designs under a diverse set of visual recognition scenarios, including image detection (multiclass OOD detection and one-class anomaly detection) and segmentation (multiclass anomaly segmentation and one-class anomaly segmentation). The empirical results show the effectiveness of our proposal through consistent improvement over various scenarios. The code is made available at https://github.com/JHome1/GiO-GiT.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Han, Junlin"'

1. Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

2. Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

3. DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer

4. Learning-based Multi-View Stereo: A Survey

5. VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

6. Strong and Controllable Blind Image Decomposition

7. How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs

8. VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

9. 3D-GPT: Procedural 3D Modeling with Large Language Models

10. Hyperbolic Audio-visual Zero-shot Learning

11. GOSS: towards generalized open-set semantic segmentation

12. NeRFEditor: Differentiable Style Decomposition for Full 3D Scene Editing

13. Publisher Correction: GOSS: towards generalized open-set semantic segmentation

14. What Images are More Memorable to Machines?

15. Curved Geometric Networks for Visual Anomaly Recognition

16. CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping

17. GOSS: Towards Generalized Open-set Semantic Segmentation

18. You Only Cut Once: Boosting Data Augmentation with a Single Cut

19. Blind Image Decomposition

20. Underwater Image Restoration via Contrastive Learning and a Real-world Dataset

21. Dual Contrastive Learning for Unsupervised Image-to-Image Translation

22. Single Underwater Image Restoration by Contrastive Learning

23. Blind Image Decomposition

24. NeRFEditor: Differentiable Style Decomposition for 3D Scene Editing

25. Publisher Correction: GOSS: towards generalized open-set semantic segmentation

26. GOSS: towards generalized open-set semantic segmentation

27. How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

28. Underwater Image Restoration via Contrastive Learning and a Real-World Dataset

29. Lack-of-fit tests based on weighted ratio of residuals and variances

30. Single Underwater Image Restoration by Contrastive Learning

31. Dual Contrastive Learning for Unsupervised Image-to-Image Translation

32. Genetic variants of TSPAN12 gene in patients with retinopathy of prematurity

33. Load disturbance observer-based control method for sensorless PMSM drive.

34. Curved Geometric Networks for Visual Anomaly Recognition.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

34 results on '"Han, Junlin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources