Author: "Lu, Qinglin" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lu, Qinglin"' showing total 21 results

Start Over Author "Lu, Qinglin" Search Limiters Available in Library Collection

21 results on '"Lu, Qinglin"'

1. Searching Priors Makes Text-to-Video Synthesis Better

Author: Cheng, Haoran, Peng, Liang, Xia, Linxuan, Hu, Yuepeng, Li, Hengjia, Lu, Qinglin, He, Xiaofei, and Wu, Boxi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this problem, in this paper, we reformulate the typical T2V generation process as a search-based generation pipeline. Instead of scaling up the model training, we employ existing videos as the motion prior database. Specifically, we divide T2V generation process into two steps: (i) For a given prompt input, we search existing text-video datasets to find videos with text labels that closely match the prompt motions. We propose a tailored search algorithm that emphasizes object motion features. (ii) Retrieved videos are processed and distilled into motion priors to fine-tune a pre-trained base T2V model, followed by generating desired videos using input prompt. By utilizing the priors gleaned from the searched videos, we enhance the realism of the generated videos' motion. All operations can be finished on a single NVIDIA RTX 4090 GPU. We validate our method against state-of-the-art T2V models across diverse prompt inputs. The code will be public.
Published: 2024

2. Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Author: Li, Zhimin, Zhang, Jianwei, Lin, Qin, Xiong, Jiangfeng, Long, Yanxin, Deng, Xinchi, Zhang, Yingfang, Liu, Xingchao, Huang, Minbin, Xiao, Zedong, Chen, Dayou, He, Jiajun, Li, Jiahao, Li, Wenyue, Zhang, Chen, Quan, Rongwei, Lu, Jianxiang, Huang, Jiabin, Yuan, Xiaoyan, Zheng, Xiaoxiao, Li, Yixuan, Zhang, Jihong, Zhang, Chao, Chen, Meng, Liu, Jie, Fang, Zheng, Wang, Weiyan, Xue, Jinbao, Tao, Yangyu, Zhu, Jianchen, Liu, Kai, Lin, Sihuan, Sun, Yifu, Li, Yun, Wang, Dongdong, Chen, Mingtao, Hu, Zhichao, Xiao, Xiao, Chen, Yan, Liu, Yuhong, Liu, Wei, Wang, Di, Yang, Yong, Jiang, Jie, and Lu, Qinglin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT, Comment: Project Page: https://dit.hunyuan.tencent.com/
Published: 2024

3. LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

Author: Yang, Yang, Wang, Wen, Peng, Liang, Song, Chaotian, Chen, Yao, Li, Hengjia, Yang, Xiaolong, Lu, Qinglin, Cai, Deng, Wu, Boxi, and Liu, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, where the model struggles to preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through concept injection constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, concept isolation constraints are introduced, refining the self-attention computation. Furthermore, latent re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations. Code is released at \url{https://github.com/Young98CN/LoRA_Composer}, Comment: project page: https://github.com/Young98CN/LoRA_Composer
Published: 2024

4. DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

Author: Huang, Minbin, Long, Yanxin, Deng, Xinchi, Chu, Ruihang, Xiong, Jiangfeng, Liang, Xiaodan, Cheng, Hong, Lu, Qinglin, and Liu, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language Models (MLLMs) with T2I models to bring the user's natural language instructions into reality. Hence, the output modality of MLLMs is extended, and the multi-turn generation quality of T2I models is enhanced thanks to the strong multi-modal comprehension ability of MLLMs. However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper. Therefore, we propose DialogGen, an effective pipeline to align off-the-shelf MLLMs and T2I models to build a Multi-modal Interactive Dialogue System (MIDS) for multi-turn Text-to-Image generation. It is composed of drawing prompt alignment, careful training data curation, and error correction. Moreover, as the field of MIDS flourishes, comprehensive benchmarks are urgently needed to evaluate MIDS fairly in terms of output modality correctness and multi-modal output coherence. To address this issue, we introduce the Multi-modal Dialogue Benchmark (DialogBen), a comprehensive bilingual benchmark designed to assess the ability of MLLMs to generate accurate and coherent multi-modal content that supports image editing. It contains two evaluation metrics to measure the model's ability to switch modalities and the coherence of the output images. Our extensive experiments on DialogBen and user study demonstrate the effectiveness of DialogGen compared with other State-of-the-Art models., Comment: Project page: https://hunyuan-dialoggen.github.io/
Published: 2024

5. Local Conditional Controlling for Text-to-Image Diffusion Models

Author: Zhao, Yibo, Peng, Liang, Yang, Yang, Luo, Zekai, Li, Hengjia, Chen, Yao, Yang, Zheng, He, Xiaofei, Zhao, Wei, lu, qinglin, Wu, Boxi, and Liu, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we explore a novel and practical task setting: local control. It focuses on controlling specific local region according to user-defined image conditions, while the remaining regions are only conditioned by the original text prompt. However, it is non-trivial to achieve local conditional controlling. The naive manner of directly adding local conditions may lead to the local control dominance problem, which forces the model to focus on the controlled region and neglect object generation in other regions. To mitigate this problem, we propose Regional Discriminate Loss to update the noised latents, aiming at enhanced object generation in non-control regions. Furthermore, the proposed Focused Token Response suppresses weaker attention scores which lack the strongest response to enhance object distinction and reduce duplication. Lastly, we adopt Feature Mask Constraint to reduce quality degradation in images caused by information differences across the local control region. All proposed strategies are operated at the inference stage. Extensive experiments demonstrate that our method can synthesize high-quality images aligned with the text prompt under local control conditions.
Published: 2023

6. SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

Author: Peng, Liang, Cheng, Haoran, Yang, Zheng, Zhao, Ruisi, Xia, Linxuan, Song, Chaotian, Lu, Qinglin, Wu, Boxi, and Liu, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.
Published: 2023

7. Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

Author: Jiang, Jie, Li, Zhimin, Xiong, Jiangfeng, Quan, Rongwei, Lu, Qinglin, and Liu, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS.
Published: 2022

8. Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

Author: Tang, Yunlong, Xu, Siting, Wang, Teng, Lin, Qin, Lu, Qinglin, and Zheng, Feng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Advertisement video editing aims to automatically edit advertising videos into shorter videos while retaining coherent content and crucial information conveyed by advertisers. It mainly contains two stages: video segmentation and segment assemblage. The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage. To address these problems, we propose M-SAN (Multi-modal Segment Assemblage Network) which can perform efficient and coherent segment assemblage task end-to-end. It utilizes multi-modal representation extracted from the segments and follows the Encoder-Decoder Ptr-Net framework with the Attention mechanism. Importance-coherence reward is designed for training M-SAN. We experiment on the Ads-1k dataset with 1000+ videos under rich ad scenarios collected from advertisers. To evaluate the methods, we propose a unified metric, Imp-Coh@Time, which comprehensively assesses the importance, coherence, and duration of the outputs at the same time. Experimental results show that our method achieves better performance than random selection and the previous method on the metric. Ablation experiments further verify that multi-modal representation and importance-coherence reward significantly improve the performance. Ads-1k dataset is available at: https://github.com/yunlong10/Ads-1k, Comment: Accepted by ACCV2022
Published: 2022

9. Neuroprotective effect of triptolide on neuronal inflammation in rats with mild brain injury

Author: Fang, Zhanglu, Shen, Guanghong, Lou, Chengjian, Botchway, Benson O.A., Lu, Qinglin, Yang, Qining, and Amin, Nashwa
Published: 2024
Full Text: View/download PDF

10. Overview of Tencent Multi-modal Ads Video Understanding Challenge

Author: Wang, Zhenzhi, Wu, Liyu, Li, Zhimin, Xiong, Jiangfeng, and Lu, Qinglin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification. It asks the participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. Therefore, our task has four distinguishing features from previous ones: ads domain, multi-modal information, temporal segmentation, and multi-label classification. It will advance the foundation of ads video understanding and have a significant impact on many ads applications like video recommendation. This paper presents an overview of our challenge, including the background of ads videos, an elaborate description of task and dataset, evaluation protocol, and our proposed baseline. By ablating the key components of our baseline, we would like to reveal the main challenges of this task and provide useful guidance for future research of this area. In this paper, we give an extended version of our challenge overview. The dataset will be publicly available at https://algo.qq.com/., Comment: 8-page extended version of our challenge paper in ACM MM 2021. It presents the overview of grand challenge "Multi-modal Ads Video Understanding" in ACM MM 2021. Our grand challenge is also the Tencent Advertising Algorithm Competition (TAAC) 2021
Published: 2021

11. BBOF1 is required for sperm motility and male fertility by stabilizing the flagellar axoneme in mice

Author: Cao, Huiwen, Xu, Haomang, Zhou, Yiqing, Xu, Wei, Lu, Qinglin, Jiang, Lingying, Rong, Yan, Zhang, Qianting, and Yu, Chao
Published: 2023
Full Text: View/download PDF

12. Unimodality of the independence polynomials of some composite graphs

Author: Zhu, Bao-Xuan and Lu, Qinglin
Subjects: Mathematics - Combinatorics, 05A20, 05A15, 05C31
Abstract: Let $I(G;x)$ denote the independence polynomial of a graph $G$. In this paper we study the unimodality properties of $I(G;x)$ for some composite graphs $G$. Given two graphs $G_1$ and $G_2$, let $G_1[G_2]$ denote the lexicographic product of $G_1$ and $G_2$. Assume $I(G_1;x)=\sum_{i\geq0}a_ix^i$ and $I(G_2;x)=\sum_{i\geq0}b_ix^i$, where $I(G_2;x)$ is log-concave. Then we prove (i) if $I(G_1;x)$ is log-concave and $(a^2_i-a_{i-1}a_{i+1})b^2_1\geq a_ia_{i-1}b_2$ for all $1\leq i \leq \alpha(G_1)$, then $I(G_1[G_2];x)$ is log-concave; (ii) if $a_{i-1}\leq b_1a_i$ for $1\leq i\leq \alpha(G_1)$, then $I(G_1[G_2];x)$ is unimodal. In particular, if $a_i$ is increasing in $i$, then $I(G_1[G_2];x)$ is unimodal. We also give two sufficient conditions when the independence polynomial of a complete multipartite graph is unimodal or log-concave. Finally, for every odd positive integer $\alpha > 3$, we find a connected graph $G$ not a tree, such that $\alpha(G) =\alpha$, and $I(G; x)$ is symmetric and has only real zeros. This answers a problem of Mandrescu and Miric\u{a}., Comment: It will appear in Filomat
Published: 2015

13. Identification of wheat seedling varieties based on MssiapNet

Author: Feng, Yongqiang, primary, Liu, Chengzhong, additional, Han, Junying, additional, Lu, Qinglin, additional, and Xing, Xue, additional
Published: 2024
Full Text: View/download PDF

14. Context-free grammars, generating functions and combinatorial arrays

Author: Zhu, Bao-Xuan, Yeh, Yeong-Nan, and Lu, Qinglin
Published: 2019
Full Text: View/download PDF

15. Wheat-Seed Variety Recognition Based on the GC_DRNet Model

Author: Xing, Xue, primary, Liu, Chengzhong, additional, Han, Junying, additional, Feng, Quan, additional, Lu, Qinglin, additional, and Feng, Yongqiang, additional
Published: 2023
Full Text: View/download PDF

16. Unimodality of the Independence Polynomials of Some Composite Graphs

Author: Zhu, Bao-Xuan and Lu, Qinglin
Published: 2017

17. IRB-5-CA Net:A Lightweight, Deep Learning-based Approach to Wheat Seed Identification

Author: Feng, Yongqiang, primary, Liu, Chengzhong, additional, Han, Junying, additional, Lu, Qinglin, additional, and Xing, Xue, additional
Published: 2023
Full Text: View/download PDF

18. Tencent AVS: A Holistic Ads Video Dataset for Multi-Modal Scene Segmentation

Author: Jiang, Jie, primary, Li, Zhimin, additional, Xiong, Jiangfeng, additional, Quan, Rongwei, additional, Lu, Qinglin, additional, and Liu, Wei, additional
Published: 2022
Full Text: View/download PDF

19. Identification Method of Wheat Cultivars by Using a Convolutional Neural Network Combined with Images of Multiple Growth Periods of Wheat

Author: Gao, Jiameng, primary, Liu, Chengzhong, additional, Han, Junying, additional, Lu, Qinglin, additional, Wang, Hengxing, additional, Zhang, Jianhua, additional, Bai, Xuguang, additional, and Luo, Jiake, additional
Published: 2021
Full Text: View/download PDF

20. Identification of Genomic Regions Controlling Adult-Plant Stripe Rust Resistance in Chinese Landrace Pingyuan 50 Through Bulked Segregant Analysis

Author: Lan, Caixia, primary, Liang, Shanshan, additional, Zhou, Xiangchun, additional, Zhou, Gang, additional, Lu, Qinglin, additional, Xia, Xianchun, additional, and He, Zhonghu, additional
Published: 2010
Full Text: View/download PDF

21. Wideband Noise Interference Suppression for Sparsity-Based SAR Imaging Based on Dechirping and Double Subspace Extraction.

Author: Li, Guojing, Lu, Qinglin, Lao, Guochao, and Ye, Wei
Subjects: INTERFERENCE suppression, SIGNAL reconstruction, ORTHOGONAL matching pursuit, NOISE, SYNTHETIC aperture radar
Abstract: Sparsity-based synthetic aperture radar (SAR) imaging has attracted much attention since it has potential advantages in improving the image quality and reducing the sampling rate. However, it is vulnerable to deliberate blanket disturbance, especially wideband noise interference (WBNI), which severely damages the imaging quality. This paper mainly focuses on WBNI suppression for SAR imaging from a new perspective—sparse recovery. We first analyze the impact of WBNI on signal reconstruction by deducing the interference energy projected on the real support set of the signal under different observation parameters. Based on the derived results, we propose a novel WBNI suppression algorithm based on dechirping and double subspace extraction (DDSE), where the signal of interest (SOI) is reconstructed by exploiting the known geometric prior and waveform prior, respectively. The experimental results illustrate that the DDSE-based WBNI suppression algorithm for sparsity-based SAR imaging is effective and outperforms the other algorithms. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Lu, Qinglin"'

1. Searching Priors Makes Text-to-Video Synthesis Better

2. Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

3. LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

4. DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

5. Local Conditional Controlling for Text-to-Image Diffusion Models

6. SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

7. Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

8. Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

9. Neuroprotective effect of triptolide on neuronal inflammation in rats with mild brain injury

10. Overview of Tencent Multi-modal Ads Video Understanding Challenge

11. BBOF1 is required for sperm motility and male fertility by stabilizing the flagellar axoneme in mice

12. Unimodality of the independence polynomials of some composite graphs

13. Identification of wheat seedling varieties based on MssiapNet

14. Context-free grammars, generating functions and combinatorial arrays

15. Wheat-Seed Variety Recognition Based on the GC_DRNet Model

16. Unimodality of the Independence Polynomials of Some Composite Graphs

17. IRB-5-CA Net:A Lightweight, Deep Learning-based Approach to Wheat Seed Identification

18. Tencent AVS: A Holistic Ads Video Dataset for Multi-Modal Scene Segmentation

19. Identification Method of Wheat Cultivars by Using a Convolutional Neural Network Combined with Images of Multiple Growth Periods of Wheat

20. Identification of Genomic Regions Controlling Adult-Plant Stripe Rust Resistance in Chinese Landrace Pingyuan 50 Through Bulked Segregant Analysis

21. Wideband Noise Interference Suppression for Sparsity-Based SAR Imaging Based on Dechirping and Double Subspace Extraction.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"Lu, Qinglin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources