Author: "Qi, Zekun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Qi, Zekun"' showing total 11 results

Start Over Author "Qi, Zekun"

11 results on '"Qi, Zekun"'

1. Positional Prompt Tuning for Efficient 3D Representation Learning

Author: Zhang, Shaochen, Qi, Zekun, Dong, Runpei, Bai, Xiuxiu, and Wei, Xing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Point cloud analysis has achieved significant development and is well-performed in multiple downstream tasks like point cloud classification and segmentation, etc. Being conscious of the simplicity of the position encoding structure in Transformer-based architectures, we attach importance to the position encoding as a high-dimensional part and the patch encoder to offer multi-scale information. Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding. With only a few parameters, the position embedding module fits the setting of PEFT (Parameter-Efficient Fine-Tuning) tasks pretty well. Thus we unfreeze these parameters as a fine-tuning part. At the same time, we review the existing prompt and adapter tuning methods, proposing a fresh way of prompts and synthesizing them with adapters as dynamic adjustments. Our Proposed method of PEFT tasks, namely PPT, with only 1.05% of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes will be released at https://github.com/zsc000722/PPT., Comment: tech report
Published: 2024

2. DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Author: Peng, Yuang, Cui, Yuxin, Tang, Haomiao, Qi, Zekun, Dong, Runpei, Bai, Jing, Han, Chunrui, Ge, Zheng, Zhang, Xiangyu, and Xia, Shu-Tao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advanced multimodal GPT models. Specifically, we systematically design the prompts to let GPT be both human-aligned and self-aligned, empowered with task reinforcement. Further, we construct a comprehensive dataset comprising diverse images and prompts. By benchmarking 7 modern generative models, we demonstrate that DreamBench++ results in significantly more human-aligned evaluation, helping boost the community with innovative findings., Comment: Project page: https://dreambenchplus.github.io/
Published: 2024

3. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Author: Qi, Zekun, Dong, Runpei, Zhang, Shaochen, Geng, Haoran, Han, Chunrui, Ge, Zheng, Yi, Li, and Ma, Kaisheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding. Project page: https://qizekun.github.io/shapellm/, Comment: Accepted at ECCV 2024
Published: 2024

4. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Author: Qi, Zekun, Dong, Runpei, Zhang, Shaochen, Geng, Haoran, Han, Chunrui, Ge, Zheng, Yi, Li, Ma, Kaisheng, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

5. DreamLLM: Synergistic Multimodal Comprehension and Creation

Author: Dong, Runpei, Han, Chunrui, Peng, Yuang, Qi, Zekun, Ge, Zheng, Yang, Jinrong, Zhao, Liang, Sun, Jianjian, Zhou, Hongyu, Wei, Haoran, Kong, Xiangwen, Zhang, Xiangyu, Ma, Kaisheng, and Yi, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. This approach circumvents the limitations and information loss inherent to external feature extractors like CLIP, and a more thorough multimodal understanding is obtained. Second, DreamLLM fosters the generation of raw, interleaved documents, modeling both text and image contents, along with unstructured layouts. This allows DreamLLM to learn all conditional, marginal, and joint multimodal distributions effectively. As a result, DreamLLM is the first MLLM capable of generating free-form interleaved content. Comprehensive experiments highlight DreamLLM's superior performance as a zero-shot multimodal generalist, reaping from the enhanced learning synergy. Project page: https://dreamllm.github.io., Comment: ICLR 2024 (Spotlight)
Published: 2023

6. VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

Author: Qi, Zekun, Yu, Muzhou, Dong, Runpei, and Ma, Kaisheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Conditional 3D generation is undergoing a significant advancement, enabling the free creation of 3D content from inputs such as text or 2D images. However, previous approaches have suffered from low inference efficiency, limited generation categories, and restricted downstream applications. In this work, we revisit the impact of different 3D representations on generation quality and efficiency. We propose a progressive generation method through Voxel-Point Progressive Representation (VPP). VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects. VPP can generate high-quality 8K point clouds within 0.2 seconds. Additionally, the masked generation Transformer allows for various 3D downstream tasks, such as generation, editing, completion, and pre-training. Extensive experiments demonstrate that VPP efficiently generates high-fidelity and diverse 3D shapes across different categories, while also exhibiting excellent representation transfer performance. Codes will be released at \url{https://github.com/qizekun/VPP}., Comment: Accepted at NeurIPS 2023
Published: 2023

7. Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast

Author: Fan, Guofan, Qi, Zekun, Shi, Wenkai, and Ma, Kaisheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Geometry and color information provided by the point clouds are both crucial for 3D scene understanding. Two pieces of information characterize the different aspects of point clouds, but existing methods lack an elaborate design for the discrimination and relevance. Hence we explore a 3D self-supervised paradigm that can better utilize the relations of point cloud information. Specifically, we propose a universal 3D scene pre-training framework via Geometry-Color Contrast (Point-GCC), which aligns geometry and color information using a Siamese network. To take care of actual application tasks, we design (i) hierarchical supervision with point-level contrast and reconstruct and object-level contrast based on the novel deep clustering module to close the gap between pre-training and downstream tasks; (ii) architecture-agnostic backbone to adapt for various downstream models. Benefiting from the object-level representation associated with downstream tasks, Point-GCC can directly evaluate model performance and the result demonstrates the effectiveness of our methods. Transfer learning results on a wide range of tasks also show consistent improvements across all datasets. e.g., new state-of-the-art object detection results on SUN RGB-D and S3DIS datasets. Codes will be released at https://github.com/Asterisci/Point-GCC.
Published: 2023

8. Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Author: Qi, Zekun, Dong, Runpei, Fan, Guofan, Ge, Zheng, Zhang, Xiangyu, Ma, Kaisheng, and Yi, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Mainstream 3D representation learning approaches are built upon contrastive or generative modeling pretext tasks, where great improvements in performance on various downstream tasks have been achieved. However, we find these two paradigms have different characteristics: (i) contrastive models are data-hungry that suffer from a representation over-fitting issue; (ii) generative models have a data filling issue that shows inferior data scaling capacity compared to contrastive models. This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms. In this paper, we propose Contrast with Reconstruct (ReCon) that unifies these two paradigms. ReCon is trained to learn from both generative modeling teachers and single/cross-modal contrastive teachers through ensemble distillation, where the generative student guides the contrastive student. An encoder-decoder style ReCon-block is proposed that transfers knowledge through cross attention with stop-gradient, which avoids pretraining over-fitting and pattern difference issues. ReCon achieves a new state-of-the-art in 3D representation learning, e.g., 91.26% accuracy on ScanObjectNN. Codes have been released at https://github.com/qizekun/ReCon., Comment: Accepted at ICML 2023
Published: 2023

9. Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

Author: Dong, Runpei, Qi, Zekun, Zhang, Linfeng, Zhang, Junbo, Sun, Jianjian, Ge, Zheng, Yi, Li, and Ma, Kaisheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN. Codes have been released at https://github.com/RunpeiDong/ACT., Comment: Accepted at ICLR 2023
Published: 2022

10. Rapid start-up of a nitritation granular reactor using activated sludge as inoculum at the influent organics/ammonium mass ratio of 2/1

Author: Wang, Jianfang, Zhang, Zeyu, Qian, Feiyue, Shen, Yaoliang, Qi, Zekun, Ji, Xiaoqing, and Kajamisso, Emma Marcello Lagu
Published: 2018
Full Text: View/download PDF

11. Bidirectional transformation between BPMN and BPEL with graph grammar

Author: Shi, Zhan, primary, Zeng, Xiaoqin, additional, Zhang, Tingting, additional, Huang, Song, additional, Qi, Zekun, additional, Li, Hui, additional, Hu, Bin, additional, Yao, Yi, additional, and Zhong, Shuiming, additional
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Qi, Zekun"'

1. Positional Prompt Tuning for Efficient 3D Representation Learning

2. DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

3. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

4. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

5. DreamLLM: Synergistic Multimodal Comprehension and Creation

6. VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

7. Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast

8. Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

9. Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

10. Rapid start-up of a nitritation granular reactor using activated sludge as inoculum at the influent organics/ammonium mass ratio of 2/1

11. Bidirectional transformation between BPMN and BPEL with graph grammar

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

11 results on '"Qi, Zekun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources