Author: "Ren, Jian" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ren, Jian"' showing total 5,508 results

Start Over Author "Ren, Jian"

5,508 results on '"Ren, Jian"'

1. Scalable Ranked Preference Optimization for Text-to-Image Generation

Author: Karthik, Shyamgopal, Coskun, Huseyin, Akata, Zeynep, Tulyakov, Sergey, Ren, Jian, and Kag, Anil
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-image (T2I) models with human feedback. Unfortunately, successful application of DPO to T2I models requires a huge amount of resources to collect and label large-scale datasets, e.g., millions of generated paired images annotated with human preferences. In addition, these human preference datasets can get outdated quickly as the rapid improvements of T2I models lead to higher quality images. In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training. Specifically, the preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process, greatly improving the dataset collection efficiency. Moreover, we demonstrate that such datasets allow averaging predictions across multiple models and collecting ranked preferences as opposed to pairwise preferences. Furthermore, we introduce RankDPO to enhance DPO-based methods using the ranking feedback. Applying RankDPO on SDXL and SD3-Medium models with our synthetically generated preference dataset "Syn-Pic" improves both prompt-following (on benchmarks like T2I-Compbench, GenEval, and DPG-Bench) and visual quality (through user studies). This pipeline presents a practical and scalable solution to develop better preference datasets to enhance the performance of text-to-image models., Comment: Project Page: https://snap-research.github.io/RankDPO/
Published: 2024

2. ControlMM: Controllable Masked Motion Generation

Author: Pinyoanuntapong, Ekkasit, Saleem, Muhammad Usama, Karunratanakul, Korrawe, Wang, Pu, Xue, Hongfei, Chen, Chen, Guo, Chuan, Cao, Junli, Ren, Jian, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in motion diffusion models have enabled spatially controllable text-to-motion generation. However, despite achieving acceptable control precision, these models suffer from generation speed and fidelity limitations. To address these challenges, we propose ControlMM, a novel approach incorporating spatial control signals into the generative masked motion model. ControlMM achieves real-time, high-fidelity, and high-precision controllable motion generation simultaneously. Our approach introduces two key innovations. First, we propose masked consistency modeling, which ensures high-fidelity motion generation via random masking and reconstruction, while minimizing the inconsistency between the input control signals and the extracted control signals from the generated motion. To further enhance control precision, we introduce inference-time logit editing, which manipulates the predicted conditional motion distribution so that the generated motion, sampled from the adjusted distribution, closely adheres to the input control signals. During inference, ControlMM enables parallel and iterative decoding of multiple motion tokens, allowing for high-speed motion generation. Extensive experiments show that, compared to the state of the art, ControlMM delivers superior results in motion quality, with better FID scores (0.061 vs 0.271), and higher control precision (average error 0.0091 vs 0.0108). ControlMM generates motions 20 times faster than diffusion-based methods. Additionally, ControlMM unlocks diverse applications such as any joint any frame control, body part timeline control, and obstacle avoidance. Video visualization can be found at https://exitudio.github.io/ControlMM-page, Comment: project page https://exitudio.github.io/ControlMM-page
Published: 2024

3. Efficient Training with Denoised Neural Weights

Author: Gong, Yifan, Zhan, Zheng, Li, Yanyu, Idelbayev, Yerlan, Zharkov, Andrey, Aberman, Kfir, Tulyakov, Sergey, Wang, Yanzhi, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights spanning a wide range. Specifically, we first collect a dataset with various image editing concepts and their corresponding trained weights, which are later used for the training of the weight generator. To address the different characteristics among layers and the substantial number of weights to be predicted, we divide the weights into equal-sized blocks and assign each block an index. Subsequently, a diffusion model is trained with such a dataset using both text conditions of the concept and the block indexes. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds. Compared to training from scratch (i.e., Pix2pix), we achieve a 15x training time acceleration for a new concept while obtaining even better image generation quality., Comment: ECCV 2024. Project Page: https://yifanfanfanfan.github.io/denoised-weights/
Published: 2024

4. Lightweight Predictive 3D Gaussian Splats

Author: Cao, Junli, Goel, Vidit, Wang, Chaoyang, Kag, Anil, Hu, Ju, Korolev, Sergei, Jiang, Chenfanfu, Tulyakov, Sergey, and Ren, Jian
Subjects: Computer Science - Graphics, Computer Science - Artificial Intelligence
Abstract: Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space. This poses a very practical limitation, prohibiting widespread adoption.Several solutions have been proposed to strike a balance between disk size and rendering quality, noticeably reducing the visual quality. In this work, we propose a new representation that dramatically reduces the hard drive footprint while featuring similar or improved quality when compared to the standard 3D Gaussian splats. When compared to other compact solutions, ours offers higher quality renderings with significantly reduced storage, being able to efficiently run on a mobile device in real-time. Our key observation is that nearby points in the scene can share similar representations. Hence, only a small ratio of 3D points needs to be stored. We introduce an approach to identify such points which are called parent points. The discarded points called children points along with attributes can be efficiently predicted by tiny MLPs., Comment: Project Page: https://plumpuddings.github.io/LPGS//
Published: 2024

5. BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Author: Sui, Yang, Li, Yanyu, Kag, Anil, Idelbayev, Yerlan, Cao, Junli, Hu, Ju, Sagar, Dhritiman, Yuan, Bo, Tulyakov, Sergey, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality., Comment: NeurIPS 2024. Project Page: https://snap-research.github.io/BitsFusion
Published: 2024

6. SF-V: Single Forward Video Generation Model

Author: Zhang, Zhixing, Li, Yanyu, Wu, Yushu, Xu, Yanwu, Kag, Anil, Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Metaxas, Dimitris, Tulyakov, Sergey, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V., Comment: Project Page: https://snap-research.github.io/SF-V
Published: 2024

7. Quality-Diversity with Limited Resources

Author: Wang, Ren-Jian, Xue, Ke, Guan, Cong, and Qian, Chao
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Quality-Diversity (QD) algorithms have emerged as a powerful optimization paradigm with the aim of generating a set of high-quality and diverse solutions. To achieve such a challenging goal, QD algorithms require maintaining a large archive and a large population in each iteration, which brings two main issues, sample and resource efficiency. Most advanced QD algorithms focus on improving the sample efficiency, while the resource efficiency is overlooked to some extent. Particularly, the resource overhead during the training process has not been touched yet, hindering the wider application of QD algorithms. In this paper, we highlight this important research question, i.e., how to efficiently train QD algorithms with limited resources, and propose a novel and effective method called RefQD to address it. RefQD decomposes a neural network into representation and decision parts, and shares the representation part with all decision parts in the archive to reduce the resource overhead. It also employs a series of strategies to address the mismatch issue between the old decision parts and the newly updated representation part. Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of RefQD: it not only uses significantly fewer resources (e.g., 16\% GPU memories on QDax and 3.7\% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our code is available at \url{https://github.com/lamda-bbo/RefQD}., Comment: ICML 2024
Published: 2024

8. Calibrating non-parametric morphological indicators from {\it JWST} images for galaxies over $0.5<z<3$

Author: Ren, Jian, Liu, F. S., Li, Nan, Cui, Qifan, Zhao, Pinsong, Li, Yubin, Song, Qi, Yesuf, Hassen M., and Zheng, Xian Zhong
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: The measurements of morphological indicators of galaxies are often influenced by a series of observational effects. In this study, we utilize a sample of over 800 TNG50 simulated galaxies with log($M_*$/M$_\odot$)$>9$ at $0.51$\,$\mu$m. The morphological indicators of star-forming galaxies (SFGs) and quiescent galaxies (QGs) are significantly different. The morphologies of QGs exhibit a higher sensitivity to rest-frame wavelength than SFGs. After analyzing the evolution of morphological indicators in the rest-frame V-band (0.5-0.7\,$\mu$m) and rest-frame J-band (1.1-1.4\,$\mu$m), we find that the morphologies of QGs evolve substantially with both redshift and stellar mass. For SFGs, the $C$, $Gini$ and $M_{\rm 20}$ show a rapid evolution with stellar mass at log($M_*$/M$_\odot$)$\geq10.5$, while the $A_{\rm O}$, $D_{\rm O}$ and $A$ evolve with both redshift and stellar mass. Our comparison shows that TNG50 simulations effectively reproduce the morphological indicators we measured from {\it JWST} observations when the impact of dust attenuation is considered., Comment: 21 pages, 14 figures, 1 table. Accepted for publication in ApJ
Published: 2024

9. TextCraftor: Your Text Encoder Can be Image Quality Controller

Author: Li, Yanyu, Liu, Xian, Kag, Anil, Hu, Ju, Idelbayev, Yerlan, Sagar, Dhritiman, Wang, Yanzhi, Tulyakov, Sergey, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs with carefully crafted prompts are required to achieve satisfactory results. To mitigate these limitations, numerous studies have endeavored to fine-tune the pre-trained diffusion models, i.e., UNet, utilizing various technologies. Yet, amidst these efforts, a pivotal question of text-to-image diffusion model training has remained largely unexplored: Is it possible and feasible to fine-tune the text encoder to improve the performance of text-to-image diffusion models? Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments. Interestingly, our technique also empowers controllable image generation through the interpolation of different text encoders fine-tuned with various rewards. We also demonstrate that TextCraftor is orthogonal to UNet finetuning, and can be combined to further improve generative quality.
Published: 2024

10. Tris-buffered efficacy: enhancing stability and reversibility of Zn anode by efficient modulation at Zn/electrolyte interface

Author: Wang, Yong-Jian, Li, Su-Hong, Li, Lin, Ren, Jian-Yong, Shen, Ling-Di, and Lai, Chao
Published: 2024
Full Text: View/download PDF

11. Heparan sulfate acts as an activator of the NLRP3 inflammasome promoting inflammatory response in the development of acute pancreatitis

Author: Zhao, Li-Jun, Chen, Peng, Huang, Ling, He, Wen-Qi, Tang, Ying-Rui, Wang, Rui, Luo, Zhu-Lin, and Ren, Jian-Dong
Published: 2024
Full Text: View/download PDF

12. Convenient room-temperature synthesis of sulfur and nitrogen co-doped NiCo-LDH coupled with CNTs on NiCo foam as battery-type electrode for high performance hybrid supercapacitor

Author: Song, Li, Zhong, Xiao-Hong, Wang, Fang-Lin, Huang, Zhi-Hui, Hong, Zhe, Gao, Yun-Fang, Wang, Hai-Dong, Ren, Jian-Wei, Peng, Sheng-Jie, and Li, Lei
Published: 2024
Full Text: View/download PDF

13. Experimental study on the frictional capacity of square pile–cemented soil interface with different surface roughness

Author: Zhou, Jia-jin, Zhou, Shi-le, Yu, Jian-lin, Ma, Jun-jie, Zhang, Ri-hong, Gong, Xiao-nan, and Ren, Jian-fei
Published: 2024
Full Text: View/download PDF

14. Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Author: Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions. Accordingly, to establish a video dataset with high-quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames. Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video. Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions. We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation. The models trained on the proposed data score substantially better on the majority of metrics across all the tasks., Comment: CVPR 2024. Project Page: https://snap-research.github.io/Panda-70M
Published: 2024

15. Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Author: Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Deyneka, Ekaterina, Chen, Tsai-Shien, Kag, Anil, Fang, Yuwei, Stoliar, Aleksei, Ricci, Elisa, Ren, Jian, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a video-first model that systematically addresses these challenges. To do that, we first extend the EDM framework to take into account spatially and temporally redundant pixels and naturally support video generation. Second, we show that a U-Net - a workhorse behind image generation - scales poorly when generating videos, requiring significant computational overhead. Hence, we propose a new transformer-based architecture that trains 3.31 times faster than U-Nets (and is ~4.5 faster at inference). This allows us to efficiently train a text-to-video model with billions of parameters for the first time, reach state-of-the-art results on a number of benchmarks, and generate videos with substantially higher quality, temporal consistency, and motion complexity. The user studies showed that our model was favored by a large margin over the most recent methods. See our website at https://snap-research.github.io/snapvideo/.
Published: 2024

16. Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

Author: Rahman, Tanzila, Mahajan, Shweta, Lee, Hsin-Ying, Ren, Jian, Tulyakov, Sergey, and Sigal, Leonid
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-image (TTI) diffusion models have demonstrated impressive results in generating high-resolution images of complex and imaginative scenes. Recent approaches have further extended these methods with personalization techniques that allow them to integrate user-illustrated concepts (e.g., the user him/herself) using a few sample image illustrations. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. In this work, we propose a concept-driven TTI personalization framework that addresses these core challenges. We build on existing works that learn custom tokens for user-illustrated concepts, allowing those to interact with existing text tokens in the TTI model. However, importantly, to disentangle and better learn the concepts in question, we jointly learn (latent) segmentation masks that disentangle these concepts in user-provided image illustrations. We do so by introducing an Expectation Maximization (EM)-like optimization procedure where we alternate between learning the custom tokens and estimating (latent) masks encompassing corresponding concepts in user-supplied images. We obtain these masks based on cross-attention, from within the U-Net parameterized latent diffusion model and subsequent DenseCRF optimization. We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a by-product, latent masks. We illustrate the benefits of the proposed approach qualitatively and quantitatively with several examples and use cases that can combine three or more entangled concepts., Comment: 11 Figures, 14 Pages, 2 tables
Published: 2024

17. SPAD : Spatially Aware Multiview Diffusers

Author: Kant, Yash, Wu, Ziyi, Vasilkovsky, Michael, Qian, Guocheng, Ren, Jian, Guler, Riza Alp, Ghanem, Bernard, Tulyakov, Sergey, Gilitschenski, Igor, and Siarohin, Aliaksandr
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVDream) leads to content copying between views. Therefore, we explicitly constrain the cross-view attention based on epipolar geometry. To further enhance 3D consistency, we utilize Plucker coordinates derived from camera rays and inject them as positional encoding. This enables SPAD to reason over spatial proximity in 3D well. In contrast to recent works that can only generate views at fixed azimuth and elevation, SPAD offers full camera control and achieves state-of-the-art results in novel view synthesis on unseen objects from the Objaverse and Google Scanned Objects datasets. Finally, we demonstrate that text-to-3D generation using SPAD prevents the multi-face Janus issue. See more details at our webpage: https://yashkant.github.io/spad, Comment: Webpage: https://yashkant.github.io/spad
Published: 2024

18. AToM: Amortized Text-to-Mesh using 2D Diffusion

Author: Qian, Guocheng, Cao, Junli, Siarohin, Aliaksandr, Kant, Yash, Wang, Chaoyang, Vasilkovsky, Michael, Lee, Hsin-Ying, Fang, Yuwei, Skorokhodov, Ivan, Zhuang, Peiye, Gilitschenski, Igor, Ren, Jian, Ghanem, Bernard, Aberman, Kfir, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly output representations other than polygonal meshes, AToM directly generates high-quality textured meshes in less than 1 second with around 10 times reduction in the training cost, and generalizes to unseen prompts. Our key idea is a novel triplane-based text-to-mesh architecture with a two-stage amortized optimization strategy that ensures stable training and enables scalability. Through extensive experiments on various prompt benchmarks, AToM significantly outperforms state-of-the-art amortized approaches with over 4 times higher accuracy (in DF415 dataset) and produces more distinguishable and higher-quality 3D outputs. AToM demonstrates strong generalizability, offering finegrained 3D assets for unseen interpolated prompts without further optimization during inference, unlike per-prompt solutions., Comment: 19 pages with appendix and references. Webpage: https://snap-research.github.io/AToM/
Published: 2024

19. Quality-Diversity Algorithms Can Provably Be Helpful for Optimization

Author: Qian, Chao, Xue, Ke, and Wang, Ren-Jian
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Quality-Diversity (QD) algorithms are a new type of Evolutionary Algorithms (EAs), aiming to find a set of high-performing, yet diverse solutions. They have found many successful applications in reinforcement learning and robotics, helping improve the robustness in complex environments. Furthermore, they often empirically find a better overall solution than traditional search algorithms which explicitly search for a single highest-performing solution. However, their theoretical analysis is far behind, leaving many fundamental questions unexplored. In this paper, we try to shed some light on the optimization ability of QD algorithms via rigorous running time analysis. By comparing the popular QD algorithm MAP-Elites with $(\mu+1)$-EA (a typical EA focusing on finding better objective values only), we prove that on two NP-hard problem classes with wide applications, i.e., monotone approximately submodular maximization with a size constraint, and set cover, MAP-Elites can achieve the (asymptotically) optimal polynomial-time approximation ratio, while $(\mu+1)$-EA requires exponential expected time on some instances. This provides theoretical justification for that QD algorithms can be helpful for optimization, and discloses that the simultaneous search for high-performing solutions with diverse behaviors can provide stepping stones to good overall solutions and help avoid local optima., Comment: The conference version of this paper has appeared at IJCAI'24. This version contains all the proof details
Published: 2024

20. E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Author: Gong, Yifan, Zhan, Zheng, Jin, Qing, Li, Yanyu, Idelbayev, Yerlan, Liu, Xian, Zharkov, Andrey, Aberman, Kfir, Tulyakov, Sergey, Wang, Yanzhi, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkably reduced training and storage costs for each concept., Comment: ICML 2024. Project Page: https://yifanfanfanfan.github.io/e2gan/
Published: 2024

21. Somatic BrafV600E mutation in the cerebral endothelium induces brain arteriovenous malformations

Author: Tu, Tianqi, Yu, Jiaxing, Jiang, Chendan, Zhang, Shikun, Li, Jingwei, Ren, Jian, Zhang, Shiju, Zhou, Yuan, Cui, Ziwei, Lu, Haohan, Meng, Xiaosheng, Wang, Zhanjing, Xing, Dong, Zhang, Hongqi, and Hong, Tao
Published: 2024
Full Text: View/download PDF

22. Empirical likelihood MLE for joint modeling right censored survival data with longitudinal covariates

Author: Ren, Jian-Jian and Shi, Yuyin
Published: 2024
Full Text: View/download PDF

23. Hydrogen production from hydrolysis of NaBH4 solution over Co–Fe–B@g-C3N4/NF thin film catalyst

Author: Wang, Yan, Ma, Jia-Xin, Ren, Jian, Zhang, Di, Xu, Feng-Yan, Zhang, Ke, Cao, Zhong-Qiu, Sun, Qiu-Ju, Li, Guo-De, Wu, Shi-Wei, and Chen, Hong-Hui
Published: 2024
Full Text: View/download PDF

24. Attention-based multiple siamese networks with primary representation guiding for offline signature verification

Author: Xiong, Yu-Jie, Cheng, Song-Yang, Ren, Jian-Xin, and Zhang, Yu-Jin
Published: 2024
Full Text: View/download PDF

25. LightSpeed: Light and Fast Neural Light Fields on Mobile Devices

Author: Gupta, Aarush, Cao, Junli, Wang, Chaoyang, Hu, Ju, Tulyakov, Sergey, Ren, Jian, and Jeni, László A
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Real-time novel-view image synthesis on mobile devices is prohibitive due to the limited computational power and storage. Using volumetric rendering methods, such as NeRF and its derivatives, on mobile devices is not suitable due to the high computational cost of volumetric rendering. On the other hand, recent advances in neural light field representations have shown promising real-time view synthesis results on mobile devices. Neural light field methods learn a direct mapping from a ray representation to the pixel color. The current choice of ray representation is either stratified ray sampling or Plucker coordinates, overlooking the classic light slab (two-plane) representation, the preferred representation to interpolate between light field views. In this work, we find that using the light slab representation is an efficient representation for learning a neural light field. More importantly, it is a lower-dimensional ray representation enabling us to learn the 4D ray space using feature grids which are significantly faster to train and render. Although mostly designed for frontal views, we show that the light-slab representation can be further extended to non-frontal scenes using a divide-and-conquer strategy. Our method offers superior rendering quality compared to previous light field methods and achieves a significantly improved trade-off between rendering quality and speed., Comment: Project Page: http://lightspeed-r2l.github.io/ . Add camera ready version
Published: 2023

26. iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

Author: Kant, Yash, Siarohin, Aliaksandr, Vasilkovsky, Michael, Guler, Riza Alp, Ren, Jian, Tulyakov, Sergey, and Gilitschenski, Igor
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaverse dataset to learn 3D object priors. While training we use a novel masking mechanism based on epipolar lines to further improve the quality of our approach. This allows our framework to perform zero-shot novel view synthesis on a variety of objects. We evaluate the zero-shot abilities of our framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D. See our webpage for more details: https://yashkant.github.io/invs/, Comment: Accepted to SIGGRAPH Asia, 2023 (Conference Papers)
Published: 2023

27. HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Author: Liu, Xian, Ren, Jian, Siarohin, Aliaksandr, Skorokhodov, Ivan, Li, Yanyu, Lin, Dahua, Liu, Xihui, Liu, Ziwei, and Tulyakov, Sergey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing models like Stable Diffusion and DALL-E 2 tend to generate human images with incoherent parts or unnatural poses. To tackle these challenges, our key insight is that human image is inherently structural over multiple granularities, from the coarse-level body skeleton to fine-grained spatial geometry. Therefore, capturing such correlations between the explicit appearance and latent structure in one model is essential to generate coherent and natural human images. To this end, we propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Specifically, 1) we first build a large-scale human-centric dataset, named HumanVerse, which consists of 340M images with comprehensive annotations like human pose, depth, and surface normal. 2) Next, we propose a Latent Structural Diffusion Model that simultaneously denoises the depth and surface normal along with the synthesized RGB image. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness. 3) Finally, to further boost the visual quality, we propose a Structure-Guided Refiner to compose the predicted conditions for more detailed generation of higher resolution. Extensive experiments demonstrate that our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios. Project Page: https://snap-research.github.io/HyperHuman/, Comment: Accepted by ICLR 2024, camera-ready version. Project Page: https://snap-research.github.io/HyperHuman/
Published: 2023

28. Diversity from Human Feedback

Author: Wang, Ren-Jian, Xue, Ke, Wang, Yutong, Yang, Peng, Fu, Haobo, Fu, Qiang, and Qian, Chao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.
Published: 2023

29. Revisiting Galaxy Evolution in Morphology in the COSMOS field (COSMOS-ReGEM):I. Merging Galaxies

Author: Ren, Jian, Li, Nan, Liu, F. S., Cui, Qifan, Fu, Mingxiang, and Zheng, Xian Zhong
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: We revisit the evolution of galaxy morphology in the COSMOS field over the redshift range $0.2\leq z \leq 1$, using a large and complete sample of 33,605 galaxies with a stellar mass of log($M_{\ast}$/M$_{\odot} )>9.5$ with significantly improved redshifts and comprehensive non-parametric morphological parameters. Our sample has 13,881 ($\sim41.3\%$) galaxies with reliable spectroscopic redshifts and has more accurate photometric redshifts with a $\sigma_{\rm NMAD} \sim 0.005$. This paper is the first in a series that investigates merging galaxies and their properties. We identify 3,594 major merging galaxies through visual inspection and find 1,737 massive galaxy pairs with log($M_\ast$/M$_\odot$)$>10.1$. Among the family of non-parametric morphological parameters including $C$, $A$, $S$, $Gini$, $M_{\rm 20}$, $A_{\rm O}$, and $D_{\rm O}$, we find that the outer asymmetry parameter $A_{\rm O}$ and the second-order momentum parameter $M_{\rm 20}$ are the best tracers of merging features than other combinations. Hence, we propose a criterion for selecting candidates of violently star-forming mergers: $M_{\rm 20}> -3A_{\rm O}+3$ at $0.2 -6A_{\rm O}+3.7$ at $0.6
Published: 2023

30. Black-Box Attacks against Signed Graph Analysis via Balance Poisoning

Author: Zhou, Jialong, Lai, Yuni, Ren, Jian, and Zhou, Kai
Subjects: Computer Science - Cryptography and Security, Computer Science - Social and Information Networks
Abstract: Signed graphs are well-suited for modeling social networks as they capture both positive and negative relationships. Signed graph neural networks (SGNNs) are commonly employed to predict link signs (i.e., positive and negative) in such graphs due to their ability to handle the unique structure of signed graphs. However, real-world signed graphs are vulnerable to malicious attacks by manipulating edge relationships, and existing adversarial graph attack methods do not consider the specific structure of signed graphs. SGNNs often incorporate balance theory to effectively model the positive and negative links. Surprisingly, we find that the balance theory that they rely on can ironically be exploited as a black-box attack. In this paper, we propose a novel black-box attack called balance-attack that aims to decrease the balance degree of the signed graphs. We present an efficient heuristic algorithm to solve this NP-hard optimization problem. We conduct extensive experiments on five popular SGNN models and four real-world datasets to demonstrate the effectiveness and wide applicability of our proposed attack method. By addressing these challenges, our research contributes to a better understanding of the limitations and resilience of robust models when facing attacks on SGNNs. This work contributes to enhancing the security and reliability of signed graph analysis in social network modeling. Our PyTorch implementation of the attack is publicly available on GitHub: https://github.com/JialongZhou666/Balance-Attack.git.
Published: 2023

31. DSCI: a database of synthetic biology components for innate immunity and cell engineering decision-making processes

Author: Zhang, Chenqiu, Chen, Tianjian, Li, Zhiyu, Lu, Qing, Luo, Xiaotong, Cai, Sihui, Zhou, Jie, Ren, Jian, and Cui, Jun
Published: 2024
Full Text: View/download PDF

32. Intraoperative changes in electrophysiological monitoring can be used to predict clinical outcomes in patients with spinal cavernous malformation

Author: Li Xiaoyu, Zhang Hongqi, and Ren Jian
Subjects: motor evoked potentials, electrophysiological monitoring, spinal cavernous malformations, somatosensory evoked potentials, intraoperative neuromonitoring, Medicine
Abstract: The study aimed to evaluate the sensitivity and specificity of these monitoring parameters in predicting postoperative neurological dysfunction.
Published: 2024
Full Text: View/download PDF

33. Effect of Production Processes on the Quality Characteristics and Flavor of Corn Germ Oil

Author: SUN Yang, REN Jian, SONG Chunli, ZHAO Yue
Subjects: corn germ, oil preparation processes, physicochemical properties, spectral analysis, flavor substances, Food processing and manufacture, TP368-456
Abstract: This study was undertaken in order to evaluate the effects of different processing techniques, namely, enzyme-assisted cold-pressing, Soxhlet extraction and aqueous enzymatic extraction on the basic physicochemical properties, antioxidant activity, fatty acid composition, spectral characteristics and volatile flavor substances of corn germ oil. The results showed that the acid value and peroxide value of corn germ oil obtained by enzyme-assisted cold pressing were higher than those obtained by the other methods. The hydroxyl radical scavenging ability of the oil obtained by enzyme-assisted cold pressing was the highest, and there was no significant difference in the contents of linoleic acid, linolenic acid or unsaturated fatty acids oil among the three oil samples. The full spectrum analysis showed that the highest content of linolenic acid (conjugated trienoic acid) was found in the oil obtained by aqueous enzymatic extraction. The composition and relative content of volatile compounds in corn germ oil varied significantly with the different production processes. In summary, the processing methods can affect the physicochemical properties, antioxidant properties and flavor substances of corn germ oil, which provides a theoretical reference for the utilization of corn processing by-products and the moderate processing of corn germ oil.
Published: 2024
Full Text: View/download PDF

34. CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

Author: Peng, Tianhao, Liang, Yu, Wu, Wenjun, Ren, Jian, Pengrui, Zhao, and Pu, Yanjun
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence
Abstract: Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic performance. In order to bridge this gap, we choose a software engineering course as the study subject. The students who participate in a software engineering course are required to team up and complete a software project together. In this work, we construct an interaction graph based on the activities of students grouped in various teams. Based on this student interaction graph, we present an extended graph transformer framework for collaborative learning (CLGT) for evaluating and predicting the performance of students. Moreover, the proposed CLGT contains an interpretation module that explains the prediction results and visualizes the student interaction patterns. The experimental results confirm that the proposed CLGT outperforms the baseline models in terms of performing predictions based on the real-world datasets. Moreover, the proposed CLGT differentiates the students with poor performance in the collaborative learning paradigm and gives teachers early warnings, so that appropriate assistance can be provided., Comment: 8 pages, 5 figures, conference: AAAI
Published: 2023

35. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Author: Qian, Guocheng, Mai, Jinjie, Hamdi, Abdullah, Ren, Jian, Siarohin, Aliaksandr, Li, Bing, Lee, Hsin-Ying, Skorokhodov, Ivan, Wonka, Peter, Tulyakov, Sergey, and Ghanem, Bernard
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123., Comment: webpage: https://guochengqian.github.io/project/magic123/
Published: 2023

36. Fast and Automatic 3D Modeling of Antenna Structure Using CNN-LSTM Network for Efficient Data Generation

Author: Wei, Zhaohui, Zhou, Zhao, Wang, Peng, Ren, Jian, Yin, Yingzeng, Pedersen, Gert Frølund, and Shen, Ming
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering properties, etc.) is mandatory for these methods to ensure good performance. Traditional antenna modeling methods relying on manual model construction and modification are time-consuming and cannot meet the requirement of efficient training data acquisition. In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures. Specifically, our method only needs an image of the antenna structure, usually available in scientific publications, as the input while the corresponding modeling codes (VBA language) are generated automatically. The proposed model mainly consists of two parts: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) networks. The former is used for capturing features of antenna structure images and the latter is employed to generate the modeling codes. Through training, the proposed model can achieve fast and automatic data acquisition of antenna physical structures based on antenna images. Experiment results show that the proposed method achieves a significant speed enhancement than the manual modeling approach. This approach lays the foundation for efficient data acquisition needed to build robust surrogate models in the future.
Published: 2023

37. Robust and Efficient Fault Diagnosis of mm-Wave Active Phased Arrays using Baseband Signal

Author: Nielsen, Martin H., Zhang, Yufeng, Xue, Changbin, Ren, Jian, Yin, Yingzeng, Shen, Ming, and Pedersen, Gert F.
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Artificial Intelligence
Abstract: One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasible for on-site deployment. This paper proposes a novel method exploiting a Deep Neural Network (DNN) tailored to extract the features hidden in the baseband in-phase and quadrature signals for classifying the different faults. It requires only a single probe in one measurement point for fast and accurate diagnosis of the faulty elements and components in APAs. Validation of the proposed method is done using a commercial 28 GHz APA. Accuracies of 99% and 80% have been demonstrated for single- and multi-element failure detection, respectively. Three different test scenarios are investigated: on-off antenna elements, phase variations, and magnitude attenuation variations. In a low signal to noise ratio of 4 dB, stable fault detection accuracy above 90% is maintained. This is all achieved with a detection time of milliseconds (e.g 6~ms), showing a high potential for on-site deployment., Comment: 10 pages
Published: 2023
Full Text: View/download PDF

38. SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Author: Li, Yanyu, Wang, Huan, Jin, Qing, Hu, Ju, Chemerys, Pavlo, Fu, Yun, Wang, Yanzhi, Tulyakov, Sergey, and Ren, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run. As a result, high-end GPUs and cloud-based inference are required to run diffusion models at scale. This is costly and has privacy implications, especially when user data is sent to a third party. To overcome these challenges, we present a generic approach that, for the first time, unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds. We achieve so by introducing efficient network architecture and improving step distillation. Specifically, we propose an efficient UNet by identifying the redundancy of the original model and reducing the computation of the image decoder via data distillation. Further, we enhance the step distillation by exploring training strategies and introducing regularization from classifier-free guidance. Our extensive experiments on MS-COCO show that our model with $8$ denoising steps achieves better FID and CLIP scores than Stable Diffusion v$1.5$ with $50$ steps. Our work democratizes content creation by bringing powerful text-to-image diffusion models to the hands of users., Comment: Our project webpage: https://snap-research.github.io/SnapFusion/
Published: 2023

39. Effect of onion skin powder on color, lipid, and protein oxidative stability of premade beef patty during cold storage

Author: Wang, Cuntang, Wang, Yuqing, Song, Yang, Ren, Manni, Gao, Zengming, and Ren, Jian
Published: 2024
Full Text: View/download PDF

40. Targeting AGTPBP1 inhibits pancreatic cancer progression via regulating microtubules and ERK signaling pathway

Author: Li, Ding-zhong, Yang, Zhe-yu, leng, Asi, Zhang, Qian, Zhang, Xiao-dong, Bian, Yan-chao, Xiao, Rui, and Ren, Jian-jun
Published: 2024
Full Text: View/download PDF

41. Multisensory gamma stimulation mitigates the effects of demyelination induced by cuprizone in male mice

Author: Rodrigues-Amorim, Daniela, Bozzelli, P. Lorenzo, Kim, TaeHyun, Liu, Liwang, Gibson, Oliver, Yang, Cheng-Yi, Murdock, Mitchell H., Galiana-Melendez, Fabiola, Schatz, Brooke, Davison, Alexis, Islam, Md Rezaul, Shin Park, Dong, Raju, Ravikiran M., Abdurrob, Fatema, Nelson, Alissa J., Min Ren, Jian, Yang, Vicky, Stokes, Matthew P., and Tsai, Li-Huei
Published: 2024
Full Text: View/download PDF

42. Recommendations for managing adult acne and adolescent acne based on an epidemiological study conducted in China

Author: Liu, Yan-ting, Wang, Ya-wen, Tu, Chen, Ren, Jian-wen, Huo, Jia, Nan, Xiao-juan, Dou, Jia-hao, Peng, Zi-he, and Zeng, Wei-hui
Published: 2024
Full Text: View/download PDF

43. A generalized interval-valued p,q Rung orthopair fuzzy Maclaurin symmetric mean and modified regret theory based sustainable supplier selection method

Author: Chen, Shuang, Ren, Jian, Ye, KeTing, and Li, FeiYan
Published: 2024
Full Text: View/download PDF

44. A novel method for extraction of high purity and high production Phytophthora sojae oospores

Author: Chu, Xiaomeng, Yin, Ziyi, Yue, Pengjie, Wang, Xinyu, Yang, Yue, Sun, Jiayi, Kong, Ziying, Ren, Jian, Liu, Xiaohan, Lu, Chongchong, Zhao, Haipeng, Li, Yang, and Ding, Xinhua
Published: 2024
Full Text: View/download PDF

45. AStruct: detection of allele-specific RNA secondary structure in structuromic probing data

Author: Xu, Qingru, Bao, Xiaoqiong, Lin, Zhuobin, Tang, Lin, He, Li-na, Ren, Jian, Zuo, Zhixiang, and Hu, Kunhua
Published: 2024
Full Text: View/download PDF

46. Construction and evaluation of a cloud follow-up platform for gynecological patients receiving chemotherapy

Author: Dan, Xin, He, Ya-Lin, Huang, Yan, Ren, Jian-Hua, Wang, Dan-Qing, Yin, Ru-Tie, and Tian, Ya-Lin
Published: 2024
Full Text: View/download PDF

47. Oral cancer cell to endothelial cell communication via exosomal miR-21/RMND5A pathway

Author: Sun, Yu-qi, Wang, Bing, Zheng, Lin-wei, Zhao, Ji-hong, and Ren, Jian-gang
Published: 2024
Full Text: View/download PDF

48. Sodium Formate as a Highly Efficient Sodium Compensation Additive for Sodium-Ion Batteries with a P2-Type Layered Oxide Cathode

Author: Zhao, Binyu, Zhang, Fengping, Li, Weiliang, Wu, Wenwei, Qiu, Shiming, Ren, Jian, Wei, Linyuan, Xu, Lin, and Wu, Xuehang
Published: 2024
Full Text: View/download PDF

49. COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Author: Xiao, Jinqi, Yin, Miao, Gong, Yu, Zang, Xiao, Ren, Jian, and Yuan, Bo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational costs, calling for efficient model compression solutions. To date, pruning ViTs has been well studied, while other compression strategies that have been widely applied in CNN compression, e.g., model factorization, is little explored in the context of ViT compression. This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. Based on the new insight on the multi-head attention layer, we develop a highly efficient ViT compression solution, which outperforms the state-of-the-art pruning methods. For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to $2.6\times$ speedup) and lower extra storage cost (up to $1927.5\times$ reduction) than the existing works., Comment: ICML 2023 Poster
Published: 2023

50. Two-Bit RIS-Aided Communications at 3.5GHz: Some Insights from the Measurement Results Under Multiple Practical Scenes

Author: Zhang, Shun, Sun, Haoran, Yu, Runze, Cui, Hongshenyuan, Ren, Jian, Gao, Feifei, Jin, Shi, Xie, Hongxiang, and Wang, Hao
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit phase quantization. In particular, we adopt a self-developed broadband intelligent communication system 40MHz-Net (BICT-40N) terminal in order to fully acquire the channel information. The terminal mainly includes a baseband board and a radio frequency (RF) front-end board, where the latter can achieve 26 dB transmitting link gain and 33 dB receiving link gain. The orthogonal frequency division multiplexing (OFDM) signal is used for the terminal, where the bandwidth is 40MHz and the subcarrier spacing is 625KHz. Also, the terminal supports a series of modulation modes, including QPSK, QAM, etc.Through experimental tests, we validate a few functions and properties of the RIS as follows. First, we validate a novel RIS power consumption model, which considers both the static and the dynamic power consumption. Besides, we demonstrate the existence of the imaging interference and find that two-bit RIS can lower the imaging interference about 10 dBm. Moreover, we verify that the RIS can outperform the metal plate in terms of the beam focusing performance. In addition, we find that the RIS has the ability to improve the channel stationarity. Then, we realize the multi-beam reflection of the RIS utilizing the pattern addition (PA) algorithm. Lastly, we validate the existence of the mutual coupling between different RIS units.
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,508 results on '"Ren, Jian"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources