Author: "Xiong AS" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xiong AS"' showing total 710,335 results

Start Over Author "Xiong AS"

710,335 results on '"Xiong AS"'

1. GroundingBooth: Grounding Text-to-Image Customization

Author: Xiong, Zhexiao, Xiong, Wei, Shi, Jing, Zhang, He, Song, Yizhi, and Jacobs, Nathan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent studies in text-to-image customization show great success in generating personalized object variants given several images of a subject. While existing methods focus more on preserving the identity of the subject, they often fall short of controlling the spatial relationship between objects. In this work, we introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task. Our proposed text-image grounding module and masked cross-attention layer allow us to generate personalized images with both accurate layout alignment and identity preservation while maintaining text-image coherence. With such layout control, our model inherently enables the customization of multiple subjects at once. Our model is evaluated on both layout-guided image synthesis and reference-based customization tasks, showing strong results compared to existing methods. Our work is the first work to achieve a joint grounding of both subject-driven foreground generation and text-driven background generation.
Published: 2024

2. ProCom: A Few-shot Targeted Community Detection Algorithm

Author: Wu, Xixi, Xiong, Kaiyu, Xiong, Yun, He, Xiaoxin, Zhang, Yao, Jiao, Yizhu, and Zhang, Jiawei
Subjects: Computer Science - Social and Information Networks
Abstract: Targeted community detection aims to distinguish a particular type of community in the network. This is an important task with a lot of real-world applications, e.g., identifying fraud groups in transaction networks. Traditional community detection methods fail to capture the specific features of the targeted community and detect all types of communities indiscriminately. Semi-supervised community detection algorithms, emerged as a feasible alternative, are inherently constrained by their limited adaptability and substantial reliance on a large amount of labeled data, which demands extensive domain knowledge and manual effort. In this paper, we address the aforementioned weaknesses in targeted community detection by focusing on few-shot scenarios. We propose ProCom, a novel framework that extends the ``pre-train, prompt'' paradigm, offering a low-resource, high-efficiency, and transferable solution. Within the framework, we devise a dual-level context-aware pre-training method that fosters a deep understanding of latent communities in the network, establishing a rich knowledge foundation for downstream task. In the prompt learning stage, we reformulate the targeted community detection task into pre-training objectives, allowing the extraction of specific knowledge relevant to the targeted community to facilitate effective and efficient inference. By leveraging both the general community knowledge acquired during pre-training and the specific insights gained from the prompt communities, ProCom exhibits remarkable adaptability across different datasets. We conduct extensive experiments on five benchmarks to evaluate the ProCom framework, demonstrating its SOTA performance under few-shot scenarios, strong efficiency, and transferability across diverse datasets., Comment: Accepted by SIGKDD'2024
Published: 2024

3. iWalker: Imperative Visual Planning for Walking Humanoid Robot

Author: Lin, Xiao, Huang, Yuhao, Fu, Taimeng, Xiong, Xiaobin, and Wang, Chen
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: Humanoid robots, with the potential to perform a broad range of tasks in environments designed for humans, have been deemed crucial for the basis of general AI agents. When talking about planning and controlling, although traditional models and task-specific methods have been extensively studied over the past few decades, they are inadequate for achieving the flexibility and versatility needed for general autonomy. Learning approaches, especially reinforcement learning, are powerful and popular nowadays, but they are inherently "blind" during training, relying heavily on trials in simulation without proper guidance from physical principles or underlying dynamics. In response, we propose a novel end-to-end pipeline that seamlessly integrates perception, planning, and model-based control for humanoid robot walking. We refer to our method as iWalker, which is driven by imperative learning (IL), a self-supervising neuro-symbolic learning framework. This enables the robot to learn from arbitrary unlabeled data, significantly improving its adaptability and generalization capabilities. In experiments, iWalker demonstrates effectiveness in both simulated and real-world environments, representing a significant advancement toward versatile and autonomous humanoid robots.
Published: 2024

4. GRB 240529A: A Tale of Two Shocks

Author: Sun, Tian-Rui, Geng, Jin-Jun, Yan, Jing-Zhi, Hu, You-Dong, Wu, Xue-Feng, Castro-Tirado, Alberto J., Yang, Chao, Ping, Yi-Ding, Hu, Chen-Ran, Xu, Fan, Gao, Hao-Xuan, Jiang, Ji-An, Zhu, Yan-Tian, Xue, Yongquan, Pérez-García, Ignacio, Wu, Si-Yu, Fernández-García, Emilio, Caballero-García, María D., Sánchez-Ramírez, Rubén, Guziy, Sergiy, Olivares, Ignacio, del Pulgar, Carlos Jesus Pérez, Castellón, A., Castillo, Sebastián, Xiong, Ding-Rong, Pandey, Shashi B., Hiriart, David, García-Segura, Guillermo, Lee, William H., Carrasco-García, I. M., Park, Il H., Meintjes, Petrus J., van Heerden, Hendrik J., Martín-Carrillo, Antonio, Hanlon, Lorraine, Zhang, Bin-Bin, Maury, Alain, Hernández-García, L., Gritsevich, Maria, Rossi, Andrea, Maiorano, Elisabetta, Cusano, Felice, D'Avanzo, Paolo, Ferro, Matteo, Melandri, Andrea, De Pasquale, Massimiliano, Brivio, Riccardo, Fang, Min, Fan, Lu-Lu, Hu, Wei-Da, Wan, Zhen, Hu, Lei, Zuo, Ying-Xi, Tang, Jin-Long, Zhang, Xiao-Ling, Zheng, Xian-Zhong, Li, Bin, Luo, Wen-Tao, Liu, Wei, Wang, Jian, Zhang, Hong-Fei, Liu, Hao, Gao, Jie, Liang, Ming, Wang, Hai-Ren, Yao, Da-Zhi, Cheng, Jing-Quan, Zhao, Wen, and Dai, Zi-Gao
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: Thanks to the rapidly increasing time-domain facilities, we are entering a golden era of research on gamma-ray bursts (GRBs). In this Letter, we report our observations of GRB 240529A with the Burst Optical Observer and Transient Exploring System, the 1.5-meter telescope at Observatorio Sierra Nevada, the 2.5-meter Wide Field Survey Telescope of China, the Large Binocular Telescope, and the Telescopio Nazionale Galileo. The prompt emission of GRB 240529A shows two comparable energetic episodes separated by a quiescence time of roughly 400 s. Combining all available data on the GRB Coordinates Network, we reveal the simultaneous apparent X-ray plateau and optical re-brightening around $10^3-10^4$ s after the burst. Rather than the energy injection from the magnetar as widely invoked for similar GRBs, the multi-wavelength emissions could be better explained as two shocks launched from the central engine separately. The optical peak time and our numerical modeling suggest that the initial bulk Lorentz factor of the later shock is roughly 50, which indicates that the later jet should be accretion-driven and have a higher mass loading than a typical one. The quiescence time between the two prompt emission episodes may be caused by the transition between different accretion states of a central magnetar or black hole, or the fall-back accretion process. A sample of similar bursts with multiple emission episodes in the prompt phase and sufficient follow-up could help to probe the underlying physics of GRB central engines., Comment: Resubmitted to ApJL after addressing the referee's comments; comments are welcome
Published: 2024

5. Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation

Author: Yang, Huan, Chen, Jiahui, Ding, Chaofan, Shi, Runhua, Xiong, Siyu, Hong, Qingqi, Mo, Xiaoqi, and Di, Xinhan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Gestures are pivotal in enhancing co-speech communication. While recent works have mostly focused on point-level motion transformation or fully supervised motion representations through data-driven approaches, we explore the representation of gestures in co-speech, with a focus on self-supervised representation and pixel-level motion deviation, utilizing a diffusion model which incorporates latent motion features. Our approach leverages self-supervised deviation in latent representation to facilitate hand gestures generation, which are crucial for generating realistic gesture videos. Results of our first experiment demonstrate that our method enhances the quality of generated videos, with an improvement from 2.7 to 4.5% for FGD, DIV, and FVD, and 8.1% for PSNR, 2.5% for SSIM over the current state-of-the-art methods., Comment: 5 pages, 5 figures, conference
Published: 2024

6. Deep CLAS: Deep Contextual Listen, Attend and Spell

Author: Xiong, Shifu, Wang, Mengzhi, Wan, Genshun, Chen, Hang, Gao, Jianqing, and Dai, Lirong
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Contextual-LAS (CLAS) has been shown effective in improving Automatic Speech Recognition (ASR) of rare words. It relies on phrase-level contextual modeling and attention-based relevance scoring without explicit contextual constraint which lead to insufficient use of contextual information. In this work, we propose deep CLAS to use contextual information better. We introduce bias loss forcing model to focus on contextual information. The query of bias attention is also enriched to improve the accuracy of the bias attention score. To get fine-grained contextual information, we replace phrase-level encoding with character-level encoding and encode contextual information with conformer rather than LSTM. Moreover, we directly use the bias attention score to correct the output probability distribution of the model. Experiments using the public AISHELL-1 and AISHELL-NER. On AISHELL-1, compared to CLAS baselines, deep CLAS obtains a 65.78% relative recall and a 53.49% relative F1-score increase in the named entity recognition scene., Comment: Accepted by NCMMSC 2022
Published: 2024

7. ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

Author: Li, Shen, Xu, Jianqing, Wu, Jiaying, Xiong, Miao, Deng, Ailin, Ji, Jiazhen, Huang, Yuge, Feng, Wenjie, Ding, Shouhong, and Hooi, Bryan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner. Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces. To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation). Inspired by these goals, we introduce a diffusion-fueled SFR model termed $\text{ID}^3$. $\text{ID}^3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances. Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data. This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces. Extensive experiments across five challenging benchmarks validate the advantages of $\text{ID}^3$., Comment: Accepted to NeurIPS 2024
Published: 2024

8. Microscopic Geared Mechanisms

Author: Wang, Gan, Rey, Marcel, Ciarlo, Antonio, Shanei, Mahdi, Xiong, Kunli, Pesce, Giuseppe, Käll, Mikael, and Volpe, Giovanni
Subjects: Physics - Optics, Condensed Matter - Soft Condensed Matter
Abstract: The miniaturization of mechanical machines is critical for advancing nanotechnology and reducing device footprints. Traditional efforts to downsize gears and micromotors have faced limitations at around 0.1 mm for over thirty years due to the complexities of constructing drives and coupling systems at such scales. Here, we present an alternative approach utilizing optical metasurfaces to locally drive microscopic machines, which can then be fabricated using standard lithography techniques and seamlessly integrated on the chip, achieving sizes down to tens of micrometers with movements precise to the sub-micrometer scale. As a proof of principle, we demonstrate the construction of microscopic gear trains powered by a single driving gear with a metasurface activated by a plane light wave. Additionally, we develop a versatile pinion and rack micromachine capable of transducing rotational motion, performing periodic motion, and controlling microscopic mirrors for light deflection. Our on-chip fabrication process allows for straightforward parallelization and integration. Using light as a widely available and easily controllable energy source, these miniaturized metamachines offer precise control and movement, unlocking new possibilities for micro- and nanoscale systems.
Published: 2024

9. Weak Closed-loop Solvability of Linear Quadratic Stochastic Optimal Control Problems with Partial Information

Author: Li, Xun, Wang, Guangchen, Xiong, Jie, and Zhang, Heng
Subjects: Mathematics - Optimization and Control
Abstract: This paper investigates a linear quadratic stochastic optimal control (LQSOC) problem with partial information. Firstly, by introducing two Riccati equations and a backward stochastic differential equation (BSDE), we solve this LQSOC problem under standard positive semidefinite assumptions. Secondly, by means of a perturbation approach, we study open-loop solvability of this problem when the weighting matrices in the cost functional are indefinite. Thirdly, we investigate weak closed-loop solvability of this problem and prove the equivalence between open-loop and weak closed-loop solvabilities. Finally, we give an example to illustrate the way for obtaining a weak closed-loop optimal strategy.
Published: 2024

10. Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation

Author: Wang, Yulin, Xiong, Honglin, Sun, Kaicong, Bai, Shuwei, Dai, Ling, Ding, Zhongxiang, Liu, Jiameng, Wang, Qian, Liu, Qian, and Shen, Dinggang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR images are not commonly available. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks, leading to suboptimal performance when applied to novel datasets and tasks. Here, we present TUMSyn, a Text-guided Universal MR image Synthesis generalist model, which can flexibly generate brain MR images with demanded imaging metadata from routinely acquired scans guided by text prompts. To ensure TUMSyn's image synthesis precision, versatility, and generalizability, we first construct a brain MR database comprising 31,407 3D images with 7 MRI modalities from 13 centers. We then pre-train an MRI-specific text encoder using contrastive learning to effectively control MR image synthesis based on text prompts. Extensive experiments on diverse datasets and physician assessments indicate that TUMSyn can generate clinically meaningful MR images with specified imaging metadata in supervised and zero-shot scenarios. Therefore, TUMSyn can be utilized along with acquired MR scan(s) to facilitate large-scale MRI-based screening and diagnosis of brain diseases., Comment: 23 pages, 9 figures
Published: 2024

11. Post-$GW$ theory and its application to pseudogap in strongly correlated system

Author: Li, Hui, Su, Yingze, Xiong, Junnian, Lin, Haiqing, Huang, Huaqing, and Li, Dingping
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: The $GW$ approximation is a widely used framework for studying correlated materials, but it struggles with certain limitations, such as its inability to explain pseudogap phenomena. To overcome these problems, we propose a systematic theoretical framework for Green's function corrections and apply it specifically to the $GW$ approximation. In this new theory, the screened potential is reconnected to the physical response function, i.e. the covariant response function proposed in \cite{cGW_2023}, rather than using the RPA formula. We apply our scheme to calculate Green's function, the spectral function, and the charge compressibility in the two-dimensional Hubbard model. Our scheme yields significant qualitative and quantitative improvements over the standard $GW$ method and successfully captures the pseudogap behavior., Comment: 13 pages, 5 figures
Published: 2024

12. Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results

Author: Conde, Marcos V., Vasluianu, Florin-Alexandru, Xiong, Jinhui, Ye, Wei, Ranjan, Rakesh, and Timofte, Radu
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: The increasing demand for augmented reality (AR) and virtual reality (VR) applications highlights the need for efficient depth information processing. Depth maps, essential for rendering realistic scenes and supporting advanced functionalities, are typically large and challenging to stream efficiently due to their size. This challenge introduces a focus on developing innovative depth upsampling techniques to reconstruct high-quality depth maps from compressed data. These techniques are crucial for overcoming the limitations posed by depth compression, which often degrades quality, loses scene details and introduces artifacts. By enhancing depth upsampling methods, this challenge aims to improve the efficiency and quality of depth map reconstruction. Our goal is to advance the state-of-the-art in depth processing technologies, thereby enhancing the overall user experience in AR and VR applications., Comment: ECCV 2024 - Advances in Image Manipulation (AIM)
Published: 2024

13. Bound-preserving OEDG schemes for Aw-Rascle-Zhang traffic models on networks

Author: Chen, Wei, Cui, Shumo, Wu, Kailiang, and Xiong, Tao
Subjects: Mathematics - Numerical Analysis
Abstract: Physical solutions to the widely used Aw-Rascle-Zhang (ARZ) traffic model and the adapted pressure (AP) ARZ model should satisfy the positivity of density, the minimum and maximum principles with respect to the velocity $v$ and other Riemann invariants. Many numerical schemes suffer from instabilities caused by violating these bounds, and the only existing bound-preserving (BP) numerical scheme (for ARZ model) is random, only first-order accurate, and not strictly conservative. This paper introduces arbitrarily high-order provably BP DG schemes for these two models, preserving all the aforementioned bounds except the maximum principle of $v$, which has been rigorously proven to conflict with the consistency and conservation of numerical schemes. Although the maximum principle of $v$ is not directly enforced, we find that the strictly preserved maximum principle of another Riemann invariant $w$ actually enforces an alternative upper bound on $v$. At the core of this work, analyzing and rigorously proving the BP property is a particularly nontrivial task: the Lax-Friedrichs (LF) splitting property, usually expected for hyperbolic conservation laws and employed to construct BP schemes, does not hold for these two models. To overcome this challenge, we formulate a generalized version of the LF splitting property, and prove it via the geometric quasilinearization (GQL) approach [Kailiang Wu and Chi-Wang Shu, SIAM Review, 65: 1031-1073, 2023]. To suppress spurious oscillations in the DG solutions, we employ the oscillation-eliminating (OE) technique, recently proposed in [Manting Peng, Zheng Sun, and Kailiang Wu, Mathematics of Computation, in press], which is based on the solution operator of a novel damping equation. Several numerical examples are included to demonstrate the effectiveness, accuracy, and BP properties of our schemes, with applications to traffic simulations on road networks.
Published: 2024

14. RIS-aided Trajectory Optimization in Layered Urban Air Mobility

Author: Xiong, Kai, Leng, Supeng, Chen, Liyuan, Zhang, Dapei, Huang, Chongwen, and Yuen, Chau
Subjects: Computer Science - Computational Engineering, Finance, and Science
Abstract: Urban Air Mobility (UAM) relies on developing aerospace industries, where safe aviation and efficient communication are critical features of aircraft. However, it is challenging for aircraft to sustain efficient air-ground communication in urban circumstances. Without continuous air-ground communication, aircraft may experience course deviation and safety accidents. To address these problems, a reconfigurable intelligent surface(RIS)-aided trajectory optimization scheme is proposed enabling efficient air-ground communication and safe aviation in UAM with a layered airspace structure. This paper first devises a dual-plane RIS communication scheme for layered airspace. It fully engages the omnidirectional and directional signal attributes to reduce the transmission delay of the air-ground communication. Based on the dual-plane RIS configuration, we jointly develop the intra- and inter-layer trajectory scheme to optimize communication and safe aviation. In the intra-layer trajectory optimization, we propose a dual-time-scale flight scheme to improve communication capacity and horizontal flight safety. Meanwhile, we propose a safe layer-switching method to ensure collision avoidance during vertical flight in the inter-layer trajectory optimization. The communication load of the proposed scheme can be improved 40% and the time of safe separation restoration can be lessened 66% compared with the benchmarks in the layered airspace., Comment: 15 pages, 13 figures
Published: 2024

15. MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Author: Ruan, Jiacheng, Yuan, Wenzhen, Lin, Zehao, Liao, Ning, Li, Zhiyu, Xiong, Feiyu, Liu, Ting, and Fu, Yuzhuo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large visual-language models (LVLMs) have achieved great success in multiple applications. However, they still encounter challenges in complex scenes, especially those involving camouflaged objects. This is primarily due to the lack of samples related to camouflaged scenes in the training dataset. To mitigate this issue, we construct the MM-CamObj dataset for the first time, comprising two subsets: CamObj-Align and CamObj-Instruct. Specifically, CamObj-Align contains 11,363 image-text pairs, and it is designed for VL alignment and injecting rich knowledge of camouflaged scenes into LVLMs. CamObj-Instruct is collected for fine-tuning the LVLMs with improved instruction-following capabilities, and it includes 11,363 images and 68,849 conversations with diverse instructions. Based on the MM-CamObj dataset, we propose the CamObj-Llava, an LVLM specifically designed for addressing tasks in camouflaged scenes. To facilitate our model's effective acquisition of knowledge about camouflaged objects and scenes, we introduce a curriculum learning strategy with six distinct modes. Additionally, we construct the CamObj-Bench to evaluate the existing LVLMs' capabilities of understanding, recognition, localization and count in camouflage scenes. This benchmark includes 600 images and 7 tasks, with a total of 9,449 questions. Extensive experiments are conducted on the CamObj-Bench with CamObj-Llava, 8 existing open-source and 3 closed-source LVLMs. Surprisingly, the results indicate that our model achieves a 25.84% improvement in 4 out of 7 tasks compared to GPT-4o. Code and datasets will be available at https://github.com/JCruan519/MM-CamObj., Comment: 9 pages, 5 figures. Work in progress
Published: 2024

16. Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Author: Gao, Xinyu, Xiong, Yun, Wang, Deze, Guan, Zhenhan, Shi, Zejian, Wang, Haofen, and Li, Shanshan
Subjects: Computer Science - Software Engineering
Abstract: Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via the retriever. The current approach suffers from two primary limitations: 1) information redundancy. The indiscriminate inclusion of redundant information can result in resource wastage and may misguide generators, affecting their effectiveness and efficiency. 2) preference gap. Due to different optimization objectives, the retriever strives to procure code with higher ground truth similarity, yet this effort does not substantially benefit the generator. The retriever and the generator may prefer different golden code, and this gap in preference results in a suboptimal design. Additionally, differences in parameterization knowledge acquired during pre-training result in varying preferences among different generators. To address these limitations, in this paper, we propose RRG (Retrieve, Refactor, Generate), a novel framework for effective and efficient code generation. This framework introduces a code refactorer module between the retriever and the generator to bridge them. The refactoring process transforms the raw retrieved code into a more concise, efficient, and model-friendly version. It eliminates redundant information and noise, reducing the input length. Consequently, the generator receives higher-quality context, enabling it to produce more accurate results with lower inference costs. We conducted comprehensive experiments on multiple datasets. In the experiments, we confirmed the existence of a preference gap between the retriever and the generator, and RRG effectively bridges this gap. Specifically, RRG achieved significant performance improvements, with increases of up to 28% on EM, 13% on BLEU, and 6.8% on CodeBLEU., Comment: ASE2024
Published: 2024

17. Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Author: Zhao, Yang, Du, Li, Ding, Xiao, Xiong, Kai, Liu, Ting, and Qin, Bing
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Though demonstrating promising potential, LLMs' performance on complex tasks, such as advanced mathematics and complex disease diagnosis is still unsatisfactory. A key issue is the present LLMs learn in a data-driven schema, while the instruction dataset about these complex tasks is both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on those simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could be highly beneficial in enhancing the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples. Based on these insights, we conduct experiments to examine whether these conclusions could effectively enhance the efficiency and effectiveness of SFT, particularly in handling complex tasks and when instructional resources are scarce. Our research not only uncovers the underlying reasons behind LLMs' rapid learning and generalization mechanisms but also provides practical solutions for addressing data challenges in complex and specialized tasks., Comment: in review
Published: 2024

18. Disentangled Generation and Aggregation for Robust Radiance Fields

Author: Shen, Shihe, Gao, Huachen, Xu, Wangze, Peng, Rui, Tang, Luyang, Xiong, Kaiqiang, Jiao, Jianbo, and Wang, Ronggang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: The utilization of the triplane-based radiance fields has gained attention in recent years due to its ability to effectively disentangle 3D scenes with a high-quality representation and low computation cost. A key requirement of this method is the precise input of camera poses. However, due to the local update property of the triplane, a similar joint estimation as previous joint pose-NeRF optimization works easily results in local minima. To this end, we propose the Disentangled Triplane Generation module to introduce global feature context and smoothness into triplane learning, which mitigates errors caused by local updating. Then, we propose the Disentangled Plane Aggregation to mitigate the entanglement caused by the common triplane feature aggregation during camera pose updating. In addition, we introduce a two-stage warm-start training strategy to reduce the implicit constraints caused by the triplane generator. Quantitative and qualitative results demonstrate that our proposed method achieves state-of-the-art performance in novel view synthesis with noisy or unknown camera poses, as well as efficient convergence of optimization. Project page: https://gaohchen.github.io/DiGARR/., Comment: 27 pages, 11 figures, Accepted by ECCV'2024
Published: 2024

19. Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)

Author: Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Bian, Jiang, Wang, Shuaiqiang, Chen, Guihai, and Yin, Dawei
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to address queries across the popularity spectrum, and (2) inadequately trained models that fail to induce generalized representations for LTR, resulting in overfitting. To address these challenges, we propose a \emph{\uline{G}enerative \uline{S}emi-\uline{S}upervised \uline{P}re-trained} (GS2P) LTR model. We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine. Furthermore, we deploy GS2P in a large-scale web search engine with realistic traffic, where we observe significant improvements in the real-world application.
Published: 2024

20. Pre-trained Graphformer-based Ranking at Web-scale Search (Extended Abstract)

Author: Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Sun, Zeyi, Chen, Hongyang, Wang, Shuaiqiang, and Yin, Dawei
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval
Abstract: Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain of learning to rank (LTR). However, these approaches adhere to two distinct yet complementary problem formulations: ranking score regression based on query-webpage pairs, and link prediction within query-webpage bipartite graphs, respectively. While it is possible to pre-train GNNs or Transformers on source datasets and subsequently fine-tune them on sparsely annotated LTR datasets, the distributional shifts between the pair-based and bipartite graph domains present significant challenges in integrating these heterogeneous models into a unified LTR framework at web scale. To address this, we introduce the novel MPGraf model, which leverages a modular and capsule-based pre-training strategy, aiming to cohesively integrate the regression capabilities of Transformers with the link prediction strengths of GNNs. We conduct extensive offline and online experiments to rigorously evaluate the performance of MPGraf.
Published: 2024

21. Adaptive Learning via a Negative Selection Strategy for Few-Shot Bioacoustic Event Detection

Author: Chen, Yaxiong, Zhang, Xueping, Zi, Yunfei, and Xiong, Shengwu
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Although the Prototypical Network (ProtoNet) has demonstrated effectiveness in few-shot biological event detection, two persistent issues remain. Firstly, there is difficulty in constructing a representative negative prototype due to the absence of explicitly annotated negative samples. Secondly, the durations of the target biological vocalisations vary across tasks, making it challenging for the model to consistently yield optimal results across all tasks. To address these issues, we propose a novel adaptive learning framework with an adaptive learning loss to guide classifier updates. Additionally, we propose a negative selection strategy to construct a more representative negative prototype for ProtoNet. All experiments ware performed on the DCASE 2023 TASK5 few-shot bioacoustic event detection dataset. The results show that our proposed method achieves an F-measure of 0.703, an improvement of 12.84%.
Published: 2024

22. Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks

Author: He, Jiayi, Luo, Xiaofeng, Kang, Jiawen, Du, Hongyang, Xiong, Zehui, Chen, Ci, Niyato, Dusit, and Shen, Xuemin
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Semantic Communication (SemCom) plays a pivotal role in 6G networks, offering a viable solution for future efficient communication. Deep Learning (DL)-based semantic codecs further enhance this efficiency. However, the vulnerability of DL models to security threats, such as adversarial attacks, poses significant challenges for practical applications of SemCom systems. These vulnerabilities enable attackers to tamper with messages and eavesdrop on private information, especially in wireless communication scenarios. Although existing defenses attempt to address specific threats, they often fail to simultaneously handle multiple heterogeneous attacks. To overcome this limitation, we introduce a novel Mixture-of-Experts (MoE)-based SemCom system. This system comprises a gating network and multiple experts, each specializing in different security challenges. The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements. Multiple experts collaborate to accomplish semantic communication tasks while meeting the security requirements of users. A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system. Simulation results show that the proposed MoE-based SemCom system effectively mitigates concurrent heterogeneous attacks, with minimal impact on downstream task accuracy., Comment: 8 pages, 3 figures
Published: 2024

23. Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Author: Tan, Min, Tao, Yushun, Zheng, Boyun, Xie, GaoSheng, Feng, Lijuan, Xia, Zeyang, and Xiong, Jing
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.
Published: 2024

24. Generative AI-Enhanced Multi-Modal Semantic Communication in Internet of Vehicles: System Design and Methodologies

Author: Lu, Jiayi, Yang, Wanting, Xiong, Zehui, Xing, Chengwen, Tafazolli, Rahim, Quek, Tony Q. S., and Debbah, Merouane
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Vehicle-to-everything (V2X) communication supports numerous tasks, from driving safety to entertainment services. To achieve a holistic view, vehicles are typically equipped with multiple sensors to compensate for undetectable blind spots. However, processing large volumes of multi-modal data increases transmission load, while the dynamic nature of vehicular networks adds to transmission instability. To address these challenges, we propose a novel framework, Generative Artificial intelligence (GAI)-enhanced multi-modal semantic communication (SemCom), referred to as G-MSC, designed to handle various vehicular network tasks by employing suitable analog or digital transmission. GAI presents a promising opportunity to transform the SemCom framework by significantly enhancing semantic encoding to facilitate the optimized integration of multi-modal information, enhancing channel robustness, and fortifying semantic decoding against noise interference. To validate the effectiveness of the G-MSC framework, we conduct a case study showcasing its performance in vehicular communication networks for predictive tasks. The experimental results show that the design achieves reliable and efficient communication in V2X networks. In the end, we present future research directions on G-MSC.
Published: 2024

25. Direct Judgement Preference Optimization

Author: Wang, Peifeng, Xu, Austin, Zhou, Yilun, Xiong, Caiming, and Joty, Shafiq
Subjects: Computer Science - Computation and Language
Abstract: Auto-evaluation is crucial for assessing response quality and offering feedback for model development. Recent studies have explored training large language models (LLMs) as generative judges to evaluate and critique other models' outputs. In this work, we investigate the idea of learning from both positive and negative data with preference optimization to enhance the evaluation capabilities of LLM judges across an array of different use cases. We achieve this by employing three approaches to collect the preference pairs for different use cases, each aimed at improving our generative judge from a different perspective. Our comprehensive study over a wide range of benchmarks demonstrates the effectiveness of our method. In particular, our generative judge achieves the best performance on 10 out of 13 benchmarks, outperforming strong baselines like GPT-4o and specialized judge models. Further analysis show that our judge model robustly counters inherent biases such as position and length bias, flexibly adapts to any evaluation protocol specified by practitioners, and provides helpful language feedback for improving downstream generator models., Comment: Preprint
Published: 2024

26. Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning

Author: Chen, Qi, Xing, Xiaohan, Chen, Zhen, and Xiong, Zhiwei
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: To accelerate Magnetic Resonance (MR) imaging procedures, Multi-Contrast MR Reconstruction (MCMR) has become a prevalent trend that utilizes an easily obtainable modality as an auxiliary to support high-quality reconstruction of the target modality with under-sampled k-space measurements. The exploration of global dependency and complementary information across different modalities is essential for MCMR. However, existing methods either struggle to capture global dependency due to the limited receptive field or suffer from quadratic computational complexity. To tackle this dilemma, we propose a novel Frequency and Spatial Mutual Learning Network (FSMNet), which efficiently explores global dependencies across different modalities. Specifically, the features for each modality are extracted by the Frequency-Spatial Feature Extraction (FSFE) module, featuring a frequency branch and a spatial branch. Benefiting from the global property of the Fourier transform, the frequency branch can efficiently capture global dependency with an image-size receptive field, while the spatial branch can extract local features. To exploit complementary information from the auxiliary modality, we propose a Cross-Modal Selective fusion (CMS-fusion) module that selectively incorporate the frequency and spatial features from the auxiliary modality to enhance the corresponding branch of the target modality. To further integrate the enhanced global features from the frequency branch and the enhanced local features from the spatial branch, we develop a Frequency-Spatial fusion (FS-fusion) module, resulting in a comprehensive feature representation for the target modality. Extensive experiments on the BraTS and fastMRI datasets demonstrate that the proposed FSMNet achieves state-of-the-art performance for the MCMR task with different acceleration factors. The code is available at: https://github.com/qic999/FSMNet., Comment: Accepted as a poster by Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024
Published: 2024

27. IMOST: Incremental Memory Mechanism with Online Self-Supervision for Continual Traversability Learning

Author: Ma, Kehui, Sun, Zhen, Xiong, Chaoran, Zhu, Qiumin, Wang, Kewei, and Pei, Ling
Subjects: Computer Science - Robotics
Abstract: Traversability estimation is the foundation of path planning for a general navigation system. However, complex and dynamic environments pose challenges for the latest methods using self-supervised learning (SSL) technique. Firstly, existing SSL-based methods generate sparse annotations lacking detailed boundary information. Secondly, their strategies focus on hard samples for rapid adaptation, leading to forgetting and biased predictions. In this work, we propose IMOST, a continual traversability learning framework composed of two key modules: incremental dynamic memory (IDM) and self-supervised annotation (SSA). By mimicking human memory mechanisms, IDM allocates novel data samples to new clusters according to information expansion criterion. It also updates clusters based on diversity rule, ensuring a representative characterization of new scene. This mechanism enhances scene-aware knowledge diversity while maintaining a compact memory capacity. The SSA module, integrating FastSAM, utilizes point prompts to generate complete annotations in real time which reduces training complexity. Furthermore, IMOST has been successfully deployed on the quadruped robot, with performance evaluated during the online learning process. Experimental results on both public and self-collected datasets demonstrate that our IMOST outperforms current state-of-the-art method, maintains robust recognition capabilities and adaptability across various scenarios. The code is available at https://github.com/SJTU-MKH/OCLTrav.
Published: 2024
Full Text: View/download PDF

28. Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors

Author: Sun, Shida, Li, Yue, Zhang, Yueyi, and Xiong, Zhiwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Non-line-of-sight (NLOS) imaging, recovering the hidden volume from indirect reflections, has attracted increasing attention due to its potential applications. Despite promising results, existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors, e.g., single fixed path compensation. Moreover, these approaches still possess limited generalization ability, particularly when dealing with scenes at a low signal-to-noise ratio (SNR). To overcome the above problems, we introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF). The LPC applies tailored path compensation coefficients to adapt to different objects in the scene, effectively reducing light wave attenuation, especially in distant regions. Meanwhile, the APF learns the precise Gaussian window of the illumination function for the phasor field, dynamically selecting the relevant spectrum band of the transient measurement. Experimental validations demonstrate that our proposed approach, only trained on synthetic data, exhibits the capability to seamlessly generalize across various real-world datasets captured by different imaging systems and characterized by low SNRs.
Published: 2024

29. Superconvergence of the local discontinuous Galerkin method with generalized numerical fluxes for one-dimensional linear time-dependent fourth-order equations

Author: Li, Linhui, Meng, Xiong, and Wu, Boying
Subjects: Mathematics - Numerical Analysis, 65M60
Abstract: In this paper, we concentrate on the superconvergence of the local discontinuous Galerkin method with generalized numerical fluxes for one-dimensional linear time-dependent fourth-order equations. The adjustable numerical viscosity of the generalized numerical fluxes is beneficial for long time simulations with a slower error growth. By using generalized Gauss--Radau projections and correction functions together with a suitable numerical initial condition, we derive, for polynomials of degree $k$, $(2k+1)$th order superconvergence for the numerical flux and cell averages, $(k+2)$th order superconvergence at generalized Radau points, and $(k+1)$th order for error derivative at generalized Radau points. Moreover, a supercloseness result of order $(k+2)$ is established between the generalized Gauss--Radau projection and the numerical solution. Superconvergence analysis of mixed boundary conditions is also given. Equations with discontinuous initial condition and nonlinear convection term are numerically investigated, illustrating that the conclusions are valid for more general cases., Comment: 22 pages, 1 figure
Published: 2024

30. Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Author: Gilson, Aidan, Ai, Xuguang, Arunachalam, Thilaka, Chen, Ziyou, Cheong, Ki Xiong, Dave, Amisha, Duic, Cameron, Kibe, Mercy, Kaminaka, Annette, Prasad, Minali, Siddig, Fares, Singer, Maxwell, Wong, Wendy, Jin, Qiao, Keenan, Tiarnan D. L., Hu, Xia, Chew, Emily Y., Lu, Zhiyong, Xu, Hua, Adelman, Ron A., Tham, Yih-Chung, and Chen, Qingyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3% hallucinated, 34.1% consisted of minor errors, and 20.6% were correct. In contrast, LLMs with RAG significantly improved accuracy (54.5% being correct) and reduced error rates (18.8% with minor hallucinations and 26.7% with errors). 62.5% of the top 10 documents retrieved by RAG were selected as the top references in the LLM response, with an average ranking of 4.9. The use of RAG also improved evidence attribution (increasing from 1.85 to 2.49 on a 5-point scale, P<0.001), albeit with slight decreases in accuracy (from 3.52 to 3.23, P=0.03) and completeness (from 3.47 to 3.27, P=0.17). The results demonstrate that LLMs frequently exhibited hallucinated and erroneous evidence in the responses, raising concerns for downstream applications in the medical domain. RAG substantially reduced the proportion of such evidence but encountered challenges.
Published: 2024

31. Generation of strong mechanical squeezing through the joint effect of two-tone driving and parametric pumping

Author: Wu, Xiao-Jie, Cheng, Huan-Huan, Wu, Qiannan, Bai, Cheng-Hua, and Wu, Shao-Xiong
Subjects: Quantum Physics
Abstract: We propose an innovative scheme to efficiently prepare strong mechanical squeezing through utilizing the synergistic mechanism of two-tone driving and parametric pumping in an optomechanical system. By reasonable choosing the system parameters, the proposal highlights the following prominent advantages: the squeezing effect of the cavity field induced by the optical parametric amplifier can be transferred to the mechanical oscillator, which has been squeezed by the two-tone driving, and the degree of squeezing of the mechanical oscillator will surpass that obtained by any single mechanism; the joint mechanism can enhance the degree of squeezing significantly and break the 3 dB mechanical squeezing limit, which is particularly evident in range where the red/blue-detuned ratio is sub-optimal; the mechanical squeezing achieved through this distinctive joint mechanism exhibits notable robustness against both thermal noise and decay of mechanical oscillator. Our project offers a versatile and efficient approach for generating strong mechanical squeezing across a wide range of conditions.
Published: 2024
Full Text: View/download PDF

32. Double-Helix Singularity and Vortex-Antivortex Annihilation in Space-Time Helical Pulses

Author: Shi, Shuai, Wang, Ren, Xiong, Minhui, Zhou, Qinyu, Wang, Bing-Zhong, and Shen, Yijie
Subjects: Physics - Optics
Abstract: Topological structures reveal the hidden secrets and beauty in nature, such as the double helix in DNA, whilst, the manipula-tion of which in physical fields, especially in ultrafast struc-tured light, draw booming attention. Here we introduce a new family of spatiotemporal light fields, i.e. helical pulses, carry-ing sophisticated double-helix singularities in its electromag-netic topological structures. The helical pulses were solved from Maxwell's equation as chiral extensions of toroidal light pulses but with controlled angular momentum dependence. We unveil that the double helix singularities can maintain their topological invariance during propagation and the field exhibits paired generation and annihilation of vortices and antivortices in ultrafast space-time, so as to be potential information carriers beating previous conventional vortex structured light.
Published: 2024

33. Generative Learning Powered Probing Beam Optimization for Cell-Free Hybrid Beamforming

Author: Zhang, Cheng, Xiong, Shuangbo, He, Mengqing, Wei, Lan, Huang, Yongming, and Zhang, Wei
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: Probing beam measurement (PBM)-based hybrid beamforming provides a feasible solution for cell-free MIMO. In this letter, we propose a novel probing beam optimization framework where three collaborative modules respectively realize PBM augmentation, sum-rate prediction and probing beam optimization. Specifically, the PBM augmentation model integrates the conditional variational auto-encoder (CVAE) and mixture density networks and adopts correlated PBM distribution with full-covariance, for which a Cholesky-decomposition based training is introduced to address the issues of covariance legality and numerical stability. Simulations verify the better performance of the proposed augmentation model compared to the traditional CVAE and the efficiency of proposed optimization framework.
Published: 2024

34. Four-fold truncated double-nested anti-resonant hollow-core fibers with ultralow loss and ultrahigh mode purity

Author: Gao, Shoufei, Chen, Hao, Sun, Yizhi, Xiong, Yifan, Yang, Zijie, Zhao, Rui, Ding, Wei, and Wang, Yingying
Subjects: Physics - Optics
Abstract: Hollow-core fibers are inherently multimode, making it crucial to filter out higher-order modes within the shortest possible fiber length for applications such as high speed coherent communications and fiber optic gyroscopes. However, current HCF designs face the challenges of simultaneously achieving ultralow fundamental mode loss and ultrahigh HOM suppression. In this study, we present a novel four fold truncated double nested anti resonant hollow core fiber structure that addresses this challenge. Our 4T-DNANF enables greater control over phase-matching between core modes and air modes in the cladding, allowing for minimized FM loss and substantially increased HOM loss. Experimentally, we fabricated several HCFs: one with an FM loss of 0.1 dB/km and an HOM loss of 430 dB/km, and another with an FM loss of 0.13 dB/km with a HOM loss of 6500 dB/km, resulting in a higher-order mode extinction ratio of 50,000., Comment: 8 pages, 2 figures
Published: 2024

35. RRM: Robust Reward Model Training Mitigates Reward Hacking

Author: Liu, Tianqi, Xiong, Wei, Ren, Jie, Chen, Lichang, Wu, Junru, Joshi, Rishabh, Gao, Yang, Shen, Jiaming, Qin, Zhen, Yu, Tianhe, Sohn, Daniel, Makarova, Anastasiia, Liu, Jeremiah, Liu, Yuan, Piot, Bilal, Ittycheriah, Abe, Kumar, Aviral, and Saleh, Mohammad
Subjects: Computer Science - Computation and Language
Abstract: Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.
Published: 2024

36. Breaking the Barriers of One-to-One Usage of Implicit Neural Representation in Image Compression: A Linear Combination Approach with Performance Guarantees

Author: Sanjeet, Sai, Hosseinalipour, Seyyedali, Xiong, Jinjun, Fujita, Masahiro, and Sahoo, Bibhu Datta
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: In an era where the exponential growth of image data driven by the Internet of Things (IoT) is outpacing traditional storage solutions, this work explores and advances the potential of Implicit Neural Representation (INR) as a transformative approach to image compression. INR leverages the function approximation capabilities of neural networks to represent various types of data. While previous research has employed INR to achieve compression by training small networks to reconstruct large images, this work proposes a novel advancement: representing multiple images with a single network. By modifying the loss function during training, the proposed approach allows a small number of weights to represent a large number of images, even those significantly different from each other. A thorough analytical study of the convergence of this new training method is also carried out, establishing upper bounds that not only confirm the validity of the method but also offer insights into optimal hyperparameter design. The proposed method is evaluated on the Kodak, ImageNet, and CIFAR-10 datasets. Experimental results demonstrate that all 24 images in the Kodak dataset can be represented by linear combinations of two sets of weights, achieving a peak signal-to-noise ratio (PSNR) of 26.5 dB with as low as 0.2 bits per pixel (BPP). The proposed method matches the rate-distortion performance of state-of-the-art image codecs, such as BPG, on the CIFAR-10 dataset. Additionally, the proposed method maintains the fundamental properties of INR, such as arbitrary resolution reconstruction of images., Comment: 10 pages, 13 figures
Published: 2024

37. LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Author: Veldanda, Akshaj Kumar, Zhang, Shi-Xiong, Das, Anirban, Chakraborty, Supriyo, Rawls, Stephen, Sahu, Sambit, and Naphade, Milind
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.
Published: 2024

38. LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction

Author: Jiang, Changjian, Gao, Ruilan, Shao, Kele, Wang, Yue, Xiong, Rong, and Zhang, Yu
Subjects: Computer Science - Robotics
Abstract: Large-scale 3D reconstruction is critical in the field of robotics, and the potential of 3D Gaussian Splatting (3DGS) for achieving accurate object-level reconstruction has been demonstrated. However, ensuring geometric accuracy in outdoor and unbounded scenes remains a significant challenge. This study introduces LI-GS, a reconstruction system that incorporates LiDAR and Gaussian Splatting to enhance geometric accuracy in large-scale scenes. 2D Gaussain surfels are employed as the map representation to enhance surface alignment. Additionally, a novel modeling method is proposed to convert LiDAR point clouds to plane-constrained multimodal Gaussian Mixture Models (GMMs). The GMMs are utilized during both initialization and optimization stages to ensure sufficient and continuous supervision over the entire scene while mitigating the risk of over-fitting. Furthermore, GMMs are employed in mesh extraction to eliminate artifacts and improve the overall geometric quality. Experiments demonstrate that our method outperforms state-of-the-art methods in large-scale 3D reconstruction, achieving higher accuracy compared to both LiDAR-based methods and Gaussian-based methods with improvements of 52.6% and 68.7%, respectively.
Published: 2024

39. Bridging the Gap: GRB 230812B -- A Three-Second Supernova-Associated Burst Detected by the GRID Mission

Author: Wang, Chen-Yu, Yin, Yi-Han Iris, Zhang, Bin-Bin, Feng, Hua, Zeng, Ming, Xiong, Shao-Lin, Pan, Xiao-Fan, Yang, Jun, Zhang, Yan-Qiu, Li, Chen, Yan, Zhen-Yu, Wang, Chen-Wei, Zheng, Xu-Tao, Liu, Jia-Cong, Wang, Qi-Dong, Yang, Zi-Rui, Li, Long-Hao, Liu, Qi-Ze, Zhao, Zheng-Yang, Hu, Bo, Liu, Yi-Qi, Lu, Si-Yuan, Luo, Zi-You, Cang, Ji-Rong, Cao, De-Zhi, Han, Wen-Tao, Jia, Li-Ping, Pan, Xing-Yu, Tian, Yang, Xu, Ben-Da, Yang, Xiao, and Zeng, Zhi
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: GRB 230812B, detected by the Gamma-Ray Integrated Detectors (GRID) constellation mission, is an exceptionally bright gamma-ray burst (GRB) with a duration of only 3 seconds. Sitting near the traditional boundary ($\sim$ 2 s) between long and short GRBs, GRB 230812B is notably associated with a supernova (SN), indicating a massive star progenitor. This makes it a rare example of a short-duration GRB resulting from stellar collapse. Our analysis, using a time-evolving synchrotron model, suggests that the burst has an emission radius of approximately $10^{14.5}$~cm. We propose that the short duration of GRB 230812B is due to the combined effects of the central engine's activity time and the time required for the jet to break through the stellar envelope. Our findings provide another case that challenges the conventional view that short-duration GRBs originate exclusively from compact object mergers, demonstrating that a broader range of durations exists for GRBs arising from the collapse of massive stars., Comment: 10 pages, 3 tables, 11 figures
Published: 2024

40. Bayer-type Vis-NIR Routing via Inverse Design for Submicron-pixel Image Sensing Chip

Author: Yang, Xianguang, Xiong, Shijie, Tan, Fangchang, Lin, Zhitao, Bao, Yanjun, Wen, Long, Chen, Qin, and Li, Baojun
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: With the advent of high-precision nanoscale lithography technology, high-resolution image sensing has experienced rapid development in recent years. Currently, mainstream commercial image sensors predominantly utilize Bayer array color filters to implement RGB colorful imaging strategies. However, as pixel sizes transition into the submicron dimensions, traditional dye filters used in image sensors have long been hampered by limited optical efficiency, suboptimal signal-to-noise ratios, and significant difficulties in miniaturization. In this work, a novel 4-channel RGB-IR color router for image sensing, distinct from the traditional absorption-transmission mechanisms, was proposed through inverse design methodologies. Utilizing genetic algorithms and DCGAN models, approximately 20,000 random color routing structures were generated and trained. From these, an optimized spectral splitting structure with a minimal periodic size of 1.6 um * 1.6 um was identified. This structure achieves peak optical efficiencies 1.7 times greater than those of dye filters, while also offering superior color imaging quality and signal intensity. This innovative design approach, leveraging deep learning integration, demonstrates an on-chip strategy for color realization in 4-channel image sensors, and holds significant promise for enhancing the development of next-generation high-performance image sensing chip systems., Comment: 19 pages,5 figures
Published: 2024

41. Geometric Relational Embeddings

Author: Xiong, Bo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks
Abstract: Relational representation learning transforms relational data into continuous and low-dimensional vector representations. However, vector-based representations fall short in capturing crucial properties of relational data that are complex and symbolic. We propose geometric relational embeddings, a paradigm of relational embeddings that respect the underlying symbolic structures. Specifically, this dissertation introduces various geometric relational embedding models capable of capturing: 1) complex structured patterns like hierarchies and cycles in networks and knowledge graphs; 2) logical structures in ontologies and logical constraints applicable for constraining machine learning model outputs; and 3) high-order structures between entities and relations. Our results obtained from benchmark and real-world datasets demonstrate the efficacy of geometric relational embeddings in adeptly capturing these discrete, symbolic, and structured properties inherent in relational data., Comment: Doctoral Dissertation, 177 pages
Published: 2024

42. UniMSF: A Unified Multi-Sensor Fusion Framework for Intelligent Transportation System Global Localization

Author: Liu, Wei, Zhu, Jiaqi, Zhuo, Guirong, Fu, Wufei, Meng, Zonglin, Lu, Yishi, Hua, Min, Qiao, Feng, Li, You, He, Yi, and Xiong, Lu
Subjects: Computer Science - Robotics
Abstract: Intelligent transportation systems (ITS) localization is of significant importance as it provides fundamental position and orientation for autonomous operations like intelligent vehicles. Integrating diverse and complementary sensors such as global navigation satellite system (GNSS) and 4D-radar can provide scalable and reliable global localization. Nevertheless, multi-sensor fusion encounters challenges including heterogeneity and time-varying uncertainty in measurements. Consequently, developing a reliable and unified multi-sensor framework remains challenging. In this paper, we introduce UniMSF, a comprehensive multi-sensor fusion localization framework for ITS, utilizing factor graphs. By integrating a multi-sensor fusion front-end, alongside outlier detection\&noise model estimation, and a factor graph optimization back-end, this framework accomplishes efficient fusion and ensures accurate localization for ITS. Specifically, in the multi-sensor fusion front-end module, we tackle the measurement heterogeneity among different modality sensors and establish effective measurement models. Reliable outlier detection and data-driven online noise estimation methods ensure that back-end optimization is immune to interference from outlier measurements. In addition, integrating multi-sensor observations via factor graph optimization offers the advantage of \enquote{plug and play}. Notably, our framework features high modularity and is seamlessly adapted to various sensor configurations. We demonstrate the effectiveness of the proposed framework through real vehicle tests by tightly integrating GNSS pseudorange and carrier phase information with IMU, and 4D-radar.
Published: 2024

43. From Lists to Emojis: How Format Bias Affects Model Alignment

Author: Zhang, Xuanchang, Xiong, Wei, Chen, Lichang, Zhou, Tianyi, Huang, Heng, and Zhang, Tong
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we study format biases in reinforcement learning from human feedback (RLHF). We observe that many widely-used preference models, including human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark, exhibit strong biases towards specific format patterns, such as lists, links, bold text, and emojis. Furthermore, large language models (LLMs) can exploit these biases to achieve higher rankings on popular benchmarks like AlpacaEval and LMSYS Chatbot Arena. One notable example of this is verbosity bias, where current preference models favor longer responses that appear more comprehensive, even when their quality is equal to or lower than shorter, competing responses. However, format biases beyond verbosity remain largely underexplored in the literature. In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases. Additionally, we show that with a small amount of biased data (less than 1%), we can inject significant bias into the reward model. Moreover, these format biases can also be easily exploited by downstream alignment algorithms, such as best-of-n sampling and online iterative DPO, as it is usually easier to manipulate the format than to improve the quality of responses. Our findings emphasize the need to disentangle format and content both for designing alignment algorithms and evaluating models., Comment: Working in progress
Published: 2024

44. Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Author: Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, and Sahu, Sambit
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area., Comment: Survey paper
Published: 2024

45. CUNSB-RFIE: Context-aware Unpaired Neural Schr\'odinger Bridge in Retinal Fundus Image Enhancement

Author: Dong, Xuanzhao, Vasa, Vamsi Krishna, Zhu, Wenhui, Qiu, Peijie, Chen, Xiwen, Su, Yi, Xiong, Yujian, Yang, Zhangsihao, Chen, Yanxi, and Wang, Yalin
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schr\"odinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schr\"{o}dinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at https://github.com/Retinal-Research/CUNSB-RFIE .
Published: 2024

46. P2 Explore: Efficient Exploration in Unknown Clustered Environment with Floor Plan Prediction

Author: Song, Kun, Chen, Gaoming, Tomizuka, Masayoshi, Zhan, Wei, Xiong, Zhenhua, and Ding, Mingyu
Subjects: Computer Science - Robotics
Abstract: Robot exploration aims at constructing unknown environments and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, due to the randomness of obstacles, the ability for prediction is limited. Therefore, to solve this problem, we propose a map prediction algorithm that can be efficient in predicting the layout of noisy indoor environments. We focus on the scenario of 2D exploration. First, we perform floor plan extraction by denoising the cluttered map using deep learning. Then, we use a floor plan-based algorithm to improve the prediction accuracy. Additionally, we extract the segmentation of rooms and construct their connectivity based on the predicted map, which can be used for downstream tasks. To validate the effectiveness of the proposed method, it is applied to exploration tasks. Extensive experiments show that even in cluttered scenes, our proposed method can benefit efficiency., Comment: 7 pages, submitted to ICRA 2025
Published: 2024

47. Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking

Author: Li, Shuhao, Lou, Jingwen, Mulatihan, Yelina, Xiong, Yuhang, Li, Yao, and Xu, Qi
Subjects: Quantitative Biology - Genomics
Abstract: Background: Allium vegetables (garlic and onion) are one of the flavorings in people's daily diets. Observational studies suggest that intake of allium vegetables may be correlated with a lower incidence of digestive system cancers. However, the existence of a causal relationship is still controversial due to confounding factors and reverse causation. Therefore, we explored the causal relationship between intake of allium vegetables and digestive system cancers using Mendelian randomization approach. Methods: First, we performed Mendelian randomization analyses using inverse variance weighting (IVW), weighted median, and MR-Egger approaches, and demonstrated the reliability of the results in the sensitivity step. Second, Multivariable Mendelian randomization was applied to adjust for smoking and alcohol consumption. Third, we explored the molecular mechanisms behind the positive results through network pharmacology and molecular docking methods. Results: The study suggests that increased intake of garlic reduced gastric cancer risk. However, onion intake was not statistically associated with digestive system cancer. Conclusion: Garlic may have a protective effect against gastric cancer.
Published: 2024

48. DIGIMON: Diagnosis and Mitigation of Sampling Skew for Reinforcement Learning based Meta-Planner in Robot Navigation

Author: Feng, Shiwei, Chen, Xuan, Cheng, Zhiyuan, Xiong, Zikang, Gao, Yifei, Cheng, Siyuan, Kate, Sayali, and Zhang, Xiangyu
Subjects: Computer Science - Robotics
Abstract: Robot navigation is increasingly crucial across applications like delivery services and warehouse management. The integration of Reinforcement Learning (RL) with classical planning has given rise to meta-planners that combine the adaptability of RL with the explainable decision-making of classical planners. However, the exploration capabilities of RL-based meta-planners during training are often constrained by the capabilities of the underlying classical planners. This constraint can result in limited exploration, thereby leading to sampling skew issues. To address these issues, our paper introduces a novel framework, DIGIMON, which begins with behavior-guided diagnosis for exploration bottlenecks within the meta-planner and follows up with a mitigation strategy that conducts up-sampling from diagnosed bottleneck data. Our evaluation shows 13.5%+ improvement in navigation performance, greater robustness in out-of-distribution environments, and a 4x boost in training efficiency. DIGIMON is designed as a versatile, plug-and-play solution, allowing seamless integration into various RL-based meta-planners.
Published: 2024

49. Semantics Preserving Emoji Recommendation with Large Language Models

Author: Qiu, Zhongyi, Qiu, Kangyi, Lyu, Hanjia, Xiong, Wei, and Luo, Jiebo
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks
Abstract: Emojis have become an integral part of digital communication, enriching text by conveying emotions, tone, and intent. Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. However, they ignore the essence of users' behavior on social media in that each text can correspond to multiple reasonable emojis. To better assess a model's ability to align with such real-world emoji usage, we propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text. To evaluate how well a model preserves semantics, we assess whether the predicted affective state, demographic profile, and attitudinal stance of the user remain unchanged. If these attributes are preserved, we consider the recommended emojis to have maintained the original semantics. The advanced abilities of Large Language Models (LLMs) in understanding and generating nuanced, contextually relevant output make them well-suited for handling the complexities of semantics preserving emoji recommendation. To this end, we construct a comprehensive benchmark to systematically assess the performance of six proprietary and open-source LLMs using different prompting techniques on our task. Our experiments demonstrate that GPT-4o outperforms other LLMs, achieving a semantics preservation score of 79.23%. Additionally, we conduct case studies to analyze model biases in downstream classification tasks and evaluate the diversity of the recommended emojis.
Published: 2024

50. Uncovering the Secrets of Human-Like Movement: A Fresh Perspective on Motion Planning

Author: Shi, Lei, Liu, Qichao, Zhou, Cheng, Gao, Wentao, Wu, Haotian, Zheng, Yu, and Li, Xiong
Subjects: Computer Science - Robotics
Abstract: This article explores human-like movement from a fresh perspective on motion planning. We analyze the coordinated and compliant movement mechanisms of the human body from the perspective of biomechanics. Based on these mechanisms, we propose an optimal control framework that integrates compliant control dynamics, optimizing robotic arm motion through a response time matrix. This matrix sets the timing parameters for joint movements, turning the system into a time-parameterized optimal control problem. The model focuses on the interaction between active and passive joints under external disturbances, improving adaptability and compliance. This method achieves optimal trajectory generation and balances precision and compliance. Experimental results on both a manipulator and a humanoid robot validate the approach., Comment: 7 pages
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

710,335 results on '"Xiong AS"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources