Author: "Xiong A" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xiong A"' showing total 702,576 results

Start Over Author "Xiong A"

702,576 results on '"Xiong A"'

251. Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation

Author: Wang, Yulin, Xiong, Honglin, Sun, Kaicong, Bai, Shuwei, Dai, Ling, Ding, Zhongxiang, Liu, Jiameng, Wang, Qian, Liu, Qian, and Shen, Dinggang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR images are not commonly available. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks, leading to suboptimal performance when applied to novel datasets and tasks. Here, we present TUMSyn, a Text-guided Universal MR image Synthesis generalist model, which can flexibly generate brain MR images with demanded imaging metadata from routinely acquired scans guided by text prompts. To ensure TUMSyn's image synthesis precision, versatility, and generalizability, we first construct a brain MR database comprising 31,407 3D images with 7 MRI modalities from 13 centers. We then pre-train an MRI-specific text encoder using contrastive learning to effectively control MR image synthesis based on text prompts. Extensive experiments on diverse datasets and physician assessments indicate that TUMSyn can generate clinically meaningful MR images with specified imaging metadata in supervised and zero-shot scenarios. Therefore, TUMSyn can be utilized along with acquired MR scan(s) to facilitate large-scale MRI-based screening and diagnosis of brain diseases., Comment: 23 pages, 9 figures
Published: 2024

252. Post-$GW$ theory and its application to pseudogap in strongly correlated system

Author: Li, Hui, Su, Yingze, Xiong, Junnian, Lin, Haiqing, Huang, Huaqing, and Li, Dingping
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: The $GW$ approximation is a widely used framework for studying correlated materials, but it struggles with certain limitations, such as its inability to explain pseudogap phenomena. To overcome these problems, we propose a systematic theoretical framework for Green's function corrections and apply it specifically to the $GW$ approximation. In this new theory, the screened potential is reconnected to the physical response function, i.e. the covariant response function proposed in \cite{cGW_2023}, rather than using the RPA formula. We apply our scheme to calculate Green's function, the spectral function, and the charge compressibility in the two-dimensional Hubbard model. Our scheme yields significant qualitative and quantitative improvements over the standard $GW$ method and successfully captures the pseudogap behavior., Comment: 13 pages, 5 figures
Published: 2024

253. Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results

Author: Conde, Marcos V., Vasluianu, Florin-Alexandru, Xiong, Jinhui, Ye, Wei, Ranjan, Rakesh, and Timofte, Radu
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: The increasing demand for augmented reality (AR) and virtual reality (VR) applications highlights the need for efficient depth information processing. Depth maps, essential for rendering realistic scenes and supporting advanced functionalities, are typically large and challenging to stream efficiently due to their size. This challenge introduces a focus on developing innovative depth upsampling techniques to reconstruct high-quality depth maps from compressed data. These techniques are crucial for overcoming the limitations posed by depth compression, which often degrades quality, loses scene details and introduces artifacts. By enhancing depth upsampling methods, this challenge aims to improve the efficiency and quality of depth map reconstruction. Our goal is to advance the state-of-the-art in depth processing technologies, thereby enhancing the overall user experience in AR and VR applications., Comment: ECCV 2024 - Advances in Image Manipulation (AIM)
Published: 2024

254. Bound-preserving OEDG schemes for Aw-Rascle-Zhang traffic models on networks

Author: Chen, Wei, Cui, Shumo, Wu, Kailiang, and Xiong, Tao
Subjects: Mathematics - Numerical Analysis
Abstract: Physical solutions to the widely used Aw-Rascle-Zhang (ARZ) traffic model and the adapted pressure (AP) ARZ model should satisfy the positivity of density, the minimum and maximum principles with respect to the velocity $v$ and other Riemann invariants. Many numerical schemes suffer from instabilities caused by violating these bounds, and the only existing bound-preserving (BP) numerical scheme (for ARZ model) is random, only first-order accurate, and not strictly conservative. This paper introduces arbitrarily high-order provably BP DG schemes for these two models, preserving all the aforementioned bounds except the maximum principle of $v$, which has been rigorously proven to conflict with the consistency and conservation of numerical schemes. Although the maximum principle of $v$ is not directly enforced, we find that the strictly preserved maximum principle of another Riemann invariant $w$ actually enforces an alternative upper bound on $v$. At the core of this work, analyzing and rigorously proving the BP property is a particularly nontrivial task: the Lax-Friedrichs (LF) splitting property, usually expected for hyperbolic conservation laws and employed to construct BP schemes, does not hold for these two models. To overcome this challenge, we formulate a generalized version of the LF splitting property, and prove it via the geometric quasilinearization (GQL) approach [Kailiang Wu and Chi-Wang Shu, SIAM Review, 65: 1031-1073, 2023]. To suppress spurious oscillations in the DG solutions, we employ the oscillation-eliminating (OE) technique, recently proposed in [Manting Peng, Zheng Sun, and Kailiang Wu, Mathematics of Computation, in press], which is based on the solution operator of a novel damping equation. Several numerical examples are included to demonstrate the effectiveness, accuracy, and BP properties of our schemes, with applications to traffic simulations on road networks.
Published: 2024

255. RIS-aided Trajectory Optimization in Layered Urban Air Mobility

Author: Xiong, Kai, Leng, Supeng, Chen, Liyuan, Zhang, Dapei, Huang, Chongwen, and Yuen, Chau
Subjects: Computer Science - Computational Engineering, Finance, and Science
Abstract: Urban Air Mobility (UAM) relies on developing aerospace industries, where safe aviation and efficient communication are critical features of aircraft. However, it is challenging for aircraft to sustain efficient air-ground communication in urban circumstances. Without continuous air-ground communication, aircraft may experience course deviation and safety accidents. To address these problems, a reconfigurable intelligent surface(RIS)-aided trajectory optimization scheme is proposed enabling efficient air-ground communication and safe aviation in UAM with a layered airspace structure. This paper first devises a dual-plane RIS communication scheme for layered airspace. It fully engages the omnidirectional and directional signal attributes to reduce the transmission delay of the air-ground communication. Based on the dual-plane RIS configuration, we jointly develop the intra- and inter-layer trajectory scheme to optimize communication and safe aviation. In the intra-layer trajectory optimization, we propose a dual-time-scale flight scheme to improve communication capacity and horizontal flight safety. Meanwhile, we propose a safe layer-switching method to ensure collision avoidance during vertical flight in the inter-layer trajectory optimization. The communication load of the proposed scheme can be improved 40% and the time of safe separation restoration can be lessened 66% compared with the benchmarks in the layered airspace., Comment: 15 pages, 13 figures
Published: 2024

256. MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Author: Ruan, Jiacheng, Yuan, Wenzhen, Lin, Zehao, Liao, Ning, Li, Zhiyu, Xiong, Feiyu, Liu, Ting, and Fu, Yuzhuo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large visual-language models (LVLMs) have achieved great success in multiple applications. However, they still encounter challenges in complex scenes, especially those involving camouflaged objects. This is primarily due to the lack of samples related to camouflaged scenes in the training dataset. To mitigate this issue, we construct the MM-CamObj dataset for the first time, comprising two subsets: CamObj-Align and CamObj-Instruct. Specifically, CamObj-Align contains 11,363 image-text pairs, and it is designed for VL alignment and injecting rich knowledge of camouflaged scenes into LVLMs. CamObj-Instruct is collected for fine-tuning the LVLMs with improved instruction-following capabilities, and it includes 11,363 images and 68,849 conversations with diverse instructions. Based on the MM-CamObj dataset, we propose the CamObj-Llava, an LVLM specifically designed for addressing tasks in camouflaged scenes. To facilitate our model's effective acquisition of knowledge about camouflaged objects and scenes, we introduce a curriculum learning strategy with six distinct modes. Additionally, we construct the CamObj-Bench to evaluate the existing LVLMs' capabilities of understanding, recognition, localization and count in camouflage scenes. This benchmark includes 600 images and 7 tasks, with a total of 9,449 questions. Extensive experiments are conducted on the CamObj-Bench with CamObj-Llava, 8 existing open-source and 3 closed-source LVLMs. Surprisingly, the results indicate that our model achieves a 25.84% improvement in 4 out of 7 tasks compared to GPT-4o. Code and datasets will be available at https://github.com/JCruan519/MM-CamObj., Comment: 9 pages, 5 figures. Work in progress
Published: 2024

257. Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Author: Gao, Xinyu, Xiong, Yun, Wang, Deze, Guan, Zhenhan, Shi, Zejian, Wang, Haofen, and Li, Shanshan
Subjects: Computer Science - Software Engineering
Abstract: Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via the retriever. The current approach suffers from two primary limitations: 1) information redundancy. The indiscriminate inclusion of redundant information can result in resource wastage and may misguide generators, affecting their effectiveness and efficiency. 2) preference gap. Due to different optimization objectives, the retriever strives to procure code with higher ground truth similarity, yet this effort does not substantially benefit the generator. The retriever and the generator may prefer different golden code, and this gap in preference results in a suboptimal design. Additionally, differences in parameterization knowledge acquired during pre-training result in varying preferences among different generators. To address these limitations, in this paper, we propose RRG (Retrieve, Refactor, Generate), a novel framework for effective and efficient code generation. This framework introduces a code refactorer module between the retriever and the generator to bridge them. The refactoring process transforms the raw retrieved code into a more concise, efficient, and model-friendly version. It eliminates redundant information and noise, reducing the input length. Consequently, the generator receives higher-quality context, enabling it to produce more accurate results with lower inference costs. We conducted comprehensive experiments on multiple datasets. In the experiments, we confirmed the existence of a preference gap between the retriever and the generator, and RRG effectively bridges this gap. Specifically, RRG achieved significant performance improvements, with increases of up to 28% on EM, 13% on BLEU, and 6.8% on CodeBLEU., Comment: ASE2024
Published: 2024

258. Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns

Author: Zhao, Yang, Du, Li, Ding, Xiao, Xiong, Kai, Liu, Ting, and Qin, Bing
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: LLMs' performance on complex tasks is still unsatisfactory. A key issue is that presently LLMs learn in a data-driven schema, while the instructions about these complex tasks are both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could enhance the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples.Based on these insights, experiments are conducted to actually enhance the efficiency and effectiveness of SFT., Comment: in review
Published: 2024

259. Disentangled Generation and Aggregation for Robust Radiance Fields

Author: Shen, Shihe, Gao, Huachen, Xu, Wangze, Peng, Rui, Tang, Luyang, Xiong, Kaiqiang, Jiao, Jianbo, and Wang, Ronggang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: The utilization of the triplane-based radiance fields has gained attention in recent years due to its ability to effectively disentangle 3D scenes with a high-quality representation and low computation cost. A key requirement of this method is the precise input of camera poses. However, due to the local update property of the triplane, a similar joint estimation as previous joint pose-NeRF optimization works easily results in local minima. To this end, we propose the Disentangled Triplane Generation module to introduce global feature context and smoothness into triplane learning, which mitigates errors caused by local updating. Then, we propose the Disentangled Plane Aggregation to mitigate the entanglement caused by the common triplane feature aggregation during camera pose updating. In addition, we introduce a two-stage warm-start training strategy to reduce the implicit constraints caused by the triplane generator. Quantitative and qualitative results demonstrate that our proposed method achieves state-of-the-art performance in novel view synthesis with noisy or unknown camera poses, as well as efficient convergence of optimization. Project page: https://gaohchen.github.io/DiGARR/., Comment: 27 pages, 11 figures, Accepted by ECCV'2024
Published: 2024

260. Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)

Author: Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Bian, Jiang, Wang, Shuaiqiang, Chen, Guihai, and Yin, Dawei
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to address queries across the popularity spectrum, and (2) inadequately trained models that fail to induce generalized representations for LTR, resulting in overfitting. To address these challenges, we propose a \emph{\uline{G}enerative \uline{S}emi-\uline{S}upervised \uline{P}re-trained} (GS2P) LTR model. We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine. Furthermore, we deploy GS2P in a large-scale web search engine with realistic traffic, where we observe significant improvements in the real-world application.
Published: 2024

261. Pre-trained Graphformer-based Ranking at Web-scale Search (Extended Abstract)

Author: Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Sun, Zeyi, Chen, Hongyang, Wang, Shuaiqiang, and Yin, Dawei
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval
Abstract: Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain of learning to rank (LTR). However, these approaches adhere to two distinct yet complementary problem formulations: ranking score regression based on query-webpage pairs, and link prediction within query-webpage bipartite graphs, respectively. While it is possible to pre-train GNNs or Transformers on source datasets and subsequently fine-tune them on sparsely annotated LTR datasets, the distributional shifts between the pair-based and bipartite graph domains present significant challenges in integrating these heterogeneous models into a unified LTR framework at web scale. To address this, we introduce the novel MPGraf model, which leverages a modular and capsule-based pre-training strategy, aiming to cohesively integrate the regression capabilities of Transformers with the link prediction strengths of GNNs. We conduct extensive offline and online experiments to rigorously evaluate the performance of MPGraf.
Published: 2024

262. Adaptive Learning via a Negative Selection Strategy for Few-Shot Bioacoustic Event Detection

Author: Chen, Yaxiong, Zhang, Xueping, Zi, Yunfei, and Xiong, Shengwu
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Although the Prototypical Network (ProtoNet) has demonstrated effectiveness in few-shot biological event detection, two persistent issues remain. Firstly, there is difficulty in constructing a representative negative prototype due to the absence of explicitly annotated negative samples. Secondly, the durations of the target biological vocalisations vary across tasks, making it challenging for the model to consistently yield optimal results across all tasks. To address these issues, we propose a novel adaptive learning framework with an adaptive learning loss to guide classifier updates. Additionally, we propose a negative selection strategy to construct a more representative negative prototype for ProtoNet. All experiments ware performed on the DCASE 2023 TASK5 few-shot bioacoustic event detection dataset. The results show that our proposed method achieves an F-measure of 0.703, an improvement of 12.84%.
Published: 2024

263. Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks

Author: He, Jiayi, Luo, Xiaofeng, Kang, Jiawen, Du, Hongyang, Xiong, Zehui, Chen, Ci, Niyato, Dusit, and Shen, Xuemin
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Semantic Communication (SemCom) plays a pivotal role in 6G networks, offering a viable solution for future efficient communication. Deep Learning (DL)-based semantic codecs further enhance this efficiency. However, the vulnerability of DL models to security threats, such as adversarial attacks, poses significant challenges for practical applications of SemCom systems. These vulnerabilities enable attackers to tamper with messages and eavesdrop on private information, especially in wireless communication scenarios. Although existing defenses attempt to address specific threats, they often fail to simultaneously handle multiple heterogeneous attacks. To overcome this limitation, we introduce a novel Mixture-of-Experts (MoE)-based SemCom system. This system comprises a gating network and multiple experts, each specializing in different security challenges. The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements. Multiple experts collaborate to accomplish semantic communication tasks while meeting the security requirements of users. A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system. Simulation results show that the proposed MoE-based SemCom system effectively mitigates concurrent heterogeneous attacks, with minimal impact on downstream task accuracy., Comment: 8 pages, 3 figures
Published: 2024

264. Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Author: Tan, Min, Tao, Yushun, Zheng, Boyun, Xie, GaoSheng, Feng, Lijuan, Xia, Zeyang, and Xiong, Jing
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.
Published: 2024

265. Generative AI-Enhanced Multi-Modal Semantic Communication in Internet of Vehicles: System Design and Methodologies

Author: Lu, Jiayi, Yang, Wanting, Xiong, Zehui, Xing, Chengwen, Tafazolli, Rahim, Quek, Tony Q. S., and Debbah, Merouane
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Vehicle-to-everything (V2X) communication supports numerous tasks, from driving safety to entertainment services. To achieve a holistic view, vehicles are typically equipped with multiple sensors to compensate for undetectable blind spots. However, processing large volumes of multi-modal data increases transmission load, while the dynamic nature of vehicular networks adds to transmission instability. To address these challenges, we propose a novel framework, Generative Artificial intelligence (GAI)-enhanced multi-modal semantic communication (SemCom), referred to as G-MSC, designed to handle various vehicular network tasks by employing suitable analog or digital transmission. GAI presents a promising opportunity to transform the SemCom framework by significantly enhancing semantic encoding to facilitate the optimized integration of multi-modal information, enhancing channel robustness, and fortifying semantic decoding against noise interference. To validate the effectiveness of the G-MSC framework, we conduct a case study showcasing its performance in vehicular communication networks for predictive tasks. The experimental results show that the design achieves reliable and efficient communication in V2X networks. In the end, we present future research directions on G-MSC.
Published: 2024

266. Direct Judgement Preference Optimization

Author: Wang, Peifeng, Xu, Austin, Zhou, Yilun, Xiong, Caiming, and Joty, Shafiq
Subjects: Computer Science - Computation and Language
Abstract: Auto-evaluation is crucial for assessing response quality and offering feedback for model development. Recent studies have explored training large language models (LLMs) as generative judges to evaluate and critique other models' outputs. In this work, we investigate the idea of learning from both positive and negative data with preference optimization to enhance the evaluation capabilities of LLM judges across an array of different use cases. We achieve this by employing three approaches to collect the preference pairs for different use cases, each aimed at improving our generative judge from a different perspective. Our comprehensive study over a wide range of benchmarks demonstrates the effectiveness of our method. In particular, our generative judge achieves the best performance on 10 out of 13 benchmarks, outperforming strong baselines like GPT-4o and specialized judge models. Further analysis show that our judge model robustly counters inherent biases such as position and length bias, flexibly adapts to any evaluation protocol specified by practitioners, and provides helpful language feedback for improving downstream generator models., Comment: Preprint
Published: 2024

267. Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning

Author: Chen, Qi, Xing, Xiaohan, Chen, Zhen, and Xiong, Zhiwei
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: To accelerate Magnetic Resonance (MR) imaging procedures, Multi-Contrast MR Reconstruction (MCMR) has become a prevalent trend that utilizes an easily obtainable modality as an auxiliary to support high-quality reconstruction of the target modality with under-sampled k-space measurements. The exploration of global dependency and complementary information across different modalities is essential for MCMR. However, existing methods either struggle to capture global dependency due to the limited receptive field or suffer from quadratic computational complexity. To tackle this dilemma, we propose a novel Frequency and Spatial Mutual Learning Network (FSMNet), which efficiently explores global dependencies across different modalities. Specifically, the features for each modality are extracted by the Frequency-Spatial Feature Extraction (FSFE) module, featuring a frequency branch and a spatial branch. Benefiting from the global property of the Fourier transform, the frequency branch can efficiently capture global dependency with an image-size receptive field, while the spatial branch can extract local features. To exploit complementary information from the auxiliary modality, we propose a Cross-Modal Selective fusion (CMS-fusion) module that selectively incorporate the frequency and spatial features from the auxiliary modality to enhance the corresponding branch of the target modality. To further integrate the enhanced global features from the frequency branch and the enhanced local features from the spatial branch, we develop a Frequency-Spatial fusion (FS-fusion) module, resulting in a comprehensive feature representation for the target modality. Extensive experiments on the BraTS and fastMRI datasets demonstrate that the proposed FSMNet achieves state-of-the-art performance for the MCMR task with different acceleration factors. The code is available at: https://github.com/qic999/FSMNet., Comment: Accepted as a poster by Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024
Published: 2024

268. IMOST: Incremental Memory Mechanism with Online Self-Supervision for Continual Traversability Learning

Author: Ma, Kehui, Sun, Zhen, Xiong, Chaoran, Zhu, Qiumin, Wang, Kewei, and Pei, Ling
Subjects: Computer Science - Robotics
Abstract: Traversability estimation is the foundation of path planning for a general navigation system. However, complex and dynamic environments pose challenges for the latest methods using self-supervised learning (SSL) technique. Firstly, existing SSL-based methods generate sparse annotations lacking detailed boundary information. Secondly, their strategies focus on hard samples for rapid adaptation, leading to forgetting and biased predictions. In this work, we propose IMOST, a continual traversability learning framework composed of two key modules: incremental dynamic memory (IDM) and self-supervised annotation (SSA). By mimicking human memory mechanisms, IDM allocates novel data samples to new clusters according to information expansion criterion. It also updates clusters based on diversity rule, ensuring a representative characterization of new scene. This mechanism enhances scene-aware knowledge diversity while maintaining a compact memory capacity. The SSA module, integrating FastSAM, utilizes point prompts to generate complete annotations in real time which reduces training complexity. Furthermore, IMOST has been successfully deployed on the quadruped robot, with performance evaluated during the online learning process. Experimental results on both public and self-collected datasets demonstrate that our IMOST outperforms current state-of-the-art method, maintains robust recognition capabilities and adaptability across various scenarios. The code is available at https://github.com/SJTU-MKH/OCLTrav.
Published: 2024
Full Text: View/download PDF

269. Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors

Author: Sun, Shida, Li, Yue, Zhang, Yueyi, and Xiong, Zhiwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Non-line-of-sight (NLOS) imaging, recovering the hidden volume from indirect reflections, has attracted increasing attention due to its potential applications. Despite promising results, existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors, e.g., single fixed path compensation. Moreover, these approaches still possess limited generalization ability, particularly when dealing with scenes at a low signal-to-noise ratio (SNR). To overcome the above problems, we introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF). The LPC applies tailored path compensation coefficients to adapt to different objects in the scene, effectively reducing light wave attenuation, especially in distant regions. Meanwhile, the APF learns the precise Gaussian window of the illumination function for the phasor field, dynamically selecting the relevant spectrum band of the transient measurement. Experimental validations demonstrate that our proposed approach, only trained on synthetic data, exhibits the capability to seamlessly generalize across various real-world datasets captured by different imaging systems and characterized by low SNRs.
Published: 2024

270. Superconvergence of the local discontinuous Galerkin method with generalized numerical fluxes for one-dimensional linear time-dependent fourth-order equations

Author: Li, Linhui, Meng, Xiong, and Wu, Boying
Subjects: Mathematics - Numerical Analysis, 65M60
Abstract: In this paper, we concentrate on the superconvergence of the local discontinuous Galerkin method with generalized numerical fluxes for one-dimensional linear time-dependent fourth-order equations. The adjustable numerical viscosity of the generalized numerical fluxes is beneficial for long time simulations with a slower error growth. By using generalized Gauss--Radau projections and correction functions together with a suitable numerical initial condition, we derive, for polynomials of degree $k$, $(2k+1)$th order superconvergence for the numerical flux and cell averages, $(k+2)$th order superconvergence at generalized Radau points, and $(k+1)$th order for error derivative at generalized Radau points. Moreover, a supercloseness result of order $(k+2)$ is established between the generalized Gauss--Radau projection and the numerical solution. Superconvergence analysis of mixed boundary conditions is also given. Equations with discontinuous initial condition and nonlinear convection term are numerically investigated, illustrating that the conclusions are valid for more general cases., Comment: 22 pages, 1 figure
Published: 2024

271. Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Author: Gilson, Aidan, Ai, Xuguang, Arunachalam, Thilaka, Chen, Ziyou, Cheong, Ki Xiong, Dave, Amisha, Duic, Cameron, Kibe, Mercy, Kaminaka, Annette, Prasad, Minali, Siddig, Fares, Singer, Maxwell, Wong, Wendy, Jin, Qiao, Keenan, Tiarnan D. L., Hu, Xia, Chew, Emily Y., Lu, Zhiyong, Xu, Hua, Adelman, Ron A., Tham, Yih-Chung, and Chen, Qingyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3% hallucinated, 34.1% consisted of minor errors, and 20.6% were correct. In contrast, LLMs with RAG significantly improved accuracy (54.5% being correct) and reduced error rates (18.8% with minor hallucinations and 26.7% with errors). 62.5% of the top 10 documents retrieved by RAG were selected as the top references in the LLM response, with an average ranking of 4.9. The use of RAG also improved evidence attribution (increasing from 1.85 to 2.49 on a 5-point scale, P<0.001), albeit with slight decreases in accuracy (from 3.52 to 3.23, P=0.03) and completeness (from 3.47 to 3.27, P=0.17). The results demonstrate that LLMs frequently exhibited hallucinated and erroneous evidence in the responses, raising concerns for downstream applications in the medical domain. RAG substantially reduced the proportion of such evidence but encountered challenges.
Published: 2024

272. Generation of strong mechanical squeezing through the joint effect of two-tone driving and parametric pumping

Author: Wu, Xiao-Jie, Cheng, Huan-Huan, Wu, Qiannan, Bai, Cheng-Hua, and Wu, Shao-Xiong
Subjects: Quantum Physics
Abstract: We propose an innovative scheme to efficiently prepare strong mechanical squeezing through utilizing the synergistic mechanism of two-tone driving and parametric pumping in an optomechanical system. By reasonable choosing the system parameters, the proposal highlights the following prominent advantages: the squeezing effect of the cavity field induced by the optical parametric amplifier can be transferred to the mechanical oscillator, which has been squeezed by the two-tone driving, and the degree of squeezing of the mechanical oscillator will surpass that obtained by any single mechanism; the joint mechanism can enhance the degree of squeezing significantly and break the 3 dB mechanical squeezing limit, which is particularly evident in range where the red/blue-detuned ratio is sub-optimal; the mechanical squeezing achieved through this distinctive joint mechanism exhibits notable robustness against both thermal noise and decay of mechanical oscillator. Our project offers a versatile and efficient approach for generating strong mechanical squeezing across a wide range of conditions.
Published: 2024
Full Text: View/download PDF

273. Double-Helix Singularity and Vortex-Antivortex Annihilation in Space-Time Helical Pulses

Author: Shi, Shuai, Wang, Ren, Xiong, Minhui, Zhou, Qinyu, Wang, Bing-Zhong, and Shen, Yijie
Subjects: Physics - Optics
Abstract: Topological structures reveal the hidden secrets and beauty in nature, such as the double helix in DNA, whilst, the manipula-tion of which in physical fields, especially in ultrafast struc-tured light, draw booming attention. Here we introduce a new family of spatiotemporal light fields, i.e. helical pulses, carry-ing sophisticated double-helix singularities in its electromag-netic topological structures. The helical pulses were solved from Maxwell's equation as chiral extensions of toroidal light pulses but with controlled angular momentum dependence. We unveil that the double helix singularities can maintain their topological invariance during propagation and the field exhibits paired generation and annihilation of vortices and antivortices in ultrafast space-time, so as to be potential information carriers beating previous conventional vortex structured light.
Published: 2024

274. Generative Learning Powered Probing Beam Optimization for Cell-Free Hybrid Beamforming

Author: Zhang, Cheng, Xiong, Shuangbo, He, Mengqing, Wei, Lan, Huang, Yongming, and Zhang, Wei
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: Probing beam measurement (PBM)-based hybrid beamforming provides a feasible solution for cell-free MIMO. In this letter, we propose a novel probing beam optimization framework where three collaborative modules respectively realize PBM augmentation, sum-rate prediction and probing beam optimization. Specifically, the PBM augmentation model integrates the conditional variational auto-encoder (CVAE) and mixture density networks and adopts correlated PBM distribution with full-covariance, for which a Cholesky-decomposition based training is introduced to address the issues of covariance legality and numerical stability. Simulations verify the better performance of the proposed augmentation model compared to the traditional CVAE and the efficiency of proposed optimization framework.
Published: 2024

275. Four-fold truncated double-nested anti-resonant hollow-core fibers with ultralow loss and ultrahigh mode purity

Author: Gao, Shoufei, Chen, Hao, Sun, Yizhi, Xiong, Yifan, Yang, Zijie, Zhao, Rui, Ding, Wei, and Wang, Yingying
Subjects: Physics - Optics
Abstract: Hollow-core fibers are inherently multimode, making it crucial to filter out higher-order modes within the shortest possible fiber length for applications such as high speed coherent communications and fiber optic gyroscopes. However, current HCF designs face the challenges of simultaneously achieving ultralow fundamental mode loss and ultrahigh HOM suppression. In this study, we present a novel four fold truncated double nested anti resonant hollow core fiber structure that addresses this challenge. Our 4T-DNANF enables greater control over phase-matching between core modes and air modes in the cladding, allowing for minimized FM loss and substantially increased HOM loss. Experimentally, we fabricated several HCFs: one with an FM loss of 0.1 dB/km and an HOM loss of 430 dB/km, and another with an FM loss of 0.13 dB/km with a HOM loss of 6500 dB/km, resulting in a higher-order mode extinction ratio of 50,000., Comment: 8 pages, 2 figures
Published: 2024

276. RRM: Robust Reward Model Training Mitigates Reward Hacking

Author: Liu, Tianqi, Xiong, Wei, Ren, Jie, Chen, Lichang, Wu, Junru, Joshi, Rishabh, Gao, Yang, Shen, Jiaming, Qin, Zhen, Yu, Tianhe, Sohn, Daniel, Makarova, Anastasiia, Liu, Jeremiah, Liu, Yuan, Piot, Bilal, Ittycheriah, Abe, Kumar, Aviral, and Saleh, Mohammad
Subjects: Computer Science - Computation and Language
Abstract: Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.
Published: 2024

277. Breaking the Barriers of One-to-One Usage of Implicit Neural Representation in Image Compression: A Linear Combination Approach with Performance Guarantees

Author: Sanjeet, Sai, Hosseinalipour, Seyyedali, Xiong, Jinjun, Fujita, Masahiro, and Sahoo, Bibhu Datta
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: In an era where the exponential growth of image data driven by the Internet of Things (IoT) is outpacing traditional storage solutions, this work explores and advances the potential of Implicit Neural Representation (INR) as a transformative approach to image compression. INR leverages the function approximation capabilities of neural networks to represent various types of data. While previous research has employed INR to achieve compression by training small networks to reconstruct large images, this work proposes a novel advancement: representing multiple images with a single network. By modifying the loss function during training, the proposed approach allows a small number of weights to represent a large number of images, even those significantly different from each other. A thorough analytical study of the convergence of this new training method is also carried out, establishing upper bounds that not only confirm the validity of the method but also offer insights into optimal hyperparameter design. The proposed method is evaluated on the Kodak, ImageNet, and CIFAR-10 datasets. Experimental results demonstrate that all 24 images in the Kodak dataset can be represented by linear combinations of two sets of weights, achieving a peak signal-to-noise ratio (PSNR) of 26.5 dB with as low as 0.2 bits per pixel (BPP). The proposed method matches the rate-distortion performance of state-of-the-art image codecs, such as BPG, on the CIFAR-10 dataset. Additionally, the proposed method maintains the fundamental properties of INR, such as arbitrary resolution reconstruction of images., Comment: 10 pages, 13 figures
Published: 2024

278. LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Author: Veldanda, Akshaj Kumar, Zhang, Shi-Xiong, Das, Anirban, Chakraborty, Supriyo, Rawls, Stephen, Sahu, Sambit, and Naphade, Milind
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.
Published: 2024

279. LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction

Author: Jiang, Changjian, Gao, Ruilan, Shao, Kele, Wang, Yue, Xiong, Rong, and Zhang, Yu
Subjects: Computer Science - Robotics
Abstract: Large-scale 3D reconstruction is critical in the field of robotics, and the potential of 3D Gaussian Splatting (3DGS) for achieving accurate object-level reconstruction has been demonstrated. However, ensuring geometric accuracy in outdoor and unbounded scenes remains a significant challenge. This study introduces LI-GS, a reconstruction system that incorporates LiDAR and Gaussian Splatting to enhance geometric accuracy in large-scale scenes. 2D Gaussain surfels are employed as the map representation to enhance surface alignment. Additionally, a novel modeling method is proposed to convert LiDAR point clouds to plane-constrained multimodal Gaussian Mixture Models (GMMs). The GMMs are utilized during both initialization and optimization stages to ensure sufficient and continuous supervision over the entire scene while mitigating the risk of over-fitting. Furthermore, GMMs are employed in mesh extraction to eliminate artifacts and improve the overall geometric quality. Experiments demonstrate that our method outperforms state-of-the-art methods in large-scale 3D reconstruction, achieving higher accuracy compared to both LiDAR-based methods and Gaussian-based methods with improvements of 52.6% and 68.7%, respectively.
Published: 2024

280. Bridging the Gap: GRB 230812B -- A Three-Second Supernova-Associated Burst Detected by the GRID Mission

Author: Wang, Chen-Yu, Yin, Yi-Han Iris, Zhang, Bin-Bin, Feng, Hua, Zeng, Ming, Xiong, Shao-Lin, Pan, Xiao-Fan, Yang, Jun, Zhang, Yan-Qiu, Li, Chen, Yan, Zhen-Yu, Wang, Chen-Wei, Zheng, Xu-Tao, Liu, Jia-Cong, Wang, Qi-Dong, Yang, Zi-Rui, Li, Long-Hao, Liu, Qi-Ze, Zhao, Zheng-Yang, Hu, Bo, Liu, Yi-Qi, Lu, Si-Yuan, Luo, Zi-You, Cang, Ji-Rong, Cao, De-Zhi, Han, Wen-Tao, Jia, Li-Ping, Pan, Xing-Yu, Tian, Yang, Xu, Ben-Da, Yang, Xiao, and Zeng, Zhi
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: GRB 230812B, detected by the Gamma-Ray Integrated Detectors (GRID) constellation mission, is an exceptionally bright gamma-ray burst (GRB) with a duration of only 3 seconds. Sitting near the traditional boundary ($\sim$ 2 s) between long and short GRBs, GRB 230812B is notably associated with a supernova (SN), indicating a massive star progenitor. This makes it a rare example of a short-duration GRB resulting from stellar collapse. Our analysis, using a time-evolving synchrotron model, suggests that the burst has an emission radius of approximately $10^{14.5}$~cm. We propose that the short duration of GRB 230812B is due to the combined effects of the central engine's activity time and the time required for the jet to break through the stellar envelope. Our findings provide another case that challenges the conventional view that short-duration GRBs originate exclusively from compact object mergers, demonstrating that a broader range of durations exists for GRBs arising from the collapse of massive stars., Comment: 10 pages, 3 tables, 11 figures
Published: 2024

281. Bayer-type Vis-NIR Routing via Inverse Design for Submicron-pixel Image Sensing Chip

Author: Yang, Xianguang, Xiong, Shijie, Tan, Fangchang, Lin, Zhitao, Bao, Yanjun, Wen, Long, Chen, Qin, and Li, Baojun
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: With the advent of high-precision nanoscale lithography technology, high-resolution image sensing has experienced rapid development in recent years. Currently, mainstream commercial image sensors predominantly utilize Bayer array color filters to implement RGB colorful imaging strategies. However, as pixel sizes transition into the submicron dimensions, traditional dye filters used in image sensors have long been hampered by limited optical efficiency, suboptimal signal-to-noise ratios, and significant difficulties in miniaturization. In this work, a novel 4-channel RGB-IR color router for image sensing, distinct from the traditional absorption-transmission mechanisms, was proposed through inverse design methodologies. Utilizing genetic algorithms and DCGAN models, approximately 20,000 random color routing structures were generated and trained. From these, an optimized spectral splitting structure with a minimal periodic size of 1.6 um * 1.6 um was identified. This structure achieves peak optical efficiencies 1.7 times greater than those of dye filters, while also offering superior color imaging quality and signal intensity. This innovative design approach, leveraging deep learning integration, demonstrates an on-chip strategy for color realization in 4-channel image sensors, and holds significant promise for enhancing the development of next-generation high-performance image sensing chip systems., Comment: 19 pages,5 figures
Published: 2024

282. Geometric Relational Embeddings

Author: Xiong, Bo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks
Abstract: Relational representation learning transforms relational data into continuous and low-dimensional vector representations. However, vector-based representations fall short in capturing crucial properties of relational data that are complex and symbolic. We propose geometric relational embeddings, a paradigm of relational embeddings that respect the underlying symbolic structures. Specifically, this dissertation introduces various geometric relational embedding models capable of capturing: 1) complex structured patterns like hierarchies and cycles in networks and knowledge graphs; 2) logical structures in ontologies and logical constraints applicable for constraining machine learning model outputs; and 3) high-order structures between entities and relations. Our results obtained from benchmark and real-world datasets demonstrate the efficacy of geometric relational embeddings in adeptly capturing these discrete, symbolic, and structured properties inherent in relational data., Comment: Doctoral Dissertation, 177 pages
Published: 2024

283. UniMSF: A Unified Multi-Sensor Fusion Framework for Intelligent Transportation System Global Localization

Author: Liu, Wei, Zhu, Jiaqi, Zhuo, Guirong, Fu, Wufei, Meng, Zonglin, Lu, Yishi, Hua, Min, Qiao, Feng, Li, You, He, Yi, and Xiong, Lu
Subjects: Computer Science - Robotics
Abstract: Intelligent transportation systems (ITS) localization is of significant importance as it provides fundamental position and orientation for autonomous operations like intelligent vehicles. Integrating diverse and complementary sensors such as global navigation satellite system (GNSS) and 4D-radar can provide scalable and reliable global localization. Nevertheless, multi-sensor fusion encounters challenges including heterogeneity and time-varying uncertainty in measurements. Consequently, developing a reliable and unified multi-sensor framework remains challenging. In this paper, we introduce UniMSF, a comprehensive multi-sensor fusion localization framework for ITS, utilizing factor graphs. By integrating a multi-sensor fusion front-end, alongside outlier detection\&noise model estimation, and a factor graph optimization back-end, this framework accomplishes efficient fusion and ensures accurate localization for ITS. Specifically, in the multi-sensor fusion front-end module, we tackle the measurement heterogeneity among different modality sensors and establish effective measurement models. Reliable outlier detection and data-driven online noise estimation methods ensure that back-end optimization is immune to interference from outlier measurements. In addition, integrating multi-sensor observations via factor graph optimization offers the advantage of \enquote{plug and play}. Notably, our framework features high modularity and is seamlessly adapted to various sensor configurations. We demonstrate the effectiveness of the proposed framework through real vehicle tests by tightly integrating GNSS pseudorange and carrier phase information with IMU, and 4D-radar.
Published: 2024

284. From Lists to Emojis: How Format Bias Affects Model Alignment

Author: Zhang, Xuanchang, Xiong, Wei, Chen, Lichang, Zhou, Tianyi, Huang, Heng, and Zhang, Tong
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we study format biases in reinforcement learning from human feedback (RLHF). We observe that many widely-used preference models, including human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark, exhibit strong biases towards specific format patterns, such as lists, links, bold text, and emojis. Furthermore, large language models (LLMs) can exploit these biases to achieve higher rankings on popular benchmarks like AlpacaEval and LMSYS Chatbot Arena. One notable example of this is verbosity bias, where current preference models favor longer responses that appear more comprehensive, even when their quality is equal to or lower than shorter, competing responses. However, format biases beyond verbosity remain largely underexplored in the literature. In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases. Additionally, we show that with a small amount of biased data (less than 1%), we can inject significant bias into the reward model. Moreover, these format biases can also be easily exploited by downstream alignment algorithms, such as best-of-n sampling and online iterative DPO, as it is usually easier to manipulate the format than to improve the quality of responses. Our findings emphasize the need to disentangle format and content both for designing alignment algorithms and evaluating models., Comment: Working in progress
Published: 2024

285. Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Author: Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, and Sahu, Sambit
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth exploration of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area., Comment: Survey paper
Published: 2024

286. CUNSB-RFIE: Context-aware Unpaired Neural Schr\'odinger Bridge in Retinal Fundus Image Enhancement

Author: Dong, Xuanzhao, Vasa, Vamsi Krishna, Zhu, Wenhui, Qiu, Peijie, Chen, Xiwen, Su, Yi, Xiong, Yujian, Yang, Zhangsihao, Chen, Yanxi, and Wang, Yalin
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schr\"odinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schr\"{o}dinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at https://github.com/Retinal-Research/CUNSB-RFIE .
Published: 2024

287. P2 Explore: Efficient Exploration in Unknown Clustered Environment with Floor Plan Prediction

Author: Song, Kun, Chen, Gaoming, Tomizuka, Masayoshi, Zhan, Wei, Xiong, Zhenhua, and Ding, Mingyu
Subjects: Computer Science - Robotics
Abstract: Robot exploration aims at constructing unknown environments and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, due to the randomness of obstacles, the ability for prediction is limited. Therefore, to solve this problem, we propose a map prediction algorithm that can be efficient in predicting the layout of noisy indoor environments. We focus on the scenario of 2D exploration. First, we perform floor plan extraction by denoising the cluttered map using deep learning. Then, we use a floor plan-based algorithm to improve the prediction accuracy. Additionally, we extract the segmentation of rooms and construct their connectivity based on the predicted map, which can be used for downstream tasks. To validate the effectiveness of the proposed method, it is applied to exploration tasks. Extensive experiments show that even in cluttered scenes, our proposed method can benefit efficiency., Comment: 7 pages, submitted to ICRA 2025
Published: 2024

288. Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking

Author: Li, Shuhao, Lou, Jingwen, Mulatihan, Yelina, Xiong, Yuhang, Li, Yao, and Xu, Qi
Subjects: Quantitative Biology - Genomics
Abstract: Background: Allium vegetables (garlic and onion) are one of the flavorings in people's daily diets. Observational studies suggest that intake of allium vegetables may be correlated with a lower incidence of digestive system cancers. However, the existence of a causal relationship is still controversial due to confounding factors and reverse causation. Therefore, we explored the causal relationship between intake of allium vegetables and digestive system cancers using Mendelian randomization approach. Methods: First, we performed Mendelian randomization analyses using inverse variance weighting (IVW), weighted median, and MR-Egger approaches, and demonstrated the reliability of the results in the sensitivity step. Second, Multivariable Mendelian randomization was applied to adjust for smoking and alcohol consumption. Third, we explored the molecular mechanisms behind the positive results through network pharmacology and molecular docking methods. Results: The study suggests that increased intake of garlic reduced gastric cancer risk. However, onion intake was not statistically associated with digestive system cancer. Conclusion: Garlic may have a protective effect against gastric cancer.
Published: 2024

289. DIGIMON: Diagnosis and Mitigation of Sampling Skew for Reinforcement Learning based Meta-Planner in Robot Navigation

Author: Feng, Shiwei, Chen, Xuan, Cheng, Zhiyuan, Xiong, Zikang, Gao, Yifei, Cheng, Siyuan, Kate, Sayali, and Zhang, Xiangyu
Subjects: Computer Science - Robotics
Abstract: Robot navigation is increasingly crucial across applications like delivery services and warehouse management. The integration of Reinforcement Learning (RL) with classical planning has given rise to meta-planners that combine the adaptability of RL with the explainable decision-making of classical planners. However, the exploration capabilities of RL-based meta-planners during training are often constrained by the capabilities of the underlying classical planners. This constraint can result in limited exploration, thereby leading to sampling skew issues. To address these issues, our paper introduces a novel framework, DIGIMON, which begins with behavior-guided diagnosis for exploration bottlenecks within the meta-planner and follows up with a mitigation strategy that conducts up-sampling from diagnosed bottleneck data. Our evaluation shows 13.5%+ improvement in navigation performance, greater robustness in out-of-distribution environments, and a 4x boost in training efficiency. DIGIMON is designed as a versatile, plug-and-play solution, allowing seamless integration into various RL-based meta-planners.
Published: 2024

290. Semantics Preserving Emoji Recommendation with Large Language Models

Author: Qiu, Zhongyi, Qiu, Kangyi, Lyu, Hanjia, Xiong, Wei, and Luo, Jiebo
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks
Abstract: Emojis have become an integral part of digital communication, enriching text by conveying emotions, tone, and intent. Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. However, they ignore the essence of users' behavior on social media in that each text can correspond to multiple reasonable emojis. To better assess a model's ability to align with such real-world emoji usage, we propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text. To evaluate how well a model preserves semantics, we assess whether the predicted affective state, demographic profile, and attitudinal stance of the user remain unchanged. If these attributes are preserved, we consider the recommended emojis to have maintained the original semantics. The advanced abilities of Large Language Models (LLMs) in understanding and generating nuanced, contextually relevant output make them well-suited for handling the complexities of semantics preserving emoji recommendation. To this end, we construct a comprehensive benchmark to systematically assess the performance of six proprietary and open-source LLMs using different prompting techniques on our task. Our experiments demonstrate that GPT-4o outperforms other LLMs, achieving a semantics preservation score of 79.23%. Additionally, we conduct case studies to analyze model biases in downstream classification tasks and evaluate the diversity of the recommended emojis.
Published: 2024

291. Uncovering the Secrets of Human-Like Movement: A Fresh Perspective on Motion Planning

Author: Shi, Lei, Liu, Qichao, Zhou, Cheng, Gao, Wentao, Wu, Haotian, Zheng, Yu, and Li, Xiong
Subjects: Computer Science - Robotics
Abstract: This article explores human-like movement from a fresh perspective on motion planning. We analyze the coordinated and compliant movement mechanisms of the human body from the perspective of biomechanics. Based on these mechanisms, we propose an optimal control framework that integrates compliant control dynamics, optimizing robotic arm motion through a response time matrix. This matrix sets the timing parameters for joint movements, turning the system into a time-parameterized optimal control problem. The model focuses on the interaction between active and passive joints under external disturbances, improving adaptability and compliance. This method achieves optimal trajectory generation and balances precision and compliance. Experimental results on both a manipulator and a humanoid robot validate the approach., Comment: 7 pages
Published: 2024

292. A Fairness-Oriented Control Framework for Safety-Critical Multi-Robot Systems: Alternative Authority Control

Author: Shi, Lei, Liu, Qichao, Zhou, Cheng, and Li, Xiong
Subjects: Computer Science - Robotics
Abstract: This paper proposes a fair control framework for multi-robot systems, which integrates the newly introduced Alternative Authority Control (AAC) and Flexible Control Barrier Function (F-CBF). Control authority refers to a single robot which can plan its trajectory while considering others as moving obstacles, meaning the other robots do not have authority to plan their own paths. The AAC method dynamically distributes the control authority, enabling fair and coordinated movement across the system. This approach significantly improves computational efficiency, scalability, and robustness in complex environments. The proposed F-CBF extends traditional CBFs by incorporating obstacle shape, velocity, and orientation. F-CBF enhances safety by accurate dynamic obstacle avoidance. The framework is validated through simulations in multi-robot scenarios, demonstrating its safety, robustness and computational efficiency.
Published: 2024

293. Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation

Author: Chang, Minsuk, Lee, Soohyun, Cho, Aeri, Jeon, Hyeon, Park, Seokhyeon, Bearfield, Cindy Xiong, and Seo, Jinwook
Subjects: Computer Science - Human-Computer Interaction
Abstract: We introduce a novel crowdsourcing method for identifying important areas in graphical images through punch-hole labeling. Traditional methods, such as gaze trackers and mouse-based annotations, which generate continuous data, can be impractical in crowdsourcing scenarios. They require many participants, and the outcome data can be noisy. In contrast, our method first segments the graphical image with a grid and drops a portion of the patches (punch holes). Then, we iteratively ask the labeler to validate each annotation with holes, narrowing down the annotation only having the most important area. This approach aims to reduce annotation noise in crowdsourcing by standardizing the annotations while enhancing labeling efficiency and reliability. Preliminary findings from fundamental charts demonstrate that punch-hole labeling can effectively pinpoint critical regions. This also highlights its potential for broader application in visualization research, particularly in studying large-scale users' graphical perception. Our future work aims to enhance the algorithm to achieve faster labeling speed and prove its utility through large-scale experiments., Comment: 2 pages, 1 figure, presented at IEEE VIS 2024 poster session
Published: 2024

294. The central limit theorem for entries of random matrices with specific rank over finite fields

Author: Chan, Chin Hei and Xiong, Maosheng
Subjects: Mathematics - Number Theory, Mathematics - Combinatorics, 15B52, 11T99, 05C50, 60F05
Abstract: Let $\mathbb{F}_q$ be the finite field of order $q$, and $\mathcal{A}$ a non-empty proper subset of $\mathbb{F}_q$. Let $\mathbf{M}$ be a random $m \times n$ matrix of rank $r$ over $\mathbb{F}_q$ taken with uniform distribution. It was proved recently by Sanna that as $m,n \to \infty$ and $r,q,\mathcal{A}$ are fixed, the number of entries of $\mathbf{M}$ in $\mathcal{A}$ approaches a normal distribution. The question was raised as to whether or not one can still obtain a central limit theorem of some sort when $r$ goes to infinity in a way controlled by $m$ and $n$. In this paper we answer this question affirmatively.
Published: 2024

295. TV Mon -- post mass transfer Algol type binary with $\delta$ Scuti pulsations in primary component

Author: Kovalev, Mikhail, Li, Zhenwei, Xiong, Jianping, Matekov, Azizbek, Bo, Zhang, Chen, Xuefei, and Han, Zhanwen
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics
Abstract: We present a study of the detached eclipsing binary TV~Mon using spectra from the LAMOST medium-resolution survey and ASAS-SN, CoRoT photometry. We apply multiple-epochs spectral fitting to derive RV and spectral parameters. The analysis of eclipses in CoRoT data show the relative sizes of the stellar components and almost edge-on circular orbit. Combining the spectral and photometrical solutions we estimate masses and radii of the components: $M_{A,B}=2.063\pm0.033({\rm stat.})\pm0.095({\rm syst.}),~0.218\pm0.004({\rm stat.})\pm0.018({\rm syst.})~M_\odot$, $R_{A,B}=2.394\pm0.014,~2.860\pm0.016~R_\odot$. SED analysis and Gaia parallax allow us to get estimation of temperatures ${T_{ eff}}_{A,B}=7624^{+194}_{-174},~5184^{+130}_{-123}$ K and distance $d=907\pm11$ pc. We identify three $\delta$ Scuti type pulsation frequencies in the primary component, while we also suspect TV~Mon having a spot activity in the secondary component. This system experienced intensive mass transfer and mass ratio reversal in the past, but currently shows no signs of mass transfer in the spectra. The low mass component will lose its outer envelope and shrink to the helium white dwarf, the mass and orbital period of which are in good agreement with evolutionary models predictions., Comment: accepted in MNRAS 31.10.2024
Published: 2024

296. Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Author: Xiong, Chenxu, Fu, Ruibo, Shi, Shuchen, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Li, Chenxing, Qiang, Chunyu, Xie, Yuankun, Qi, Xin, Li, Guanjun, and Yang, Zizheng
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Sound
Abstract: Current mainstream audio generation methods primarily rely on simple text prompts, often failing to capture the nuanced details necessary for multi-style audio generation. To address this limitation, the Sound Event Enhanced Prompt Adapter is proposed. Unlike traditional static global style transfer, this method extracts style embedding through cross-attention between text and reference audio for adaptive style control. Adaptive layer normalization is then utilized to enhance the model's capacity to express multiple styles. Additionally, the Sound Event Reference Style Transfer Dataset (SERST) is introduced for the proposed target style audio generation task, enabling dual-prompt audio generation using both text and audio references. Experimental results demonstrate the robustness of the model, achieving state-of-the-art Fr\'echet Distance of 26.94 and KL Divergence of 1.82, surpassing Tango, AudioLDM, and AudioGen. Furthermore, the generated audio shows high similarity to its corresponding audio reference. The demo, code, and dataset are publicly available., Comment: 5 pages, 2 figures, submitted to ICASSP 2025
Published: 2024

297. Gravitational Wave Birefringence in Symmetron Cosmology

Author: Xiong, Ze-Xuan and Huang, Da
Subjects: General Relativity and Quantum Cosmology, Astrophysics - Cosmology and Nongalactic Astrophysics, High Energy Physics - Phenomenology
Abstract: The symmetron is a light scalar which provides a screening mechanism so as to evade the strong constraints from local gravity tests. In order to achieve this goal, a $Z_2$ symmetry is imposed on the symmetron model. In this paper, we introduce a new symmetron Chern-Simons-like gravitational interaction which is $Z_2$ invariant but breaks the parity symmetry explicitly. As a result, it is found that this coupling can generate gravitational wave (GW) amplitude birefringence when GWs propagate over the symmetron backgrounds. Due to the matter density difference, the symmetron profile changes significantly when entering the galaxy, so that we need to discuss the extra-galactic and galactic situations separately. On the one hand, the cosmological symmetron field follows the adiabatic solution, which induces a parity-violating GW amplitude correction with its exponent proportional to the GW frequency and the traveling distance. On the other hand, the symmetron takes the screening solution within the Milky Way, and the generated GW birefringence is only a function of the GW frequency. By further comparing these two contributions, we find that the extra-galactic symmetron field produces the dominant birefringence effects. Finally, with the latest GW data from LIGO-Virgo-Kagra, we place a reasonable constraint on the parity-violating coupling parameter in this symmetron model., Comment: 22 pages
Published: 2024

298. Generative AI in Data Center Networking: Fundamentals, Perspectives, and Case Study

Author: Liu, Yinqiu, Du, Hongyang, Niyato, Dusit, Kang, Jiawen, Xiong, Zehui, Wen, Yonggang, and Kim, Dong In
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Generative AI (GenAI), exemplified by Large Language Models (LLMs) such as OpenAI's ChatGPT, is revolutionizing various fields. Central to this transformation is Data Center Networking (DCN), which not only provides the computational power necessary for GenAI training and inference but also delivers GenAI-driven services to users. This article examines an interplay between GenAI and DCNs, highlighting their symbiotic relationship and mutual advancements. We begin by reviewing current challenges within DCNs and discuss how GenAI contributes to enhancing DCN capabilities through innovations, such as data augmentation, process automation, and domain transfer. We then focus on analyzing the distinctive characteristics of GenAI workloads on DCNs, gaining insights that catalyze the evolution of DCNs to more effectively support GenAI and LLMs. Moreover, to illustrate the seamless integration of GenAI with DCNs, we present a case study on full-lifecycle DCN digital twins. In this study, we employ LLMs equipped with Retrieval Augmented Generation (RAG) to formulate optimization problems for DCNs and adopt Diffusion-Deep Reinforcement Learning (DRL) for optimizing the RAG knowledge placement strategy. This approach not only demonstrates the application of advanced GenAI methods within DCNs but also positions the digital twin as a pivotal GenAI service operating on DCNs. We anticipate that this article can promote further research into enhancing the virtuous interaction between GenAI and DCNs., Comment: 9 pages
Published: 2024

299. Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Author: Ward, Nigel G., Segura, Andres, Bugarini, Georgina, Lehnert-LeHouillier, Heike, Liu, Dancheng, Xiong, Jinjun, and Fuentes, Olac
Subjects: Computer Science - Computation and Language
Abstract: The diagnosis and treatment of individuals with communication disorders offers many opportunities for the application of speech technology, but research so far has not adequately considered: the diversity of conditions, the role of pragmatic deficits, and the challenges of limited data. This paper explores how a general-purpose model of perceived pragmatic similarity may overcome these limitations. It explains how it might support several use cases for clinicians and clients, and presents evidence that a simple model can provide value, and in particular can capture utterance aspects that are relevant to diagnoses of autism and specific language impairment., Comment: submitted to IEEE ICASSP 2025
Published: 2024

300. Evolution and the quasistationary state of collective fast neutrino flavor conversion in three dimensions without axisymmetry

Author: George, Manu, Xiong, Zewei, Wu, Meng-Ru, and Lin, Chun-Yu
Subjects: Astrophysics - High Energy Astrophysical Phenomena, High Energy Physics - Phenomenology
Abstract: We investigate in this work the evolution of the collective fast neutrino flavor conversion (FFC) in a three dimensional (3D) cubic box with periodic boundary condition for three different neutrino angular distributions that are axially asymmetric. We find that the system evolves toward a quasistationary state where the angular distribution of the spatially averaged neutrino electron-minus-muon lepton number (ELN) does not contain any crossings. In the quasistationary state, near flavor equilibration is achieved in one angular domain enclosed by the initial ELN angular crossing contour, similar to the conclusion derived based on simplified one dimensional (1D) system with axially symmetric neutrino angular distributions. We have also performed additional simulations in coordinates where the initial first ELN angular moment has only one nonvanishing spatial component by using the original axially asymmetric ELN angular distributions as well as the corresponding axisymmetric ELN distributions, and find interesting similarity between these two sets. Finally, we propose three different analytical prescriptions generalized from earlier 1D models to 3D models, and evaluate their performances in predicting the post-FFC moments. Our findings suggest that further development of effective classical transport model in multidimensions to capture the effect of FFC is promising., Comment: 10 pages, 8 figures
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

702,576 results on '"Xiong A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources