Author: "A. A. Mao" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"A. A. Mao"' showing total 602,105 results

Start Over Author "A. A. Mao"

602,105 results on '"A. A. Mao"'

251. Infer Human's Intentions Before Following Natural Language Instructions

Author: Wan, Yanming, Wu, Yue, Wang, Yiping, Mao, Jiayuan, and Jaques, Natasha
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: For AI agents to be helpful to humans, they should be able to follow natural language instructions to complete everyday cooperative tasks in human environments. However, real human instructions inherently possess ambiguity, because the human speakers assume sufficient prior knowledge about their hidden goals and intentions. Standard language grounding and planning methods fail to address such ambiguities because they do not model human internal goals as additional partially observable factors in the environment. We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks. Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps. We implement a set of Transformer-based models and evaluate them over a challenging benchmark, HandMeThat. We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches. We also compare our implementation with strong baselines, including Chain of Thought prompting on the largest available pre-trained language models, and find that FISER provides better performance on the embodied social reasoning tasks under investigation, reaching the state-of-the-art on HandMeThat.
Published: 2024

252. Properties of the QCD Matter: A Review of Selected Results from the ALICE Experiment

Author: Shou, Qi-Ye, Ma, Yu-Gang, Zhang, Song, Zhu, Jian-Hui, Mao, Ya-Xian, Pei, Hua, Yin, Zhong-Bao, Zhang, Xiao-Ming, Zhou, Dai-Cui, Peng, Xin-Ye, Bai, Xiao-Zhi, Tang, Ze-Bo, Zhang, Yi-Fei, and Li, Xiao-Mei
Subjects: Nuclear Experiment, High Energy Physics - Experiment
Abstract: The Large Hadron Collider (LHC), the world's largest and most powerful particle accelerator, has been a pivotal tool in advancing our understanding of fundamental physics. By colliding heavy ions (such as lead ions), the LHC recreates conditions similar to those just after the Big Bang. This allows scientists to study the Quark-Gluon Plasma (QGP), a state of matter where quarks and gluons are not confined within protons and neutrons. These studies provide insights into the strong force and the early universe's behavior. In this paper, we provide a comprehensive overview of recent significant findings from A Large Ion Collider Experiment (ALICE) at LHC. The topics encompass measurements regarding to properties of the QGP, particle production, flow and correlations, dileptons, quarkonia and electromagnetic probes, heavy flavor, and jets. Additionally, we introduce future plans for detector upgrades of the ALICE experiment., Comment: 29 pages, 32 figures. This review is dedicated to Professor Wenqing Shen in honor of his leadership and significant impact on the Chinese heavy-ion physics community. All authors contributed equally to this work
Published: 2024

253. AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Author: Sun, Nan, Mao, Bo, Li, Yongchang, Ma, Lumeng, Guo, Di, and Liu, Huaping
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: The increasing demand for intelligent assistants in human-populated environments has motivated significant research in autonomous robotic systems. Traditional service robots and virtual assistants, however, struggle with real-world task execution due to their limited capacity for dynamic reasoning and interaction, particularly when human collaboration is required. Recent developments in Large Language Models have opened new avenues for improving these systems, enabling more sophisticated reasoning and natural interaction capabilities. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed to operate autonomously in a physical office environment. Unlike conventional service robots, AssistantX leverages a novel multi-agent architecture, PPDR4X, which provides advanced inference capabilities and comprehensive collaboration awareness. By effectively bridging the gap between virtual operations and physical interactions, AssistantX demonstrates robust performance in managing complex real-world scenarios. Our evaluation highlights the architecture's effectiveness, showing that AssistantX can respond to clear instructions, actively retrieve supplementary information from memory, and proactively seek collaboration from team members to ensure successful task completion. More details and videos can be found at https://assistantx-agent.github.io/AssistantX/., Comment: 6 pages, 8 figures, 4 tables
Published: 2024

254. Zak Phase Induced Topological Nonreciprocity

Author: Liu, Xiao, Wang, Jiefei, Mao, Ruosong, Hu, Huizhu, Zhu, Shi-Yao, Xu, Xingqi, Cai, Han, and Wang, Da-Wei
Subjects: Physics - Optics, Quantum Physics
Abstract: Topological physics provides novel insights for designing functional photonic devices, such as magnetic-free optical diodes, which are important in optical engineering and quantum information processing. Past efforts mostly focus on the topological edge modes in two-dimensional (2D) photonic Chern lattices, which, however, require delicate fabrication and temporal modulation. In particular, the 1D nonreciprocal edge mode needs to be embedded in a 2D lattice, contradicting with the compactness of integrated photonics. To address these challenges, we investigate the optical nonreciprocity of the 1D Su-Schrieffer-Heeger (SSH) superradiance lattices in room-temperature atoms. The probe fields propagating in two opposite directions perceive two different SSH topological phases, which have different absorption spectra due to the interplay between the Zak phase and the thermal motion of atoms, resulting in optical nonreciprocity. Our findings reveal the relationship between 1D topological matter and optical nonreciprocity, simplifying the design of topologically resilient nonreciprocal devices., Comment: 4 figures
Published: 2024

255. SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Author: Wang, Zerun, Xiang, Liuyu, Huang, Lang, Mao, Jiafeng, Xiao, Ling, and Yamasaki, Toshihiko
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization., Comment: ECCV 2024 accepted
Published: 2024

256. AgMTR: Agent Mining Transformer for Few-shot Segmentation in Remote Sensing

Author: Bi, Hanbo, Feng, Yingchao, Mao, Yongqiang, Pei, Jianning, Diao, Wenhui, Wang, Hongqi, and Sun, Xian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot Segmentation (FSS) aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer (AgMTR), which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder (ALE) is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder (AAD) and the Semantic Alignment Decoder (SAD) are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-5i and COCO-20i., Comment: accepted to IJCV
Published: 2024

257. LensWatch: II. Improved Photometry and Time Delay Constraints on the Strongly-Lensed Type Ia Supernova 2022qmx ('SN Zwicky') with HST Template Observations

Author: Larison, Conor, Pierel, Justin D. R., Newman, Max J. B., Jha, Saurabh W., Gilman, Daniel, Hayes, Erin E., Agrawal, Aadya, Arendse, Nikki, Birrer, Simon, Bronikowski, Mateusz, Della Costa, John M., Coulter, David A., Courbin, Frédéric, Chakrabarti, Sukanya, Diego, Jose M., Dhawan, Suhail, Goobar, Ariel, Gall, Christa, Hjorth, Jens, Huang, Xiaosheng, Mao, Shude, Marques-Chaves, Rui, Mazzali, Paolo A., More, Anupreeta, Moustakas, Leonidas A., Pérez-Fournon, Ismael, Petrushevska, Tanja, Poidevin, Frédérick, Rest, Armin, Shajib, Anowar J., Shirley, Raphael, Sheu, William, Strolger, Louis-Gregory, Suyu, Sherry H., Treu, Tommaso, and Zenati, Yossef
Subjects: Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Cosmology and Nongalactic Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: Strongly lensed supernovae (SNe) are a rare class of transient that can offer tight cosmological constraints that are complementary to methods from other astronomical events. We present a follow-up study of one recently-discovered strongly lensed SN, the quadruply-imaged Type Ia SN 2022qmx (aka, "SN Zwicky") at z = 0.3544. We measure updated, template-subtracted photometry for SN Zwicky and derive improved time delays and magnifications. This is possible because SNe are transient, fading away after reaching their peak brightness. Specifically, we measure point spread function (PSF) photometry for all four images of SN Zwicky in three Hubble Space Telescope WFC3/UVIS passbands (F475W, F625W, F814W) and one WFC3/IR passband (F160W), with template images taken $\sim 11$ months after the epoch in which the SN images appear. We find consistency to within $2\sigma$ between lens model predicted time delays ($\lesssim1$ day), and measured time delays with HST colors ($\lesssim2$ days), including the uncertainty from chromatic microlensing that may arise from stars in the lensing galaxy. The standardizable nature of SNe Ia allows us to estimate absolute magnifications for the four images, with images A and C being elevated in magnification compared to lens model predictions by about $6\sigma$ and $3\sigma$ respectively, confirming previous work. We show that millilensing or differential dust extinction is unable to explain these discrepancies and find evidence for the existence of microlensing in images A, C, and potentially D, that may contribute to the anomalous magnification., Comment: Submitted to ApJ
Published: 2024

258. VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Author: Liu, Yifei, Wen, Jicheng, Wang, Yang, Ye, Shengyu, Zhang, Li Lyna, Cao, Ting, Li, Cheng, and Yang, Mao
Subjects: Computer Science - Artificial Intelligence
Abstract: Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits). It reduces memory requirements, optimizes storage costs, and decreases memory bandwidth needs during inference. However, due to numerical representation limitations, traditional scalar-based weight quantization struggles to achieve such extreme low-bit. Recent research on Vector Quantization (VQ) for LLMs has demonstrated the potential for extremely low-bit model quantization by compressing vectors into indices using lookup tables. In this paper, we introduce Vector Post-Training Quantization (VPTQ) for extremely low-bit quantization of LLMs. We use Second-Order Optimization to formulate the LLM VQ problem and guide our quantization algorithm design by solving the optimization. We further refine the weights using Channel-Independent Second-Order Optimization for a granular VQ. In addition, by decomposing the optimization problem, we propose a brief and effective codebook initialization algorithm. We also extend VPTQ to support residual and outlier quantization, which enhances model accuracy and further compresses the model. Our experimental results show that VPTQ reduces model quantization perplexity by $0.01$-$0.34$ on LLaMA-2, $0.38$-$0.68$ on Mistral-7B, $4.41$-$7.34$ on LLaMA-3 over SOTA at 2-bit, with an average accuracy improvement of $0.79$-$1.5\%$ on LLaMA-2, $1\%$ on Mistral-7B, $11$-$22\%$ on LLaMA-3 on QA tasks on average. We only utilize $10.4$-$18.6\%$ of the quantization algorithm execution time, resulting in a $1.6$-$1.8\times$ increase in inference throughput compared to SOTA., Comment: EMNLP 2024, Main, Poster
Published: 2024

259. Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom

Author: Das, Adrito, Sidiqi, Bilal, Mennillo, Laurent, Mao, Zhehua, Brudfors, Mikael, Xochicale, Miguel, Khan, Danyal Z., Newall, Nicola, Hanrahan, John G., Clarkson, Matthew J., Stoyanov, Danail, Marcus, Hani J., and Bano, Sophia
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Improved surgical skill is generally associated with improved patient outcomes, although assessment is subjective; labour-intensive; and requires domain specific expertise. Automated data driven metrics can alleviate these difficulties, as demonstrated by existing machine learning instrument tracking models in minimally invasive surgery. However, these models have been tested on limited datasets of laparoscopic surgery, with a focus on isolated tasks and robotic surgery. In this paper, a new public dataset is introduced, focusing on simulated surgery, using the nasal phase of endoscopic pituitary surgery as an exemplar. Simulated surgery allows for a realistic yet repeatable environment, meaning the insights gained from automated assessment can be used by novice surgeons to hone their skills on the simulator before moving to real surgery. PRINTNet (Pituitary Real-time INstrument Tracking Network) has been created as a baseline model for this automated assessment. Consisting of DeepLabV3 for classification and segmentation; StrongSORT for tracking; and the NVIDIA Holoscan SDK for real-time performance, PRINTNet achieved 71.9% Multiple Object Tracking Precision running at 22 Frames Per Second. Using this tracking output, a Multilayer Perceptron achieved 87% accuracy in predicting surgical skill level (novice or expert), with the "ratio of total procedure time to instrument visible time" correlated with higher surgical skill. This therefore demonstrates the feasibility of automated surgical skill assessment in simulated endoscopic pituitary surgery. The new publicly available dataset can be found here: https://doi.org/10.5522/04/26511049., Comment: 7 pages, 6 figures
Published: 2024

260. Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

Author: Liu, Xiaohong, Yang, Guoxing, Luo, Yulin, Mao, Jiaji, Zhang, Xiang, Gao, Ming, Zhang, Shanghang, Shen, Jun, and Wang, Guangyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Radiology is a vital and complex component of modern clinical workflow and covers many tasks. Recently, vision-language (VL) foundation models in medicine have shown potential in processing multimodal information, offering a unified solution for various radiology tasks. However, existing studies either pre-trained VL models on natural data or did not fully integrate vision-language architecture and pretraining, often neglecting the unique multimodal complexity in radiology images and their textual contexts. Additionally, their practical applicability in real-world scenarios remains underexplored. Here, we present RadFound, a large and open-source vision-language foundation model tailored for radiology, that is trained on the most extensive dataset of over 8.1 million images and 250,000 image-text pairs, covering 19 major organ systems and 10 imaging modalities. To establish expert-level multimodal perception and generation capabilities, RadFound introduces an enhanced vision encoder to capture intra-image local features and inter-image contextual information, and a unified cross-modal learning design tailored to radiology. To fully assess the models' capability, we construct a benchmark, RadVLBench, including radiology interpretation tasks like medical vision-language question-answering, as well as text generation tasks ranging from captioning to report generation. We also propose a human evaluation framework. When evaluated on the real-world benchmark involving three representative modalities, 2D images (chest X-rays), multi-view images (mammograms), and 3D images (thyroid CT scans), RadFound significantly outperforms other VL foundation models on both quantitative metrics and human evaluation. In summary, the development of RadFound represents an advancement in radiology generalists, demonstrating broad applicability potential for integration into clinical workflows.
Published: 2024

261. Eagle: Efficient Training-Free Router for Multi-LLM Inference

Author: Zhao, Zesen, Jin, Shuowei, and Mao, Z. Morley
Subjects: Computer Science - Machine Learning
Abstract: The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection in AI systems. LLM routers address this need by dynamically choosing the most suitable model for a given query based on task requirements and budget constraints. However, existing routers face challenges in scalability and real-time adaptation, particularly in high-volume online environments. We present Eagle, a novel LLM routing approach that combines global and local ELO ranking modules to overcome these limitations. By evaluating both general and specialized LLM abilities, Eagle provides a scalable, training-free solution that enhances model selection quality while reducing computational overhead. Our experiments across multiple datasets show Eagle consistently outperforms baseline methods, with improvements of up to 23.52 percent in Area Under Curve (AUC) scores. Moreover, Eagle demonstrates remarkable efficiency, requiring only 1/20 of baseline methods' time for initialization and 100 to 200 times faster incremental updates in online scenarios, making it well-suited for dynamic, high-volume online serving environments.
Published: 2024

262. VLMine: Long-Tail Data Mining with Vision Language Models

Author: Ye, Mao, Meyer, Gregory P., Zhang, Zaiwei, Park, Dennis, Mustikovela, Siva Karthik, Chai, Yuning, and Wolff, Eric M
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class variation is the primary source of data diversity, and on 3D object detection, where intra-class variation is the main concern. Furthermore, through the detection task, we demonstrate that the knowledge extracted from 2D images is transferable to the 3D domain. Our experiments consistently show large improvements (between 10\% and 50\%) over the baseline techniques on several representative benchmarks: ImageNet-LT, Places-LT, and the Waymo Open Dataset.
Published: 2024

263. S$^2$AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance

Author: Li, Yuanhang, Mao, Qi, Chen, Lan, Fang, Zhen, Tian, Lei, Xiao, Xinyan, Jin, Libiao, and Wu, Hua
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding motions. To address this challenge, we propose \textbf{S$^2$AG-Vid}, a training-free inference-stage optimization method that improves the alignment of multiple objects with their corresponding motions in T2V models. S$^2$AG-Vid initially applies a spatial position-based, cross-attention (CA) constraint in the early stages of the denoising process, facilitating multiple nouns distinctly attending to the correct subject regions. To enhance the motion-subject binding, we implement a syntax-guided contrastive constraint in the subsequent denoising phase, aimed at improving the correlations between the CA maps of verbs and their corresponding nouns.Both qualitative and quantitative evaluations demonstrate that the proposed framework significantly outperforms baseline approaches, producing higher-quality videos with improved subject-motion consistency.
Published: 2024

264. M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Author: Wang, Taowen, Liu, Yiyang, Liang, James Chenhao, zhao, junhan, Cui, Yiming, Mao, Yuning, Nie, Shaoliang, Liu, Jiahao, Feng, Fuli, Xu, Zenglin, Han, Cheng, Huang, Lifu, Wang, Qifan, and Liu, Dongfang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the scale of MLLMs continues to grow, parameter-efficient finetuning becomes increasingly critical. However, most existing parameter-efficient approaches focus only on single modalities and often overlook the multimodal characteristics during finetuning. In this work, we introduce a novel Multimodal Prompt Tuning (M$^2$PT) approach for efficient instruction tuning of MLLMs. M$^2$PT effectively integrates visual and textual prompts into the vision encoder and language processor respectively during finetuning, facilitating the extraction and alignment of features across modalities. Empirical results on various multimodal evaluation datasets demonstrate the superior performance of our approach compared to several state-of-the-art baselines. A comprehensive set of ablation studies validates the effectiveness of our prompt design and the efficiency of our approach., Comment: EMNLP 2024
Published: 2024

265. Linear Perturbations and Stability Analysis in $f(T)$ Braneworld Scenario

Author: Zhao, Ju-Ying, Liu, Mao-Jiang, and Yang, Ke
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology
Abstract: We conduct a detailed analysis of the full linear perturbations in the braneworld scenario within $f(T)$ gravity. By decomposing the perturbations of the theory into the scalar, transverse vector, antisymmetric pseudotensor, and symmetric transverse-traceless tensor modes, we derive the quadratic action for each mode. The results indicate that there is a total of one scalar, one massless vector, and one tensor propagating degrees of freedom. Consequently, in comparison to general relativity, no additional degrees of freedom appear under the flat braneworld background in the linearized theory. For a thick brane model with $f(T)=T+ \alpha T^2$, we find that it exhibits stability against linear perturbations., Comment: 14 pages and 2 figures, published version
Published: 2024
Full Text: View/download PDF

266. Vision-Language Models Assisted Unsupervised Video Anomaly Detection

Author: Jiang, Yalong and Mao, Liquan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video anomaly detection is a subject of great interest across industrial and academic domains due to its crucial role in computer vision applications. However, the inherent unpredictability of anomalies and the scarcity of anomaly samples present significant challenges for unsupervised learning methods. To overcome the limitations of unsupervised learning, which stem from a lack of comprehensive prior knowledge about anomalies, we propose VLAVAD (Video-Language Models Assisted Anomaly Detection). Our method employs a cross-modal pre-trained model that leverages the inferential capabilities of large language models (LLMs) in conjunction with a Selective-Prompt Adapter (SPA) for selecting semantic space. Additionally, we introduce a Sequence State Space Module (S3M) that detects temporal inconsistencies in semantic features. By mapping high-dimensional visual features to low-dimensional semantic ones, our method significantly enhance the interpretability of unsupervised anomaly detection. Our proposed approach effectively tackles the challenge of detecting elusive anomalies that are hard to discern over periods, achieving SOTA on the challenging ShanghaiTech dataset.
Published: 2024

267. MIMO Precoding Exploiting Extra Degrees of Freedom (DoF) in the Wavenumber Domain

Author: Chen, Yuanbin, Guo, Xufeng, Mao, Tianqi, Wu, Qingqing, Wang, Zhaocheng, and Yuen, Chau
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Information Theory
Abstract: In this paper, we propose an emerging wavenumber-domain precoding scheme to break the limitations of rank-1 channels that merely supports single-stream transmission, enabling simultaneous transmission of multiple data streams. The proposed wavenumber-domain precoding scheme also breaks the Rayleigh distance demarcation, regardless of the far-field and near-field contexts. Specifically, by characterizing the channel response as the superposition of a series of Fourier harmonics specified by different wavenumbers, the degree of freedom (DoF) is dependent on the cardinality of the wavenumber support, based on which the extra DoF is presented. This representation is applicable for both far-field and near-field. Different wavenumber atoms, determined within this support, constitute the codebook for MIMO precoding, in which each atom allows for the transmission of a data stream. Then, to maximize the capacity, it is required to select the wavenumbers associated with the optimal transmission direction, and optimize its power allocation. Finally, our simulation results demonstrate the significant superiority in comparison to the conventional spatial division schemes, with the potential of approaching the theoretical performance upper bound achieved by singular value decomposition (SVD)., Comment: This paper has been accepted in 2024 IEEE Globecom Workshop
Published: 2024

268. Absence of altermagnetic spin splitting character in rutile oxide RuO$_2$

Author: Liu, Jiayu, Zhan, Jie, Li, Tongrui, Liu, Jishan, Cheng, Shufan, Shi, Yuming, Deng, Liwei, Zhang, Meng, Li, Chihao, Ding, Jianyang, Jiang, Qi, Ye, Mao, Liu, Zhengtai, Jiang, Zhicheng, Wang, Siyu, Li, Qian, Xie, Yanwu, Wang, Yilin, Qiao, Shan, Wen, Jinsheng, Sun, Yan, and Shen, Dawei
Subjects: Condensed Matter - Materials Science
Abstract: Rutile RuO$_2$ has been posited as a potential $d$-wave altermagnetism candidate, with a predicted significant spin splitting up to 1.4 eV. Despite accumulating theoretical predictions and transport measurements, direct spectroscopic observation of spin splitting has remained elusive. Here, we employ spin- and angle-resolved photoemission spectroscopy to investigate the band structures and spin polarization of thin-film and single-crystal RuO$_2$. Contrary to expectations of altermagnetism, our analysis indicates that RuO$_2$'s electronic structure aligns with those predicted under non-magnetic conditions, exhibiting no evidence of the hypothesized spin splitting. Additionally, we observe significant in-plane spin polarization of the low-lying bulk bands, which is antisymmetric about the high-symmetry plane and contrary to the $d$-wave spin texture due to time-reversal symmetry breaking in altermagnetism. These findings definitively challenge the altermagnetic order previously proposed for rutile RuO$_2$, prompting a reevaluation of its magnetic properties., Comment: 7 pages, 4 figures. Published in Physical Review Letters
Published: 2024
Full Text: View/download PDF

269. RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning

Author: Diao, Wenhui, Yu, Haichen, Kang, Kaiyue, Ling, Tong, Liu, Di, Feng, Yingchao, Bi, Hanbo, Ren, Libo, Li, Xuexue, Mao, Yongqiang, and Sun, Xian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Aerial Remote Sensing (ARS) vision tasks pose significant challenges due to the unique characteristics of their viewing angles. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes the RingMo-Aerial model, aiming to fill the gap in foundation model research in the field of ARS vision. By introducing the Frequency-Enhanced Multi-Head Self-Attention (FE-MSA) mechanism and an affine transformation-based contrastive learning pre-training method, the model's detection capability for small targets is enhanced and optimized for the tilted viewing angles characteristic of ARS. Furthermore, the ARS-Adapter, an efficient parameter fine-tuning method, is proposed to improve the model's adaptability and effectiveness in various ARS vision tasks. Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks. This indicates the practicality and effectiveness of RingMo-Aerial in enhancing the performance of ARS vision tasks.
Published: 2024

270. How flagellated bacteria wobble

Author: Hu, Jinglei, Gui, Chen, Mao, Mingxin, Feng, Pu, Liu, Yurui, Gong, Xiangjun, and Gompper, Gerhard
Subjects: Condensed Matter - Soft Condensed Matter, Physics - Biological Physics
Abstract: A flagellated bacterium navigates fluid environments by rotating its helical flagellar bundle. The wobbling of the bacterial body significantly influences its swimming behavior. To quantify the three underlying motions--precession, nutation, and spin, we extract the Euler angles from trajectories generated by mesoscale hydrodynamics simulations, which is experimentally unattainable. In contrast to the common assumption, the cell body does not undergo complete cycles of spin, a general result for multiflagellated bacteria. Our simulations produce apparent wobbling periods that closely match the results of {\it E. coli} obtained from experiments and reveal the presence of two kinds of precession modes, consistent with theoretical analysis. Small-amplitude yet periodic nutation is also observed in the simulations., Comment: 3 figures
Published: 2024

271. KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning

Author: Liu, Junnan, Mao, Qianren, Jiang, Weifeng, and Li, Jianxin
Subjects: Computer Science - Artificial Intelligence
Abstract: Knowledge graph reasoning plays a vital role in various applications and has garnered considerable attention. Recently, path-based methods have achieved impressive performance. However, they may face limitations stemming from constraints in message-passing neural networks, such as missing paths and information over-squashing. In this paper, we revisit the application of transformers for knowledge graph reasoning to address the constraints faced by path-based methods and propose a novel method KnowFormer.KnowFormer utilizes a transformer architecture to perform reasoning on knowledge graphs from the message-passing perspective, rather than reasoning by textual information like previous pretrained language model based methods. Specifically, we define the attention computation based on the query prototype of knowledge graph reasoning, facilitating convenient construction and efficient optimization. To incorporate structural information into the self-attention mechanism, we introduce structure-aware modules to calculate query, key, and value respectively. Additionally, we present an efficient attention computation method for better scalability. Experimental results demonstrate the superior performance of KnowFormer compared to prominent baseline methods on both transductive and inductive benchmarks., Comment: Accepted by ICML2024
Published: 2024

272. Cram\'er-Rao Bound Based Waveform Optimization for MIMO Radar: An Efficient Linear-Proximal Method

Author: Zhou, Xiaohua, Du, Xu, and Mao, Yijie
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: This paper focuses on radar waveform optimization for minimizing the Cram\'er-Rao bound (CRB) in a multiple-input multiple-output (MIMO) radar system. In contrast to conventional approaches relying on semi-definite programming (SDP) and optimization toolboxes like CVX, we introduce a pioneering and efficient waveform optimization approach in this paper. Our proposed algorithm first applies sequential linear approximation to transform the original CRB-based problem with the transmit power constraint into a sequence of convex subproblems. By introducing a proximal term and further leveraging the Karush-Kuhn-Tucker (KKT) conditions, we derive the optimal closed-form solution for each subproblem. The convergence of the proposed algorithm is then proved rigorously. Numerical results demonstrate that the proposed approach significantly reduces computational complexity -- at least two orders of magnitude lower than the baseline algorithms while maintaining the same radar sensing accuracy.
Published: 2024

273. Precise structure and polarization determination of Hf0.5Zr0.5O2 with electron ptychography

Author: Gao1, Xiaoyue, Liu, Zhuohui, Han, Bo, Zhang, Xiaowen, Mao, Ruilin, Shi, Ruochen, Zhu, Ruixue, Lu, Jiangbo, Wang, Tao, Ge, Chen, and Gao, Peng
Subjects: Condensed Matter - Materials Science, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Hf0.5Zr0.5O2 (HZO) is a promising candidate for next generation ferroelectric memories and transistors. However, its ferroelectricity origin is still under debate due to the complex of its phase and microstructure in practical samples. In this study, we investigate the atomic structure of substrate-free HZO freestanding film with multislice electron ptychography, for which the ultra-high space resolution (up to ~25 pm) and capability to simultaneously image the cation and oxygen allow us to precisely determine the intrinsic atomic structures of different phases and reveal subtle changes among them. We clarify that the orthorhombic phase is ferroelectric with spontaneous polarization ~34{\pm}4 {\mu}C/cm2 (corresponding to 56{\pm}6 pm in displacement) that is accurately measured through statistical analysis. Significant polarization suppression is observed near the grain boundary, while no distinguishable structural changes are detected near the 180{\deg} ferroelectric domain walls. Through the direct oxygen imaging of orthorhombic phase from the [111] zone axis, we quantify a substantial number of oxygen vacancies with a preferential distribution, which influences the polarization direction and strength. These findings provide fundamentals for HZO research, and thus lay a foundation for the design of high-performance ferroelectric devices.
Published: 2024

274. Narrowing band gap chemically and physically: Conductive dense hydrocarbon

Author: Nakagawa, Takeshi, Zhang, Caoshun, Bu, Kejun, Dalladay-Simpson, Philip, Vrankić, Martina, Bolton, Sarah, Laniel, Dominique, Wang, Dong, Liang, Akun, Ishii, Hirofumi, Hiraoka, Nozomu, Garbarino, Gaston, Rosa, Angelika D., Hu, Qingyang, Lü, Xujie, Mao, Ho-kwang, and Ding, Yang
Subjects: Condensed Matter - Materials Science, Physics - Chemical Physics
Abstract: Band gap energy of an organic molecule can be reduced by intermolecular interaction enhancement, and thus, certain polycyclic aromatic hydrocarbons (PAHs), which are insulators with wide band gaps, are expected to undergo insulator-metal transitions by simple compression. Such a pressure-induced electronic transition can be exploited to transform non-metallic organic materials into states featuring intriguing electronic characteristics such as high-temperature superconductivity. Numerous attempts have been made to metalize various small PAHs, but so far only pressure-induced amorphization well below the megabar region was observed. The wide band gap energy of the small PAHs and low chemical stability under simple compression are the bottlenecks. We have investigated the band gap energy evolution and the crystal structural compression of the large PAH molecules, where the band gap energy is significantly reduced by increasing the number of {\pi}-electrons and improved chemical stability with fully benzenoid molecular structure. Herein, we present a pressure-induced transition in dicoronylene, C48H20, an insulator at ambient conditions that transforms into a semi-metallic state above 23.0 GPa with a three-order-of-magnitude reduction in resistivity. In-situ UV-visible absorption, transport property measurement, Raman spectroscopy, X-ray diffraction and density functional theory calculations were performed to provide tentative explanations to the alterations in its electronic structure at high pressure. The discovery of an electronic transition at pressures well below the megabar is a promising step towards realization of a single component purely hydrocarbon molecular metal in the near future., Comment: 22 pages, 5 figures
Published: 2024

275. SAGAbg II: the Low-Mass Star-Forming Sequence Evolves Significantly Between 0.05<z<0.21

Author: Kado-Fong, Erin, Geha, Marla, Mao, Yao-Yuan, Reyes, Mithi A. C. de los, Wechsler, Risa H., Weiner, Benjamin, Asali, Yasmeen, Kallivayalil, Nitya, Nadler, Ethan O., Tollerud, Erik J., and Wang, Yunchong
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: The redshift-dependent relation between galaxy stellar mass and star formation rate (the Star-Forming Sequence, or SFS) is a key observational yardstick for galaxy assembly. We use the SAGAbg-A sample of background galaxies from the Satellites Around Galactic Analogs (SAGA) Survey to model the low-redshift evolution of the low-mass SFS. The sample is comprised of 23258 galaxies with H$\alpha$-based star formation rates (SFRs) spanning $6<\log_{10}(\rm M_\star/[M_\odot])<10$ and $z<0.21$ ($t<2.5$ Gyr). Although it is common to bin or stack galaxies at $z \lesssim 0.2$ for galaxy population studies, the difference in lookback time between $z=0$ and $z=0.21$ is comparable to the time between $z=1$ to $z=2$. We develop a model to account for both the physical evolution of low-mass SFS and the selection function of the SAGA survey, allowing us to disentangle redshift evolution from redshift-dependent selection effects across the SAGAbg-A redshift range. Our findings indicate significant evolution in the SFS over the last 2.5 Gyr, with a rising normalization: $\langle {\rm SFR}({\rm M_\star=10^{8.5} M_\odot)}\rangle(z)=1.24^{+0.25}_{-0.23}\ {\rm z} -1.47^{+0.03}_{-0.03}$. We also identify the redshift limit at which a static SFS is ruled out at the 95% confidence level, which is $z=0.05$ based on the precision of the SAGAbg-A sample. Comparison with cosmological hydrodynamic simulations reveals that some contemporary simulations under-predict the recent evolution of the low-mass SFS. This demonstrates that the recent evolution of the low-mass SFS can provide new constraints on the assembly of the low-mass Universe and highlights the need for improved models in this regime., Comment: 20 pages, 7 figures. Submitted to ApJ
Published: 2024

276. SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation

Author: Sun, Mingze, Guo, Chen, Jiang, Puhua, Mao, Shiwei, Chen, Yurun, and Huang, Ruqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose SRIF, a novel Semantic shape Registration framework based on diffusion-based Image morphing and Flow estimation. More concretely, given a pair of extrinsically aligned shapes, we first render them from multi-views, and then utilize an image interpolation framework based on diffusion models to generate sequences of intermediate images between them. The images are later fed into a dynamic 3D Gaussian splatting framework, with which we reconstruct and post-process for intermediate point clouds respecting the image morphing processing. In the end, tailored for the above, we propose a novel registration module to estimate continuous normalizing flow, which deforms source shape consistently towards the target, with intermediate point clouds as weak guidance. Our key insight is to leverage large vision models (LVMs) to associate shapes and therefore obtain much richer semantic information on the relationship between shapes than the ad-hoc feature extraction and alignment. As a consequence, SRIF achieves high-quality dense correspondences on challenging shape pairs, but also delivers smooth, semantically meaningful interpolation in between. Empirical evidence justifies the effectiveness and superiority of our method as well as specific design choices. The code is released at https://github.com/rqhuang88/SRIF., Comment: Accepted as a conference paper of SIGGRAPH Asia 2024
Published: 2024

277. Reionization relics in the cross-correlation between the Ly$\alpha$ forest and 21 cm intensity mapping in the post-reionization era

Author: Montero-Camacho, Paulo, Morales-Gutiérrez, Catalina, Zhang, Yao, Long, Heyang, and Mao, Yi
Subjects: Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: The tumultuous effects of ultraviolet photons that source cosmic reionization, the subsequent compression and shock-heating of low-density regions, and the modulation of baryons in shallow potential wells induced by the passage of ionization fronts, collectively introduce perturbations to the evolution of the intergalactic medium in the post-reionization era. These enduring fluctuations persist deep into the post-reionization era, casting a challenge upon precision cosmology endeavors targeting tracers in this cosmic era. Simultaneously, these relics from reionization also present a unique opportunity to glean insights into the astrophysics that govern the epoch of reionization. In this work, we propose a first study of the cross-correlation of \lya forest and 21 cm intensity mapping, accounting for the repercussions of inhomogeneous reionization in the post-reionization era. We investigate the ability of SKA $\times$ DESI-like, SKA $\times$ MUST-like, and PUMA $\times$ MUST-like instrumental setups to achieve a high signal-to-noise ratio (SNR) in the redshift range $3.5 \leq z \leq 4$. Moreover, we assess how alterations in integration time, survey area, and reionization scenarios impact the SNR. Furthermore, we forecast the cross-correlation's potential to constrain cosmological parameters under varying assumptions: considering or disregarding reionization relics, marginalizing over reionization astrophysics, and assuming perfect knowledge of reionization. Notably, our findings underscore the remarkable capability of a futuristic PUMA $\times$ MUST-like setup, with a modest 100-hour integration time over a 100 sq. deg. survey, to constrain the ionization efficiency error to $\sigma_\zeta = 3.42 $., Comment: Comments welcome! (16 pages, 10 figures)
Published: 2024

278. AutoSpec: Automated Generation of Neural Network Specifications

Author: Jin, Shuowei, Yan, Francis Y., Tan, Cheng, Kalia, Anuj, Foukas, Xenofon, and Mao, Z. Morley
Subjects: Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, particularly in safety-critical domains. Despite progress in the formal verification of neural networks, current practices require users to manually define model specifications -- properties that dictate expected model behavior in various scenarios. This manual process, however, is prone to human error, limited in scope, and time-consuming. In this paper, we introduce AutoSpec, the first framework to automatically generate comprehensive and accurate specifications for neural networks in learning-augmented systems. We also propose the first set of metrics for assessing the accuracy and coverage of model specifications, establishing a benchmark for future comparisons. Our evaluation across four distinct applications shows that AutoSpec outperforms human-defined specifications as well as two baseline approaches introduced in this study.
Published: 2024

279. Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation

Author: Bi, Hanbo, Feng, Yingchao, Diao, Wenhui, Wang, Peijin, Mao, Yongqiang, Fu, Kun, Wang, Hongqi, and Sun, Xian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: For more efficient generalization to unseen domains (classes), most Few-shot Segmentation (FSS) would directly exploit pre-trained encoders and only fine-tune the decoder, especially in the current era of large models. However, such fixed feature encoders tend to be class-agnostic, inevitably activating objects that are irrelevant to the target class. In contrast, humans can effortlessly focus on specific objects in the line of sight. This paper mimics the visual perception pattern of human beings and proposes a novel and powerful prompt-driven scheme, called ``Prompt and Transfer" (PAT), which constructs a dynamic class-aware prompting paradigm to tune the encoder for focusing on the interested object (target class) in the current task. Three key points are elaborated to enhance the prompting: 1) Cross-modal linguistic information is introduced to initialize prompts for each task. 2) Semantic Prompt Transfer (SPT) that precisely transfers the class-specific semantics within the images to prompts. 3) Part Mask Generator (PMG) that works in conjunction with SPT to adaptively generate different but complementary part prompts for different individuals. Surprisingly, PAT achieves competitive performance on 4 different tasks including standard FSS, Cross-domain FSS (e.g., CV, medical, and remote sensing domains), Weak-label FSS, and Zero-shot Segmentation, setting new state-of-the-arts on 11 benchmarks.
Published: 2024

280. Tracking the variation of entanglement R\'enyi negativity: an efficient quantum Monte Carlo method

Author: Ding, Yi-Ming, Tang, Yin, Wang, Zhe, Wang, Zhiyan, Mao, Bin-Bin, and Yan, Zheng
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Statistical Mechanics, Quantum Physics
Abstract: Although the entanglement entropy probing novel phases and phase transitions numerically via quantum Monte Carlo (QMC) has achieved huge success in pure ground states of quantum many-body systems, numerical explorations on mixed states remain limited, despite the fact that most real-world systems are non-isolated. Meanwhile, entanglement negativity, as a rarely computable entanglement monotone for mixed states, is significant in characterizing mixed-state entanglement, such as in systems with two disconnected regions, dissipation or at finite temperature. However, efficient numerical approaches are scarce to calculate this quantity in large-scale and high-dimensional systems, especially when we need to access how it varies with certain parameters to study critical behaviors. Within the reweight-annealing frame, we present an accessible and efficient QMC algorithm, which is able to achieve the values as well as tracking the variation of the R\'enyi version of entanglement negativity on some specified parameter path. Our algorithm makes it feasible to directly study the role that entanglement plays at the critical point and in different phases for mixed states in high dimensions numerically. In addition, this method is accessible and easy to parallelize on computers. Through this method, different intrinsic mechanisms in quantum and thermal criticalities with the same universal class have been revealed clearly through the numerical calculations on R\'enyi negativity., Comment: 10 pages, 4 figures
Published: 2024

281. Joint Beamforming and Illumination Pattern Design for Beam-Hopping LEO Satellite Communications

Author: Wang, Jing, Qi, Chenhao, Yu, Shui, and Mao, Shiwen
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: Since hybrid beamforming (HBF) can approach the performance of fully-digital beamforming (FDBF) with much lower hardware complexity, we investigate the HBF design for beam-hopping (BH) low earth orbit (LEO) satellite communications (SatComs). Aiming at maximizing the sum-rate of totally illuminated beam positions during the whole BH period, we consider joint beamforming and illumination pattern design subject to the HBF constraints and sum-rate requirements. To address the non-convexity of the HBF constraints, we temporarily replace the HBF constraints with the FDBF constraints. Then we propose an FDBF and illumination pattern random search (FDBF-IPRS) scheme to optimize illumination patterns and fully-digital beamformers using constrained random search and fractional programming methods. To further reduce the computational complexity, we propose an FDBF and illumination pattern alternating optimization (FDBF-IPAO) scheme, where we relax the integer illumination pattern to continuous variables and after finishing all the iterations we quantize the continuous variables into integer ones. Based on the fully-digital beamformers designed by the FDBF-IPRS or FDBF-IPAO scheme, we propose an HBF alternating minimization algorithm to design the hybrid beamformers. Simulation results show that the proposed schemes can achieve satisfactory sum-rate performance for BH LEO SatComs.
Published: 2024

282. ANNZ+: an enhanced photometric redshift estimation algorithm with applications on the PAU Survey

Author: Pathi, Imdad Mahmud, Soo, John Y. H., Wee, Mao Jie, Zakaria, Sazatul Nadhilah, Ismail, Nur Azwin, Baugh, Carlton M., Manzoni, Giorgio, Gaztanaga, Enrique, Castander, Francisco J., Eriksen, Martin, Carretero, Jorge, Fernandez, Enrique, Garcia-Bellido, Juan, Miquel, Ramon, Padilla, Cristobal, Renard, Pablo, Sanchez, Eusebio, Sevilla-Noarbe, Ignacio, and Tallada-Crespí, Pau
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: ANNZ is a fast and simple algorithm which utilises artificial neural networks (ANNs), it was known as one of the pioneers of machine learning approaches to photometric redshift estimation decades ago. We enhanced the algorithm by introducing new activation functions like tanh, softplus, SiLU, Mish and ReLU variants; its new performance is then vigorously tested on legacy samples like the Luminous Red Galaxy (LRG) and Stripe-82 samples from SDSS, as well as modern galaxy samples like the Physics of the Accelerating Universe Survey (PAUS). This work focuses on testing the robustness of activation functions with respect to the choice of ANN architectures, particularly on its depth and width, in the context of galaxy photometric redshift estimation. Our upgraded algorithm, which we named ANNZ+, shows that the tanh and Leaky ReLU activation functions provide more consistent and stable results across deeper and wider architectures with > 1 per cent improvement in root-mean-square error ($\sigma_{\textrm{RMS}}$) and 68th percentile error ($\sigma_{68}$) when tested on SDSS data sets. While assessing its capabilities in handling high dimensional inputs, we achieved an improvement of 11 per cent in $\sigma_{\textrm{RMS}}$ and 6 per cent in $\sigma_{68}$ with the tanh activation function when tested on the 40-narrowband PAUS dataset; it even outperformed ANNZ2, its supposed successor, by 44 per cent in $\sigma_{\textrm{RMS}}$. This justifies the effort to upgrade the 20-year-old ANNZ, allowing it to remain viable and competitive within the photo-z community today. The updated algorithm ANNZ+ is publicly available at https://github.com/imdadmpt/ANNzPlus., Comment: 38 pages, 9 figures, revised version submitted to JCAP
Published: 2024

283. Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Author: Liu, Pei, Ji, Luping, Gou, Jiaxiang, Fu, Bo, and Ye, Mao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learning paradigm suffers from important performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes prognostic language prior and then employs it as auxiliary signals to guide the aggregating of prognostic visual features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs. Our source code is available at https://github.com/liupei101/VLSA., Comment: 24 pages, 11 tables, 6 figures
Published: 2024

284. VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Author: Chen, Hao, Wu, Jiafu, Jin, Ying, Peng, Jinlong, Mao, Xiaofeng, Chi, Mingmin, Yao, Mufeng, Peng, Bo, Li, Jian, and Cao, Yun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, methods like Zero-1-2-3 have focused on single-view based 3D reconstruction and have achieved remarkable success. However, their predictions for unseen areas heavily rely on the inductive bias of large-scale pretrained diffusion models. Although subsequent work, such as DreamComposer, attempts to make predictions more controllable by incorporating additional views, the results remain unrealistic due to feature entanglement in the vanilla latent space, including factors such as lighting, material, and structure. To address these issues, we introduce the Visual Isotropy 3D Reconstruction Model (VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates within an ID consistent and perspective-disentangled 3D latent space. By facilitating the disentanglement of semantic information, color, material properties and lighting, VI3DRM is capable of generating highly realistic images that are indistinguishable from real photographs. By leveraging both real and synthesized images, our approach enables the accurate construction of pointmaps, ultimately producing finely textured meshes or point clouds. On the NVS task, tested on the GSO dataset, VI3DRM significantly outperforms state-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of 0.929, and an LPIPS of 0.027. Code will be made available upon publication.
Published: 2024

285. What Makes a Maze Look Like a Maze?

Author: Hsu, Joy, Mao, Jiayuan, Tenenbaum, Joshua B., Goodman, Noah D., and Wu, Jiajun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as tree branches), they still struggle to make sense of such visual abstractions (e.g., how an arrangement of tree branches may form the walls of a maze). To address this challenge, we introduce Deep Schema Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning. At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols. DSG uses large language models to extract schemas, then hierarchically grounds concrete to abstract components of the schema onto images with vision-language models. The grounded schema is used to augment visual abstraction understanding. We systematically evaluate DSG and different methods in reasoning on our new Visual Abstractions Dataset, which consists of diverse, real-world images of abstract concepts and corresponding question-answer pairs labeled by humans. We show that DSG significantly improves the abstract visual reasoning performance of vision-language models, and is a step toward human-aligned understanding of visual abstractions.
Published: 2024

286. Mapping the nanoscale optical topological textures with a fiber-integrated plasmonic probe

Author: Wu, Yunkun, Wang, Shu, Lei, Xinrui, Mao, Jiahui, Lu, Liu, Liu, Yue, Qu, Guangyuan, Guo, Guangcan, Zhan, Qiwen, and Ren, Xifeng
Subjects: Physics - Optics
Abstract: Topologically protected quasiparticles in optics have received increasing research attention recently, as they provide novel degree of freedom to manipulate light-matter interactions and exhibiting excellent potential in nanometrology and ultrafast vector imaging. However, the characterization of the full three-dimensional vectorial structures of the topological texures at the nanoscale has remained a challenge. Here, we propose a novel probe based on the fiber taper-silver nanowire waveguide structure to achieve super-resolution mapping of the topological textures. Based on the mode selection rules, the three-dimensional decomposed electric fields in both the far-field and near-field are directly collected and reconstructed without postprocessing algorithms, clearly visualizing the topological texures formed in free space and evanescent waves respectively. The fiber-integrated probe is further demonstrated to be robust and broadband. This approach holds promise for the characterization of more sophisticated topology in optical field, which may allow for advance applications in optical information processing and data storage., Comment: 13 pages,4 figures
Published: 2024

287. Agent Workflow Memory

Author: Wang, Zora Zhiruo, Mao, Jiayuan, Fried, Daniel, and Neubig, Graham
Subjects: Computer Science - Computation and Language
Abstract: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.
Published: 2024

288. Sound Wave Manipulation Based on Valley Acoustic Interferometers

Author: Zhao, Wei, Chen, Jia-He, Cheng, Shu-Guang, Mao, Yong, Zhang, Xiaojun, Tao, Zhi, Jiang, Hua, and Hang, Zhi Hong
Subjects: Physics - Applied Physics, Condensed Matter - Materials Science
Abstract: Topological acoustics provides new opportunities for materials with unprecedented functions. In this work, we report a design of topological valley acoustic interferometers by Y-shaped valley sonic crystals. By tight-bounding calculation and experimental demonstration, we successfully tune the acoustic energy partition rate by configuring the channel. An analytical theory proposed to explain the transmission property matches well with experimental observations. An additional {\pi} Berry phase is verified to accumulate circling along the shape-independent topological valley acoustic interferometer, unique in the pseudospin half systems. Based on the spectral oscillation originating from the accumulated dynamic phase and {\pi} Berry phase, a simplified method to measure acoustic valley interface dispersion is explored, which overcomes the shortcomings of the traditional fast Fourier transform method and improves the measuring efficiency by simply analyzing the peaks and dips of the measured transmission spectrum. Moreover, an effective approach to tuning its transmissions, as well as the spectral line shapes proposed and realized by the local geometry design of the interferometer, exhibits strong tunability under an unchanged physical mechanism. Our work opens an avenue to design future acoustic devices with the function of sound wave manipulation based on the physical mechanism of interference and Berry phase.
Published: 2024

289. DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation

Author: Qin, Qitao, Luo, Yucong, Cheng, Mingyue, Mao, Qingyang, and Lei, Chenyi
Subjects: Computer Science - Cryptography and Security, Computer Science - Information Retrieval
Abstract: Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. Significant research has been conducted on targeted attacks in FedRec systems, motivated by commercial and social influence considerations. However, much of this work has largely overlooked the differential robustness of recommendation models. Moreover, our empirical findings indicate that existing targeted attack methods achieve only limited effectiveness in Federated Sequential Recommendation (FSR) tasks. Driven by these observations, we focus on investigating targeted attacks in FSR and propose a novel dualview attack framework, named DV-FSR. This attack method uniquely combines a sampling-based explicit strategy with a contrastive learning-based implicit gradient strategy to orchestrate a coordinated attack. Additionally, we introduce a specific defense mechanism tailored for targeted attacks in FSR, aiming to evaluate the mitigation effects of the attack method we proposed. Extensive experiments validate the effectiveness of our proposed approach on representative sequential models.
Published: 2024

290. Refracting Reconfigurable Intelligent Surface Assisted URLLC for Millimeter Wave High-Speed Train Communication Coverage Enhancement

Author: Liu, Changzhu, He, Ruisi, Niu, Yong, Mao, Shiwen, Ai, Bo, and Chen, Ruifeng
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversing carriage, posing substantial challenges to cellular networks. To address this issue, reconfigurable intelligent surfaces (RIS) have gained considerable interest for its ability to enhance cell coverage by reflecting signals toward receiver. Ensuring communication reliability, a core performance indicators of ultra-reliable and low-latency communications (URLLC) in fifth-generation systems, is crucial for providing steady and reliable data transmissions along railways, particularly for delivering safety and control messages and monitoring HST signaling information. In this paper, we investigate a refracting RIS-assisted multi-user multiple-input single-output URLLC system in mmWave HST communications. We propose a sum rate maximization problem, subject to base station beamforming constraint, as well as refracting RIS discrete phase shifts and reliability constraints. To solve this optimization problem, we design a joint optimization algorithm based on alternating optimization method. This involves decoupling the original optimization problem into active beamforming design and packet error probability optimization subproblem, and discrete phase shift design subproblems. These subproblems are addressed exploiting Lagrangian dual method and the local search method, respectively. Simulation results demonstrate the fast convergence of the proposed algorithm and highlight the benefits of refracting RIS adoption for sum rate improvement in mmWave HST networks., Comment: 11 figures, accepted by IEEE Transactions on Vehicular Technology
Published: 2024

291. Five dimensional Weyl double copy

Author: Zhao, Weicheng, Mao, Pu-Jian, and Wu, Jun-Bao
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology
Abstract: In this paper, we generalize the Weyl double copy to five-dimensional spacetime. We show that a special class of five-dimensional type N vacuum solutions admits a special class of degenerate Maxwell field that squares to give the Weyl tensor. The five-dimensional Weyl double copy relation defines a scalar field that satisfies the source-free Klein-Gordon equation on the curved background., Comment: 11 pages
Published: 2024

292. Wave effect of gravitational waves intersected with a microlens field II: an adaptive hierarchical tree algorithm and population study

Author: Shan, Xikai, Li, Guoliang, Chen, Xuechun, Zhao, Wen, Hu, Bin, and Mao, Shude
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: The gravitational lensing wave effect generated by a microlensing field embedded in a lens galaxy is an inevitable phenomenon in strong lensed gravitational waves (SLGWs). This effect presents both challenges and opportunities for the detection and application of SLGWs. However, investigating this wave effect requires computing a complete diffraction integral over each microlens in the field. This is extremely time-consuming due to the large number of microlenses. Therefore, simply adding all the microlenses is impractical. Additionally, the complexity of the time delay surface makes the lens plane resolution a crucial factor in controlling numerical errors. In this paper, we propose a trapezoid approximation-based adaptive hierarchical tree algorithm to meet the challenges of calculation speed and precision. We find that this algorithm accelerates the calculation by four orders of magnitude compared to the simple adding method and is one order of magnitude faster than the fixed hierarchical tree algorithm proposed for electromagnetic microlensing. More importantly, our algorithm ensures controllable numerical errors, increasing confidence in the results. Together with our previous work, this paper addresses all numerical issues, including integral convergence, precision, and computational time. Finally, we conducted a population study on the microlensing wave effect of SLGWs using this algorithm and found that the microlensing wave effect cannot be ignored, especially for Type II SLGWs due to their intrinsic geometric structures and their typical intersection with a denser microlensing field. Statistically, more than 33% (11%) of SLGWs have a mismatch larger than 1% (3%) compared to the unlensed waveform. Additionally, we found that the mismatch between signal pairs in a doubly imaged GW is generally larger than 10^{-3}, and 61% (25%) of signal pairs have a mismatch larger than 1% (3%)., Comment: 19 pages, 11 figures, minor revision before publication
Published: 2024

293. Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Author: Yang, Qi, Mao, Binjie, Wang, Zili, Nie, Xing, Gao, Pengfei, Guo, Ying, Zhen, Cheng, Yan, Pengfei, and Xiang, Shiming
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated audio, as well as the alignment of temporal and loudness properties within the video. To address these issues, we construct a controllable video-to-audio synthesis model, termed Draw an Audio, which supports multiple input instructions through drawn masks and loudness signals. To ensure content consistency between the synthesized audio and target video, we introduce the Mask-Attention Module (MAM), which employs masked video instruction to enable the model to focus on regions of interest. Additionally, we implement the Time-Loudness Module (TLM), which uses an auxiliary loudness signal to ensure the synthesis of sound that aligns with the video in both loudness and temporal dimensions. Furthermore, we have extended a large-scale V2A dataset, named VGGSound-Caption, by annotating caption prompts. Extensive experiments on challenging benchmarks across two large-scale V2A datasets verify Draw an Audio achieves the state-of-the-art. Project page: https://yannqi.github.io/Draw-an-Audio/., Comment: 14 pages, 11 figures
Published: 2024

294. Quantum Oscillations Evidence for Topological Bands in Kagome Metal ScV6Sn6

Author: Zheng, Guoxin, Zhu, Yuan, Mozaffari, Shirin, Mao, Ning, Chen, Kuan-Wen, Jenkins, Kaila, Zhang, Dechen, Chan, Aaron, Arachchige, Hasitha W. Suriya, Madhogaria, Richa P., Cothrine, Matthew, Meier, William R., Zhang, Yang, Mandrus, David, and Li, Lu
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: Metals with kagome lattice provide bulk materials to host both the flat-band and Dirac electronic dispersions. A new family of kagome metals is recently discovered in AV6Sn6. The Dirac electronic structures of this material need more experimental evidence to confirm. In the manuscript, we investigate this problem by resolving the quantum oscillations in both electrical transport and magnetization in ScV6Sn6. The revealed orbits are consistent with the electronic band structure models. Furthermore, the Berry phase of a dominating orbit is revealed to be around $\pi$, providing direct evidence for the topological band structure, which is consistent with calculations. Our results demonstrate a rich physics and shed light on the correlated topological ground state of this kagome metal., Comment: 5 figures, accepted version
Published: 2024
Full Text: View/download PDF

295. CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization

Author: Chen, Nan, Huang, Mengqi, Chen, Zhuowei, Zheng, Yang, Zhang, Lei, and Mao, Zhendong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Subject-driven text-to-image (T2I) customization has drawn significant interest in academia and industry. This task enables pre-trained models to generate novel images based on unique subjects. Existing studies adopt a self-reconstructive perspective, focusing on capturing all details of a single image, which will misconstrue the specific image's irrelevant attributes (e.g., view, pose, and background) as the subject intrinsic attributes. This misconstruction leads to both overfitting or underfitting of irrelevant and intrinsic attributes of the subject, i.e., these attributes are over-represented or under-represented simultaneously, causing a trade-off between similarity and controllability. In this study, we argue an ideal subject representation can be achieved by a cross-differential perspective, i.e., decoupling subject intrinsic attributes from irrelevant attributes via contrastive learning, which allows the model to focus more on intrinsic attributes through intra-consistency (features of the same subject are spatially closer) and inter-distinctiveness (features of different subjects have distinguished differences). Specifically, we propose CustomContrast, a novel framework, which includes a Multilevel Contrastive Learning (MCL) paradigm and a Multimodal Feature Injection (MFI) Encoder. The MCL paradigm is used to extract intrinsic features of subjects from high-level semantics to low-level appearance through crossmodal semantic contrastive learning and multiscale appearance contrastive learning. To facilitate contrastive learning, we introduce the MFI encoder to capture cross-modal representations. Extensive experiments show the effectiveness of CustomContrast in subject similarity and text controllability.
Published: 2024

296. MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Author: Qian, Hongjin, Zhang, Peitian, Liu, Zheng, Mao, Kelong, and Dou, Zhicheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through optimized context. However, the existing retrieval methods are constrained inherently, as they can only perform relevance matching between explicitly stated queries and well-formed knowledge, but unable to handle tasks involving ambiguous information needs or unstructured knowledge. Consequently, existing RAG systems are primarily effective for straightforward question-answering tasks. In this work, we propose MemoRAG, a novel retrieval-augmented generation paradigm empowered by long-term memory. MemoRAG adopts a dual-system architecture. On the one hand, it employs a light but long-range LLM to form the global memory of database. Once a task is presented, it generates draft answers, cluing the retrieval tools to locate useful information within the database. On the other hand, it leverages an expensive but expressive LLM, which generates the ultimate answer based on the retrieved information. Building on this general framework, we further optimize MemoRAG's performance by enhancing its cluing mechanism and memorization capacity. In our experiment, MemoRAG achieves superior performance across a variety of evaluation tasks, including both complex ones where conventional RAG fails and straightforward ones where RAG is commonly applied., Comment: Technical Report. Codes and models are in https://github.com/qhjqhj00/MemoRAG
Published: 2024

297. Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models

Author: Thorne, Camilo, Druckenbrodt, Christian, Szarkowska, Kinga, Goyal, Deepika, Marajan, Pranita, Somanath, Vijay, Harper, Corey, Yan, Mao, and Scerri, Tony
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission, Comment: This document was submitted without obtaining all necessary permissions and therefore needs to be withdrawn. The corresponding author apologizes for any inconvenience this might cause
Published: 2024

298. DiVA-DocRE: A Discriminative and Voice-Aware Paradigm for Document-Level Relation Extraction

Author: Wu, Yiheng, Yangarber, Roman, and Mao, Xian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: The remarkable capabilities of Large Language Models (LLMs) in text comprehension and generation have revolutionized Information Extraction (IE). One such advancement is in Document-level Relation Triplet Extraction (DocRTE), a critical task in information systems that aims to extract entities and their semantic relationships from documents. However, existing methods are primarily designed for Sentence level Relation Triplet Extraction (SentRTE), which typically handles a limited set of relations and triplet facts within a single sentence. Additionally, some approaches treat relations as candidate choices integrated into prompt templates, resulting in inefficient processing and suboptimal performance when determining the relation elements in triplets. To address these limitations, we introduce a Discriminative and Voice Aware Paradigm DiVA. DiVA involves only two steps: performing document-level relation extraction (DocRE) and then identifying the subject object entities based on the relation. No additional processing is required simply input the document to directly obtain the triplets. This streamlined process more accurately reflects real-world scenarios for triplet extraction. Our innovation lies in transforming DocRE into a discriminative task, where the model pays attention to each relation and to the often overlooked issue of active vs. passive voice within the triplet. Our experiments on the Re-DocRED and DocRED datasets demonstrate state-of-the-art results for the DocRTE task.
Published: 2024

299. Hierarchical Sparse Representation Clustering for High-Dimensional Data Streams

Author: Chen, Jie, Mao, Hua, Gou, Yuanbiao, and Peng, Xi
Subjects: Computer Science - Machine Learning
Abstract: Data stream clustering reveals patterns within continuously arriving, potentially unbounded data sequences. Numerous data stream algorithms have been proposed to cluster data streams. The existing data stream clustering algorithms still face significant challenges when addressing high-dimensional data streams. First, it is intractable to measure the similarities among high-dimensional data objects via Euclidean distances when constructing and merging microclusters. Second, these algorithms are highly sensitive to the noise contained in high-dimensional data streams. In this paper, we propose a hierarchical sparse representation clustering (HSRC) method for clustering high-dimensional data streams. HSRC first employs an $l_1$-minimization technique to learn an affinity matrix for data objects in individual landmark windows with fixed sizes, where the number of neighboring data objects is automatically selected. This approach ensures that highly correlated data samples within clusters are grouped together. Then, HSRC applies a spectral clustering technique to the affinity matrix to generate microclusters. These microclusters are subsequently merged into macroclusters based on their sparse similarity degrees (SSDs). Additionally, HSRC introduces sparsity residual values (SRVs) to adaptively select representative data objects from the current landmark window. These representatives serve as dictionary samples for the next landmark window. Finally, HSRC refines each macrocluster through fine-tuning. In particular, HSRC enables the detection of outliers in high-dimensional data streams via the associated SRVs. The experimental results obtained on several benchmark datasets demonstrate the effectiveness and robustness of HSRC., Comment: 11 pages, 6 figures
Published: 2024

300. Simplex-enabled Safe Continual Learning Machine

Author: Cao, Hongpeng, Mao, Yanbing, Cai, Yihao, Sha, Lui, and Caccamo, Marco
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

602,105 results on '"A. A. Mao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources