2,747,533 results on '"Xiao An"'
Search Results
152. A large language model-type architecture for high-dimensional molecular potential energy surfaces
- Author
-
Zhu, Xiao and Iyengar, Srinivasan S.
- Subjects
Computer Science - Machine Learning ,Physics - Atomic and Molecular Clusters ,Physics - Chemical Physics ,Physics - Computational Physics - Abstract
Computing high dimensional potential surfaces for molecular and materials systems is considered to be a great challenge in computational chemistry with potential impact in a range of areas including fundamental prediction of reaction rates. In this paper we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 dimensions. Essentially a family of neural networks that pertain to the graph-based subsystems, get the job done for this 51 dimensional system. We then ask if this same family of lower-dimensional neural networks can be transformed to provide accurate predictions for a 186 dimensional potential surface. We find that our algorithm does provide reasonably accurate results for this larger dimensional problem with sub-kcal/mol accuracy for the higher dimensional potential surface problem.
- Published
- 2024
153. Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach
- Author
-
Li, Xiao, Girard, Anouck, and Kolmanovsky, Ilya
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Autonomous driving heavily relies on perception systems to interpret the environment for decision-making. To enhance robustness in these safety critical applications, this paper considers a Deep Ensemble of Deep Neural Network regressors integrated with Conformal Prediction to predict and quantify uncertainties. In the Adaptive Cruise Control setting, the proposed method performs state and uncertainty estimation from RGB images, informing the downstream controller of the DNN perception uncertainties. An adaptive cruise controller using Conformal Tube Model Predictive Control is designed to ensure probabilistic safety. Evaluations with a high-fidelity simulator demonstrate the algorithm's effectiveness in speed tracking and safe distance maintaining, including in Out-Of-Distribution scenarios.
- Published
- 2024
154. The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
- Author
-
Feng, Ruili, Zhang, Han, Yang, Zhantao, Xiao, Jie, Shu, Zhilei, Liu, Zhiheng, Zheng, Andy, Huang, Yukun, Liu, Yu, and Zhang, Hongyang
- Subjects
Computer Science - Artificial Intelligence - Abstract
We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments. Trained on limited supervised data from AAA games like Forza Horizon 5 and Cyberpunk 2077, complemented by large-scale unsupervised footage from real-world settings like Tokyo streets, The Matrix allows users to traverse diverse terrains -- deserts, grasslands, water bodies, and urban landscapes -- in continuous, uncut hour-long sequences. Operating at 16 FPS, the system supports real-time interactivity and demonstrates zero-shot generalization, translating virtual game environments to real-world contexts where collecting continuous movement data is often infeasible. For example, The Matrix can simulate a BMW X3 driving through an office setting--an environment present in neither gaming data nor real-world sources. This approach showcases the potential of AAA game data to advance robust world models, bridging the gap between simulations and real-world applications in scenarios with limited data.
- Published
- 2024
155. FLAIR: VLM with Fine-grained Language-informed Image Representations
- Author
-
Xiao, Rui, Kim, Sanghwan, Georgescu, Mariana-Iuliana, Akata, Zeynep, and Alaniz, Stephan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
CLIP has shown impressive results in aligning images and texts at scale. However, its ability to capture detailed visual features remains limited because CLIP matches images and texts at a global level. To address this issue, we propose FLAIR, Fine-grained Language-informed Image Representations, an approach that utilizes long and detailed image descriptions to learn localized image embeddings. By sampling diverse sub-captions that describe fine-grained details about an image, we train our vision-language model to produce not only global embeddings but also text-specific image representations. Our model introduces text-conditioned attention pooling on top of local image tokens to produce fine-grained image representations that excel at retrieving detailed image content. We achieve state-of-the-art performance on both, existing multimodal retrieval benchmarks, as well as, our newly introduced fine-grained retrieval task which evaluates vision-language models' ability to retrieve partial image content. Furthermore, our experiments demonstrate the effectiveness of FLAIR trained on 30M image-text pairs in capturing fine-grained visual information, including zero-shot semantic segmentation, outperforming models trained on billions of pairs. Code is available at https://github.com/ExplainableML/flair .
- Published
- 2024
156. PaliGemma 2: A Family of Versatile VLMs for Transfer
- Author
-
Steiner, Andreas, Pinto, André Susano, Tschannen, Michael, Keysers, Daniel, Wang, Xiao, Bitton, Yonatan, Gritsenko, Alexey, Minderer, Matthias, Sherbondy, Anthony, Long, Shangbang, Qin, Siyang, Ingle, Reeve, Bugliarello, Emanuele, Kazemzadeh, Sahar, Mesnard, Thomas, Alabdulmohsin, Ibrahim, Beyer, Lucas, and Zhai, Xiaohua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broad knowledge for transfer via fine-tuning. The resulting family of base models covering different model sizes and resolutions allows us to investigate factors impacting transfer performance (such as learning rate) and to analyze the interplay between the type of task, model size, and resolution. We further increase the number and breadth of transfer tasks beyond the scope of PaliGemma including different OCR-related tasks such as table structure recognition, molecular structure recognition, music score recognition, as well as long fine-grained captioning and radiography report generation, on which PaliGemma 2 obtains state-of-the-art results.
- Published
- 2024
157. A new fidelity of quantum channel evolution and its geometric interpretation
- Author
-
Yan, Xiaojing, Sun, Xiao, Du, Mingming, and Tang, Jiashan
- Subjects
Quantum Physics ,81P47 - Abstract
Fidelity is crucial for characterizing transformations of quantum states under various quantum channels, which can be served as a fundamental tool in resource theories. Firstly, we define an $\alpha$-$z$-fidelity as a significant quantity in quantum information theory and give the properties of the fidelity with orders $\alpha$ and $z$. Secondly, by analyzing the $\alpha$-$z$-fidelity under the evolution of different types of quantum channels (single orbit, all quantum channels, unitary quantum channels, and mixed unitary quantum channels), we propose a limit formula for the maximum and the minimum of the $\alpha$-$z$-fidelity. In addition, we have extended the $\alpha$-$z$-R\'enyi relative entropy, providing new insights into its relevance for resource quantification. Finally, we offer a geometric interpretation for measuring the distance between quantum states, contributing to the broader understanding of the operational and transformative power of dynamical quantum resources across various physical settings., Comment: 13 pages
- Published
- 2024
158. Generating Synthetic Genotypes using Diffusion Models
- Author
-
Kenneweg, Philip, Dandinasivara, Raghuram, Luo, Xiao, Hammer, Barbara, and Schönhuth, Alexander
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
In this paper, we introduce the first diffusion model designed to generate complete synthetic human genotypes, which, by standard protocols, one can straightforwardly expand into full-length, DNA-level genomes. The synthetic genotypes mimic real human genotypes without just reproducing known genotypes, in terms of approved metrics. When training biomedically relevant classifiers with synthetic genotypes, accuracy is near-identical to the accuracy achieved when training classifiers with real data. We further demonstrate that augmenting small amounts of real with synthetically generated genotypes drastically improves performance rates. This addresses a significant challenge in translational human genetics: real human genotypes, although emerging in large volumes from genome wide association studies, are sensitive private data, which limits their public availability. Therefore, the integration of additional, insensitive data when striving for rapid sharing of biomedical knowledge of public interest appears imperative.
- Published
- 2024
159. ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning
- Author
-
Xie, Zhe, Li, Zeyan, He, Xiao, Xu, Longlong, Wen, Xidao, Zhang, Tieying, Chen, Jianjun, Shi, Rui, and Pei, Dan
- Subjects
Computer Science - Artificial Intelligence - Abstract
Understanding time series is crucial for its application in real-world scenarios. Recently, large language models (LLMs) have been increasingly applied to time series tasks, leveraging their strong language capabilities to enhance various applications. However, research on multimodal LLMs (MLLMs) for time series understanding and reasoning remains limited, primarily due to the scarcity of high-quality datasets that align time series with textual information. This paper introduces ChatTS, a novel MLLM designed for time series analysis. ChatTS treats time series as a modality, similar to how vision MLLMs process images, enabling it to perform both understanding and reasoning with time series. To address the scarcity of training data, we propose an attribute-based method for generating synthetic time series with detailed attribute descriptions. We further introduce Time Series Evol-Instruct, a novel approach that generates diverse time series Q&As, enhancing the model's reasoning capabilities. To the best of our knowledge, ChatTS is the first MLLM that takes multivariate time series as input, which is fine-tuned exclusively on synthetic datasets. We evaluate its performance using benchmark datasets with real-world data, including six alignment tasks and four reasoning tasks. Our results show that ChatTS significantly outperforms existing vision-based MLLMs (e.g., GPT-4o) and text/agent-based LLMs, achieving a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks., Comment: 14 pages, 14 figures
- Published
- 2024
160. Taurus Database: How to be Fast, Available, and Frugal in the Cloud
- Author
-
Depoutovitch, Alex, Chen, Chong, Chen, Jin, Larson, Paul, Lin, Shu, Ng, Jack, Cui, Wenlin, Liu, Qiang, Huang, Wei, Xiao, Yong, and He, Yongjun
- Subjects
Computer Science - Databases ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Using cloud Database as a Service (DBaaS) offerings instead of on-premise deployments is increasingly common. Key advantages include improved availability and scalability at a lower cost than on-premise alternatives. In this paper, we describe the design of Taurus, a new multi-tenant cloud database system. Taurus separates the compute and storage layers in a similar manner to Amazon Aurora and Microsoft Socrates and provides similar benefits, such as read replica support, low network utilization, hardware sharing and scalability. However, the Taurus architecture has several unique advantages. Taurus offers novel replication and recovery algorithms providing better availability than existing approaches using the same or fewer replicas. Also, Taurus is highly optimized for performance, using no more than one network hop on critical paths and exclusively using append-only storage, delivering faster writes, reduced device wear, and constant-time snapshots. This paper describes Taurus and provides a detailed description and analysis of the storage node architecture, which has not been previously available from the published literature.
- Published
- 2024
- Full Text
- View/download PDF
161. LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs
- Author
-
Doma, Pranav, Arab, Aliasghar, and Xiao, Xuesu
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Autonomous navigation guided by natural language instructions is essential for improving human-robot interaction and enabling complex operations in dynamic environments. While large language models (LLMs) are not inherently designed for planning, they can significantly enhance planning efficiency by providing guidance and informing constraints to ensure safety. This paper introduces a planning framework that integrates LLMs with 2D occupancy grid maps and natural language commands to improve spatial reasoning and task execution in resource-limited settings. By decomposing high-level commands and real-time environmental data, the system generates structured navigation plans for pick-and-place tasks, including obstacle avoidance, goal prioritization, and adaptive behaviors. The framework dynamically recalculates paths to address environmental changes and aligns with implicit social norms for seamless human-robot interaction. Our results demonstrates the potential of LLMs to design context-aware system to enhance navigation efficiency and safety in industrial and dynamic environments.
- Published
- 2024
162. PrefixLLM: LLM-aided Prefix Circuit Design
- Author
-
Xiao, Weihua, Putrevu, Venkata Sai Charan, Hemadri, Raghu Vamshi, Garg, Siddharth, and Karri, Ramesh
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence - Abstract
Prefix circuits are fundamental components in digital adders, widely used in digital systems due to their efficiency in calculating carry signals. Synthesizing prefix circuits with minimized area and delay is crucial for enhancing the performance of modern computing systems. Recently, large language models (LLMs) have demonstrated a surprising ability to perform text generation tasks. We propose PrefixLLM, that leverages LLMs for prefix circuit synthesis. PrefixLLM transforms the prefix circuit synthesis task into a structured text generation problem, termed the Structured Prefix Circuit Representation (SPCR), and introduces an iterative framework to automatically and accurately generate valid SPCRs. We further present a design space exploration (DSE) framework that uses LLMs to iteratively search for area and delay optimized prefix circuits. Compared to state-of-the-art, PrefixLLM can reduce the area by 3.70% under the same delay constraint. This work highlights the use of LLMs in the synthesis of arithmetic circuits, which can be transformed into the structured text generation.
- Published
- 2024
163. F-SE-LSTM: A Time Series Anomaly Detection Method with Frequency Domain Information
- Author
-
Lu, Yi-Xiang, Jin, Xiao-Bo, Chen, Jian, Liu, Dong-Jie, and Geng, Guang-Gang
- Subjects
Computer Science - Artificial Intelligence - Abstract
With the development of society, time series anomaly detection plays an important role in network and IoT services. However, most existing anomaly detection methods directly analyze time series in the time domain and cannot distinguish some relatively hidden anomaly sequences. We attempt to analyze the impact of frequency on time series from a frequency domain perspective, thus proposing a new time series anomaly detection method called F-SE-LSTM. This method utilizes two sliding windows and fast Fourier transform (FFT) to construct a frequency matrix. Simultaneously, Squeeze-and-Excitation Networks (SENet) and Long Short-Term Memory (LSTM) are employed to extract frequency-related features within and between periods. Through comparative experiments on multiple datasets such as Yahoo Webscope S5 and Numenta Anomaly Benchmark, the results demonstrate that the frequency matrix constructed by F-SE-LSTM exhibits better discriminative ability than ordinary time domain and frequency domain data. Furthermore, F-SE-LSTM outperforms existing state-of-the-art deep learning anomaly detection methods in terms of anomaly detection capability and execution efficiency., Comment: 14 pages, 7 figures
- Published
- 2024
164. Detection of a new GeV source in the outer region of the Coma cluster: a signature of external accretion shock ?
- Author
-
Chen, Xiao-Bin, Wang, Kai, Huang, Yi-Yun, Zhang, Hai-Ming, Xi, Shao-Qiang, Liu, Ruo-Yu, and Wang, Xiang-Yu
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The supersonic flow motions associated with infall of baryonic gas toward sheets and filaments, as well as cluster mergers, produces large-scale shock waves. The shocks associated with galaxy clusters can be classified mainly into two categories: internal shocks appear in the hot intracluster medium within the viral radius, and external accretion shocks form in the outer cold region well outside of the virial radius. Cosmic-ray (CR) electrons and/or protons accelerated by these shocks are expected to produce gamma-rays through inverse-Compton scatterings (ICS) or inelastic $pp$ collisions respectively. Recent studies have found a spatially extended GeV source within the virial radius, consistent with the internal shock origin. Here we report the detection of a new GeV source at a distance of about 2.8$^\circ$ from the center of the Coma cluster through the analysis of 16.2 years of Fermi-LAT data. The hard spectrum of the source, in agreement with the ICS origin, and its location in a large-scale filament of galaxies points to the external accretion shock origin. The gamma-ray ($0.1-10^3$ GeV) luminosity of the source, $1.4\times 10^{42}~ {\rm erg~s^{-1}}$, suggests that a fraction $\sim 10^{-3}$ of the kinetic energy flux through the shock-surface is transferred to relativistic CR electrons.
- Published
- 2024
165. First Pulsar Polarization Array Limits on Ultralight Axion-like Dark Matter
- Author
-
Xue, Xiao, Dai, Shi, Luu, Hoang Nhan, Liu, Tao, Ren, Jing, Shu, Jing, Zhao, Yue, Zic, Andrew, Bhat, N. D. Ramesh, Chen, Zu-Cheng, Feng, Yi, Hobbs, George, Kapur, Agastya, Manchester, Richard N., Mandow, Rami, Mishra, Saurav, Reardon, Daniel J., Russell, Christopher J., Shannon, Ryan M., Wang, Shuangqiang, Zhang, Lei, Zhang, Songbo, and Zhu, Xingjiang
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Cosmology and Nongalactic Astrophysics ,Astrophysics - Instrumentation and Methods for Astrophysics ,High Energy Physics - Phenomenology - Abstract
We conduct the first-ever Pulsar Polarization Array (PPA) analysis to detect the ultralight Axion-Like Dark Matter (ALDM) using the polarization data of 22 millisecond pulsars from the third data release of Parkes Pulsar Timing Array. As one of the major dark matter candidates, the ultralight ALDM exhibits a pronounced wave nature on astronomical scales and offers a promising solution to small-scale structure issues within local galaxies. While the linearly polarized pulsar light travels through the ALDM galactic halo, its position angle (PA) can be subject to an oscillation induced by the ALDM Chern-Simons coupling with electromagnetic field. The PPA is thus especially suited for detecting the ultralight ALDM by correlating polarization data across the arrayed pulsars. To accomplish this task, we develop an advanced Bayesian analysis framework that allows us to construct pulsar PA residual time series, model noise contributions properly and search for pulsar cross-correlations. We find that for an ALDM density of $\rho_0=0.4\,\textrm{GeV}/\textrm{cm}^3$, the Parkes PPA offers the best global limits on the ALDM Chern-Simons coupling, namely $\lesssim 10^{-13.5}-10^{-12.2}~{\rm GeV}^{-1}$, for the mass range of $10^{-22} - 10^{-21}~{\rm eV}$. The crucial role of pulsar cross-correlation in recognizing the nature of the derived limits is also highlighted., Comment: 6+15 pages, 10 figures, 2 tables, submitted to the journal
- Published
- 2024
166. Entangling independent particles by path identity
- Author
-
Wang, Kai, Hou, Zhaohua, Qian, Kaiyi, Chen, Leizhen, Krenn, Mario, Zhu, Shining, and Ma, Xiao-song
- Subjects
Quantum Physics - Abstract
Quantum entanglement -- correlations of particles that are stronger than any classical analogue -- is the basis for research on the foundations of quantum mechanics and for practical applications such as quantum networks. Traditionally, entanglement is achieved through local interactions or via entanglement swapping, where entanglement at a distance is generated through previously established entanglement and Bell-state measurements. However, the precise requirements enabling the generation of quantum entanglement without traditional local interactions remain less explored. Here we demonstrate that independent particles can be entangled without the need for direct interaction, prior established entanglement, or Bell-state measurements, by exploiting the indistinguishability of the origins of photon pairs. Our demonstrations challenge the long-standing belief that the prior generation and measurement of entanglement are necessary prerequisites for generating entanglement between independent particles that do not share a common past. In addition to its foundational interest, we show that this technique might lower the resource requirements in quantum networks, by reducing the complexity of photon sources and the overhead photon numbers.
- Published
- 2024
- Full Text
- View/download PDF
167. Tensor renormalization group study of (1+1)-dimensional O(3) nonlinear sigma model with and without finite chemical potential
- Author
-
Luo, Xiao and Kuramashi, Yoshinobu
- Subjects
High Energy Physics - Lattice - Abstract
We study (1+1)-dimensional O(3) nonlinear sigma model using the tensor renormalization group method with the infinite limit of the bond dimension $D_{\rm cut}\rightarrow \infty$. At the vanishing chemical potential $\mu=0$, we investigate the von Neumann and R\'enyi types of entanglement entropies. The central charge is determined to be $c=1.97(9)$ by using the asymptotic scaling properties of the entropies. We also examine the consistency between two entropies. In the finite density region with $\mu\ne 0$, where this model suffers from the sign problem in the standard Monte Carlo approach, we investigate the properties of the quantum phase transition. We determine the transition point $\mu_{\rm c}$ and the critical exponent of the correlation length $\nu$ from the $\mu$ dependence of the number density in the thermodynamic limit. The dynamical critical exponent $z$ is also extracted from the scaling behavior of the temporal correlation length as a function of $\mu$. This is the first successful calculation of the dynamical critical exponent with the TRG method., Comment: 11 pages in total, to appear in the proceedings of the 41st International Symposium on Lattice Field Theory (LATTICE2024), 28 July - 3 August 2024, Liverpool, UK
- Published
- 2024
168. Synthesis of metalloborophene nanoribbons on Cu(110)
- Author
-
Weng, Xiao-Ji, Zhu, Yi, Xu, Ying, Bai, Jie, Zhang, Zhuhua, Xu, Bo, Zhou, Xiang-Feng, and Tian, Yongjun
- Subjects
Condensed Matter - Materials Science ,Physics - Chemical Physics - Abstract
Metalloborophene, characterized by the presence of metal-centered boron wheels denoted as M\c{opyright}Bn, has garnered considerable attention in recent years due to its versatile properties and potential applications in fields such as electronics, spintronics, and catalysis. However, the experimental verification of metalloborophene has been challenging, mainly due to the unconventional two-dimensional (2D) boron networks. In this study, we employ scanning tunneling microscopy, X-ray photoelectron spectroscopy, low energy electron diffraction, and first-principles calculations to unveil Cu\c{opyright}B8 metalloborophene nanoribbons formed via spontaneous alloying after the deposition of boron on a heated Cu(110) substrate under ultrahigh vacuum condition. The thermodynamically preferred precursor, the anchoring of boron network to metal atoms, and anisotropic lattice mismatch are identified as pivotal factors in the formation of these metalloborophene nanoribbons. This discovery expands the repertoire of 2D materials and offers a potential pathway for the synthesis of other metalloborophenes., Comment: 4 figures
- Published
- 2024
- Full Text
- View/download PDF
169. Birthmarks: Ergodicity Breaking Beyond Quantum Scars
- Author
-
Graf, Anton M., Keski-Rahkonen, Joonas, Xiao, Mingxuan, Atwood, Saul, Lu, Zhongling, Chen, Siyuan, and Heller, Eric J.
- Subjects
Quantum Physics - Abstract
One manifestation of classical ergodicity is a complete loss of memory of the initial conditions due to the eventual uniform exploration of phase space. In quantum versions of the same systems, classical ergodic traits can be broken. Here, we extend the concept of quantum scars in new directions, more focused on ergodicity and infinite time averages than individual eigenstates. We specifically establish a union of short and long-term enhancements in terms of a \emph{quantum birthmark} (QB). Subsequently, we show (1) that the birth and early evolution of a nonstationary state is remembered forever in infinite time averages, and (2) that early recurrences in the autocorrelation function inevitably lead to nonergodic flow over infinite times. We recount here that phase space cannot be explored ergodically if there are early recurrences (well before the Heisenberg time) in the autocorrelation of the initial nonstationary quantum state. Employing random matrix theory, we show that QB extends beyond individual states to entire subspaces or ``{\it birthplaces}" in Hilbert space. Finally, we visualize scar-amplified QBs unveiled within the time-averaged probability density of a wavepacket in a stadium system. By transcending the quantum scarring, QB delivers a new paradigm for understanding the elusive quantum nature of ergodicity.
- Published
- 2024
170. Stain-aware Domain Alignment for Imbalance Blood Cell Classification
- Author
-
Li, Yongcheng, Cai, Lingcong, Lu, Ying, Fu, Xianghua, Han, Xiao, Li, Ma, Lai, Wenxing, Zhang, Xiangzhong, and Fan, Xiaomao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Blood cell identification is critical for hematological analysis as it aids physicians in diagnosing various blood-related diseases. In real-world scenarios, blood cell image datasets often present the issues of domain shift and data imbalance, posing challenges for accurate blood cell identification. To address these issues, we propose a novel blood cell classification method termed SADA via stain-aware domain alignment. The primary objective of this work is to mine domain-invariant features in the presence of domain shifts and data imbalances. To accomplish this objective, we propose a stain-based augmentation approach and a local alignment constraint to learn domain-invariant features. Furthermore, we propose a domain-invariant supervised contrastive learning strategy to capture discriminative features. We decouple the training process into two stages of domain-invariant feature learning and classification training, alleviating the problem of data imbalance. Experiment results on four public blood cell datasets and a private real dataset collected from the Third Affiliated Hospital of Sun Yat-sen University demonstrate that SADA can achieve a new state-of-the-art baseline, which is superior to the existing cutting-edge methods with a big margin. The source code can be available at the URL (\url{https://github.com/AnoK3111/SADA}).
- Published
- 2024
171. Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution
- Author
-
Xiao, Jiahua, Zhang, Jiawei, Zou, Dongqing, Zhang, Xiaodan, Ren, Jimmy, and Wei, Xing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-world image super-resolution (Real-ISR) has achieved a remarkable leap by leveraging large-scale text-to-image models, enabling realistic image restoration from given recognition textual prompts. However, these methods sometimes fail to recognize some salient objects, resulting in inaccurate semantic restoration in these regions. Additionally, the same region may have a strong response to more than one prompt and it will lead to semantic ambiguity for image super-resolution. To alleviate the above two issues, in this paper, we propose to consider semantic segmentation as an additional control condition into diffusion-based image super-resolution. Compared to textual prompt conditions, semantic segmentation enables a more comprehensive perception of salient objects within an image by assigning class labels to each pixel. It also mitigates the risks of semantic ambiguities by explicitly allocating objects to their respective spatial regions. In practice, inspired by the fact that image super-resolution and segmentation can benefit each other, we propose SegSR which introduces a dual-diffusion framework to facilitate interaction between the image super-resolution and segmentation diffusion models. Specifically, we develop a Dual-Modality Bridge module to enable updated information flow between these two diffusion models, achieving mutual benefit during the reverse diffusion process. Extensive experiments show that SegSR can generate realistic images while preserving semantic structures more effectively.
- Published
- 2024
172. COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
- Author
-
Kim, Sanghwan, Xiao, Rui, Georgescu, Mariana-Iuliana, Alaniz, Stephan, and Akata, Zeynep
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Vision-Language Models (VLMs) trained with contrastive loss have achieved significant advancements in various vision and language tasks. However, the global nature of contrastive loss makes VLMs focus predominantly on foreground objects, neglecting other crucial information in the image, which limits their effectiveness in downstream tasks. To address these challenges, we propose COSMOS: CrOSs-MOdality Self-distillation for vision-language pre-training that integrates a novel text-cropping strategy and cross-attention module into a self-supervised learning framework. We create global and local views of images and texts (i.e., multi-modal augmentations), which are essential for self-distillation in VLMs. We further introduce a cross-attention module, enabling COSMOS to learn comprehensive cross-modal representations optimized via a cross-modality self-distillation loss. COSMOS consistently outperforms previous strong baselines on various zero-shot downstream tasks, including retrieval, classification, and semantic segmentation. Additionally, it surpasses CLIP-based models trained on larger datasets in visual perception and contextual understanding tasks.
- Published
- 2024
173. Modeling High Mass X-ray Binaries to Double Neutron Stars through Common Envelope Evolution
- Author
-
Nie, Yu-Dong, Shao, Yong, He, Jian-Guo, Wei, Ze-Lin, Xu, Xiao-Jie, and Li, Xiang-Dong
- Subjects
Astrophysics - Solar and Stellar Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
We present detailed evolutionary simulations of wide binary systems with high-mass ($8-20\,M_{\odot}$) donor stars and a $1.4\,M_{\odot}$ neutron star. Mass transfer in such binaries is dynamically unstable and common envelope (CE) evolution is followed. We use a recently developed prescription to deal with CE evolution and consider various CE ejection efficiencies varying in the range of $0.1-3.0$. We focus on the evolutionary consequences of the binaries survived CE evolution. We demonstrate that it is possible for the binaries to enter a CE decoupling phase (CEDP) when the donor stars are partially stripped leaving a hydrogen envelope of $\lesssim1.0-4.0\,M_\odot$ after CE evolution. This phase is expected to last $\sim 10^4-10^5\,\rm yr$, during which mass transfer occurs stably via Roche lobe overflow with super-Eddington rates. Identification of some X-ray binaries in a CEDP is important for the understanding of the physics of CE evolution itself, the origin of ultraluminous X-ray sources, and the recycling process of accreting pulsars. Also, we discuss the formation of double neutron stars and the occurrence of ultra-stripped supernovae according to the results from our simulations. On the whole, the properties of post-CE binaries are sensitive to the options of CE ejection efficiencies., Comment: 22 pages, 12+2 figures, 1 table, accepted by ApJ
- Published
- 2024
174. Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training
- Author
-
Wang, Yujie, Wang, Shiju, Zhu, Shenhan, Fu, Fangcheng, Liu, Xinyi, Xiao, Xuefeng, Li, Huixia, Li, Jiashi, Wu, Faming, and Cui, Bin
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Extending the context length (i.e., the maximum supported sequence length) of LLMs is of paramount significance. To facilitate long context training of LLMs, sequence parallelism has emerged as an essential technique, which scatters each input sequence across multiple devices and necessitates communication to process the sequence. In essence, existing sequence parallelism methods assume homogeneous sequence lengths (i.e., all input sequences are equal in length) and therefore leverages a single, static scattering strategy for all input sequences. However, in reality, the sequence lengths in LLM training corpora exhibit substantial variability, often following a long-tail distribution, which leads to workload heterogeneity. In this paper, we show that employing a single, static strategy results in inefficiency and resource under-utilization, highlighting the need for adaptive approaches to handle the heterogeneous workloads across sequences. To address this, we propose a heterogeneity-adaptive sequence parallelism method. For each training step, our approach captures the variability in sequence lengths and assigns the optimal combination of scattering strategies based on workload characteristics. We model this problem as a linear programming optimization and design an efficient and effective solver to find the optimal solution. Furthermore, we implement our method in a high-performance system that supports adaptive parallelization in distributed LLM training. Experimental results demonstrate that our system outperforms state-of-the-art training frameworks by up to 1.98x.
- Published
- 2024
175. Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
- Author
-
Shan, Weiqiao, Meng, Long, Zheng, Tong, Luo, Yingfeng, Li, Bei, Wang, junxin, Xiao, Tong, and Zhu, Jingbo
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) exhibit exceptional performance across various downstream tasks. However, they encounter limitations due to slow inference speeds stemming from their extensive parameters. The early exit (EE) is an approach that aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model, which offers a promising solution to this challenge. However, additional output layers and joint optimization used in conventional EE hinder the application of EE in LLMs. In this paper, we explore the possibility of LLMs EE without additional output layers and joint optimization. Our findings indicate that EE is a natural capability within transformer-based models. While joint optimization does not give model EE capability, it must be employed to address challenges by improving the accuracy of locating the optimal EE layer through gating functions. Additionally, our study reveals patterns in EE behavior from a sub-word perspective based on the LLaMA model and the potential possibility for EE based on sub-layers.
- Published
- 2024
176. A deformation-based framework for learning solution mappings of PDEs defined on varying domains
- Author
-
Xiao, Shanshan, Jin, Pengzhan, and Tang, Yifa
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Machine Learning - Abstract
In this work, we establish a deformation-based framework for learning solution mappings of PDEs defined on varying domains. The union of functions defined on varying domains can be identified as a metric space according to the deformation, then the solution mapping is regarded as a continuous metric-to-metric mapping, and subsequently can be represented by another continuous metric-to-Banach mapping using two different strategies, referred to as the D2D framework and the D2E framework, respectively. We point out that such a metric-to-Banach mapping can be learned by neural networks, hence the solution mapping is accordingly learned. With this framework, a rigorous convergence analysis is built for the problem of learning solution mappings of PDEs on varying domains. As the theoretical framework holds based on several pivotal assumptions which need to be verified for a given specific problem, we study the star domains as a typical example, and other situations could be similarly verified. There are three important features of this framework: (1) The domains under consideration are not required to be diffeomorphic, therefore a wide range of regions can be covered by one model provided they are homeomorphic. (2) The deformation mapping is unnecessary to be continuous, thus it can be flexibly established via combining a primary identity mapping and a local deformation mapping. This capability facilitates the resolution of large systems where only local parts of the geometry undergo change. (3) If a linearity-preserving neural operator such as MIONet is adopted, this framework still preserves the linearity of the surrogate solution mapping on its source term for linear PDEs, thus it can be applied to the hybrid iterative method. We finally present several numerical experiments to validate our theoretical results.
- Published
- 2024
177. Research on Cervical Cancer p16/Ki-67 Immunohistochemical Dual-Staining Image Recognition Algorithm Based on YOLO
- Author
-
Wu, Xiao-Jun, Zhao, Cai-Jun, Meng, Chun, and Wang, Hang
- Subjects
Computer Science - Artificial Intelligence - Abstract
The p16/Ki-67 dual staining method is a new approach for cervical cancer screening with high sensitivity and specificity. However, there are issues of mis-detection and inaccurate recognition when the YOLOv5s algorithm is directly applied to dual-stained cell images. This paper Proposes a novel cervical cancer dual-stained image recognition (DSIR-YOLO) model based on an YOLOv5. By fusing the Swin-Transformer module, GAM attention mechanism, multi-scale feature fusion, and EIoU loss function, the detection performance is significantly improved, with mAP@0.5 and mAP@0.5:0.95 reaching 92.6% and 70.5%, respectively. Compared with YOLOv5s in five-fold cross-validation, the accuracy, recall, mAP@0.5, and mAP@0.5:0.95 of the improved algorithm are increased by 2.3%, 4.1%, 4.3%, and 8.0%, respectively, with smaller variances and higher stability. Compared with other detection algorithms, DSIR-YOLO in this paper sacrifices some performance requirements to improve the network recognition effect. In addition, the influence of dataset quality on the detection results is studied. By controlling the sealing property of pixels, scale difference, unlabelled cells, and diagonal annotation, the model detection accuracy, recall, mAP@0.5, and mAP@0.5:0.95 are improved by 13.3%, 15.3%, 18.3%, and 30.5%, respectively.
- Published
- 2024
178. Remarks on strong phase shifts in weak nonleptonic baryon decays
- Author
-
Wang, Hong-Jian, Li, Pei-Rong, Lyu, Xiao-Rui, Tandean, Jusak, and Li, Hai-Bo
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
A sizable strong-interaction phase shift in weak two-body nonleptonic baryon decay would enhance the possibility of discovering charge-conjugation parity ($CP$) violation in the baryon sector, which might help in the quest for understanding the matter-antimatter asymmetry in the universe. Over the past 60 years, empirical analyses involving different types of instruments, including fixed-target experiments and $e^+e^-$ colliders, have indicated that the phase shifts in nonleptonic hyperon decays are relatively small, below order ten degrees in size. A large phase shift, however, has been observed by BESIII in the decay of a charmed baryon into a hyperon and kaon, $\Lambda_c^+\to \Xi^0K^+$. In various experimental and theoretical studies on hyperon, charmed-baryon, and bottomed-baryon decays, different conventions have been adopted for defining the strong phases. It is important to be aware of this situation when obtaining global averages from different measurements and applying the results to future investigations on $CP$ violation among baryons. This paper gives an overview of the conventions employed in the literature for the strong phases and suggests a unified parameterization form applicable to the different alternatives. Numerical results under the unified parameterization form are also provided, which can serve as useful inputs to further pursuits of baryon $CP$ violation.
- Published
- 2024
179. Interpreting the extremely diffuse stellar distribution of Nube galaxy through fuzzy dark matter
- Author
-
Yang, Yu-Ming, Zhang, Zhao-Chen, Bi, Xiao-Jun, and Yin, Peng-Fei
- Subjects
Astrophysics - Astrophysics of Galaxies ,High Energy Physics - Phenomenology - Abstract
Recent observations have revealed an unusual stellar distribution within the almost dark dwarf galaxy Nube. The galaxy exhibits a remarkably flat stellar distribution, with an effective radius of approximately 6.9 kpc, exceeding the typical size of dwarf galaxies and even surpassing that of ultra-diffuse galaxies (UDGs) with similar stellar masses. The dynamical heating effect of fuzzy dark matter (FDM) may offer an explanation for this extremely diffuse stellar distribution in Nube. In this research, we utilize simulation techniques to investigate this issue and find that a particle mass $\mathcal{O} (1)\times 10^{-23}$ eV offers a plausible explanation for this peculiar stellar distribution anomaly., Comment: 8 pages, 4 figures
- Published
- 2024
180. Self-embedding similitudes of Bedford-McMullen carpets with dependent ratios
- Author
-
Xiao, Jian-Ci
- Subjects
Mathematics - Classical Analysis and ODEs ,Mathematics - Metric Geometry ,Primary 28A80, Secondary 28A78 - Abstract
We prove that any non-degenerate Bedford-McMullen carpet does not allow oblique self-embedding similitudes; that is, if $f$ is a similitude sending the carpet into itself, then the image of the $x$-axis under $f$ must be parallel to one of the principal axes. We also establish a logarithmic commensurability result on the contraction ratios of such embeddings. This completes a previous study of Algom and Hochman [Ergod. Th. & Dynam. Sys. 39 (2019), 577--603] on Bedford-McMullen carpets generated by multiplicatively independent exponents, together with a new proof on their non-obliqueness statement. For the self-similar case, however, we construct a generalized Sierpinski carpet that is symmetric with respect to an appropriate oblique line and hence allows a reflectional oblique self-embedding. As a complement, we prove that if a generalized Sierpinski carpet satisfies the strong separation condition and permits an oblique rotational self-embedding similitude, then the tangent of the rotation angle takes values $\pm 1$., Comment: 25 pages, 6 figures
- Published
- 2024
181. Numerical approximation of slowlingly varying envelope in finite element electromagnetism: a ray-wave method of modeling multi-scale devices
- Author
-
Xiao, Fan, Wang, Jingwei, Xiong, Zhongfei, and Chen, Yuntian
- Subjects
Physics - Optics - Abstract
In this work we propose an efficient and accurate multi-scale optical simulation algorithm by applying a numerical version of slowly varying envelope approximation in FEM. Specifically, we employ the fast iterative method to quickly compute the phase distribution of the electric field within the computational domain and construct a novel multi-scale basis function that combines the conventional polynomial basis function together with numerically resolved phase information of optical waves. Utilizing this multi-scale basis function, the finite element method can significantly reduce the degrees of freedom required for the solution while maintaining computational accuracy, thereby improving computational efficiency. Without loss of generality, we illustrate our approach via simulating the examples of lens groups and gradient-index lenses, accompanied with performance benchmark against the standard finite element method. The results demonstrate that the proposed method achieves consistent results with the standard finite element method but with a computational speed improved by an order of magnitude., Comment: 12 pages,4 figures and 2 tables
- Published
- 2024
182. Estimating the gravitational wave background anisotropy: a Bayesian approach boosted by cross-correlation angular power spectrum
- Author
-
Tian, Chi, Ding, Ran, and Kou, Xiao-Xiao
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,General Relativity and Quantum Cosmology ,High Energy Physics - Phenomenology - Abstract
We introduce a new method designed for Bayesian inference of the angular power spectrum of the Gravitational Wave Background (GWB) anisotropy. This scheme works with time-series data and can optionally incorporate the cross-correlations between the GWB anisotropy and other cosmological tracers, enhancing the significance of Bayesian inference. We employ the realistic LISA response and noise model to demonstrate the validity of this approach. The findings indicate that, without considering any cross-correlations, the 4-year LISA data is insufficient to achieve a significant detection of multipoles. However, if the anisotropies in the GWB are strongly correlated with the Cosmic Microwave Background (CMB), the 4-year data can provide unbiased estimates of the quadrupole moment ($\ell = 2$). This reconstruction process is generic and not restricted to any specific detector, offering a new framework for extracting anisotropies in the GWB data from various current and future gravitational wave observatories., Comment: 11 pages, 4 figures
- Published
- 2024
183. AVA: Fault-tolerant Reconfigurable Geo-Replication on Heterogeneous Clusters
- Author
-
Mane, Tejas, Li, Xiao, Sadoghi, Mohammad, and Lesani, Mohsen
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Fault-tolerant replicated database systems consume less energy than the compute-intensive proof-of-work blockchain. Thus, they are promising technologies for the building blocks that assemble global financial infrastructure. To facilitate global scaling, clustered replication protocols are essential in orchestrating nodes into clusters based on proximity. However, the existing approaches often assume a homogeneous and fixed model in which the number of nodes across clusters is the same and fixed, and often limited to a fail-stop fault model. This paper presents heterogeneous and reconfigurable clustered replication for the general environment with arbitrary failures. In particular, we present AVA, a fault-tolerant reconfigurable geo-replication that allows dynamic membership: replicas are allowed to join and leave clusters. We formally state and prove the safety and liveness properties of the protocol. Furthermore, our replication protocol is consensus-agnostic, meaning each cluster can utilize any local replication mechanism. In our comprehensive evaluation, we instantiate our replication with both HotStuff and BFT-SMaRt. Experiments on geo-distributed deployments on Google Cloud demonstrates that members of clusters can be reconfigured without considerably affecting transaction processing, and that heterogeneity of clusters may significantly improve throughput.
- Published
- 2024
184. MPBD-LSTM: A Predictive Model for Colorectal Liver Metastases Using Time Series Multi-phase Contrast-Enhanced CT Scans
- Author
-
Li, Xueyang, Xiao, Han, Weng, Weixiang, Xu, Xiaowei, and Shi, Yiyu
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Colorectal cancer is a prevalent form of cancer, and many patients develop colorectal cancer liver metastasis (CRLM) as a result. Early detection of CRLM is critical for improving survival rates. Radiologists usually rely on a series of multi-phase contrast-enhanced computed tomography (CECT) scans done during follow-up visits to perform early detection of the potential CRLM. These scans form unique five-dimensional data (time, phase, and axial, sagittal, and coronal planes in 3D CT). Most of the existing deep learning models can readily handle four-dimensional data (e.g., time-series 3D CT images) and it is not clear how well they can be extended to handle the additional dimension of phase. In this paper, we build a dataset of time-series CECT scans to aid in the early diagnosis of CRLM, and build upon state-of-the-art deep learning techniques to evaluate how to best predict CRLM. Our experimental results show that a multi-plane architecture based on 3D bi-directional LSTM, which we call MPBD-LSTM, works best, achieving an area under curve (AUC) of 0.79. On the other hand, analysis of the results shows that there is still great room for further improvement.
- Published
- 2024
- Full Text
- View/download PDF
185. PROFIT: A Specialized Optimizer for Deep Fine Tuning
- Author
-
Chakravarthy, Anirudh S, Zheng, Shuai Kyle, Huang, Xin, Hemachandra, Sachithra, Zhang, Xiao, Chai, Yuning, and Chen, Zhao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-tuning pre-trained models has become invaluable in computer vision and robotics. Recent fine-tuning approaches focus on improving efficiency rather than accuracy by using a mixture of smaller learning rates or frozen backbones. To return the spotlight to model accuracy, we present PROFIT (Proximally Restricted Optimizer For Iterative Training), one of the first optimizers specifically designed for incrementally fine-tuning converged models on new tasks or datasets. Unlike traditional optimizers such as SGD or Adam, which make minimal assumptions due to random initialization, PROFIT leverages the structure of a converged model to regularize the optimization process, leading to improved results. By employing a simple temporal gradient orthogonalization process, PROFIT outperforms traditional fine-tuning methods across various tasks: image classification, representation learning, and large-scale motion prediction. Moreover, PROFIT is encapsulated within the optimizer logic, making it easily integrated into any training pipeline with minimal engineering effort. A new class of fine-tuning optimizers like PROFIT can drive advancements as fine-tuning and incremental training become increasingly prevalent, reducing reliance on costly model training from scratch., Comment: technical report
- Published
- 2024
186. Clustering and Runaway Merging in a Primordial Black Hole Dominated Universe
- Author
-
Holst, Ian, Krnjaic, Gordan, and Xiao, Huangyu
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,High Energy Physics - Phenomenology - Abstract
If primordial black holes (PBH) are present in the early universe, their contribution to the energy budget grows relative to that of radiation and generically becomes dominant unless the initial abundance is exponentially small. This black hole domination scenario is largely unconstrained for PBHs with masses $\lesssim 10^9\,\mathrm{g}$, which evaporate prior to Big Bang nucleosynthesis. However, if the era of PBH domination is sufficiently long, the PBHs form clusters and can merge appreciably within these objects. We calculate the population statistics of these clusters within the Press-Schechter formalism and find that, for a wide range of PBH masses and Hubble rates at the onset of PBH domination, the mergers within PBH clusters can exhibit runaway behavior, where the majority of the cluster will eventually form a single black hole with a mass much greater than the original PBH mass. These mergers can dramatically alter the PBH mass distribution and leave behind merged relic black holes that evaporate after Big Bang nucleosynthesis and yield various observational signatures, excluding parameter choices previously thought to be viable, Comment: 16 pages, 5 figures, 1 appendix
- Published
- 2024
187. The Structure, Populations and Kinematics of the Milky Way central and inner Bulge with OGLE, APOGEE and Gaia data
- Author
-
Han, Xiao, Wang, Hai-Feng, Carraro, Giovanni, López-Corredoira, Martín, Ting, Yuan-Sen, Luo, Yang-Ping, and Wang, Guan-Yu
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We present an analysis of the structure, kinematics, and chemo-dynamical properties of the Milky Way bulge using data from OGLE, APOGEE, and Gaia. Firstly, we identified 2,156 ab-type RR Lyrae stars (RRabs) from OGLE-IV, then through their apocenters derived from orbital integration, we distinguished three populations: the central bulge RRabs, the inner bulge RRabs and halo interlopers. Inner bulge RRabs kinematically align with the Galactic bar, while central bulge RRabs show slower rotation with lower velocity dispersion, which do not trace the bar. Higher velocity dispersion stars were identified as halo interlopers. Then, orbital analysis of 28,188 APOGEE Red Clump and Red Giant Branch stars revealed kinematic properties consistent with RRabs, and chemical abundance distribution displayed a bimodal stellar density pattern, suggesting complex star evolution histories and also slightly different star formation histories for the inner bulge and central bulge. The differences in the density distribution on the $|\mathrm{Z}|_{\text{max}}$-eccentricity plane for the central bulge, inner bulge, and halo regions are clearly detected. Finally, the chemodynamical analysis of 301,485 Gaia DR3 red giants without orbital integration indicated that metal-rich bulge stars form a bar-like structure, while metal-poor bulge stars are dominated by velocity dispersion. It is found that the classification of bulge stars based on orbital parameters, rather than solely on metallicity, provides a more accurate population separation. Our results also support that secular evolution of the Galactic disk is the primary origin of the bulge, and boxy/peanut (B-P) bulge population might be more dominant than X-shape bulge population., Comment: 24 pages, 20 figures
- Published
- 2024
188. BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks
- Author
-
Xiao, Yisong, Liu, Aishan, Zhang, Xinwei, Zhang, Tianyuan, Li, Tianlin, Liang, Siyuan, Liu, Xianglong, Liu, Yang, and Tao, Dacheng
- Subjects
Computer Science - Software Engineering - Abstract
Pre-trained large deep learning models are now serving as the dominant component for downstream middleware users and have revolutionized the learning paradigm, replacing the traditional approach of training from scratch locally. To reduce development costs, developers often integrate third-party pre-trained deep neural networks (DNNs) into their intelligent software systems. However, utilizing untrusted DNNs presents significant security risks, as these models may contain intentional backdoor defects resulting from the black-box training process. These backdoor defects can be activated by hidden triggers, allowing attackers to maliciously control the model and compromise the overall reliability of the intelligent software. To ensure the safe adoption of DNNs in critical software systems, it is crucial to establish a backdoor defect database for localization studies. This paper addresses this research gap by introducing BDefects4NN, the first backdoor defect database, which provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. In BDefects4NN, we define three defect injection rules and employ four representative backdoor attacks across four popular network architectures and three widely adopted datasets, yielding a comprehensive database of 1,654 backdoor-defected DNNs with four defect quantities and varying infected neurons. Based on BDefects4NN, we conduct extensive experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects. Additionally, we investigate backdoor-defected models in practical scenarios, specifically in lane detection for autonomous driving and large language models (LLMs), revealing potential threats and highlighting current limitations in precise defect localization., Comment: 11 pages, accepted by ICSE 2025
- Published
- 2024
189. Integrated simulation of cavity design and radiation transport codes (ACE3P + Geant4)
- Author
-
Ge, Lixin, Li, Zenghai, Ng, Cho-Kuen, Xiao, Liling, Ego, Hiroyasu, Enomoto, Yoshinori, Iwase, Hiroshi, Morikawa, Yu, and Yoshimoto, Takashi
- Subjects
Physics - Computational Physics ,Physics - Accelerator Physics - Abstract
A simulation workflow has been developed to study dark current (DC) radiation effects using ACE3P and Geant4. The integrated workflow interfaces particle data transfer and geometry between the electromagnetic (EM) cavity simulation code ACE3P and the radiation code Geant4, targeting large-scale problems using high-performance computing. The process begins by calculating the operating mode in the vacuum region of an accelerator structure and tracking field-emitted electrons influenced by the EM fields of the mode calculated by ACE3P. It then transfers particle data at the vacuum-wall interface for subsequent radiation calculations within the wall enclosure materials through Geant4 calculation. The whole integrated simulation workflow will be demonstrated through large-scale dark current radiation calculations for the KEK 56-cell traveling-wave structure, and the efficiency of performing these simulations on the NERSC supercomputer Perlmutter will be presented., Comment: 11 pages, 13 figures. Appears in the proceedings of the 14th International Computational Accelerator Physics Conference, 2-5 October 2024, Germany
- Published
- 2024
190. DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
- Author
-
Wu, Hao, Zhong, Zhihang, and Sun, Xiao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image captioning models often suffer from performance degradation when applied to novel datasets, as they are typically trained on domain-specific data. To enhance generalization in out-of-domain scenarios, retrieval-augmented approaches have garnered increasing attention. However, current methods face two key challenges: (1) image features used for retrieval are often optimized based on ground-truth (GT) captions, which represent the image from a specific perspective and are influenced by annotator biases, and (2) they underutilize the full potential of retrieved text, typically relying on raw captions or parsed objects, which fail to capture the full semantic richness of the data. In this paper, we propose Dive Into Retrieval (DIR), a method designed to enhance both the image-to-text retrieval process and the utilization of retrieved text to achieve a more comprehensive understanding of the visual content. Our approach introduces two key innovations: (1) diffusion-guided retrieval enhancement, where a pretrained diffusion model guides image feature learning by reconstructing noisy images, allowing the model to capture more comprehensive and fine-grained visual information beyond standard annotated captions; and (2) a high-quality retrieval database, which provides comprehensive semantic information to enhance caption generation, especially in out-of-domain scenarios. Extensive experiments demonstrate that DIR not only maintains competitive in-domain performance but also significantly improves out-of-domain generalization, all without increasing inference costs.
- Published
- 2024
191. RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments
- Author
-
Chen, Yuxing, Wei, Songlin, Xiao, Bowen, Lyu, Jiangran, Chen, Jiayi, Zhu, Feng, and Wang, He
- Subjects
Computer Science - Robotics - Abstract
For the task of hanging clothes, learning how to insert a hanger into a garment is crucial but has been seldom explored in robotics. In this work, we address the problem of inserting a hanger into various unseen garments that are initially laid out flat on a table. This task is challenging due to its long-horizon nature, the high degrees of freedom of the garments, and the lack of data. To simplify the learning process, we first propose breaking the task into several stages. Then, we formulate each stage as a policy learning problem and propose low-dimensional action parameterization. To overcome the challenge of limited data, we build our own simulator and create 144 synthetic clothing assets to effectively collect high-quality training data. Our approach uses single-view depth images and object masks as input, which mitigates the Sim2Real appearance gap and achieves high generalization capabilities for new garments. Extensive experiments in both simulation and the real world validate our proposed method. By training on various garments in the simulator, our method achieves a 75\% success rate with 8 different unseen garments in the real world., Comment: Project website: https://chen01yx.github.io/Robohanger_Index/
- Published
- 2024
192. Practitioners' Expectations on Log Anomaly Detection
- Author
-
Ma, Xiaoxue, Li, Yishu, Keung, Jacky, Yu, Xiao, Zou, Huiqi, Yang, Zhen, Sarro, Federica, and Barr, Earl T.
- Subjects
Computer Science - Software Engineering - Abstract
Log anomaly detection has become a common practice for software engineers to analyze software system behavior. Despite significant research efforts in log anomaly detection over the past decade, it remains unclear what are practitioners' expectations on log anomaly detection and whether current research meets their needs. To fill this gap, we conduct an empirical study, surveying 312 practitioners from 36 countries about their expectations on log anomaly detection. In particular, we investigate various factors influencing practitioners' willingness to adopt log anomaly detection tools. We then perform a literature review on log anomaly detection, focusing on publications in premier venues from 2014 to 2024, to compare practitioners' needs with the current state of research. Based on this comparison, we highlight the directions for researchers to focus on to develop log anomaly detection techniques that better meet practitioners' expectations.
- Published
- 2024
193. FreeCodec: A disentangled neural speech codec with fewer tokens
- Author
-
Zheng, Youqiang, Tu, Weiping, Kang, Yueteng, Chen, Jie, Zhang, Yike, Xiao, Li, Yang, Yuhong, and Ma, Long
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most works based on residual vector quantization perform worse with fewer tokens due to low coding efficiency for modeling complex coupled information. In this paper, we propose a neural speech codec named FreeCodec which employs a more effective encoding framework by decomposing intrinsic properties of speech into different components: 1) a global vector is extracted as the timbre information, 2) a prosody encoder with a long stride level is used to model the prosody information, 3) the content information is from a content encoder. Using different training strategies, FreeCodec achieves state-of-the-art performance in reconstruction and disentanglement scenarios. Results from subjective and objective experiments demonstrate that our framework outperforms existing methods., Comment: Submiited to ICASSP 2025.Code and Demo page:https://github.com/exercise-book-yq/FreeCodec
- Published
- 2024
194. Deep Learning Based Near-Field User Localization with Beam Squint in Wideband XL-MIMO Systems
- Author
-
Lei, Hao, Zhang, Jiayi, Xiao, Huahua, Ng, Derrick Wing Kwan, and Ai, Bo
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Information Theory - Abstract
Extremely large-scale multiple-input multiple-output (XL-MIMO) is gaining attention as a prominent technology for enabling the sixth-generation (6G) wireless networks. However, the vast antenna array and the huge bandwidth introduce a non-negligible beam squint effect, causing beams of different frequencies to focus at different locations. One approach to cope with this is to employ true-time-delay lines (TTDs)-based beamforming to control the range and trajectory of near-field beam squint, known as the near-field controllable beam squint (CBS) effect. In this paper, we investigate the user localization in near-field wideband XL-MIMO systems under the beam squint effect and spatial non-stationary properties. Firstly, we derive the expressions for Cram\'er-Rao Bounds (CRBs) for characterizing the performance of estimating both angle and distance. This analysis aims to assess the potential of leveraging CBS for precise user localization. Secondly, a user localization scheme combining CBS and beam training is proposed. Specifically, we organize multiple subcarriers into groups, directing beams from different groups to distinct angles or distances through the CBS to obtain the estimates of users' angles and distances. Furthermore, we design a user localization scheme based on a convolutional neural network model, namely ConvNeXt. This scheme utilizes the inputs and outputs of the CBS-based scheme to generate high-precision estimates of angle and distance. More importantly, our proposed ConvNeXt-based user localization scheme achieves centimeter-level accuracy in localization estimates.
- Published
- 2024
195. Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
- Author
-
Lai, Bolin, Juefei-Xu, Felix, Liu, Miao, Dai, Xiaoliang, Mehta, Nikhil, Zhu, Chenguang, Huang, Zeyi, Rehg, James M., Lee, Sangmin, Zhang, Ning, and Xiao, Tong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-guided image manipulation has experienced notable advancement in recent years. In order to mitigate linguistic ambiguity, few-shot learning with visual examples has been applied for instructions that are underrepresented in the training set, or difficult to describe purely in language. However, learning from visual prompts requires strong reasoning capability, which diffusion models are struggling with. To address this issue, we introduce a novel multi-modal autoregressive model, dubbed $\textbf{InstaManip}$, that can $\textbf{insta}$ntly learn a new image $\textbf{manip}$ulation operation from textual and visual guidance via in-context learning, and apply it to new query images. Specifically, we propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages -- learning and applying, which simplifies the complex problem into two easier tasks. We also introduce a relation regularization method to further disentangle image transformation features from irrelevant contents in exemplar images. Extensive experiments suggest that our method surpasses previous few-shot image manipulation models by a notable margin ($\geq$19% in human evaluation). We also find our model can be further boosted by increasing the number or diversity of exemplar images., Comment: 18 pages, 16 figures, 5 tables
- Published
- 2024
196. Generating AI Literacy MCQs: A Multi-Agent LLM Approach
- Author
-
Wang, Jiayi, Xiao, Ruiwei, and Tseng, Ying-Jui
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Artificial intelligence (AI) is transforming society, making it crucial to prepare the next generation through AI literacy in K-12 education. However, scalable and reliable AI literacy materials and assessment resources are lacking. To address this gap, our study presents a novel approach to generating multiple-choice questions (MCQs) for AI literacy assessments. Our method utilizes large language models (LLMs) to automatically generate scalable, high-quality assessment questions. These questions align with user-provided learning objectives, grade levels, and Bloom's Taxonomy levels. We introduce an iterative workflow incorporating LLM-powered critique agents to ensure the generated questions meet pedagogical standards. In the preliminary evaluation, experts expressed strong interest in using the LLM-generated MCQs, indicating that this system could enrich existing AI literacy materials and provide a valuable addition to the toolkit of K-12 educators.
- Published
- 2024
- Full Text
- View/download PDF
197. Brownian spin-locking effect
- Author
-
Zhang, Xiao, Chen, Peiyang, Li, Mei, Shi, Yuzhi, Hasman, Erez, Wang, Bo, and Chen, Xianfeng
- Subjects
Physics - Optics ,Condensed Matter - Disordered Systems and Neural Networks ,Physics - Applied Physics ,Physics - Biological Physics - Abstract
Brownian systems are characterized by spatiotemporal disorder, which arises from the erratic motion of particles driven by thermal fluctuations. When light interacts with such systems, it typically produces unpolarized and uncorrelated fields. Here, we report the observation of a large-scale spin-locking effect of light within a Brownian medium. In an observation direction perpendicular to the incident wave momentum, scattering naturally divides into two diffusion regions, each associated with an opposite spin from the Brownian nanoparticles. This effect arises from the intrinsic spin-orbit interactions of scattering from individual nanoparticles, which ubiquitously generate radiative spin fields that propagate through the Brownian medium with multiple incoherent scattering. It offers a novel experimental platform for exploring macroscale spin behaviors of diffused light, with potential applications in precision metrology for measuring various nanoparticle properties. Our findings may inspire the study of analogous phenomena for different waves from novel spin-orbit interactions in complex disordered systems., Comment: 48 pages, 20 figures
- Published
- 2024
198. Operator learning regularization for macroscopic permeability prediction in dual-scale flow problem
- Author
-
Runkel, Christina, Xiao, Sinan, Boullé, Nicolas, and Chen, Yang
- Subjects
Physics - Fluid Dynamics ,Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Physics - Computational Physics - Abstract
Liquid composites moulding is an important manufacturing technology for fibre reinforced composites, due to its cost-effectiveness. Challenges lie in the optimisation of the process due to the lack of understanding of key characteristic of textile fabrics - permeability. The problem of computing the permeability coefficient can be modelled as the well-known Stokes-Brinkman equation, which introduces a heterogeneous parameter $\beta$ distinguishing macropore regions and fibre-bundle regions. In the present work, we train a Fourier neural operator to learn the nonlinear map from the heterogeneous coefficient $\beta$ to the velocity field $u$, and recover the corresponding macroscopic permeability $K$. This is a challenging inverse problem since both the input and output fields span several order of magnitudes, we introduce different regularization techniques for the loss function and perform a quantitative comparison between them., Comment: 23 pages, 7 figures
- Published
- 2024
199. ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
- Author
-
Ye, Xubing, Gan, Yukang, Ge, Yixiao, Zhang, Xiao-Ping, and Tang, Yansong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large Vision Language Models (LVLMs) have achieved significant success across multi-modal tasks. However, the computational cost of processing long visual tokens can be prohibitively expensive on resource-limited devices. Previous methods have identified redundancy in visual tokens within the Large Language Model (LLM) decoder layers and have mitigated this by pruning tokens using a pre-defined or fixed ratio, thereby reducing computational overhead. Nonetheless, we observe that the impact of pruning ratio varies across different LLM layers and instances (image-prompt pairs). Therefore, it is essential to develop a layer-wise and instance-wise vision token pruning strategy to balance computational cost and model performance effectively. We propose ATP-LLaVA, a novel approach that adaptively determines instance-specific token pruning ratios for each LLM layer. Specifically, we introduce an Adaptive Token Pruning (ATP) module, which computes the importance score and pruning threshold based on input instance adaptively. The ATP module can be seamlessly integrated between any two LLM layers with negligible computational overhead. Additionally, we develop a Spatial Augmented Pruning (SAP) strategy that prunes visual tokens with both token redundancy and spatial modeling perspectives. Our approach reduces the average token count by 75% while maintaining performance, with only a minimal 1.9% degradation across seven widely used benchmarks. The project page can be accessed via https://yxxxb.github.io/ATP-LLaVA-page/., Comment: 11 pages, 4 figures
- Published
- 2024
200. Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
- Author
-
Wang, Haicheng, Ju, Chen, Lin, Weixiong, Xiao, Shuai, Chen, Mengting, Huang, Yixuan, Liu, Chang, Yao, Mingshuai, Lan, Jinsong, Chen, Ying, Liu, Qingwen, and Wang, Yanfeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In rapidly evolving field of vision-language models (VLMs), contrastive language-image pre-training (CLIP) has made significant strides, becoming foundation for various downstream tasks. However, relying on one-to-one (image, text) contrastive paradigm to learn alignment from large-scale messy web data, CLIP faces a serious myopic dilemma, resulting in biases towards monotonous short texts and shallow visual expressivity. To overcome these issues, this paper advances CLIP into one novel holistic paradigm, by updating both diverse data and alignment optimization. To obtain colorful data with low cost, we use image-to-text captioning to generate multi-texts for each image, from multiple perspectives, granularities, and hierarchies. Two gadgets are proposed to encourage textual diversity. To match such (image, multi-texts) pairs, we modify the CLIP image encoder into multi-branch, and propose multi-to-multi contrastive optimization for image-text part-to-part matching. As a result, diverse visual embeddings are learned for each image, bringing good interpretability and generalization. Extensive experiments and ablations across over ten benchmarks indicate that our holistic CLIP significantly outperforms existing myopic CLIP, including image-text retrieval, open-vocabulary classification, and dense visual tasks.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.