14,408 results on '"Li Ang"'
Search Results
52. Effects of cutting head load on fatigue life of bolter miner cutting arm
- Author
-
Li Ang, Wang Bukang, Wang Teng, Guo Zhifu, and Yan Zhaokun
- Subjects
fatigue analysis ,bolter miner ,cutting arm ,rigid flexible coupling simulation ,Materials of engineering and construction. Mechanics of materials ,TA401-492 - Abstract
In the operation of bolter miners, the cutting arm is an essential and weak part and its fatigue life directly affects its performance. This study aimed to investigate the influence of the cutting head load on the fatigue life of a cutting arm using the DEM-MFBD (Discrete Element Method-Multi Flexible Body Dynamics) bi-directional coupling technique. The EJM340 bolter miner was chosen as the research object, and a three-dimensional solid model of the bolter miner was built using the RecurDyn software. The cutting arm was flexibly modelled, and the tunnel model was built using the EDEM software. The motion parameters of the bolter miner and cutting head load were transferred through the bi-directional coupling interface to obtain the loads and stress parameters during the entire tunnel cutting process. Based on the stress-time variation, the fatigue life of the cutting arm was calculated, the overall damage and crack initiation locations were obtained, and the minimum number of cutting arm cycles was determined. The accuracy of the virtual model is verified through field experiments. The analysis results indicated that the crack emergence location and fatigue life obtained from the simulation were in agreement with the experimental results.
- Published
- 2023
- Full Text
- View/download PDF
53. The complete chloroplast genome sequence of Davidia involucrata
- Author
-
Li Ang and Hou Zhe
- Subjects
davidia involucrata ,chloroplast genome ,phylogenetic analysis ,Genetics ,QH426-470 - Abstract
Davidia involucrata Baill. is a kind of tertiary paleotropical plant floristic relic species unique to China. This rare plant is disappearing due to poor adaptability and serious poaching. Davidia involucrata has been listed as a national first-level protected wild plant, a unique genus plant unique to China, a relic plant, and a well-known ornamental plant in the world. It is a national-level protected plant. In this study, the complete chloroplast genome sequence of D. involucrata was characterized from Illumina pair-end sequencing. The chloroplast genome of D. involucrata was 158,118 bp in length, containing a large single-copy (LSC) region of 87,329 bp, a small single-copy (SSC) region of 18,869 bp, and two inverted repeat (IR) regions of 25,960 bp. The overall GC content is 37.80%, while the corresponding values of the LSC, SSC, and IR regions are 36.0%, 31.6%, and 43.1%, respectively. The genome contains 132 complete genes, including 86 protein-coding genes, 38 tRNA genes, and eight rRNA genes. Phylogenetic analysis based on complete chloroplast genomes showed that D. involucrata and Camptotheca acuminate clustered together as sisters to other related species.
- Published
- 2021
- Full Text
- View/download PDF
54. Numerical and Experimental Study on Seakeeping Performance of a High-Speed Trimaran with T-foil in Head Waves
- Author
-
Li Ang and Li Yunbo
- Subjects
trimaran ,t-foil ,urans ,longitudinal motion ,model test ,Naval architecture. Shipbuilding. Marine engineering ,VM1-989 - Abstract
The longitudinal motion characteristics of a slender trimaran equipped with and without a T-foil near the bow are investigated by experimental and numerical methods. Computational fluid dynamics ( CFD) method is used in this study. The seakeeping characteristics such as heave, pitch and vertical acceleration in head regular waves are analyzed in various wave conditions. Numerical simulations have been validated by comparisons with experimental tests. The influence of large wave amplitudes and size of T-foil on the longitudinal motion of trimaran are analyzed. The present systematic study demonstrates that the numerical results are in a reasonable agreement with the experimental data. The research implied that the longitudinal motion response values are greatly reduced with the use of T-foil.
- Published
- 2019
- Full Text
- View/download PDF
55. Key factors and developmental directions with regard to metal additive manufacturing
- Author
-
LI Ang, LIU Xue-feng, YU Bo, and YIN Bao-qiang
- Subjects
metal additive manufacturing technology ,metal additive manufacturing equipment ,metal additive manufacturing material ,metal additive manufacturing process ,critical factor ,development direction ,Mining engineering. Metallurgy ,TN1-997 ,Environmental engineering ,TA170-171 - Abstract
Metal additive manufacturing is a new type of material-forming technology characterized by its short process and near net shape. Equipment, material and process are critical factors which serve as the supporter, key, and foundation respectively in terms of the development of this technology. In this paper, the characteristics of the equipment, material, and process of the different representative technologies were summarized. The relations among metal additive manufacturing equipment, manufacturing material, and manufacturing process as well as their roles in the metal additive manufacturing technology were analyzed. The research status of the raw material supply system, forming system, and control system were reviewed. The typical microstructure and mechanical properties of metal additive manufacturing materials, such as titanium alloy, nickel alloy, aluminum alloy, and steel, were summarized. The effects of the manufacturing process parameters on residual stress, porosity, accuracy, and microstructure were discussed. Problems associated with the manufacturing equipment, such as high cost, limited forming size, and low forming efficiency were discussed along with the problems associated with the material, such as high production cost and poor applicability. Furthermore, problems associated with the metal additive manufacturing process, such as difficult matching of process parameters and severe thermal accumulation, were elaborated as well. Future developmental goals in metal additive manufacturing include: (a) reducing the cost of manufacturing equipment and material, (b) expanding the range of product forming size, (c) improving the product printing accuracy and forming efficiency, (d) expanding the types and application scope of metal additive manufacturing material, (e) reducing the difficulty in the matching of process parameters, (f) improving product quality and comprehensive performance, and (g) developing new types of metal additive manufacturing technologies.
- Published
- 2019
- Full Text
- View/download PDF
56. On Scaling Up 3D Gaussian Splatting Training
- Author
-
Zhao, Hexu, Weng, Haoyang, Lu, Daohan, Li, Ang, Li, Jinyang, Panda, Aurojit, and Xie, Saining
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,I.4.5 - Abstract
3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. As each Gaussian affects a small, dynamic subset of rendered pixels, Grendel employs sparse all-to-all communication to transfer the necessary Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training with multiple views. We explore various optimization hyperparameter scaling strategies and find that a simple sqrt(batch size) scaling rule is highly effective. Evaluations using large-scale, high-resolution scenes show that Grendel enhances rendering quality by scaling up 3DGS parameters across multiple GPUs. On the Rubble dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at: https://github.com/nyu-systems/Grendel-GS, Comment: Code: https://github.com/nyu-systems/Grendel-GS ; Project page: https://daohanlu.github.io/scaling-up-3dgs
- Published
- 2024
57. Forget but Recall: Incremental Latent Rectification in Continual Learning
- Author
-
Nguyen, Nghia D., Nguyen, Hieu Trung, Li, Ang, Pham, Hoang, Nguyen, Viet Anh, and Doan, Khoa D.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR. In a nutshell, ILR learns to propagate with correction (or rectify) the representation from the current trained DNN backward to the representation space of the old task, where performing predictive decisions is easier. This rectification process only employs a chain of small representation mapping networks, called rectifier units. Empirical experiments on several continual learning benchmarks, including CIFAR10, CIFAR100, and Tiny ImageNet, demonstrate the effectiveness and potential of this novel CL direction compared to existing representative CL methods.
- Published
- 2024
58. What Matters in Transformers? Not All Attention is Needed
- Author
-
He, Shwai, Sun, Guoheng, Shen, Zheyu, and Li, Ang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
While scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks, it also introduces redundant architectures, posing efficiency challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different architectures in transformers, such as MLP and Attention layers, is under-explored. In this work, we investigate redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. Surprisingly, despite the critical role of attention layers in distinguishing transformers from other architectures, we found that a large portion of these layers exhibit excessively high similarity and can be pruned without degrading performance. For instance, Llama-2-70B achieved a 48.4\% speedup with only a 2.4\% performance drop by pruning half of the attention layers. Furthermore, by tracing model checkpoints throughout the training process, we observed that attention layer redundancy is inherent and consistent across training stages. Additionally, we further propose a method that jointly drops Attention and MLP layers, allowing us to more aggressively drop additional layers. For instance, when dropping 31 layers (Attention + MLP), Llama-2-13B still retains 90\% of the performance on the MMLU task. Our work provides valuable insights for future network architecture design. The code is released at: \url{https://github.com/Shwai-He/LLM-Drop}., Comment: 15 pages, 13 figures, 6 tables
- Published
- 2024
59. PID: Prompt-Independent Data Protection Against Latent Diffusion Models
- Author
-
Li, Ang, Mo, Yichuan, Li, Mingjie, and Wang, Yisen
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts used by data protectors exactly match those employed by data exploiters. In this paper, we first empirically demonstrate that breaking this assumption, i.e., in cases where discrepancies exist between the textual conditions used by protectors and exploiters, could substantially reduce the effectiveness of these defenses. Furthermore, considering the visual encoder's independence from textual prompts, we delve into the visual encoder and thoroughly investigate how manipulating the visual encoder affects the few-shot fine-tuning process of LDMs. Drawing on these insights, we propose a simple yet effective method called \textbf{Prompt-Independent Defense (PID)} to safeguard privacy against LDMs. We show that PID can act as a strong privacy shield on its own while requiring significantly less computational power. We believe our studies, along with the comprehensive understanding and new defense method, provide a notable advance toward reliable data protection against LDMs., Comment: 27 pages, ICML 2024 poster
- Published
- 2024
60. Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
- Author
-
He, Shwai, Dong, Daize, Ding, Liang, and Li, Ang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Scaling large language models has revolutionized the performance across diverse domains, yet the continual growth in model size poses significant challenges for real-world deployment. The Mixture of Experts (MoE) approach addresses this by dynamically selecting and activating only a subset of experts, significantly reducing computational costs while maintaining high performance. However, MoE introduces potential redundancy (e.g., parameters) and extra costs (e.g., communication overhead). Despite numerous compression techniques developed for mitigating the redundancy in dense models, the compression of MoE remains under-explored. We first bridge this gap with a cutting-edge unified framework that not only seamlessly integrates mainstream compression methods but also helps systematically understand MoE compression. This framework approaches compression from two perspectives: Expert Slimming which compresses individual experts and Expert Trimming which removes structured modules. Within this framework, we explore the optimization space unexplored by existing methods,and further introduce aggressive Expert Trimming techniques, i.e., Layer Drop and Block Drop, to eliminate redundancy at larger scales. Based on these insights,we present a comprehensive recipe to guide practitioners in compressing MoE effectively. Extensive experimental results demonstrate the effectiveness of the compression methods under our framework and the proposed recipe, achieving a 6.05x speedup and only 20.0GB memory usage while maintaining over 92% of performance on Mixtral-8x7B. Code is released at \url{https://github.com/DaizeDong/Unified-MoE-Compression}., Comment: 20 pages, 15 figures, 5 tables
- Published
- 2024
61. A New Solution for MU-MISO Symbol-Level Precoding: Extrapolation and Deep Unfolding
- Author
-
Liang, Mu, Li, Ang, Hu, Xiaoyan, and Masouros, Christos
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordable in practical communication systems. In this paper, we propose a symbol-level extrapolation (SLE) strategy to extrapolate the precoding matrix by leveraging the relationship between different symbol slots within in a transmission block, during which the channel state information (CSI) remains constant, where we design a closed-form iterative algorithm based on SLE for both PSK and QAM modulation. In order to further reduce the computational complexity, a sub-optimal closed-form solution based on SLE is further developed for PSK and QAM, respectively. Moreover, we design an unsupervised SLE-based neural network (SLE-Net) to unfold the proposed iterative algorithm, which helps enhance the interpretability of the neural network. By carefully designing the loss function of the SLE-Net, the time-complexity of the network can be reduced effectively. Extensive simulation results illustrate that the proposed algorithms can dramatically reduce the computational complexity and time complexity with only marginal performance loss, compared with the conventional SLP design methods.
- Published
- 2024
62. Causality in the Can: Diet Coke's Impact on Fatness
- Author
-
Qi, Yicheng and Li, Ang
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence - Abstract
Artificially sweetened beverages like Diet Coke are often considered better alternatives to sugary drinks, but the debate over their impact on health, particularly in relation to obesity, continues. Previous research has predominantly used association-based methods with observational or Randomized Controlled Trial (RCT) data, which may not accurately capture the causal relationship between Diet Coke consumption and obesity, leading to potentially limited conclusions. In contrast, we employed causal inference methods using structural causal models, integrating both observational and RCT data. Specifically, we utilized data from the National Health and Nutrition Examination Survey (NHANES), which includes diverse demographic information, as our observational data source. This data was then used to construct a causal graph, and the back-door criterion, along with its adjustment formula, was applied to estimate the RCT data. We then calculated the counterfactual quantity, the Probability of Necessity and Sufficiency (PNS), using both NHANES data and estimated RCT data. We propose that PNS is the essential metric for assessing the impact of Diet Coke on obesity. Our results indicate that between 20 to 50 percent of individuals, especially those with poor dietary habits, are more likely to gain weight from Diet Coke. Conversely, in groups like young females with healthier diets, only a small proportion experience weight gain due to Diet Coke. These findings highlight the influence of individual lifestyle and potential hormonal factors on the varied effects of Diet Coke, providing a new framework for understanding its nutritional impacts on public health.
- Published
- 2024
63. Surf-Deformer: Mitigating Dynamic Defects on Surface Code via Adaptive Deformation
- Author
-
Yin, Keyi, Fang, Xiang, Shi, Yunong, Humble, Travis, Li, Ang, and Ding, Yufei
- Subjects
Quantum Physics - Abstract
In this paper, we introduce Surf-Deformer, a code deformation framework that seamlessly integrates adaptive defect mitigation functionality into the current surface code workflow. It crafts several basic deformation instructions based on fundamental gauge transformations, which can be combined to explore a larger design space than previous methods. This enables more optimized deformation processes tailored to specific defect situations, restoring the QEC capability of deformed codes more efficiently with minimal qubit resources. Additionally, we design an adaptive code layout that accommodates our defect mitigation strategy while ensuring efficient execution of logical operations. Our evaluation shows that Surf-Deformer outperforms previous methods by significantly reducing the end-to-end failure rate of various quantum programs by 35x to 70x, while requiring only about 50% of the qubit resources compared to the previous method to achieve the same level of failure rate. Ablation studies show that Surf-Deformer surpasses previous defect removal methods in preserving QEC capability and facilitates surface code communication by achieving nearly optimal throughput.
- Published
- 2024
64. Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System
- Author
-
Kan, Shuwen, Du, Zefan, Palma, Miguel, Stein, Samuel A, Liu, Chenxu, Wei, Wenqi, Chen, Juntao, Li, Ang, and Mao, Ying
- Subjects
Quantum Physics ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Despite quantum computing's rapid development, current systems remain limited in practical applications due to their limited qubit count and quality. Various technologies, such as superconducting, trapped ions, and neutral atom quantum computing technologies are progressing towards a fault tolerant era, however they all face a diverse set of challenges in scalability and control. Recent efforts have focused on multi-node quantum systems that connect multiple smaller quantum devices to execute larger circuits. Future demonstrations hope to use quantum channels to couple systems, however current demonstrations can leverage classical communication with circuit cutting techniques. This involves cutting large circuits into smaller subcircuits and reconstructing them post-execution. However, existing cutting methods are hindered by lengthy search times as the number of qubits and gates increases. Additionally, they often fail to effectively utilize the resources of various worker configurations in a multi-node system. To address these challenges, we introduce FitCut, a novel approach that transforms quantum circuits into weighted graphs and utilizes a community-based, bottom-up approach to cut circuits according to resource constraints, e.g., qubit counts, on each worker. FitCut also includes a scheduling algorithm that optimizes resource utilization across workers. Implemented with Qiskit and evaluated extensively, FitCut significantly outperforms the Qiskit Circuit Knitting Toolbox, reducing time costs by factors ranging from 3 to 2000 and improving resource utilization rates by up to 3.88 times on the worker side, achieving a system-wide improvement of 2.86 times.
- Published
- 2024
65. Benchmarking Optimizers for Qumode State Preparation with Variational Quantum Algorithms
- Author
-
Kan, Shuwen, Palma, Miguel, Du, Zefan, Stein, Samuel A, Liu, Chenxu, Chen, Juntao, Li, Ang, and Mao, Ying
- Subjects
Quantum Physics - Abstract
Quantum state preparation involves preparing a target state from an initial system, a process integral to applications such as quantum machine learning and solving systems of linear equations. Recently, there has been a growing interest in qumodes due to advancements in the field and their potential applications. However there is a notable gap in the literature specifically addressing this area. This paper aims to bridge this gap by providing performance benchmarks of various optimizers used in state preparation with Variational Quantum Algorithms. We conducted extensive testing across multiple scenarios, including different target states, both ideal and sampling simulations, and varying numbers of basis gate layers. Our evaluations offer insights into the complexity of learning each type of target state and demonstrate that some optimizers perform better than others in this context. Notably, the Powell optimizer was found to be exceptionally robust against sampling errors, making it a preferred choice in scenarios prone to such inaccuracies. Additionally, the Simultaneous Perturbation Stochastic Approximation optimizer was distinguished for its efficiency and ability to handle increased parameter dimensionality effectively.
- Published
- 2024
66. Design of an entanglement purification protocol selection module
- Author
-
Shi, Yue, Liu, Chenxu, Stein, Samuel, Wang, Meng, Zheng, Muqing, and Li, Ang
- Subjects
Quantum Physics - Abstract
Entanglement purification protocols, designed to improve the fidelity of Bell states over quantum networks for inter-node communications, have attracted significant attention over the last few decades. These protocols have great potential to resolve a core challenge in quantum networking of generating high-fidelity Bell states. However, previous studies focused on the theoretical discussion with limited consideration of realistic errors. Studies of dynamically selecting the right purification protocol under various realistic errors that populate in practice have yet to be performed. In this work, we study the performance of various purification protocols under realistic errors by conducting density matrix simulations over a large suite of error models. Based on our findings of how specific error channels affect the performance of purification protocols, we propose a module that can be embedded in the quantum network. This module determines and selects the appropriate purification protocol, considering not only expected specifications from the network layer but also the capabilities of the physical layer. Finally, the performance of our proposed module is verified using two benchmark categories. Compared with the default approach and exhaustive search approach, we show a success rate approaching 90% in identifying the optimal purification protocol for our target applications.
- Published
- 2024
67. An Early Investigation of the HHL Quantum Linear Solver for Scientific Applications
- Author
-
Zheng, Muqing, Liu, Chenxu, Stein, Samuel, Li, Xiangyu, Mülmenstädt, Johannes, Chen, Yousu, and Li, Ang
- Subjects
Quantum Physics - Abstract
In this paper, we explore using the Harrow-Hassidim-Lloyd (HHL) algorithm to address scientific and engineering problems through quantum computing utilizing the NWQSim simulation package on high-performance computing. Focusing on domains such as power-grid management and heat transfer problems, we demonstrate the correlations of the precision of quantum phase estimation, along with various properties of coefficient matrices, on the final solution and quantum resource cost in iterative and non-iterative numerical methods such as Newton-Raphson method and finite difference method, as well as their impacts on quantum error correction costs using Microsoft Azure Quantum resource estimator. We conclude the exponential resource cost from quantum phase estimation before and after quantum error correction and illustrate a potential way to reduce the demands on physical qubits. This work lays down a preliminary step for future investigations, urging a closer examination of quantum algorithms' scalability and efficiency in domain applications., Comment: 21 pages, 8 figures
- Published
- 2024
68. SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
- Author
-
He, Yexiao, Wang, Ziyao, Shen, Zheyu, Sun, Guoheng, Dai, Yucong, Wu, Yongkai, Wang, Hongyi, and Li, Ang
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality data from vast datasets to curate small yet effective datasets has emerged as a critical challenge. In this paper, we introduce SHED, an automated dataset refinement framework based on Shapley value for instruction fine-tuning. SHED eliminates the need for human intervention or the use of commercial LLMs. Moreover, the datasets curated through SHED exhibit transferability, indicating they can be reused across different LLMs with consistently high performance. We conduct extensive experiments to evaluate the datasets curated by SHED. The results demonstrate SHED's superiority over state-of-the-art methods across various tasks and LLMs; notably, datasets comprising only 10% of the original data selected by SHED achieve performance comparable to or surpassing that of the full datasets., Comment: NeurIPS 2024
- Published
- 2024
69. Numerical study of transitions in lid-driven flows in semicircular cavities
- Author
-
Pan, Tsorng-Whay, Li, Ang, and Chiu, Shang-Huan
- Subjects
Physics - Fluid Dynamics - Abstract
In this article, three-dimensional (3D) lid-driven flows in semicircular cavities are studied. The numerical solution of the Navier-Stokes equations modeling incompressible viscous fluid flow in cavities is obtained via a methodology combining a first-order accurate operator-splitting scheme, a fictitious domain formulation, and finite element space approximations. The critical Reynolds numbers (Re_{cr}) for having oscillatory flow (a Hopf bifurcation) are obtained. The associated oscillating motion in a semicircular cavity with length equal to width has been studied in detail. Based on the averaged velocity field in one period of oscillating motion, the flow difference (called oscillation mode) between the velocity field and averaged one at several time instances in such period shows almost the same flow pattern for the Reynolds numbers close to Re_{cr}. This oscillation mode in a semicircular cavity shows a close similarity to the one obtained in a shallow cavity, but with some difference in a shallow cavity which is triggered by the presence of two vertical side walls and downstream wall.
- Published
- 2024
70. TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC
- Author
-
Li, Ang, Liu, Chenxu, Stein, Samuel, Suh, In-Saeng, Zheng, Muqing, Wang, Meng, Shi, Yue, Fang, Bo, Roetteler, Martin, and Humble, Travis
- Subjects
Quantum Physics - Abstract
Although there have been remarkable advances in quantum computing (QC), it remains crucial to simulate quantum programs using classical large-scale parallel computing systems to validate quantum algorithms, comprehend the impact of noise, and develop resilient quantum applications. This is particularly important for bridging the gap between near-term noisy-intermediate-scale-quantum (NISQ) computing and future fault-tolerant quantum computing (FTQC). Nevertheless, current simulation methods either lack the capability to simulate noise, or simulate with excessive computational costs, or do not scale out effectively. In this paper, we propose TANQ-Sim, a full-scale density matrix based simulator designed to simulate practical deep circuits with both coherent and non-coherent noise. To address the significant computational cost associated with such simulations, we propose a new density-matrix simulation approach that enables TANQ-Sim to leverage the latest double-precision tensorcores (DPTCs) in NVIDIA Ampere and Hopper GPUs. To the best of our knowledge, this is the first application of double-precision tensorcores for non-AI/ML workloads. To optimize performance, we also propose specific gate fusion techniques for density matrix simulation. For scaling, we rely on the advanced GPU-side communication library NVSHMEM and propose effective optimization methods for enhancing communication efficiency. Evaluations on the NERSC Perlmutter supercomputer demonstrate the functionality, performance, and scalability of the simulator. We also present three case studies to showcase the practical usage of TANQ-Sim, including teleportation, entanglement distillation, and Ising simulation. TANQ-Sim will be released on GitHub., Comment: 14 pages, 12 figures, 4 tables
- Published
- 2024
71. MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers
- Author
-
Wu, Yizhuo, Li, Ang, Beikmirza, Mohammadreza, Singh, Gagan Deep, Chen, Qinyu, de Vreede, Leo C. N., Alavi, Morteza, and Gao, Chang
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP) neural networks that employ quantized low-precision fixed-point parameters for energy-efficient DPD. This approach reduces computational complexity and memory footprint, thereby lowering power consumption without compromising linearization efficacy. Applied to a 160MHz-BW 1024-QAM OFDM signal from a digital RF PA, MP-DPD gives no performance loss against 32-bit floating-point precision DPDs, while achieving -43.75 (L)/-45.27 (R) dBc in Adjacent Channel Power Ratio (ACPR) and -38.72 dB in Error Vector Magnitude (EVM). A 16-bit fixed-point-precision MP-DPD enables a 2.8X reduction in estimated inference power. The PyTorch learning and testing code is publicly available at \url{https://github.com/lab-emi/OpenDPD}., Comment: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)
- Published
- 2024
- Full Text
- View/download PDF
72. Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion
- Author
-
Li, Ang, Hu, Anning, Xi, Wei, Yu, Wenxian, and Zou, Danping
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hint Guidance, named SDG-Depth. Our network includes a deformable propagation module for generating a semi-dense hint map and a confidence map by propagating sparse hints using a learned deformable window. These maps then guide cost aggregation in stereo matching. To reduce the triangulation error in depth recovery from disparity, especially in distant regions, we introduce a disparity-depth conversion module. Our method is both accurate and efficient. The experimental results on benchmark tests show its superior performance. Our code is available at https://github.com/SJTU-ViSYS/SDG-Depth., Comment: Accepted in ICRA 2024. 8 pages, 6 figures
- Published
- 2024
73. Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration
- Author
-
He, Shwai, Li, Ang, and Chen, Tianlong
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision-Language Models (VLMs) integrate information from multiple modalities and have shown remarkable success across various tasks. However, deploying large-scale VLMs in resource-constrained scenarios is challenging. Pruning followed by finetuning offers a potential solution but remains underexplored for VLMs. This study addresses two key questions: how to distribute sparsity across different modality-specific models, and how to restore the performance of pruned sparse VLMs. Our preliminary studies identified two effective pruning settings: applying the same sparsity to both vision and language models, and pruning only the language models. While LoRA finetuning aims to restore sparse models, it faces challenges due to incompatibility with sparse models, disrupting the pruned sparsity. To overcome these issues, we propose SparseLoRA, which applies sparsity directly to LoRA weights. Our experimental results demonstrate significant improvements, including an 11.3\% boost under 2:4 sparsity and a 47.6\% enhancement under unstructured 70\% sparsity. Code is released at: \url{https://github.com/Shwai-He/VLM-Compression}.
- Published
- 2024
74. Switching from Premixed Insulin to Insulin Degludec/Insulin Aspart for the Management of Type 2 Diabetes Mellitus: Implications of a Real-World Study on Insulin Degludec Dosing
- Author
-
Wu, Yiming, Zhang, Junqing, and Li, Ang
- Published
- 2024
- Full Text
- View/download PDF
75. Study and application of wide-azimuth seismic anisotropy analysis and correction in shale reservoir in Gulong Sag, Songliao Basin, China
- Author
-
Zhang, Liyan and Li, Ang
- Published
- 2024
- Full Text
- View/download PDF
76. Integrating automatic order determination with response prediction error minimization for nonlinear subspace identification in structural dynamics
- Author
-
Jiang, Dong, Li, Ang, Wang, Yusheng, Xie, Shitao, Cao, Zhifu, and Zhu, Rui
- Published
- 2024
- Full Text
- View/download PDF
77. MEIKIN expression and its C-terminal phosphorylation by PLK1 is closely related the metaphase–anaphase transition by affecting cyclin B1 and Securin stabilization in meiotic oocyte
- Author
-
Fan, Li-Hua, Qi, Shu-Tao, Wang, Zhen-Bo, Ouyang, Ying-Chun, Lei, Wen-Long, Wang, Yue, Li, Ang, Wang, Feng, Li, Jian, Li, Li, Li, Yuan-Yuan, Hou, Yi, Schatten, Heide, Wang, Wei-Hua, Sun, Qing-Yuan, and Ou, Xiang-Hong
- Published
- 2024
- Full Text
- View/download PDF
78. Conditional minimum density power divergence estimator for self-exciting integer-valued threshold autoregressive models
- Author
-
Sun, Mingyu, Yang, Kai, and Li, Ang
- Published
- 2024
- Full Text
- View/download PDF
79. Recent Progress on Thermal Energy Storage for Coal-Fired Power Plant
- Author
-
Wang, Wei, Zhang, Jianyuan, Gu, Yi, Luo, Qing, Zhou, Guiqing, Li, Ang, Lu, Guozhong, Ma, Tingshan, Zhao, Yuanzhu, Chang, Yiming, and Xue, Zhaonan
- Published
- 2024
- Full Text
- View/download PDF
80. Nitrogen-doped amorphous monolayer carbon
- Author
-
Bai, Xiuhui, Hu, Pengfei, Li, Ang, Zhang, Youwei, Li, Aowen, Zhang, Guangjie, Xue, Yufeng, Jiang, Tianxing, Wang, Zezhou, Cui, Hanke, Kang, Jianxin, Zhao, Hewei, Gu, Lin, Zhou, Wu, Liu, Li-Min, Qiu, Xiaohui, and Guo, Lin
- Published
- 2024
- Full Text
- View/download PDF
81. An evolved artificial radical cyclase enables the construction of bicyclic terpenoid scaffolds via an H-atom transfer pathway
- Author
-
Chen, Dongping, Zhang, Xiang, Vorobieva, Anastassia Andreevna, Tachibana, Ryo, Stein, Alina, Jakob, Roman P., Zou, Zhi, Graf, Damian Alexander, Li, Ang, Maier, Timm, Correia, Bruno E., and Ward, Thomas R.
- Published
- 2024
- Full Text
- View/download PDF
82. Dronedarone inhibits the proliferation of esophageal squamous cell carcinoma through the CDK4/CDK6-RB1 axis in vitro and in vivo
- Author
-
Li, Bo, Zhang, Jing, Yu, Yin, Li, Yinhua, Chen, Yingying, Zhao, Xiaokun, Li, Ang, Zhao, Lili, Li, Mingzhu, Wang, Zitong, Lu, Xuebo, Wu, Wenjie, Zhang, Yueteng, Dong, Zigang, Liu, Kangdong, and Jiang, Yanan
- Published
- 2024
- Full Text
- View/download PDF
83. Concept drift detection methods based on different weighting strategies
- Author
-
Han, Meng, Mu, Dongliang, Li, Ang, Liu, Shujuan, and Gao, Zhihui
- Published
- 2024
- Full Text
- View/download PDF
84. Association Between Metal Exposure and Mitochondrial DNA Parameters and the Potential Mediation Effect of Oxidative Stress
- Author
-
Xu, Jing, Zhao, Meiduo, Ge, Xiaoyu, Liu, Xiaolin, Wei, Lanping, Li, Ang, Mei, Yayuan, Yin, Guohuan, Wu, Jingtao, and Xu, Qun
- Published
- 2024
- Full Text
- View/download PDF
85. Can we distinguish quark stars from neutron stars with measurements of global properties?
- Author
-
Li Ang
- Subjects
Physics ,QC1-999 - Abstract
The phase state of the dense stellar matter is an exciting topic in the area of nuclear astrophysics. It may be probed by observed properties of neutron stars from, for example, the currently operating satellites (NICER, Neutron star Interior Composition Explorer) and the gravitational-wave laser interferometers (Advanced LIGO, Virgo, and KAGRA). Based on our recent constrained parameter spaces of the equation of states of neutron stars and quark stars from LIGO/Virgo and NICER, we discuss the important role of an even-accurate determination of the stellar radius for distinguishing possible quark stars from neutron stars and our understanding of the QCD phase transition at finite density.
- Published
- 2022
- Full Text
- View/download PDF
86. Variation of Indoor Airflow Patterns under Dynamic Outdoor Wind Conditions in Large Space Naturally Ventilated Buildings
- Author
-
Lv Yuling, Yao Huimin, Li Ang, and Shen Xiong
- Subjects
Environmental sciences ,GE1-350 - Abstract
Understanding Indoor Airflow Patterns (IAPs) helps control air contaminants in large naturally ventilated buildings (NVBs). However, the effect of random and unpredictable changes in outdoor wind conditions (OWC) is a major contributor to the variation in IAPs in the NVBs, making the IAP uncontrollable. This study presents the results of field measurements and numerical simulation in a NBV to study the IAP variation characteristic under the dynamical OWC. The OWC data monitored in real time was processed with Kalman Filtering (KF) and the gradient method to decompose the data prior to being entered into the CFD solver. The trend was similar between the simulated and measured data of a full size NVB. In addition, the distribution of internal turbulence intensity varied widely depending on the spatial locations and time intervals. The variation in speeds in the vicinity of windbreaks greatly affected the variation in IAP on a certain frequency scale. The results not only prove that CFD simulation to be an efficient tool for the prediction of time-averaged IAP, but also initialize efficient measures for the control of IAQ in dynamic OWC of large space NVBs.
- Published
- 2022
- Full Text
- View/download PDF
87. A novel design of dehumidifier system in underground pumped storage power station
- Author
-
Yao Huimin, Li Ang, Lv Yuling, and Shen Xiong
- Subjects
pumped storage power station ,underground ventilation tunnel ,dehumidification system ,heat and moisture transfer ,cfd ,Environmental sciences ,GE1-350 - Abstract
The heavy water condensation problem has led to safety concern for the workers and electrical facilities in the underground pumped storage power station. As for current dehumidification scheme, heat and humidity load are all removed through the air conditioning unit, which has high energy consumption and low efficiency. As a large amount of moisture comes from the external fresh air, in this study, we set the dehumidification system in the underground ventilation tunnel. By using the air inlet channel to lighten the wet load, the heat and humidity parameters of the fresh air are processed as far as possible in the front of the air supply, so as to achieve the purpose of energy saving operation. In this study, numerical modelling was used to simulate the control scheme of the heat and humidity environment. Compared with the traditional method, the two new dehumidification schemes can take about 1/3 of the wet load of the main workshop, and the dehumidification efficiency is improved. Therefore, this study is expected to be a promising tool for reducing air humidity in underground workshops.
- Published
- 2022
- Full Text
- View/download PDF
88. Bounding the $K(p-1)$-local exotic Picard group at $p>3$
- Author
-
Bobkova, Irina, Lachmann, Andrea, Li, Ang, Lima, Alicia, Stojanoska, Vesna, and Zhang, Adela YiYu
- Subjects
Mathematics - Algebraic Topology - Abstract
In this paper, we bound the descent filtration of the exotic Picard group $\kappa_n$, for a prime number p>3 and n=p-1. Our method involves a detailed comparison of the Picard spectral sequence, the homotopy fixed point spectral sequence, and an auxiliary $\beta$-inverted homotopy fixed point spectral sequence whose input is the Farrell-Tate cohomology of the Morava stabilizer group. Along the way, we deduce that the K(n)-local Adams-Novikov spectral sequence for the sphere has a horizontal vanishing line at $3n^2+1$ on the $E_{2n^2+2}$-page. The same analysis also allows us to express the exotic Picard group of $K(n)$-local modules over the homotopy fixed points spectrum $\mathrm{E}_n^{hN}$, where N is the normalizer in $\mathbb{G}_n$ of a finite cyclic subgroup of order p, as a subquotient of a single continuous cohomology group $H^{2n+1}(N,\pi_{2n}\mathrm{E}_n)$.
- Published
- 2024
89. A Challenge Dataset and Effective Models for Conversational Stance Detection
- Author
-
Niu, Fuqiang, Yang, Min, Li, Ang, Zhang, Baoquan, Peng, Xiaojiang, and Zhang, Bowen
- Subjects
Computer Science - Computation and Language - Abstract
Previous stance detection studies typically concentrate on evaluating stances within individual instances, thereby exhibiting limitations in effectively modeling multi-party discussions concerning the same specific topic, as naturally transpire in authentic social media interactions. This constraint arises primarily due to the scarcity of datasets that authentically replicate real social media contexts, hindering the research progress of conversational stance detection. In this paper, we introduce a new multi-turn conversation stance detection dataset (called \textbf{MT-CSD}), which encompasses multiple targets for conversational stance detection. To derive stances from this challenging dataset, we propose a global-local attention network (\textbf{GLAN}) to address both long and short-range dependencies inherent in conversational data. Notably, even state-of-the-art stance detection methods, exemplified by GLAN, exhibit an accuracy of only 50.47\%, highlighting the persistent challenges in conversational stance detection. Furthermore, our MT-CSD dataset serves as a valuable resource to catalyze advancements in cross-domain stance detection, where a classifier is adapted from a different yet related target. We believe that MT-CSD will contribute to advancing real-world applications of stance detection research. Our source code, data, and models are available at \url{https://github.com/nfq729/MT-CSD}.
- Published
- 2024
90. AQM: A Refresh of the Abstract Qubit Model for Quantum Computing Co-design
- Author
-
Liu, Chenxu, Stein, Samuel A., Zheng, Muqing, Ang, James, and Li, Ang
- Subjects
Quantum Physics - Abstract
Qubits are the fundamental building blocks of quantum information science and applications, whose concept is widely utilized in both quantum physics and quantum computation. While the significance of qubits and their implementation in physical devices have been extensively examined, now is the right time to revisit this understanding. In this paper, we introduce an abstract qubit model (AQM), offering a mathematical framework for higher-level algorithms and applications, and setting forth criteria for lower-level physical devices to enable quantum computation. We first provide a comprehensive definition of "qubits", regarded as the foundational principle for quantum computing algorithms (bottom-up support), and examine their requisites for devices (top-down demand). We then investigate the feasibility of relaxing specific requirements, thereby broadening device support while considering techniques that tradeoff extra costs to counterbalance this relaxation. Lastly, we delve into the quantum applications that only require partial support of "qubits", and discuss the physical systems with limited support of the AQM but remain valuable in quantum applications. AQM may serve as an intermediate interface between quantum algorithms and devices, facilitating quantum algorithm-device co-design., Comment: 36 pages, 3 figures, 2 tables
- Published
- 2024
91. Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids
- Author
-
Li, Yanfei, Liu, Juejing, Zhao, Xiaodong, Liu, Wenjun, Geng, Tong, Li, Ang, and Zhang, Xin
- Subjects
Condensed Matter - Materials Science ,Computer Science - Machine Learning - Abstract
Traditional analysis of highly distorted micro-X-ray diffraction ({\mu}-XRD) patterns from hydrothermal fluid environments is a time-consuming process, often requiring substantial data preprocessing and labeled experimental data. This study demonstrates the potential of deep learning with a multitask learning (MTL) architecture to overcome these limitations. We trained MTL models to identify phase information in {\mu}-XRD patterns, minimizing the need for labeled experimental data and masking preprocessing steps. Notably, MTL models showed superior accuracy compared to binary classification CNNs. Additionally, introducing a tailored cross-entropy loss function improved MTL model performance. Most significantly, MTL models tuned to analyze raw and unmasked XRD patterns achieved close performance to models analyzing preprocessed data, with minimal accuracy differences. This work indicates that advanced deep learning architectures like MTL can automate arduous data handling tasks, streamline the analysis of distorted XRD patterns, and reduce the reliance on labor-intensive experimental datasets.
- Published
- 2024
92. HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback
- Author
-
Li, Ang, Xiao, Qiugen, Cao, Peng, Tang, Jian, Yuan, Yi, Zhao, Zijie, Chen, Xiaoyuan, Zhang, Liang, Li, Xiangyang, Yang, Kaitong, Guo, Weidong, Gan, Yukang, Yu, Xu, Wang, Daniell, and Shan, Ying
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Reinforcement Learning from AI Feedback (RLAIF) has the advantages of shorter annotation cycles and lower costs over Reinforcement Learning from Human Feedback (RLHF), making it highly efficient during the rapid strategy iteration periods of large language model (LLM) training. Using ChatGPT as a labeler to provide feedback on open-domain prompts in RLAIF training, we observe an increase in human evaluators' preference win ratio for model responses, but a decrease in evaluators' satisfaction rate. Analysis suggests that the decrease in satisfaction rate is mainly due to some responses becoming less helpful, particularly in terms of correctness and truthfulness, highlighting practical limitations of basic RLAIF. In this paper, we propose Hybrid Reinforcement Learning from AI Feedback (HRLAIF). This method enhances the accuracy of AI annotations for responses, making the model's helpfulness more robust in training process. Additionally, it employs AI for Red Teaming, further improving the model's harmlessness. Human evaluation results show that HRLAIF inherits the ability of RLAIF to enhance human preference for outcomes at a low cost while also improving the satisfaction rate of responses. Compared to the policy model before Reinforcement Learning (RL), it achieves an increase of 2.08\% in satisfaction rate, effectively addressing the issue of a decrease of 4.58\% in satisfaction rate after basic RLAIF., Comment: 18 pages, 7 figures
- Published
- 2024
93. Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
- Author
-
De Nadai, Marco, Fabbri, Francesco, Gigioli, Paul, Wang, Alice, Li, Ang, Silvestri, Fabrizio, Kim, Laura, Lin, Shawn, Radosavljevic, Vladan, Ghael, Sandeep, Nyhan, David, Bouchard, Hugues, Lalmas-Roelleke, Mounia, and Damianou, Andreas
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
In the ever-evolving digital audio landscape, Spotify, well-known for its music and talk content, has recently introduced audiobooks to its vast user base. While promising, this move presents significant challenges for personalized recommendations. Unlike music and podcasts, audiobooks, initially available for a fee, cannot be easily skimmed before purchase, posing higher stakes for the relevance of recommendations. Furthermore, introducing a new content type into an existing platform confronts extreme data sparsity, as most users are unfamiliar with this new content type. Lastly, recommending content to millions of users requires the model to react fast and be scalable. To address these challenges, we leverage podcast and music user preferences and introduce 2T-HGNN, a scalable recommendation system comprising Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model. This novel approach uncovers nuanced item relationships while ensuring low latency and complexity. We decouple users from the HGNN graph and propose an innovative multi-link neighbor sampler. These choices, together with the 2T component, significantly reduce the complexity of the HGNN model. Empirical evaluations involving millions of users show significant improvement in the quality of personalized recommendations, resulting in a +46% increase in new audiobooks start rate and a +23% boost in streaming rates. Intriguingly, our model's impact extends beyond audiobooks, benefiting established products like podcasts., Comment: To appear in The Web Conference 2024 proceedings
- Published
- 2024
94. From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction
- Author
-
Li, Ang, Chen, Qiangchao, Wu, Yiquan, Cai, Ming, Zhou, Xiang, Wu, Fei, and Kuang, Kun
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge's reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model's attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.
- Published
- 2024
95. Enhancing Court View Generation with Knowledge Injection and Guidance
- Author
-
Li, Ang, Wu, Yiquan, Liu, Yifei, Wu, Fei, Cai, Ming, and Kuang, Kun
- Subjects
Computer Science - Artificial Intelligence - Abstract
Court View Generation (CVG) is a challenging task in the field of Legal Artificial Intelligence (LegalAI), which aims to generate court views based on the plaintiff claims and the fact descriptions. While Pretrained Language Models (PLMs) have showcased their prowess in natural language generation, their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitations. In this paper, we present a novel approach, named Knowledge Injection and Guidance (KIG), designed to bolster CVG using PLMs. To efficiently incorporate domain knowledge during the training stage, we introduce a knowledge-injected prompt encoder for prompt tuning, thereby reducing computational overhead. Moreover, to further enhance the model's ability to utilize domain knowledge, we employ a generating navigator, which dynamically guides the text generation process in the inference stage without altering the model's architecture, making it readily transferable. Comprehensive experiments on real-world data demonstrate the effectiveness of our approach compared to several established baselines, especially in the responsivity of claims, where it outperforms the best baseline by 11.87%.
- Published
- 2024
96. FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
- Author
-
Li, Xinyi, Li, Ang, Fang, Bo, Swirydowicz, Katarzyna, Laguna, Ignacio, and Gopalakrishnan, Ganesh
- Subjects
Computer Science - Hardware Architecture - Abstract
NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during computations. This makes it impossible to reliably port codes across these differing accelerators. This paper contributes a collection of {\em Feature Targeted Tests for Numerical Properties} that that help determine these features across five floating-point formats, four rounding modes and additional that highlight the rounding behaviors and preservation of extra precision bits. To show the practical relevance of FTTN, we design a simple matrix-multiplication test designed with insights gathered from our feature-tests. We executed this very simple test on five platforms, producing different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875, and Hopper H100 produced 191.875. Our matrix multiplication tests employ patterns found in iterative refinement-based algorithms, highlighting the need to check for significant result variability when porting code across GPUs.
- Published
- 2024
97. A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity
- Author
-
L'Abbate, Ryan, D'Onofrio Jr., Anthony, Stein, Samuel, Chen, Samuel Yen-Chi, Li, Ang, Chen, Pin-Yu, Chen, Juntao, and Mao, Ying
- Subjects
Quantum Physics ,Computer Science - Artificial Intelligence - Abstract
Recent advancements have highlighted the limitations of current quantum systems, particularly the restricted number of qubits available on near-term quantum devices. This constraint greatly inhibits the range of applications that can leverage quantum computers. Moreover, as the available qubits increase, the computational complexity grows exponentially, posing additional challenges. Consequently, there is an urgent need to use qubits efficiently and mitigate both present limitations and future complexities. To address this, existing quantum applications attempt to integrate classical and quantum systems in a hybrid framework. In this study, we concentrate on quantum deep learning and introduce a collaborative classical-quantum architecture called co-TenQu. The classical component employs a tensor network for compression and feature extraction, enabling higher-dimensional data to be encoded onto logical quantum circuits with limited qubits. On the quantum side, we propose a quantum-state-fidelity-based evaluation function to iteratively train the network through a feedback loop between the two sides. co-TenQu has been implemented and evaluated with both simulators and the IBM-Q platform. Compared to state-of-the-art approaches, co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting. Additionally, it outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits., Comment: IEEE Transactions on Quantum Engineering
- Published
- 2024
- Full Text
- View/download PDF
98. Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting
- Author
-
Dai, Rong, Zhang, Yonggang, Li, Ang, Liu, Tongliang, Yang, Xun, and Han, Bo
- Subjects
Computer Science - Machine Learning - Abstract
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. In OFL, the server model is aggregated by distilling knowledge from all client models (the ensemble), which are also responsible for synthesizing samples for distillation. In this regard, advanced works show that the performance of the server model is intrinsically related to the quality of the synthesized data and the ensemble model. To promote OFL, we introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively. Specifically, Co-Boosting leverages the current ensemble model to synthesize higher-quality samples in an adversarial manner. These hard samples are then employed to promote the quality of the ensemble model by adjusting the ensembling weights for each client model. Consequently, Co-Boosting periodically achieves high-quality data and ensemble models. Extensive experiments demonstrate that Co-Boosting can substantially outperform existing baselines under various settings. Moreover, Co-Boosting eliminates the need for adjustments to the client's local training, requires no additional data or model transmission, and allows client models to have heterogeneous architectures., Comment: To be published in ICLR2024
- Published
- 2024
99. Ground-Fusion: A Low-cost Ground SLAM System Robust to Corner Cases
- Author
-
Yin, Jie, Li, Ang, Xi, Wei, Yu, Wenxian, and Zou, Danping
- Subjects
Computer Science - Robotics - Abstract
We introduce Ground-Fusion, a low-cost sensor fusion simultaneous localization and mapping (SLAM) system for ground vehicles. Our system features efficient initialization, effective sensor anomaly detection and handling, real-time dense color mapping, and robust localization in diverse environments. We tightly integrate RGB-D images, inertial measurements, wheel odometer and GNSS signals within a factor graph to achieve accurate and reliable localization both indoors and outdoors. To ensure successful initialization, we propose an efficient strategy that comprises three different methods: stationary, visual, and dynamic, tailored to handle diverse cases. Furthermore, we develop mechanisms to detect sensor anomalies and degradation, handling them adeptly to maintain system accuracy. Our experimental results on both public and self-collected datasets demonstrate that Ground-Fusion outperforms existing low-cost SLAM systems in corner cases. We release the code and datasets at https://github.com/SJTU-ViSYS/Ground-Fusion.
- Published
- 2024
100. Multi-modal Stance Detection: New Datasets and Model
- Author
-
Liang, Bin, Li, Ang, Zhao, Jingqian, Gui, Lin, Yang, Min, Yu, Yue, Wong, Kam-Fai, and Xu, Ruifeng
- Subjects
Computer Science - Computation and Language - Abstract
Stance detection is a challenging task that aims to identify public opinion from social media platforms with respect to specific targets. Previous work on stance detection largely focused on pure texts. In this paper, we study multi-modal stance detection for tweets consisting of texts and images, which are prevalent in today's fast-growing social media platforms where people often post multi-modal messages. To this end, we create five new multi-modal stance detection datasets of different domains based on Twitter, in which each example consists of a text and an image. In addition, we propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT), where target information is leveraged to learn multi-modal stance features from textual and visual modalities. Experimental results on our five benchmark datasets show that the proposed TMPT achieves state-of-the-art performance in multi-modal stance detection., Comment: ACL'24 Findings
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.