68,925 results on '"HUANG, WEI"'
Search Results
2. International comparison of optical frequencies with transportable optical lattice clocks
- Author
-
Clock, International, Networking, Oscillator, Collaboration, Amy-Klein, Anne, Benkler, Erik, Blondé, Pascal, Bongs, Kai, Cantin, Etienne, Chardonnet, Christian, Denker, Heiner, Dörscher, Sören, Feng, Chen-Hao, Gaudron, Jacques-Olivier, Gill, Patrick, Hill, Ian R, Huang, Wei, Johnson, Matthew Y H, Kale, Yogeshwar B, Katori, Hidetoshi, Klose, Joshua, Kronjäger, Jochen, Kuhl, Alexander, Targat, Rodolphe Le, Lisdat, Christian, Lopez, Olivier, Lücke, Tim, Mazouth, Maxime, Mukherjee, Shambo, Nosske, Ingo, Pointard, Benjamin, Pottie, Paul-Eric, Schioppo, Marco, Singh, Yeshpal, Stahl, Kilian, Takamoto, Masao, Tønnes, Mads, Tunesi, Jacob, Ushijima, Ichiro, and Vishwakarma, Chetan
- Subjects
Physics - Atomic Physics - Abstract
Optical clocks have improved their frequency stability and estimated accuracy by more than two orders of magnitude over the best caesium microwave clocks that realise the SI second. Accordingly, an optical redefinition of the second has been widely discussed, prompting a need for the consistency of optical clocks to be verified worldwide. While satellite frequency links are sufficient to compare microwave clocks, a suitable method for comparing high-performance optical clocks over intercontinental distances is missing. Furthermore, remote comparisons over frequency links face fractional uncertainties of a few $10^{-18}$ due to imprecise knowledge of each clock's relativistic redshift, which stems from uncertainty in the geopotential determined at each distant location. Here, we report a landmark campaign towards the era of optical clocks, where, for the first time, state-of-the-art transportable optical clocks from Japan and Europe are brought together to demonstrate international comparisons that require neither a high-performance frequency link nor information on the geopotential difference between remote sites. Conversely, the reproducibility of the clocks after being transported between countries was sufficient to determine geopotential height offsets at the level of 4 cm. Our campaign paves the way for redefining the SI second and has a significant impact on various applications, including tests of general relativity, geodetic sensing for geosciences, precise navigation, and future timing networks., Comment: 29 pages, 5 figures
- Published
- 2024
3. Identifiability Analysis of Linear ODE Systems with Hidden Confounders
- Author
-
Wang, Yuanyuan, Huang, Biwei, Huang, Wei, Geng, Xi, and Gong, Mingming
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
The identifiability analysis of linear Ordinary Differential Equation (ODE) systems is a necessary prerequisite for making reliable causal inferences about these systems. While identifiability has been well studied in scenarios where the system is fully observable, the conditions for identifiability remain unexplored when latent variables interact with the system. This paper aims to address this gap by presenting a systematic analysis of identifiability in linear ODE systems incorporating hidden confounders. Specifically, we investigate two cases of such systems. In the first case, latent confounders exhibit no causal relationships, yet their evolution adheres to specific functional forms, such as polynomial functions of time $t$. Subsequently, we extend this analysis to encompass scenarios where hidden confounders exhibit causal dependencies, with the causal structure of latent variables described by a Directed Acyclic Graph (DAG). The second case represents a more intricate variation of the first case, prompting a more comprehensive identifiability analysis. Accordingly, we conduct detailed identifiability analyses of the second system under various observation conditions, including both continuous and discrete observations from single or multiple trajectories. To validate our theoretical results, we perform a series of simulations, which support and substantiate our findings., Comment: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
- Published
- 2024
4. Narrow [O III] emission lines as a potential proxy for the evolutionary stage of quasars
- Author
-
Chen, Zhi-fu, Chen, Zhe-Geng, Peng, Xing-long, and Huang, Wei-rong
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Radio spectral shape of quasars can provide insight into the ages of quasars. We have compiled data for 1804 quasars with $z\lesssim1$ from the Sloan Digital Sky Survey (SDSS). Additionally, these quasars were also mapped by the Low-Frequency Array at 144 MHz and the Very Large Array Sky Survey at 3000 MHz. The radio spectral index, designated as $\alpha^{\rm 144}_{\rm 3000}$ (with $S(\nu)\propto\nu^\alpha$), is analyzed between 144 MHz and 3000 MHz as a proxy for the ages of quasars. We measure the [O III] $\lambda$5007 emission line in the SDSS spectra. A strong correlation was found between the equivalent width of the core component of the [O III] $\lambda$5007 emission line and $\alpha^{\rm 144}_{\rm 3000}$. This relationship suggests that the core component of the [O III] $\lambda$5007 emission line could potentially serve as a surrogate for the evolutionary stage of a quasar. The quasars at an early stage of evolutions tend to show weaker [O III] $\lambda$5007 emission, while older quasars exhibit stronger [O III] $\lambda$5007 emission.
- Published
- 2024
- Full Text
- View/download PDF
5. A Novel Investigation of the ATOMKI Anomaly
- Author
-
Dutta, Bhaskar, Huang, Wei-Chih, Hu, Bai-Shan, and Van de Water, Richard G.
- Subjects
High Energy Physics - Phenomenology ,Nuclear Theory - Abstract
ATOMKI nuclear anomaly has suggested a new BSM (Beyond the Standard Model) boson with mass $\sim17$ MeV emitted from excited nuclei and quickly decays into a pair of $e^+e^-$. In order to search for the new particle, we propose a new approach that utilizes the ongoing Coherent CAPTAIN-Mills (CCM) 10-ton LAr (liquid argon) detectors. The neutrons from the Lujan target can scatter inelastically by the PMT glass in the CCM detector can produce the new boson which solves the ATOMKI anomaly. The new boson can be detected from its decay to a $e^+e^-$ pair. We find that CCM probe a large area of the anomaly allowed parameter space. We also show the prediction for a 100 ton LAr detector., Comment: 6 pages, 3 figures
- Published
- 2024
6. Efficient Function Placement in Virtual Networks: An Online Learning Approach
- Author
-
Huang, Wei, Combes, Richard, Castel-Taleb, Hind, and Jouaber, Badii
- Subjects
Computer Science - Machine Learning ,Computer Science - Networking and Internet Architecture - Abstract
We propose a model for the virtual function placement problem and several novel algorithms using ideas based on multi-armed bandits. We prove that these algorithms learn the optimal placement policy rapidly, and their regret grows at a rate at most $O( N M \sqrt{T\ln T} )$ while respecting the feasibility constraints with high probability. We show through numerical experiments that those algorithms both have good practical performance and modest computational complexity. Using the proposed acceleration technique, they can be used to learn in large networks where computational power is limited. Our experiments are fully reproducible, and the code is publicly available., Comment: 19 pages
- Published
- 2024
7. MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
- Author
-
Huang, Wei, Liao, Yue, Liu, Jianhui, He, Ruifei, Tan, Haoru, Zhang, Shiming, Li, Hongsheng, Liu, Si, and Qi, Xiaojuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Mixture-of-Experts large language models (MoE-LLMs) marks a significant step forward of language models, however, they encounter two critical challenges in practice: 1) expert parameters lead to considerable memory consumption and loading latency; and 2) the current activated experts are redundant, as many tokens may only require a single expert. Motivated by these issues, we investigate the MoE-LLMs and make two key observations: a) different experts exhibit varying behaviors on activation reconstruction error, routing scores, and activated frequencies, highlighting their differing importance, and b) not all tokens are equally important -- only a small subset is critical. Building on these insights, we propose MC-MoE, a training-free Mixture-Compressor for MoE-LLMs, which leverages the significance of both experts and tokens to achieve an extreme compression. First, to mitigate storage and loading overheads, we introduce Pre-Loading Mixed-Precision Quantization, which formulates the adaptive bit-width allocation as a Linear Programming problem, where the objective function balances multi-factors reflecting the importance of each expert. Additionally, we develop Online Dynamic Pruning, which identifies important tokens to retain and dynamically select activated experts for other tokens during inference to optimize efficiency while maintaining performance. Our MC-MoE integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss, ensuring an optimal trade-off between performance and efficiency. Extensive experiments confirm the effectiveness of our approach. For instance, at 2.54 bits, MC-MoE compresses 76.6% of the model, with only a 3.8% average accuracy loss. During dynamic inference, we further reduce activated parameters by 15%, with a performance drop of less than 0.6%., Comment: 18 pages
- Published
- 2024
8. On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
- Author
-
Li, Bingrui, Huang, Wei, Han, Andi, Zhou, Zhanpeng, Suzuki, Taiji, Zhu, Jun, and Chen, Jianfei
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The Adam optimizer is widely used for transformer optimization in practice, which makes understanding the underlying optimization mechanisms an important problem. However, due to the Adam's complexity, theoretical analysis of how it optimizes transformers remains a challenging task. Fortunately, Sign Gradient Descent (SignGD) serves as an effective surrogate for Adam. Despite its simplicity, theoretical understanding of how SignGD optimizes transformers still lags behind. In this work, we study how SignGD optimizes a two-layer transformer -- consisting of a softmax attention layer with trainable query-key parameterization followed by a linear layer -- on a linearly separable noisy dataset. We identify four stages in the training dynamics, each exhibiting intriguing behaviors. Based on the training dynamics, we prove the fast convergence but poor generalization of the learned transformer on the noisy dataset. We also show that Adam behaves similarly to SignGD in terms of both optimization and generalization in this setting. Additionally, we find that the poor generalization of SignGD is not solely due to data noise, suggesting that both SignGD and Adam requires high-quality data for real-world tasks. Finally, experiments on synthetic and real-world datasets empirically support our theoretical results., Comment: preprint
- Published
- 2024
9. Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method
- Author
-
Pan, Bikang, Huang, Wei, and Shi, Ye
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Integrating pretrained vision-language foundation models like CLIP into federated learning has attracted significant attention for enhancing generalization across diverse tasks. Typically, federated learning of vision-language models employs prompt learning to reduce communication and computational costs, i.e., prompt-based federated learning. However, there is limited theoretical analysis to understand the performance of prompt-based federated learning. In this work, we construct a theoretical analysis framework for prompt-based federated learning via feature learning theory. Specifically, we monitor the evolution of signal learning and noise memorization in prompt-based federated learning, demonstrating that performance can be assessed by the ratio of task-relevant to task-irrelevant coefficients. Furthermore, we draw an analogy between income and risk in portfolio optimization and the task-relevant and task-irrelevant terms in feature learning. Leveraging inspiration from portfolio optimization that combining two independent assets will maintain the income while reducing the risk, we introduce two prompts: global prompt and local prompt to construct a prompt portfolio to balance the generalization and personalization. Consequently, we showed the performance advantage of the prompt portfolio and derived the optimal mixing coefficient. These theoretical claims have been further supported by empirical experiments.
- Published
- 2024
10. Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
- Author
-
Jiang, Jiarui, Huang, Wei, Zhang, Miao, Suzuki, Taiji, and Nie, Liqiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Transformers have demonstrated great power in the recent development of large foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary changes to the field of vision, achieving significant accomplishments on the experimental side. However, their theoretical capabilities, particularly in terms of generalization when trained to overfit training data, are still not fully understood. To address this gap, this work delves deeply into the benign overfitting perspective of transformers in vision. To this end, we study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model. By developing techniques that address the challenges posed by softmax and the interdependent nature of multiple weights in transformer optimization, we successfully characterized the training dynamics and achieved generalization in post-training. Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model. The theoretical results are further verified by experimental simulation.
- Published
- 2024
11. Initial release styles have limited effects on the hydrodynamic dynamics of a self-propelled fin in the unsteady wakes
- Author
-
Han, Peng, Zhang, Dong, Zhang, Jun-Duo, and Huang, Wei-Xi
- Subjects
Physics - Fluid Dynamics - Abstract
Living fish may suddenly encounter upstream obstacles, join the queue of the fish schooling, or detect upstream flow in advance, resulting in interactions with environmental vortices that can be abrupt or develop gradually from an initial state. The impact of initial conditions on fish swimming behavior in unsteady upstream vortices remains an open question. This study employs a self-propelled flexible fin model, the immersed boundary method, and direct simulation to analyze the hydrodynamics and locomotion of fish swimming behind a bluff cylinder and within the schooling, under different initial gaps and release styles. Additionally, the above tests were conducted with both quiescent flow fields and fully developed unsteady flows as initial conditions. The results indicate that schooling self-propelled fins are more resilient to initial perturbations compared to fins swimming behind a bluff body. More importantly, when simulations begin with a fully developed wake pattern, which better reflects natural environments, the characteristics of the self-propelled fins remain consistent regardless of the initial release styles. Therefore, from a hydrodynamic perspective, we conclude that initial release styles have limited effects on living fish in unsteady wakes.
- Published
- 2024
12. Privacy Evaluation Benchmarks for NLP Models
- Author
-
Huang, Wei, Wang, Yinggui, and Chen, Cen
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
By inducing privacy attacks on NLP models, attackers can obtain sensitive information such as training data and model parameters, etc. Although researchers have studied, in-depth, several kinds of attacks in NLP models, they are non-systematic analyses. It lacks a comprehensive understanding of the impact caused by the attacks. For example, we must consider which scenarios can apply to which attacks, what the common factors are that affect the performance of different attacks, the nature of the relationships between different attacks, and the influence of various datasets and models on the effectiveness of the attacks, etc. Therefore, we need a benchmark to holistically assess the privacy risks faced by NLP models. In this paper, we present a privacy attack and defense evaluation benchmark in the field of NLP, which includes the conventional/small models and large language models (LLMs). This benchmark supports a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluation of attacks and defense strategies. Based on the above framework, we present a study on the association between auxiliary data from different domains and the strength of privacy attacks. And we provide an improved attack method in this scenario with the help of Knowledge Distillation (KD). Furthermore, we propose a chained framework for privacy attacks. Allowing a practitioner to chain multiple attacks to achieve a higher-level attack objective. Based on this, we provide some defense and enhanced attack strategies. The code for reproducing the results can be found at https://github.com/user2311717757/nlp_doctor., Comment: Findings of EMNLP 2024
- Published
- 2024
13. LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs
- Author
-
Li, Siqing, Park, Jin-Duk, Huang, Wei, Cao, Xin, Shin, Won-Yong, and Xu, Zhiqiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Social and Information Networks - Abstract
Heterogeneous graph neural networks (HGNNs) have significantly propelled the information retrieval (IR) field. Still, the effectiveness of HGNNs heavily relies on high-quality labels, which are often expensive to acquire. This challenge has shifted attention towards Heterogeneous Graph Contrastive Learning (HGCL), which usually requires pre-defined meta-paths. However, our findings reveal that meta-path combinations significantly affect performance in unsupervised settings, an aspect often overlooked in current literature. Existing HGCL methods have considerable variability in outcomes across different meta-path combinations, thereby challenging the optimization process to achieve consistent and high performance. In response, we introduce \textsf{LAMP} (\underline{\textbf{L}}earn\underline{\textbf{A}}ble \underline{\textbf{M}}eta-\underline{\textbf{P}}ath), a novel adversarial contrastive learning approach that integrates various meta-path sub-graphs into a unified and stable structure, leveraging the overlap among these sub-graphs. To address the denseness of this integrated sub-graph, we propose an adversarial training strategy for edge pruning, maintaining sparsity to enhance model performance and robustness. \textsf{LAMP} aims to maximize the difference between meta-path and network schema views for guiding contrastive learning to capture the most meaningful information. Our extensive experimental study conducted on four diverse datasets from the Heterogeneous Graph Benchmark (HGB) demonstrates that \textsf{LAMP} significantly outperforms existing state-of-the-art unsupervised models in terms of accuracy and robustness., Comment: 19 pages, 7 figures
- Published
- 2024
14. Hardware Acceleration of Kolmogorov-Arnold Network (KAN) for Lightweight Edge Inference
- Author
-
Huang, Wei-Hsing, Jia, Jianwei, Kong, Yuyao, Waqar, Faaiq, Wen, Tai-Hao, Chang, Meng-Fan, and Yu, Shimeng
- Subjects
Computer Science - Hardware Architecture - Abstract
Recently, a novel model named Kolmogorov-Arnold Networks (KAN) has been proposed with the potential to achieve the functionality of traditional deep neural networks (DNNs) using orders of magnitude fewer parameters by parameterized B-spline functions with trainable coefficients. However, the B-spline functions in KAN present new challenges for hardware acceleration. Evaluating the B-spline functions can be performed by using look-up tables (LUTs) to directly map the B-spline functions, thereby reducing computational resource requirements. However, this method still requires substantial circuit resources (LUTs, MUXs, decoders, etc.). For the first time, this paper employs an algorithm-hardware co-design methodology to accelerate KAN. The proposed algorithm-level techniques include Alignment-Symmetry and PowerGap KAN hardware aware quantization, KAN sparsity aware mapping strategy, and circuit-level techniques include N:1 Time Modulation Dynamic Voltage input generator with analog-CIM (ACIM) circuits. The impact of non-ideal effects, such as partial sum errors caused by the process variations, has been evaluated with the statistics measured from the TSMC 22nm RRAM-ACIM prototype chips. With the best searched hyperparameters of KAN and the optimized circuits implemented in 22 nm node, we can reduce hardware area by 41.78x, energy by 77.97x with 3.03% accuracy boost compared to the traditional DNN hardware., Comment: Accepted at ASP-DAC (Asia and South Pacific Design Automation Conference)
- Published
- 2024
15. On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective
- Author
-
Wang, Li and Huang, Wei
- Subjects
Computer Science - Machine Learning - Abstract
Variational Auto-Encoders (VAEs) have emerged as powerful probabilistic models for generative tasks. However, their convergence properties have not been rigorously proven. The challenge of proving convergence is inherently difficult due to the highly non-convex nature of the training objective and the implementation of a Stochastic Neural Network (SNN) within VAE architectures. This paper addresses these challenges by characterizing the optimization trajectory of SNNs utilized in VAEs through the lens of Neural Tangent Kernel (NTK) techniques. These techniques govern the optimization and generalization behaviors of ultra-wide neural networks. We provide a mathematical proof of VAE convergence under mild assumptions, thus advancing the theoretical understanding of VAE optimization dynamics. Furthermore, we establish a novel connection between the optimization problem faced by over-parameterized SNNs and the Kernel Ridge Regression (KRR) problem. Our findings not only contribute to the theoretical foundation of VAEs but also open new avenues for investigating the optimization of generative models using advanced kernel methods. Our theoretical claims are verified by experimental simulations., Comment: Accepted by Machine Learning journal
- Published
- 2024
16. MultiCounter: Multiple Action Agnostic Repetition Counting in Untrimmed Videos
- Author
-
Tang, Yin, Luo, Wei, Zhang, Jinrui, Huang, Wei, Jing, Ruihai, and Zhang, Deyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-instance Repetitive Action Counting (MRAC) aims to estimate the number of repetitive actions performed by multiple instances in untrimmed videos, commonly found in human-centric domains like sports and exercise. In this paper, we propose MultiCounter, a fully end-to-end deep learning framework that enables simultaneous detection, tracking, and counting of repetitive actions of multiple human instances. Specifically, MultiCounter incorporates two novel modules: 1) mixed spatiotemporal interaction for efficient context correlation across consecutive frames, and 2) task-specific heads for accurate perception of periodic boundaries and generalization for action-agnostic human instances. We train MultiCounter on a synthetic dataset called MultiRep generated from annotated real-world videos. Experiments on the MultiRep dataset validate the fundamental challenge of MRAC tasks and showcase the superiority of our proposed model. Compared to ByteTrack+RepNet, a solution that combines an advanced tracker with a single repetition counter, MultiCounter substantially improves Period-mAP by 41.0%, reduces AvgMAE by 58.6%, and increases AvgOBO 1.48 times. This sets a new benchmark in the field of MRAC. Moreover, MultiCounter runs in real-time on a commodity GPU server and is insensitive to the number of human instances in a video., Comment: Accepted by ECAI 2024
- Published
- 2024
17. Spatio-Temporal Context Prompting for Zero-Shot Action Detection
- Author
-
Huang, Wei-Jhe, Chen, Min-Hung, and Lai, Shang-Hong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Spatio-temporal action detection encompasses the tasks of localizing and classifying individual actions within a video. Recent works aim to enhance this process by incorporating interaction modeling, which captures the relationship between people and their surrounding context. However, these approaches have primarily focused on fully-supervised learning, and the current limitation lies in the lack of generalization capability to recognize unseen action categories. In this paper, we aim to adapt the pretrained image-language models to detect unseen actions. To this end, we propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction. Meanwhile, our Context Prompting module will utilize contextual information to prompt labels, thereby enhancing the generation of more representative text features. Moreover, to address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism which employs pretrained visual knowledge to find each person's interest context tokens, and then these tokens will be used for prompting to generate text features tailored to each individual. To evaluate the ability to detect unseen actions, we propose a comprehensive benchmark on J-HMDB, UCF101-24, and AVA datasets. The experiments show that our method achieves superior results compared to previous approaches and can be further extended to multi-action videos, bringing it closer to real-world applications. The code and data can be found in https://webber2933.github.io/ST-CLIP-project-page., Comment: Project page: https://webber2933.github.io/ST-CLIP-project-page
- Published
- 2024
18. Doping-free Janus homojunction solar cell with efficiency exceeding 23%
- Author
-
Li, Lei, Yang, Zi-Xuan, Huang, Tao, Wan, Hui, Chen, Wu-Yu, Zhang, Tao, Huang, Gui-Fang, Hu, Wangyu, and Huang, Wei-Qing
- Subjects
Physics - Applied Physics - Abstract
Photovoltaic solar cell is one of the main renewable energy sources, and its power conversion efficiency (PCE) is improved by employing doping or heterojunction to reduce the photogenerated carrier recombination. Here, we propose a doping-free homojunction solar cell utilizing two-dimensional Janus semiconductors to achieve high PCE. Thanks to the intrinsic dipole of Janus structure, doping-free Janus homojunction has naturally not only a type-II band alignment to promote the photoexciton dissociation, but also a smaller effective bandgap to enhance light absorption. More importantly, the intrinsic electric field across the Janus structure will drive photoinduced electron and hole transfer from the interface to the opposite transport layers respectively, significantly enhancing the efficiency of carrier separation and transport. We illustrate the concept in titanium-based Janus monolayer homojunction, where the theoretically observed PCE reaches 23.22% of TiSSe homojunction. Our work opens a novel avenue to design low-cost, high-efficiency solar cells., Comment: 16 pages, 5 figures
- Published
- 2024
19. Improving Typhoon Predictions by Integrating Data-Driven Machine Learning Models with Physics Models Based on the Spectral Nudging and Data Assimilation
- Author
-
Niu, Zeyi, Huang, Wei, Zhang, Lei, Deng, Lin, Wang, Haibo, Yang, Yuhua, Wang, Dongliang, and Li, Hong
- Subjects
Physics - Atmospheric and Oceanic Physics - Abstract
With the rapid development of data-driven machine learning (ML) models in meteorology, typhoon track forecasts have become increasingly accurate. However, current ML models still face challenges, such as underestimating typhoon intensity and lacking interpretability. To address these issues, this study establishes an ML-driven hybrid typhoon model, where forecast fields from the Pangu-Weather model are used to constrain the large-scale forecasts of the Weather Research and Forecasting model based on the spectral nudging method (Pangu_SP). The results show that forecasts from the Pangu_SP experiment obviously outperform those by using the Global Forecast System as the initial field (GFS_INIT) and from the Integrated Forecasting System of the European Centre for Medium-Range Weather Forecasts (ECMWF IFS) for the track forecast of Typhoon Doksuri (2023). The predicted typhoon cloud patterns from Pangu_SP are also more consistent with satellite observations. Additionally, the typhoon intensity forecasts from Pangu_SP are notably more accurate than those from the ECMWF IFS, demonstrating that the hybrid model effectively leverages the strengths of both ML and physical models. Furthermore, this study is the first to explore the significance of data assimilation in ML-driven hybrid dynamical systems. The findings reveal that after assimilating water vapor channels from the Advanced Geostationary Radiation Imager onboard Fengyun-4B, the errors in typhoon intensity forecasts are reduced., Comment: 12 pages, 4 figures
- Published
- 2024
20. Cross-View Geolocalization and Disaster Mapping with Street-View and VHR Satellite Imagery: A Case Study of Hurricane IAN
- Author
-
Li, Hao, Deuser, Fabian, Yina, Wenping, Luo, Xuanshu, Walther, Paul, Mai, Gengchen, Huang, Wei, and Werner, Martin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Nature disasters play a key role in shaping human-urban infrastructure interactions. Effective and efficient response to natural disasters is essential for building resilience and a sustainable urban environment. Two types of information are usually the most necessary and difficult to gather in disaster response. The first information is about disaster damage perception, which shows how badly people think that urban infrastructure has been damaged. The second information is geolocation awareness, which means how people whereabouts are made available. In this paper, we proposed a novel disaster mapping framework, namely CVDisaster, aiming at simultaneously addressing geolocalization and damage perception estimation using cross-view Street-View Imagery (SVI) and Very High-Resolution satellite imagery. CVDisaster consists of two cross-view models, where CVDisaster-Geoloc refers to a cross-view geolocalization model based on a contrastive learning objective with a Siamese ConvNeXt image encoder, and CVDisaster-Est is a cross-view classification model based on a Couple Global Context Vision Transformer (CGCViT). Taking Hurricane IAN as a case study, we evaluate the CVDisaster framework by creating a novel cross-view dataset (CVIAN) and conducting extensive experiments. As a result, we show that CVDisaster can achieve highly competitive performance (over 80% for geolocalization and 75% for damage perception estimation) with even limited fine-tuning efforts, which largely motivates future cross-view models and applications within a broader GeoAI research community. The data and code are publicly available at: https://github.com/tum-bgd/CVDisaster.
- Published
- 2024
21. Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control
- Author
-
Huang, Chong, Chen, Gaojie, Xiao, Pei, Chambers, Jonathon A., and Huang, Wei
- Subjects
Computer Science - Information Theory ,Electrical Engineering and Systems Science - Signal Processing - Abstract
The space-air-ground integrated network (SAGIN) has become a crucial research direction in future wireless communications due to its ubiquitous coverage, rapid and flexible deployment, and multi-layer cooperation capabilities. However, integrating hierarchical federated learning (HFL) with edge computing and SAGINs remains a complex open issue to be resolved. This paper proposes a novel framework for applying HFL in SAGINs, utilizing aerial platforms and low Earth orbit (LEO) satellites as edge servers and cloud servers, respectively, to provide multi-layer aggregation capabilities for HFL. The proposed system also considers the presence of inter-satellite links (ISLs), enabling satellites to exchange federated learning models with each other. Furthermore, we consider multiple different computational tasks that need to be completed within a limited satellite service time. To maximize the convergence performance of all tasks while ensuring fairness, we propose the use of the distributional soft-actor-critic (DSAC) algorithm to optimize resource allocation in the SAGIN and aggregation weights in HFL. Moreover, we address the efficiency issue of hybrid action spaces in deep reinforcement learning (DRL) through a decoupling and recoupling approach, and design a new dynamic adjusting reward function to ensure fairness among multiple tasks in federated learning. Simulation results demonstrate the superiority of our proposed algorithm, consistently outperforming baseline approaches and offering a promising solution for addressing highly complex optimization problems in SAGINs., Comment: Accepted for publication in IEEE Journal on Selected Areas in Communications
- Published
- 2024
22. Integrated high-performance error correction for continuous-variable quantum key distribution
- Author
-
Zhou, Chuang, Li, Yang, Ma, Li, Yang, Jie, Huang, Wei, Sun, Ao, Wang, Heng, Luo, Yujie, Li, Yong, Chen, Ziyang, Lau, Francis C. M., Zhang, Yichen, Yu, Song, Guo, Hong, and Xu, Bingjie
- Subjects
Quantum Physics - Abstract
An integrated error-correction scheme with high throughput, low frame errors rate (FER) and high reconciliation efficiency under low signal to noise ratio (SNR) is one of the major bottlenecks to realize high-performance and low-cost continuous variable quantum key distribution (CV-QKD). To solve this long-standing problem, a novel two-stage error correction method with limited precision that is suitable for integration given limited on-chip hardware resource while maintaining excellent decoding performance is proposed, and experimentally verified on a commercial FPGA. Compared to state-of-art results, the error-correction throughput can be improved more than one order of magnitude given FER<0.1 based on the proposed method, where 544.03 Mbps and 393.33 Mbps real-time error correction is achieved for typical 0.2 and 0.1 code rate, respectively. Besides, compared with traditional decoding method, the secure key rate (SKR) for CV-QKD under composable security framework can be improved by 140.09% and 122.03% by using the proposed two-stage decoding method for codes rate 0.2 and 0.1, which can support 32.70 Mbps and 5.66 Mbps real-time SKR under typical transmission distances of 25 km and 50 km, correspondingly. The record-breaking results paves the way for large-scale deployment of high-rate integrated CV-QKD systems in metropolitan quantum secure network.
- Published
- 2024
23. SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator
- Author
-
Wong, Kwunhang, Wang, Songqi, Huang, Wei, Zhang, Xinyuan, He, Yangu, Lai, Karl M. H., Jiao, Yuzhong, Lin, Ning, Qi, Xiaojuan, Chen, Xiaoming, and Wang, Zhongrui
- Subjects
Computer Science - Cryptography and Security - Abstract
Biologically plausible Spiking Neural Networks (SNNs), characterized by spike sparsity, are growing tremendous attention over intellectual edge devices and critical bio-medical applications as compared to artificial neural networks (ANNs). However, there is a considerable risk from malicious attempts to extract white-box information (i.e., weights) from SNNs, as attackers could exploit well-trained SNNs for profit and white-box adversarial concerns. There is a dire need for intellectual property (IP) protective measures. In this paper, we present a novel secure software-hardware co-designed RRAM-based neuromorphic accelerator for protecting the IP of SNNs. Software-wise, we design a tailored genetic algorithm with classic XOR encryption to target the least number of weights that need encryption. From a hardware perspective, we develop a low-energy decryption module, meticulously designed to provide zero decryption latency. Extensive results from various datasets, including NMNIST, DVSGesture, EEGMMIDB, Braille Letter, and SHD, demonstrate that our proposed method effectively secures SNNs by encrypting a minimal fraction of stealthy weights, only 0.00005% to 0.016% weight bits. Additionally, it achieves a substantial reduction in energy consumption, ranging from x59 to x6780, and significantly lowers decryption latency, ranging from x175 to x4250. Moreover, our method requires as little as one sample per class in dataset for encryption and addresses hessian/gradient-based search insensitive problems. This strategy offers a highly efficient and flexible solution for securing SNNs in diverse applications., Comment: International Conference on Computer-Aided Design 2024
- Published
- 2024
- Full Text
- View/download PDF
24. A Novel Method to Improve Quality Surface Coverage in Multi-View Capture
- Author
-
Huang, Wei-Lun, Tashayyod, Davood, Gandjbakhche, Amir, Kazhdan, Michael, and Armand, Mehran
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The depth of field of a camera is a limiting factor for applications that require taking images at a short subject-to-camera distance or using a large focal length, such as total body photography, archaeology, and other close-range photogrammetry applications. Furthermore, in multi-view capture, where the target is larger than the camera's field of view, an efficient way to optimize surface coverage captured with quality remains a challenge. Given the 3D mesh of the target object and camera poses, we propose a novel method to derive a focus distance for each camera that optimizes the quality of the covered surface area. We first design an Expectation-Minimization (EM) algorithm to assign points on the mesh uniquely to cameras and then solve for a focus distance for each camera given the associated point set. We further improve the quality surface coverage by proposing a $k$-view algorithm that solves for the points assignment and focus distances by considering multiple views simultaneously. We demonstrate the effectiveness of the proposed method under various simulations for total body photography. The EM and $k$-view algorithms improve the relative cost of the baseline single-view methods by at least $24$% and $28$% respectively, corresponding to increasing the in-focus surface area by roughly $1550$ cm$^2$ and $1780$ cm$^2$. We believe the algorithms can be useful in a number of vision applications that require photogrammetric details but are limited by the depth of field., Comment: submitted version 1
- Published
- 2024
25. An Agile Adaptation Method for Multi-mode Vehicle Communication Networks
- Author
-
He, Shiwen, Chen, Kanghong, Huang, Shiyue, Huang, Wei, and An, Zhenyu
- Subjects
Computer Science - Networking and Internet Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks. To be specific, Markov decision process and reinforcement learning are applied to establish an agile adaptation mechanism for multi-mode communication devices according to the driving scenarios and business requirements. Then, Q-learning is used to train the agile adaptation reinforcement learning model and output the trained model. By learning the best actions to take in different states to maximize the cumulative reward, and avoiding the problem of poor adaptation effect caused by inaccurate delay measurement in unstable communication scenarios. The experiments show that the proposed scheme can quickly adapt to dynamic vehicle networking environment, while achieving high concurrency and communication efficiency.
- Published
- 2024
26. LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning
- Author
-
Huang, Wei, Liu, Wei, Zhang, Xiaoming, Yin, Xiaoli, Han, Xu, Li, Chunli, Gao, Yuan, Shi, Yu, Lu, Le, Zhang, Ling, Zhang, Lei, and Yan, Ke
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors. In this work, a precise LIver tumor DIAgnosis network on multi-phase contrast-enhance CT, named LIDIA, is proposed for real-world scenario. To fully utilize all available phases in contrast-enhanced CT, LIDIA first employs the iterative fusion module to aggregate variable numbers of image phases, thereby capturing the features of lesions at different phases for better tumor diagnosis. To effectively mitigate the high heterogeneity problem of liver tumors, LIDIA incorporates asymmetric contrastive learning to enhance the discriminability between different classes. To evaluate our method, we constructed a large-scale dataset comprising 1,921 patients and 8,138 lesions. LIDIA has achieved an average AUC of 93.6% across eight different types of lesions, demonstrating its effectiveness. Besides, LIDIA also demonstrated strong generalizability with an average AUC of 89.3% when tested on an external cohort of 828 patients., Comment: Accepted to MICCAI 2024
- Published
- 2024
27. Quantum Local Search for Traveling Salesman Problem with Path-Slicing Strategy
- Author
-
Liu, Chen-Yu, Matsuyama, Hiromichi, Huang, Wei-hao, and Yamashiro, Yu
- Subjects
Quantum Physics - Abstract
We present novel path-slicing strategies integrated with quantum local search to optimize solutions for the Traveling Salesman Problem (TSP), addressing the limitations of current Noisy Intermediate-Scale Quantum (NISQ) technologies. Our hybrid quantum-classical approach leverages classical path initialization and quantum optimization to effectively manage the computational challenges posed by the TSP. We explore various path slicing methods, including k-means and anti-k-means clustering, to divide the TSP into manageable subproblems. These are then solved using quantum or classical solvers. Our analysis, performed on multiple TSP instances from the TSPlib, demonstrates the ability of our strategies to achieve near-optimal solutions efficiently, highlighting significant improvements in solving efficiency and resource utilization. This approach paves the way for future applications in larger combinatorial optimization scenarios, advancing the field of quantum optimization., Comment: 5 pages, 4 figures
- Published
- 2024
28. Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task
- Author
-
Zhu, Mengxiao, Wang, Xin, Wang, Xiantao, Chen, Zihang, and Huang, Wei
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Collaborative problem solving (CPS) competence is considered one of the essential 21st-century skills. To facilitate the assessment and learning of CPS competence, researchers have proposed a series of frameworks to conceptualize CPS and explored ways to make sense of the complex processes involved in collaborative problem solving. However, encoding explicit behaviors into subskills within the frameworks of CPS skills is still a challenging task. Traditional studies have relied on manual coding to decipher behavioral data for CPS, but such coding methods can be very time-consuming and cannot support real-time analyses. Scholars have begun to explore approaches for constructing automatic coding models. Nevertheless, the existing models built using machine learning or deep learning techniques depend on a large amount of training data and have relatively low accuracy. To address these problems, this paper proposes a prompt-based learning pre-trained model. The model can achieve high performance even with limited training data. In this study, three experiments were conducted, and the results showed that our model not only produced the highest accuracy, macro F1 score, and kappa values on large training sets, but also performed the best on small training sets of the CPS behavioral data. The application of the proposed prompt-based learning pre-trained model contributes to the CPS skills coding task and can also be used for other CSCW coding tasks to replace manual coding.
- Published
- 2024
29. Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
- Author
-
Kuan, Chun-Yi, Yang, Chih-Kai, Huang, Wei-Ping, Lu, Ke-Han, and Lee, Hung-yi
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Sound - Abstract
In this work, we introduce Speech-Copilot, a modular framework for instruction-oriented speech-processing tasks that minimizes human effort in toolset construction. Unlike end-to-end methods using large audio-language models, Speech-Copilot builds speech processing-specific toolsets by analyzing pre-collected task instructions and breaking tasks into manageable sub-tasks. It features a flexible agent based on large language models that performs tasks through program generation. Our approach achieves state-of-the-art performance on the Dynamic-SUPERB benchmark, demonstrating its effectiveness across diverse speech-processing tasks. Key contributions include: 1) developing an innovative framework for speech processing-specific toolset construction, 2) establishing a high-performing agent based on large language models, and 3) offering a new perspective on addressing challenging instruction-oriented speech-processing tasks. Without additional training processes required by end-to-end approaches, our method provides a flexible and extendable solution for a wide range of speech-processing applications., Comment: Accepted to SLT 2024
- Published
- 2024
30. Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
- Author
-
Gao, Wanling, Huang, Yunyou, Cui, Dandan, Yu, Zhuoming, Liu, Wenjing, Liang, Xiaoshuang, Zhao, Jiahui, Xie, Jiyue, Li, Hao, Ma, Li, Ye, Ning, Kang, Yumiao, Luo, Dingfeng, Pan, Peng, Huang, Wei, Liu, Zhongmou, Hu, Jizhong, Zhao, Gangyuan, Jiang, Chongrong, Huang, Fan, Wei, Tianyi, Tang, Suqin, Xia, Bingjie, Zhang, Zhifei, and Zhan, Jianfeng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-step inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816., Comment: 24 pages
- Published
- 2024
31. Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory
- Author
-
Li, Yang, Luo, Yujie, Zhang, Yichen, Sun, Ao, Huang, Wei, Zhang, Shuai, Zhang, Tao, Zhou, Chuang, Ma, Li, Yang, Jie, Wu, Mei, Wang, Heng, Pan, Yan, Shao, Yun, Chen, Xing, Chen, Ziyang, Yu, Song, Guo, Hong, and Xu, Bingjie
- Subjects
Computer Science - Cryptography and Security - Abstract
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still missed for precision time synchronization. In this paper, a secure combination algorithm based on Dempster-Shafer theory is proposed for multiple paths method. Special optimizations are done for the combination algorithm to solve the potential problems due to untrusted evidence. Theoretical simulation shows that the proposed algorithm works much better than Fault Tolerant Algorithm (FTA) and the attack detection method based on single path. And experimental demonstration proves the feasibility and superiority of the proposed algorithm, where the time stability with 27.97 ps, 1.57 ps, and 1.12 ps at average time 1s, 10s, 100s is achieved under TDA and local clock jump. The proposed algorithm can be used to improve the security and resilience of many importance synchronization protocol, such as NTP, PTP, and TWFTT.
- Published
- 2024
32. The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs
- Author
-
Wang, Kun, Zhang, Guibin, Zhang, Xinnan, Fang, Junfeng, Wu, Xun, Li, Guohao, Pan, Shirui, Huang, Wei, and Liang, Yuxuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{for the first time}, we transfer the prevailing concept of ``one node one receptive field" to the heterophilic graph. By constructing a proxy label predictor, we enable each node to possess a latent prediction distribution, which assists connected nodes in determining whether they should aggregate their associated neighbors. Ultimately, every node can have its own unique aggregation hop and pattern, much like each snowflake is unique and possesses its own characteristics. Based on observations, we innovatively introduce the Heterophily Snowflake Hypothesis and provide an effective solution to guide and facilitate research on heterophilic graphs and beyond. We conduct comprehensive experiments including (1) main results on 10 graphs with varying heterophily ratios across 10 backbones; (2) scalability on various deep GNN backbones (SGC, JKNet, etc.) across various large number of layers (2,4,6,8,16,32 layers); (3) comparison with conventional snowflake hypothesis; (4) efficiency comparison with existing graph pruning algorithms. Our observations show that our framework acts as a versatile operator for diverse tasks. It can be integrated into various GNN frameworks, boosting performance in-depth and offering an explainable approach to choosing the optimal network depth. The source code is available at \url{https://github.com/bingreeky/HeteroSnoH}., Comment: KDD 2024
- Published
- 2024
33. Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
- Author
-
Lin, Guan-Ting, Huang, Wei-Ping, and Lee, Hung-yi
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Deep Learning-based end-to-end Automatic Speech Recognition (ASR) has made significant strides but still struggles with performance on out-of-domain samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we first propose a Fast-slow TTA framework for ASR that leverages the advantage of continual and non-continual TTA. Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA robustness for time-varying data, we design a dynamic reset strategy to automatically detect domain shifts and reset the model, making it more effective at handling multi-domain data. Our method demonstrates superior performance on various noisy ASR datasets, outperforming both non-continual and continual TTA baselines while maintaining robustness to domain changes without requiring domain boundary information., Comment: Accepted by EMNLP 2024
- Published
- 2024
34. EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
- Author
-
Li, Yuan-Ming, Huang, Wei-Jin, Wang, An-Lan, Zeng, Ling-An, Meng, Jing-Ke, and Zheng, Wei-Shi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main., Comment: Accepted by ECCV2024
- Published
- 2024
35. A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification
- Author
-
Huang, Wei, Wang, Ning, Feng, Panpan, Wang, Haiyan, Wang, Zongmin, and Zhou, Bing
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning - Abstract
Electrocardiograms (ECG), which record the electrophysiological activity of the heart, have become a crucial tool for diagnosing these diseases. In recent years, the application of deep learning techniques has significantly improved the performance of ECG signal classification. Multi-resolution feature analysis, which captures and processes information at different time scales, can extract subtle changes and overall trends in ECG signals, showing unique advantages. However, common multi-resolution analysis methods based on simple feature addition or concatenation may lead to the neglect of low-resolution features, affecting model performance. To address this issue, this paper proposes the Multi-Resolution Mutual Learning Network (MRM-Net). MRM-Net includes a dual-resolution attention architecture and a feature complementary mechanism. The dual-resolution attention architecture processes high-resolution and low-resolution features in parallel. Through the attention mechanism, the high-resolution and low-resolution branches can focus on subtle waveform changes and overall rhythm patterns, enhancing the ability to capture critical features in ECG signals. Meanwhile, the feature complementary mechanism introduces mutual feature learning after each layer of the feature extractor. This allows features at different resolutions to reinforce each other, thereby reducing information loss and improving model performance and robustness. Experiments on the PTB-XL and CPSC2018 datasets demonstrate that MRM-Net significantly outperforms existing methods in multi-label ECG classification performance. The code for our framework will be publicly available at https://github.com/wxhdf/MRM.
- Published
- 2024
36. Impacts of Backside Insulation on the Dynamic On-Resistance of Lateral p-GaN HEMTs-on-Si
- Author
-
Wang, Yu-Xuan, Tai, Mao-Chou, Chang, Ting-Chang, Huang, Wei-Chen, Wan, Zeyu, Li, Simon, Sze, Simon, and Xia, Guangrui
- Subjects
Physics - Applied Physics - Abstract
We examined the effect of backside insulation on the dynamic on-resistance of lateral p-GaN HEMTs. To gain a comprehensive understanding of the dynamic onresistance difference between substrate grounded and substrate floating p-GaN HEMTs, we conducted in-circuit double pulse testing and long-term direct current (DC) bias stress. We have realized that while backside insulation can enhance the breakdown voltage of lateral p-GaN HEMTs, it also comes with a tradeoff in device reliability. Results through Sentaurus TCAD simulation suggest that the use of backside insulation in devices gradually disperses potential to the buffer barrier. As a result, the potential barrier at the buffer edge of the 2DEG channel decreases significantly, leading to considerable electron trappings at buffer traps. This breakdown voltage and reliability tradeoff also applies to HEMT technologies using insulating substrates.
- Published
- 2024
37. Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
- Author
-
Kuan, Chun-Yi, Huang, Wei-Ping, and Lee, Hung-yi
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Large audio-language models (LALMs) enhance traditional large language models by integrating audio perception capabilities, allowing them to tackle audio-related tasks. Previous research has primarily focused on assessing the performance of LALMs across various tasks, yet overlooking their reliability, particularly concerning issues like object hallucination. In our study, we introduce methods to assess the extent of object hallucination of publicly available LALMs. Our findings reveal that LALMs are comparable to specialized audio captioning models in their understanding of audio content, but struggle to answer discriminative questions, specifically those requiring the identification of the presence of particular object sounds within an audio clip. This limitation highlights a critical weakness in current LALMs: their inadequate understanding of discriminative queries. Moreover, we explore the potential of prompt engineering to enhance LALMs' performance on discriminative questions., Comment: Accepted to Interspeech 2024
- Published
- 2024
38. AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database
- Author
-
Gao, Wanling, Liu, Yuan, Yu, Zhuoming, Cui, Dandan, Liu, Wenjing, Liang, Xiaoshuang, Zhao, Jiahui, Xie, Jiyue, Li, Hao, Ma, Li, Ye, Ning, Kang, Yumiao, Luo, Dingfeng, Pan, Peng, Huang, Wei, Liu, Zhongmou, Hu, Jizhong, Huang, Fan, Zhao, Gangyuan, Jiang, Chongrong, Wei, Tianyi, Zhang, Zhifei, Huang, Yunyou, and Zhan, Jianfeng
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment., Comment: 12 pages
- Published
- 2024
39. Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
- Author
-
Bu, Dake, Huang, Wei, Suzuki, Taiji, Cheng, Ji, Zhang, Qingfu, Xu, Zhiqiang, and Wong, Hau-San
- Subjects
Computer Science - Machine Learning - Abstract
Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this work, we try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view. Specifically, we consider a feature-noise data model comprising easy-to-learn or hard-to-learn features disrupted by noise, and conduct analysis over 2-layer NN-based NALs in the pool-based scenario. We provably show that both uncertainty-based and diversity-based NAL are inherently amenable to one and the same principle, i.e., striving to prioritize samples that contain yet-to-be-learned features. We further prove that this shared principle is the key to their success-achieve small test error within a small labeled set. Contrastingly, the strategy-free passive learning exhibits a large test error due to the inadequate learning of yet-to-be-learned features, necessitating resort to a significantly larger label complexity for a sufficient test error reduction. Experimental results validate our findings., Comment: Accepted by the 41th Intemational Conference on Machine Learning (lCML 2024)
- Published
- 2024
40. Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
- Author
-
Han, Haoyu, Li, Juanhui, Huang, Wei, Tang, Xianfeng, Lu, Hanqing, Luo, Chen, Liu, Hui, and Tang, Jiliang
- Subjects
Computer Science - Machine Learning - Abstract
Graph Neural Networks (GNNs) have proven to be highly effective for node classification tasks across diverse graph structural patterns. Traditionally, GNNs employ a uniform global filter, typically a low-pass filter for homophilic graphs and a high-pass filter for heterophilic graphs. However, real-world graphs often exhibit a complex mix of homophilic and heterophilic patterns, rendering a single global filter approach suboptimal. In this work, we theoretically demonstrate that a global filter optimized for one pattern can adversely affect performance on nodes with differing patterns. To address this, we introduce a novel GNN framework Node-MoE that utilizes a mixture of experts to adaptively select the appropriate filters for different nodes. Extensive experiments demonstrate the effectiveness of Node-MoE on both homophilic and heterophilic graphs.
- Published
- 2024
41. SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
- Author
-
Han, Andi, Li, Jiaxiang, Huang, Wei, Hong, Mingyi, Takeda, Akiko, Jawanpuria, Pratik, and Mishra, Bamdev
- Subjects
Computer Science - Machine Learning - Abstract
Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.
- Published
- 2024
42. Effects of Humic Acid-Copper Interactions on Growth, Nutrient Absorption, and Photosynthetic Performance of Citrus sinensis Seedlings in Sand Culture
- Author
-
Huang, Wei-Tao, Shen, Qian, Yang, Hui, Chen, Xu-Feng, Huang, Wei-Lin, Wu, Han-Xue, Lai, Ning-Wei, Yang, Lin-Tong, Huang, Zeng-Rong, and Chen, Li-Song
- Published
- 2024
- Full Text
- View/download PDF
43. A Rusty but Provocative Knife? The Rationale behind China's Sanction Usage
- Author
-
Huang, Wei-Hao
- Subjects
economic sanctions from china ,south korea ,taiwan ,bureaucratic competition ,Social sciences and state - Asia (Asian studies only) ,H53 - Abstract
China has initiated a series of "economic sanctions" against South Korea, affecting Korean pop stars visiting China and Korean investments in China. Sanctions were imposed on South Korea in response to the decision of South Korea to deploy Terminal High Altitude Area Defense (THAAD) in 2016. Furthermore, the Global Daily assembled local population to boycott Korean products and investments in China. However, the Chinese Foreign Ministry has never positively confirmed these activities as economic sanctions to South Korea related to the THAAD installation. In other words, the Chinese government singled a relatively weak message via these sanctions to South Korea. As a result, the THADD implementation continued in South Korea. In the paper, I interpret China's rationale to impost puzzling economic sanctions, which have a weak resolution, to South Korea and Taiwan. As signaling theory argues, economic sanctions with insufficient resolution, which are more likely to fail, is a more provocative foreign policy. By reviewing China's sanctions usage to South Korea and Taiwan, I propose arguments of bureaucratic competition to answer why China launched such sanctions to other countries: those are caused by domestic institutions who are seeking reward from the Communist Party of China. By comparing shifts of leadership between domestic agencies, the paper provides evidence to support the proposed argument. I also include two alternative explanations to strengthen the proposed argument, albeit connecting the paper with other two larger streams of research, which address analyses of China's aggressive foreign policies as well as the domestic politics of economic sanctions.
- Published
- 2019
- Full Text
- View/download PDF
44. Single-stranded pre-methylated 5mC adapters uncover the methylation profile of plasma ultrashort Single-stranded cell-free DNA
- Author
-
Cheng, Jordan C, Swarup, Neeti, Morselli, Marco, Huang, Wei-Lun, Aziz, Mohammad, Caggiano, Christa, Kordi, Misagh, Patel, Abhijit A, Chia, David, Kim, Yong, Li, Feng, Wei, Fang, Zaitlen, Noah, Krysan, Kostyantyn, Dubinett, Steve, Pellegrini, Matteo, and Wong, David TW
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Cancer ,Genetic Testing ,Cancer Genomics ,DNA Methylation ,Humans ,Cell-Free Nucleic Acids ,CpG Islands ,DNA ,Single-Stranded ,5-Methylcytosine ,Lung Neoplasms ,Sulfites ,Promoter Regions ,Genetic ,Sequence Analysis ,DNA ,Whole Genome Sequencing ,Environmental Sciences ,Information and Computing Sciences ,Developmental Biology ,Biological sciences ,Chemical sciences ,Environmental sciences - Abstract
Whole-genome bisulfite sequencing (BS-Seq) measures cytosine methylation changes at single-base resolution and can be used to profile cell-free DNA (cfDNA). In plasma, ultrashort single-stranded cfDNA (uscfDNA, ∼50 nt) has been identified together with 167 bp double-stranded mononucleosomal cell-free DNA (mncfDNA). However, the methylation profile of uscfDNA has not been described. Conventional BS-Seq workflows may not be helpful because bisulfite conversion degrades larger DNA into smaller fragments, leading to erroneous categorization as uscfDNA. We describe the '5mCAdpBS-Seq' workflow in which pre-methylated 5mC (5-methylcytosine) single-stranded adapters are ligated to heat-denatured cfDNA before bisulfite conversion. This method retains only DNA fragments that are unaltered by bisulfite treatment, resulting in less biased uscfDNA methylation analysis. Using 5mCAdpBS-Seq, uscfDNA had lower levels of DNA methylation (∼15%) compared to mncfDNA and was enriched in promoters and CpG islands. Hypomethylated uscfDNA fragments were enriched in upstream transcription start sites (TSSs), and the intensity of enrichment was correlated with expressed genes of hemopoietic cells. Using tissue-of-origin deconvolution, we inferred that uscfDNA is derived primarily from eosinophils, neutrophils, and monocytes. As proof-of-principle, we show that characteristics of the methylation profile of uscfDNA can distinguish non-small cell lung carcinoma from non-cancer samples. The 5mCAdpBS-Seq workflow is recommended for any cfDNA methylation-based investigations.
- Published
- 2024
45. On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
- Author
-
Zheng, Chenyu, Huang, Wei, Wang, Rongzhen, Wu, Guoqiang, Zhu, Jun, and Li, Chongxuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objective function in-context. However, whether the practical non-convex training dynamics will converge to the ideal mesa-optimizer is still unclear. Towards filling this gap, we investigate the non-convex dynamics of a one-layer linear causal self-attention model autoregressively trained by gradient flow, where the sequences are generated by an AR process $x_{t+1} = W x_t$. First, under a certain condition of data distribution, we prove that an autoregressively trained transformer learns $W$ by implementing one step of gradient descent to minimize an ordinary least squares (OLS) problem in-context. It then applies the learned $\widehat{W}$ for next-token prediction, thereby verifying the mesa-optimization hypothesis. Next, under the same data conditions, we explore the capability limitations of the obtained mesa-optimizer. We show that a stronger assumption related to the moments of data is the sufficient and necessary condition that the learned mesa-optimizer recovers the distribution. Besides, we conduct exploratory analyses beyond the first data condition and prove that generally, the trained transformer will not perform vanilla gradient descent for the OLS problem. Finally, our simulation results verify the theoretical results., Comment: Accepted by NeurIPS2024, 45 pages. The final version of the previous preprint
- Published
- 2024
46. Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes
- Author
-
Huang, Wei, Wu, Pengfei, Xu, Tianhe, Zhang, Hao, and Meng, Kaitao
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the deployment of reference nodes for positioning underwater anchor nodes considering the variability of sound speed has not yet been studied. This paper focuses on the optimal reference nodes deployment strategies for time--of--arrival (TOA) localization in the three-dimensional (3D) underwater space. We adopt the criterion that minimizing the trace of the inverse Fisher information matrix (FIM) to determine optimal reference nodes deployment with Gaussian measurement noise, which is positive related to the signal propagation path. A comprehensive analysis of optimal reference-target geometries is provided in the general circumstance with no restriction on the number of reference nodes, elevation angle and reference-target range. A new semi-closed form solution is found to detemine the optimal geometries. To demonstrate the findings in this paper, we conducted both simulations and sea trials on underwater anchor node positioning. Both the simulation and experiment results are consistent with theoretical analysis.
- Published
- 2024
47. SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
- Author
-
Huang, Wei, Qin, Haotong, Liu, Yangdong, Li, Yawei, Liu, Xianglong, Benini, Luca, Magno, Michele, and Qi, Xiaojuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Large language models (LLMs) achieve remarkable performance in natural language understanding but require substantial computation and memory resources. Post-training quantization (PTQ) is a powerful compression technique extensively investigated in LLMs. However, existing PTQ methods are still not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths. Standard PTQ methods using group-wise quantization suffer difficulties in quantizing LLMs accurately to such low-bit, but advanced methods remaining high-precision weights element-wisely are hard to realize their theoretical hardware efficiency. This paper presents a Salience-Driven Mixed-Precision Quantization scheme for LLMs, namely SliM-LLM. The scheme exploits the salience distribution of weights to determine optimal bit-width and quantizers for accurate LLM quantization, while aligning bit-width partition to groups for compact memory usage and fast integer inference. Specifically, the proposed SliM-LLM mainly relies on two novel techniques: (1) Salience-Determined Bit Allocation utilizes the clustering characteristics of salience distribution to allocate the bit-widths of each group, increasing the accuracy of quantized LLMs and maintaining the inference efficiency; (2) Salience-Weighted Quantizer Calibration optimizes the parameters of the quantizer by considering the element-wise salience within the group, balancing the maintenance of salient information and minimization of errors. Comprehensive experiments show that SliM-LLM significantly improves the accuracy of LLMs at ultra-low bits, e.g., 2-bit LLaMA-7B achieves a 5.5-times memory-saving than original model on NVIDIA A800 GPUs, and 48% decrease of perplexity compared to the state-of-the-art gradient-free PTQ method. Moreover, SliM-LLM+, which is integrated from the extension of SliM-LLM with gradient-based quantizers, further reduces perplexity by 35.1%., Comment: 22 pages
- Published
- 2024
48. Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds
- Author
-
Zhang, Hanwei, Cheng, Luo, He, Qisong, Huang, Wei, Li, Renjue, Sicre, Ronan, Huang, Xiaowei, Hermanns, Holger, and Zhang, Lijun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility., Comment: Preprint
- Published
- 2024
49. Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation
- Author
-
Jia, Yuyu, Huang, Wei, Gao, Junyu, Wang, Qi, and Li, Qiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Few-shot segmentation (FSS) for remote sensing (RS) imagery leverages supporting information from limited annotated samples to achieve query segmentation of novel classes. Previous efforts are dedicated to mining segmentation-guiding visual cues from a constrained set of support samples. However, they still struggle to address the pronounced intra-class differences in RS images, as sparse visual cues make it challenging to establish robust class-specific representations. In this paper, we propose a holistic semantic embedding (HSE) approach that effectively harnesses general semantic knowledge, i.e., class description (CD) embeddings.Instead of the naive combination of CD embeddings and visual features for segmentation decoding, we investigate embedding the general semantic knowledge during the feature extraction stage.Specifically, in HSE, a spatial dense interaction module allows the interaction of visual support features with CD embeddings along the spatial dimension via self-attention.Furthermore, a global content modulation module efficiently augments the global information of the target category in both support and query features, thanks to the transformative fusion of visual features and CD embeddings.These two components holistically synergize general CD embeddings and visual cues, constructing a robust class-specific representation.Through extensive experiments on the standard FSS benchmark, the proposed HSE approach demonstrates superior performance compared to peer work, setting a new state-of-the-art.
- Published
- 2024
50. Epanechnikov Variational Autoencoder
- Author
-
Qin, Tian and Huang, Wei-Min
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
In this paper, we bridge Variational Autoencoders (VAEs) [17] and kernel density estimations (KDEs) [25 ],[23] by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space in vanilla VAE but also provides a new perspective of estimating the KL-divergence in ELBO. Under appropriate conditions [ 9],[3 ], we show that the Epanechnikov kernel is the optimal choice in minimizing the derived upper bound of KL-divergence asymptotically. Compared with Gaussian kernel, Epanechnikov kernel has compact support which should make the generated sample less noisy and blurry. The implementation of Epanechnikov kernel in ELBO is straightforward as it lies in the "location-scale" family of distributions where the reparametrization tricks can be directly employed. A series of experiments on benchmark datasets such as MNIST, Fashion-MNIST, CIFAR-10 and CelebA further demonstrate the superiority of Epanechnikov Variational Autoenocoder (EVAE) over vanilla VAE in the quality of reconstructed images, as measured by the FID score and Sharpness[27].
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.