Author: "Zhang, Wei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang, Wei"' showing total 172,331 results

Start Over Author "Zhang, Wei"

172,331 results on '"Zhang, Wei"'

1. Thus Speaks Mr. Nobody: Brecht's Stories of Mr. Keuner through the Lensof Classical Chinese Dialectics

Author: Zhang, Wei
Published: 2023
Full Text: View/download PDF

2. Culture Change in Long-Term Care-Post COVID-19: Adapting to a New Reality Using Established Ideas and Systems

Author: Iyamu, Ihoghosa, Plottel, Louis, Snow, M. Elizabeth, Zhang, Wei, Havaei, Farinaz, Puyat, Joseph, Sawatzky, Richard, and Salmon, Amy
Published: 2023

3. Deforestation and Smallholder Income: Evidence from Remittances to Nepal

Author: Li, Man, Zhang, Wei, Guo, Zhe, and Bhandary, Prapti
Published: 2022

4. The Poet as Educator in the Works and Days

Author: Zhang, Wei
Published: 2021
Full Text: View/download PDF

5. The Kindness of Strangers: Tennessee Williams's A Streetcar Named Desire on the Chinese Stage

Author: Zhang, Wei and Qi, Shouha
Published: 2021
Full Text: View/download PDF

6. Generative Learning Powered Probing Beam Optimization for Cell-Free Hybrid Beamforming

Author: Zhang, Cheng, Xiong, Shuangbo, He, Mengqing, Wei, Lan, Huang, Yongming, and Zhang, Wei
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: Probing beam measurement (PBM)-based hybrid beamforming provides a feasible solution for cell-free MIMO. In this letter, we propose a novel probing beam optimization framework where three collaborative modules respectively realize PBM augmentation, sum-rate prediction and probing beam optimization. Specifically, the PBM augmentation model integrates the conditional variational auto-encoder (CVAE) and mixture density networks and adopts correlated PBM distribution with full-covariance, for which a Cholesky-decomposition based training is introduced to address the issues of covariance legality and numerical stability. Simulations verify the better performance of the proposed augmentation model compared to the traditional CVAE and the efficiency of proposed optimization framework.
Published: 2024

7. FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model

Author: Qiu, Feng, Zhang, Wei, Liu, Chen, An, Rudong, Li, Lincheng, Ding, Yu, Fan, Changjie, Hu, Zhipeng, and Yu, Xin
Subjects: Computer Science - Graphics, Computer Science - Artificial Intelligence
Abstract: Video-driven 3D facial animation transfer aims to drive avatars to reproduce the expressions of actors. Existing methods have achieved remarkable results by constraining both geometric and perceptual consistency. However, geometric constraints (like those designed on facial landmarks) are insufficient to capture subtle emotions, while expression features trained on classification tasks lack fine granularity for complex emotions. To address this, we propose \textbf{FreeAvatar}, a robust facial animation transfer method that relies solely on our learned expression representation. Specifically, FreeAvatar consists of two main components: the expression foundation model and the facial animation transfer model. In the first component, we initially construct a facial feature space through a face reconstruction task and then optimize the expression feature space by exploring the similarities among different expressions. Benefiting from training on the amounts of unlabeled facial images and re-collected expression comparison dataset, our model adapts freely and effectively to any in-the-wild input facial images. In the facial animation transfer component, we propose a novel Expression-driven Multi-avatar Animator, which first maps expressive semantics to the facial control parameters of 3D avatars and then imposes perceptual constraints between the input and output images to maintain expression consistency. To make the entire process differentiable, we employ a trained neural renderer to translate rig parameters into corresponding images. Furthermore, unlike previous methods that require separate decoders for each avatar, we propose a dynamic identity injection module that allows for the joint training of multiple avatars within a single network., Comment: 11 pages, 11 figures
Published: 2024
Full Text: View/download PDF

8. Atmospheric Turbulence-Immune Free Space Optical Communication System based on Discrete-Time Analog Transmission

Author: Huang, Hongyu, Yu, Zhenming, Lei, Yi, Zhang, Wei, Zhao, Yongli, Huang, Shanguo, and Xu, Kun
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: To effectively mitigate the influence of atmospheric turbulence, a novel discrete-time analog transmission free-space optical (DTAT-FSO) communication scheme is proposed. It directly maps information sources to discrete-time analog symbols via joint source-channel coding and modulation. Differently from traditional digital free space optical (TD-FSO) schemes, the proposed DTAT-FSO approach can automatically adapt to the variation of the channel state, with no need to adjust the specific modulation and coding scheme. The performance of the DTAT-FSO system was evaluated in both intensity modulation/direct detection (IM/DD) and coherent FSO systems for high-resolution image transmission. The results show that the DTAT-FSO reliably transmits images at low received optical powers (ROPs) and automatically enhances quality at high ROPs, while the TD-FSO experiences cliff and leveling effects when the channel state varies. With respect to the TD-FSO scheme, the DTAT-FSO scheme improved receiver sensitivity by 2.5 dB in the IM/DD FSO system and 0.8 dB in the coherent FSO system, and it achieved superior image fidelity under the same ROP. The automatic adaptation feature and improved performance of the DTAT-FSO suggest its potential for terrestrial, airborne, and satellite optical networks, addressing challenges posed by atmospheric turbulence.
Published: 2024

9. Matrix Profile for Anomaly Detection on Multidimensional Time Series

Author: Yeh, Chin-Chia Michael, Der, Audrey, Saini, Uday Singh, Lai, Vivian, Zheng, Yan, Wang, Junpeng, Dai, Xin, Zhuang, Zhongfang, Fan, Yujie, Chen, Huiyuan, Aboagye, Prince Osei, Wang, Liang, Zhang, Wei, and Keogh, Eamonn
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Databases
Abstract: The Matrix Profile (MP), a versatile tool for time series data mining, has been shown effective in time series anomaly detection (TSAD). This paper delves into the problem of anomaly detection in multidimensional time series, a common occurrence in real-world applications. For instance, in a manufacturing factory, multiple sensors installed across the site collect time-varying data for analysis. The Matrix Profile, named for its role in profiling the matrix storing pairwise distance between subsequences of univariate time series, becomes complex in multidimensional scenarios. If the input univariate time series has n subsequences, the pairwise distance matrix is a n x n matrix. In a multidimensional time series with d dimensions, the pairwise distance information must be stored in a n x n x d tensor. In this paper, we first analyze different strategies for condensing this tensor into a profile vector. We then investigate the potential of extending the MP to efficiently find k-nearest neighbors for anomaly detection. Finally, we benchmark the multidimensional MP against 19 baseline methods on 119 multidimensional TSAD datasets. The experiments covers three learning setups: unsupervised, supervised, and semi-supervised. MP is the only method that consistently delivers high performance across all setups.
Published: 2024

10. Bayesian Dynamic Factor Models for High-dimensional Matrix-valued Time Series

Author: Zhang, Wei
Subjects: Economics - Econometrics
Abstract: High-dimensional matrix-valued time series are of significant interest in economics and finance, with prominent examples including cross region macroeconomic panels and firms' financial data panels. We introduce a class of Bayesian matrix dynamic factor models that utilize matrix structures to identify more interpretable factor patterns and factor impacts. Our model accommodates time-varying volatility, adjusts for outliers, and allows cross-sectional correlations in the idiosyncratic components. To determine the dimension of the factor matrix, we employ an importance-sampling estimator based on the cross-entropy method to estimate marginal likelihoods. Through a series of Monte Carlo experiments, we show the properties of the factor estimators and the performance of the marginal likelihood estimator in correctly identifying the true dimensions of the factor matrices. Applying our model to a macroeconomic dataset and a financial dataset, we demonstrate its ability in unveiling interesting features within matrix-valued time series.
Published: 2024

11. Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Author: Zheng, Xinhu, Jiang, Anbai, Han, Bing, Qian, Yanmin, Fan, Pingyi, Liu, Jia, and Zhang, Wei-Qiang
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real production sites due to the generalization problem, which is primarily caused by the difficulty of data collection and the complexity of environmental factors. This paper introduces a robust ASD model that leverages audio pre-trained models. Specifically, we fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy. Additionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA) tuning instead of full fine-tuning to address the problem of limited data for fine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models, including top-tier traditional convolutional networks and speech pre-trained models, which demonstrates the effectiveness of audio pre-trained models with LoRA tuning. Ablation studies are also conducted to showcase the efficacy of the proposed scheme.
Published: 2024

12. E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Author: Liao, Zihan, Wang, Jun, Yu, Hang, Wei, Lingxiao, Li, Jianguo, and Zhang, Wei
Subjects: Computer Science - Computation and Language
Abstract: In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling., Comment: 12 pages, 4 figures
Published: 2024

13. LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Author: Ding, Henghui, Hong, Lingyi, Liu, Chang, Xu, Ning, Yang, Linjie, Fan, Yuchen, Miao, Deshui, Gu, Yameng, Li, Xin, He, Zhenyu, Wang, Yaowei, Yang, Ming-Hsuan, Chai, Jinming, Ma, Qin, Zhang, Junpei, Jiao, Licheng, Liu, Fang, Liu, Xinyu, Zhang, Jing, Zhang, Kexin, Liu, Xu, Li, LingLing, Fang, Hao, Pan, Feiyu, Lu, Xiankai, Zhang, Wei, Cong, Runmin, Tran, Tuyen, Cao, Bin, Zhang, Yisi, Wang, Hanyi, He, Xingjian, and Liu, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage https://lsvos.github.io/., Comment: ECCV 2024 LSVOS Challenge Report: https://lsvos.github.io/
Published: 2024

14. Spatial Interference Detection in Treatment Effect Model

Author: Zhang, Wei, Yao, Fang, and Yang, Ying
Subjects: Statistics - Methodology
Abstract: Modeling the interference effect is an important issue in the field of causal inference. Existing studies rely on explicit and often homogeneous assumptions regarding interference structures. In this paper, we introduce a low-rank and sparse treatment effect model that leverages data-driven techniques to identify the locations of interference effects. A profiling algorithm is proposed to estimate the model coefficients, and based on these estimates, global test and local detection methods are established to detect the existence of interference and the interference neighbor locations for each unit. We derive the non-asymptotic bound of the estimation error, and establish theoretical guarantees for the global test and the accuracy of the detection method in terms of Jaccard index. Simulations and real data examples are provided to demonstrate the usefulness of the proposed method.
Published: 2024

15. Preserving Individuality while Following the Crowd: Understanding the Role of User Taste and Crowd Wisdom in Online Product Rating Prediction

Author: Wang, Liang, Jain, Shubham, Dou, Yingtong, Wang, Junpeng, Yeh, Chin-Chia Michael, Fan, Yujie, Aboagye, Prince, Zheng, Yan, Dai, Xin, Zhuang, Zhongfang, Saini, Uday Singh, and Zhang, Wei
Subjects: Computer Science - Social and Information Networks, Computer Science - Information Retrieval
Abstract: Numerous algorithms have been developed for online product rating prediction, but the specific influence of user and product information in determining the final prediction score remains largely unexplored. Existing research often relies on narrowly defined data settings, which overlooks real-world challenges such as the cold-start problem, cross-category information utilization, and scalability and deployment issues. To delve deeper into these aspects, and particularly to uncover the roles of individual user taste and collective wisdom, we propose a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation. This representation effectively captures the temporal dynamics of users and products, leverages user information across product categories, and provides a natural solution to the cold-start problem. Furthermore, we have developed an efficient data processing strategy that makes this approach highly scalable and easily deployable. Comprehensive experiments in real industry settings demonstrate the effectiveness of our approach. Notably, our findings reveal that individual taste dominates over collective wisdom in online product rating prediction, a perspective that contrasts with the commonly observed wisdom of the crowd phenomenon in other domains. This dominance of individual user taste is consistent across various model types, including the boosting tree model, recurrent neural network (RNN), and transformer-based architectures. This observation holds true across the overall population, within individual product categories, and in cold-start scenarios. Our findings underscore the significance of individual user tastes in the context of online product rating prediction and the robustness of our approach across different model architectures., Comment: Preprint
Published: 2024

16. Cycle Pixel Difference Network for Crisp Edge Detection

Author: Liu, Changsong, Zhang, Wei, Liu, Yanyan, Li, Mingyang, Li, Wenlin, Fan, Yimeng, Bai, Xiangnan, and Zhangd, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Edge detection, as a fundamental task in computer vision, has garnered increasing attention. The advent of deep learning has significantly advanced this field. However, recent deep learning-based methods which rely on large-scale pre-trained weights cannot be trained from scratch, with very limited research addressing this issue. This paper proposes a novel cycle pixel difference convolution (CPDC), which effectively integrates image gradient information with modern convolution operations. Based on the CPDC, we develop a U-shape encoder-decoder model named CPD-Net, which is a purely end-to-end network. Additionally, to address the issue of edge thickness produced by most existing methods, we construct a multi-scale information enhancement module (MSEM) to enhance the discriminative ability of the model, thereby generating crisp and clean contour maps. Comprehensive experiments conducted on three standard benchmarks demonstrate that our method achieves competitive performance on the BSDS500 dataset (ODS=0.813), NYUD-V2 (ODS=0.760), and BIPED dataset (ODS=0.898). Our approach provides a novel perspective for addressing these challenges in edge detection.
Published: 2024

17. The pathway to chirality in elemental tellurium

Author: Zhou, Yuxing, Elliott, Stephen R., Toit, Daniel F. Thomas du, Zhang, Wei, and Deringer, Volker L.
Subjects: Condensed Matter - Materials Science, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Chiral crystals, like chiral molecules, cannot be superimposed onto their mirror images -- a fundamental property that has been linked to interesting physical behavior and exploited in functional devices. Among the simplest inorganic systems with crystallographic chirality, elemental tellurium adopts structures with right- or left-handed chains. However, understanding the formation mechanisms of those structures has been difficult due to the rapid crystallization of Te, which reaches the spatial and temporal resolution limits of even the most advanced experiments. Here, we report ultra-large-scale, quantum-mechanically accurate simulations that reveal mechanisms of crystallization and the origin of crystallographic chirality in solid Te. We identify a characteristic, disordered cube-like structural motif -- a transient bonding environment with only nanosecond lifetime -- that enables the rapid crystallization of Te and mediates chirality transfer. Based on the resulting microscopic understanding, we are able to explain the switching mechanism of Te-based electrical devices.
Published: 2024

18. Boosting Certified Robustness for Time Series Classification with Efficient Self-Ensemble

Author: Dong, Chang, Li, Zhengyang, Zheng, Liangwei, Chen, Weitong, and Zhang, Wei Emma
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning, H.3.3
Abstract: Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under $\ell_p$-ball attacks. Recognizing its success, research in the time series domain has started focusing on these aspects. However, existing research predominantly focuses on time series forecasting, or under the non-$\ell_p$ robustness in statistic feature augmentation for time series classification~(TSC). Our review found that Randomized Smoothing performs modestly in TSC, struggling to provide effective assurances on datasets with poor robustness. Therefore, we propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins, thereby certifying a larger radius. This approach also addresses the computational overhead issue of Deep Ensemble~(DE) while remaining competitive and, in some cases, outperforming it in terms of robustness. Both theoretical analysis and experimental results validate the effectiveness of our method, demonstrating superior performance in robustness testing compared to baseline approaches., Comment: 6 figures, 4 tables, 10 pages
Published: 2024
Full Text: View/download PDF

19. Interpreting and Improving Large Language Models in Arithmetic Calculation

Author: Zhang, Wei, Wan, Chaoqun, Zhang, Yonggang, Cheung, Yiu-ming, Tian, Xinmei, Shen, Xu, and Ye, Jieping
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have demonstrated remarkable potential across numerous applications and have shown an emergent ability to tackle complex reasoning tasks, such as mathematical computations. However, even for the simplest arithmetic calculations, the intrinsic mechanisms behind LLMs remain mysterious, making it challenging to ensure reliability. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. Through comprehensive experiments, we find that LLMs frequently involve a small fraction (< 5%) of attention heads, which play a pivotal role in focusing on operands and operators during calculation processes. Subsequently, the information from these operands is processed through multi-layer perceptrons (MLPs), progressively leading to the final solution. These pivotal heads/MLPs, though identified on a specific dataset, exhibit transferability across different datasets and even distinct tasks. This insight prompted us to investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance. We empirically find that such precise tuning can yield notable enhancements on mathematical prowess, without compromising the performance on non-mathematical tasks. Our work serves as a preliminary exploration into the arithmetic calculation abilities inherent in LLMs, laying a solid foundation to reveal more intricate mathematical tasks., Comment: Accepted by ICML 2024 (oral)
Published: 2024

20. Bayesian Dynamic Generalized Additive Model for Mortality during COVID-19 Pandemic

Author: Zhang, Wei, Mira, Antonietta, and Wit, Ernst C.
Subjects: Statistics - Applications
Abstract: While COVID-19 has resulted in a significant increase in global mortality rates, the impact of the pandemic on mortality from other causes remains uncertain. To gain insight into the broader effects of COVID-19 on various causes of death, we analyze an Italian dataset that includes monthly mortality counts for different causes from January 2015 to December 2020. Our approach involves a generalized additive model enhanced with correlated random effects. The generalized additive model component effectively captures non-linear relationships between various covariates and mortality rates, while the random effects are multivariate time series observations recorded in various locations, and they embody information on the dependence structure present among geographical locations and different causes of mortality. Adopting a Bayesian framework, we impose suitable priors on the model parameters. For efficient posterior computation, we employ variational inference, specifically for fixed effect coefficients and random effects, Gaussian variational approximation is assumed, which streamlines the analysis process. The optimisation is performed using a coordinate ascent variational inference algorithm and several computational strategies are implemented along the way to address the issues arising from the high dimensional nature of the data, providing accelerated and stabilised parameter estimation and statistical inference.
Published: 2024

21. MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning

Author: Sun, Jiarui, Akcal, M. Ugur, Zhang, Wei, and Chowdhary, Girish
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency, primarily due to the complexity of extracting informative state representations from high-dimensional data. Previous methods such as contrastive-based approaches have made strides in improving sample efficiency but fall short in modeling the nuanced evolution of states. To address this, we introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking to explicitly model state evolution in visual RL. Specifically, we propose a self-supervised dual-component strategy that integrates (1) a graph construction of pixel-based observations for spatial-temporal masking, coupled with (2) a multi-level contrastive learning mechanism that enriches state representations by emphasizing temporal continuity and change of states. MOOSS advances the understanding of state dynamics by disrupting and learning from spatial-temporal correlations, which facilitates policy learning. Our comprehensive evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency, demonstrating the effectiveness of our method. Our code is released at https://github.com/jsun57/MOOSS., Comment: WACV 2025
Published: 2024

22. Searching for MeV-scale Axion-like Particles and Dark Photons with PandaX-4T

Author: PandaX Collaboration, Li, Tao, Bo, Zihao, Chen, Wei, Chen, Xun, Chen, Yunhua, Cheng, Zhaokan, Cui, Xiangyi, Fan, Yingjie, Fang, Deqing, Gao, Zhixing, Geng, Lisheng, Giboni, Karl, Guo, Xunan, Guo, Xuyuan, Guo, Zichao, Han, Chencheng, He, Ke HanChangda, He, Jinrong, Huang, Di, Huang, Houqi, Huang, Junting, Hou, Ruquan, Hou, Yu, Ji, Xiangdong, Ji, Xiangpan, Ju, Yonglin, Li, Chenxiang, Li, Jiafu, Li, Mingchuan, Li, Shuaijie, Li, Zhiyuan, Lin, Qing, Liu, Jianglai, Lu, Congcong, Lu, Xiaoying, Luo, Lingyin, Luo, Yunyang, Ma, Wenbo, Ma, Yugang, Mao, Yajun, Meng, Yue, Ning, Xuyang, Pang, Binyu, Qi, Ningchun, Qian, Zhicheng, Ren, Xiangxiang, Shan, Dong, Shang, Xiaofeng, Shao, Xiyuan, Shen, Guofang, Shen, Manbin, Sun, Wenliang, Tao, Yi, Wang, Anqing, Wang, Guanbo, Wang, Hao, Wang, Jiamin, Wang, Lei, Wang, Meng, Wang, Qiuhong, Wang, Shaobo, Wang, Siguang, Wang, Wei, Wang, Xiuli, Wang, Xu, Wang, Zhou, Wei, Yuehuan, Wu, Weihao, Wu, Yuan, Xiao, Mengjiao, Xiao, Xiang, Xiong, Kaizhi, Xu, Yifan, Yao, Shunyu, Yan, Binbin, Yan, Xiyu, Yang, Yong, Ye, Peihua, Yu, Chunxu, Yuan, Ying, Yuan, Zhe, Yun, Youhui, Zeng, Xinning, Zhang, Minzhen, Zhang, Peng, Zhang, Shibo, Zhang, Shu, Zhang, Tao, Zhang, Wei, Zhang, Yang, Zhang, Yingxin, Zhang, Yuanyuan, Zhao, Li, Zhou, Jifang, Zhou, Jiaxu, Zhou, Jiayi, Zhou, Ning, Zhou, Xiaopeng, Zhou, Yubo, and Zhou, Zhizhen
Subjects: High Energy Physics - Experiment
Abstract: Axion-like particles (ALPs) and dark photons (DPs) are viable dark matter particle candidates. We have searched for possible ALP/DP signals in the PandaX-4T liquid xenon detector using 94.8 days of data. A binned likelihood fit is constructed to search for possible mono-energetic peaks induced by the absorption processes between ALPs/DPs and atomic electrons of xenon. A detailed temporal model of decays associated with xenon isotopes is introduced to constrain the number of background events. No signal excess over background expectations is observed, and we have established the most stringent exclusion limits for most ALP/DP masses ranging from 150 keV/$c^2$ to 1 MeV/$c^2$.
Published: 2024

23. Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

Author: Xu, Cong, Zhu, Zhangchi, Yu, Mo, Wang, Jun, Wang, Jianyong, and Zhang, Wei
Subjects: Computer Science - Information Retrieval
Abstract: Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This inconsistent experimental setting leads to the underestimation of traditional methods and further fosters over-confidence in the ranking capability of LLMs. In this study, we provide theoretical justification for the superiority of the cross-entropy loss by demonstrating its two desirable properties: tightness and coverage. Furthermore, this study sheds light on additional novel insights: 1) Taking into account only the recommendation performance, CE is not yet optimal as it is not a quite tight bound in terms of some ranking metrics. 2) In scenarios that full softmax cannot be performed, an effective alternative is to scale up the sampled normalizing term. These findings then help unleash the potential of traditional recommendation models, allowing them to surpass LLM-based counterparts. Given the substantial computational burden, existing LLM-based methods are not as effective as claimed for sequential recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future., Comment: 18 pages. arXiv admin note: substantial text overlap with arXiv:2402.06216
Published: 2024

24. CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Author: Jiang, Anbai, Shi, Yuchen, Fan, Pingyi, Zhang, Wei-Qiang, and Liu, Jia
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Machine anomalous sound detection (ASD) has emerged as one of the most promising applications in the Industrial Internet of Things (IIoT) due to its unprecedented efficacy in mitigating risks of malfunctions and promoting production efficiency. Previous works mainly investigated the machine ASD task under centralized settings. However, developing the ASD system under decentralized settings is crucial in practice, since the machine data are dispersed in various factories and the data should not be explicitly shared due to privacy concerns. To enable these factories to cooperatively develop a scalable ASD model while preserving their privacy, we propose a novel framework named CoopASD, where each factory trains an ASD model on its local dataset, and a central server aggregates these local models periodically. We employ a pre-trained model as the backbone of the ASD model to improve its robustness and develop specialized techniques to stabilize the model under a completely non-iid and domain shift setting. Compared with previous state-of-the-art (SOTA) models trained in centralized settings, CoopASD showcases competitive results with negligible degradation of 0.08%. We also conduct extensive ablation studies to demonstrate the effectiveness of CoopASD., Comment: Accepted by GLOBECOM 2024
Published: 2024

25. When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

Author: Zhu, Xi, Zhang, Wei, Li, Yijie, O'Donnell, Lauren J., and Zhang, Fan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as a promising solution to enhance image quality while minimizing acquisition costs and scanning time. In this study, we propose a novel generative approach to perform dMRI generation using deep diffusion models. It can generate high dimension (4D) and high resolution data preserving the gradients information and brain structure. We demonstrated our method through an image mapping task aimed at enhancing the quality of dMRI images from 3T to 7T. Our approach demonstrates highly enhanced performance in generating dMRI images when compared to the current state-of-the-art (SOTA) methods. This achievement underscores a substantial progression in enhancing dMRI quality, highlighting the potential of our novel generative approach to revolutionize dMRI imaging standards., Comment: 11 pages, 3 figures
Published: 2024

26. Where to Fetch: Extracting Visual Scene Representation from Large Pre-Trained Models for Robotic Goal Navigation

Author: Li, Yu, Li, Dayou, Zhao, Chenkun, Wang, Ruifeng, Song, Ran, and Zhang, Wei
Subjects: Computer Science - Robotics
Abstract: To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in many robotic goal navigation tasks due to poor understanding of the environment. In this work, we present a visual scene representation built with large-scale visual language models to form a feature representation of the environment capable of handling natural language queries. Combined with large language models, this method can parse language instructions into action sequences for a robot to follow, and accomplish goal navigation with querying the scene representation. Experiments demonstrate that our method enables the robot to follow a wide range of instructions and complete complex goal navigation tasks.
Published: 2024

27. MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes

Author: Li, Dayou, Zhao, Chenkun, Yang, Shuo, Song, Ran, Li, Xiaolei, and Zhang, Wei
Subjects: Computer Science - Robotics
Abstract: This paper focuses on target-oriented grasping in occluded scenes, where the target object is specified by a binary mask and the goal is to grasp the target object with as few robotic manipulations as possible. Most existing methods rely on a push-grasping synergy to complete this task. To deliver a more powerful target-oriented grasping pipeline, we present MPGNet, a three-branch network for learning a synergy between moving, pushing, and grasping actions. We also propose a multi-stage training strategy to train the MPGNet which contains three policy networks corresponding to the three actions. The effectiveness of our method is demonstrated via both simulated and real-world experiments., Comment: Accepted to IROS 2024
Published: 2024

28. Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Author: Wei, Pengkun, Cheng, Shuo, Li, Dayou, Song, Ran, Zhang, Yipeng, and Zhang, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Efficiently detecting target weld seams while ensuring sub-millimeter accuracy has always been an important challenge in autonomous welding, which has significant application in industrial practice. Previous works mostly focused on recognizing and localizing welding seams one by one, leading to inferior efficiency in modeling the workpiece. This paper proposes a novel framework capable of multiple weld seams extraction using both RGB images and 3D point clouds. The RGB image is used to obtain the region of interest by approximately localizing the weld seams, and the point cloud is used to achieve the fine-edge extraction of the weld seams within the region of interest using region growth. Our method is further accelerated by using a pre-trained deep learning model to ensure both efficiency and generalization ability. The performance of the proposed method has been comprehensively tested on various workpieces featuring both linear and curved weld seams and in physical experiment systems. The results showcase considerable potential for real-world industrial applications, emphasizing the method's efficiency and effectiveness. Videos of the real-world experiments can be found at https://youtu.be/pq162HSP2D4.
Published: 2024

29. Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Author: Li, Dayou, Zhao, Chenkun, Yang, Shuo, Ma, Lin, Li, Yibin, and Zhang, Wei
Subjects: Computer Science - Robotics
Abstract: We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large amount of data automatically for model training. Besides, with the help of Large-Language-Models(LLMs), actions can be effectively executed to finish the tasks defined by instructions. A series of real-world experiments revealed that our method can achieve better performance with generated data. Moreover, our model can generalize better to scenarios with unseen objects and language instructions., Comment: Accepted to ICARM 2024
Published: 2024

30. Global well-posedness of the free boundary problem for incompressible viscous resistive MHD in critical Besov spaces

Author: Zhang, Wei, Fu, Jie, Hao, Chengchun, and Yang, Siqi
Subjects: Mathematics - Analysis of PDEs, 35R35, 76D03, 76W05
Abstract: This paper aims to establish the global well-posedness of the free boundary problem for the incompressible viscous resistive magnetohydrodynamic (MHD) equations. Under the framework of Lagrangian coordinates, a unique global solution exists in the half-space provided that the norm of the initial data in the critical homogeneous Besov space $\dot{B}_{p, 1}^{-1+N/p}(\mathbb{R}_{+}^N)$ is sufficiently small, where $p \in [N, 2N-1)$. Building upon prior work such as (Danchin and Mucha, J. Funct. Anal. 256 (2009) 881--927) and (Ogawa and Shimizu, J. Differ. Equations 274 (2021) 613--651) in the half-space setting, we establish maximal $L^{1}$-regularity for both the Stokes equations without surface stress and the linearized equations of the magnetic field with zero boundary condition. The existence and uniqueness of solutions to the nonlinear problems are proven using the Banach contraction mapping principle., Comment: 28 pages
Published: 2024

31. UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track

Author: Fang, Hao, Pan, Feiyu, Lu, Xiankai, Zhang, Wei, and Cong, Runmin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video. In this year, LSVOS Challenge RVOS Track replaced the origin YouTube-RVOS benchmark with MeViS. MeViS focuses on referring the target object in a video through its motion descriptions instead of static attributes, posing a greater challenge to RVOS task. In this work, we integrate strengths of that leading RVOS and VOS models to build up a simple and effective pipeline for RVOS. Firstly, We finetune the state-of-the-art RVOS model to obtain mask sequences that are correlated with language descriptions. Secondly, based on a reliable and high-quality key frames, we leverage VOS model to enhance the quality and temporal consistency of the mask results. Finally, we further improve the performance of the RVOS model using semi-supervised learning. Our solution achieved 62.57 J&F on the MeViS test set and ranked 1st place for 6th LSVOS Challenge RVOS Track.
Published: 2024

32. Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track

Author: Pan, Feiyu, Fang, Hao, Cong, Runmin, Zhang, Wei, and Lu, Xiankai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. SAM 2 is a simple transformer architecture with streaming memory for real-time video processing, which trained on the date provides strong performance across a wide range of tasks. In this work, we evaluate the zero-shot performance of SAM 2 on the more challenging VOS datasets MOSE and LVOS. Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track., Comment: arXiv admin note: substantial text overlap with arXiv:2408.00714
Published: 2024

33. Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Author: Chen, Zeyuan, Wu, Haiyan, Wu, Kaixin, Chen, Wei, Zhong, Mingjie, Xu, Jia, Liu, Zhongyi, and Zhang, Wei
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even powerful Large Language Models (LLMs) still cannot accurately judge the relevance of a query and an item from a semantic perspective. To augment LLMs-driven relevance modeling, this study proposes leveraging user interactions recorded in search logs to yield insights into users' implicit search intentions. The challenge lies in the effective prompting of LLMs to capture dynamic search intentions, which poses several obstacles in real-world relevance scenarios, i.e., the absence of domain-specific knowledge, the inadequacy of an isolated prompt, and the prohibitive costs associated with deploying LLMs. In response, we propose ProRBP, a novel Progressive Retrieved Behavior-augmented Prompting framework for integrating search scenario-oriented knowledge with LLMs effectively. Specifically, we perform the user-driven behavior neighbors retrieval from the daily search logs to obtain domain-specific knowledge in time, retrieving candidates that users consider to meet their expectations. Then, we guide LLMs for relevance modeling by employing advanced prompting techniques that progressively improve the outputs of the LLMs, followed by a progressive aggregation with comprehensive consideration of diverse aspects. For online serving, we have developed an industrial application framework tailored for the deployment of LLMs in relevance modeling. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance.
Published: 2024

34. Optimal UCA Design for OAM Based Wireless Backhaul Transmission

Author: Jing, Haiyue, Cheng, Wenchi, Zhang, Wei, and Zhang, Hailin
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Signal Processing
Abstract: Orbital angular momentum (OAM), which is considered as a novel way to achieve high capacity, has been attracted much attention recently. OAM signals emitted by uniform circular array (UCA) are widely regarded to go through the Bessel-form channels. However, the channel gains corresponding to the Bessel-form channels are with low signal-to-noise-ratio (SNR) on OAM-modes and it is difficult to achieve high capacity using all OAM modes. To achieve maximum capacity offered by OAM multiplexing for wireless backhaul communications, in this paper we propose the optimal UCA design, which selects the optimal OAM-modes and radius of receive UCA. We formulate the capacity maximization problem and divide it into two subproblems for obtaining the corresponding optimal UCA design for OAM multiplexing based wireless backhaul communications. In particular, the optimal radius of the receive UCA is firstly derived. Then, we propose an mode selection scheme to choose appropriate OAM-modes for data transmission to maximize the capacity. Extensive simulations obtained validate that the capacity of OAM multiplexing can be significantly increased with our developed scheme.
Published: 2024

35. A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised Pretraining

Author: Der, Audrey, Yeh, Chin-Chia Michael, Dai, Xin, Chen, Huiyuan, Zheng, Yan, Fan, Yujie, Zhuang, Zhongfang, Lai, Vivian, Wang, Junpeng, Wang, Liang, Zhang, Wei, and Keogh, Eamonn
Subjects: Computer Science - Machine Learning
Abstract: Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments, most self-supervised time series PTMs were surpassed by simple supervised models. We hypothesize this undesired phenomenon may be caused by data scarcity. In response, we test six time series generation methods, use the generated data in pretraining in lieu of the real data, and examine the effects on classification performance. Our results indicate that replacing a real-data pretraining set with a greater volume of only generated samples produces noticeable improvement., Comment: To appear in CIKM 2024 as a short paper; the version here is the self-contained version that includes the non-mandatory supplementary material available on the paper's companion website
Published: 2024

36. Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions

Author: Liu, Quan, Zhou, Zhenhong, He, Longzhu, Liu, Yi, Zhang, Wei, and Su, Sen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines AED and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach. Code is available at https://github.com/GIGABaozi/AED.git., Comment: 15 pages, 5 figures
Published: 2024

37. Exploring New Physics with PandaX-4T Low Energy Electronic Recoil Data

Author: PandaX Collaboration, Zeng, Xinning, Bo, Zihao, Chen, Wei, Chen, Xun, Chen, Yunhua, Cheng, Zhaokan, Cui, Xiangyi, Fan, Yingjie, Fang, Deqing, Gao, Zhixing, Geng, Lisheng, Giboni, Karl, Guo, Xunan, Guo, Xuyuan, Guo, Zichao, Han, Chencheng, He, Ke HanChangda, He, Jinrong, Huang, Di, Huang, Houqi, Huang, Junting, Hou, Ruquan, Hou, Yu, Ji, Xiangdong, Ji, Xiangpan, Ju, Yonglin, Li, Chenxiang, Li, Jiafu, Li, Mingchuan, Li, Shuaijie, Li, Tao, Li, Zhiyuan, Lin, Qing, Liu, Jianglai, Lu, Congcong, Lu, Xiaoying, Luo, Lingyin, Luo, Yunyang, Ma, Wenbo, Ma, Yugang, Mao, Yajun, Meng, Yue, Ning, Xuyang, Pang, Binyu, Qi, Ningchun, Qian, Zhicheng, Ren, Xiangxiang, Shan, Dong, Shang, Xiaofeng, Shao, Xiyuan, Shen, Guofang, Shen, Manbin, Sun, Wenliang, Tao, Yi, Wang, Anqing, Wang, Guanbo, Wang, Hao, Wang, Jiamin, Wang, Lei, Wang, Meng, Wang, Qiuhong, Wang, Shaobo, Wang, Siguang, Wang, Wei, Wang, Xiuli, Wang, Xu, Wang, Zhou, Wei, Yuehuan, Wu, Weihao, Wu, Yuan, Xiao, Mengjiao, Xiao, Xiang, Xiong, Kaizhi, Xu, Yifan, Yao, Shunyu, Yan, Binbin, Yan, Xiyu, Yang, Yong, Ye, Peihua, Yu, Chunxu, Yuan, Ying, Yuan, Zhe, Yun, Youhui, Zhang, Minzhen, Zhang, Peng, Zhang, Shibo, Zhang, Shu, Zhang, Tao, Zhang, Wei, Zhang, Yang, Zhang, Yingxin, Zhang, Yuanyuan, Zhao, Li, Zhou, Jifang, Zhou, Jiaxu, Zhou, Jiayi, Zhou, Ning, Zhou, Xiaopeng, Zhou, Yubo, and Zhou, Zhizhen
Subjects: High Energy Physics - Experiment
Abstract: New particles beyond the Standard Model of particle physics, such as axions, can be effectively searched through their interactions with electrons. We use the large liquid xenon detector PandaX-4T to search for novel electronic recoil signals induced by solar axions, neutrinos with anomalous magnetic moment, axion-like particles, dark photons, and light fermionic dark matter. A detailed background model is established with the latest datasets with 1.54 $\rm tonne \cdot year$ exposure. No significant excess above the background has been observed, and we have obtained competitive constraints for axion couplings, neutrino magnetic moment, and fermionic dark matter interactions.
Published: 2024

38. Multiboson Hanbury Brown-Twiss correlations for partially coherent sources in relativistic heavy-ion collisions in a multiphase transport model

Author: Wang, Shi-Yao, Ye, Jun-Ting, and Zhang, Wei-Ning
Subjects: Nuclear Theory
Abstract: We use a multi-phase transport (AMPT) model to study multi-pion and multi-kaon Hanbury Brown-Twiss (HBT) correlations for the partially coherent particle-emitting sources in relativistic heavy-ion collisions. A density-dependent longitudinal coherent emission length and density-dependent transverse coherent emission length are introduced in calculating the multi-boson HBT correlation functions of the partially coherent sources. We compare the model results of three- and four-pion HBT correlation functions with experimental data in Pb-Pb collisions at center-of-mass energy $\sqrt{s_{NN}}=$2.76 TeV, and investigate the influences of boson coherent emissions on the multi-pion and multi-kaon correlation functions, respectively. We find that all of the three- and four-pion correlation functions of the partially coherent sources are consistent with experimental data. Coherent emission leads to the intercept decreases of the multi-boson correlation functions. The intercepts of the multi-kaon correlation functions of the partially coherent source are higher than those of the multi-pion correlation functions, because low kaon densities lead to smaller kaon coherent emission lengths than pion emission lengths. The intercepts of multi-boson correlation functions of partially coherent sources in high transverse momentum intervals are higher than those in low transverse momentum intervals because particle de Broglie wavelengths are small at high momenta., Comment: 19 pages, 12 figures
Published: 2024

39. Hybrid Magnonics with Localized Spoof Surface Plasmon Polaritons

Author: Xiong, Yuzan, Christy, Andrew, Yan, Zixin, Pishehvar, Amin, Mahdi, Muntasir, Wu, Junming, Cahoon, James F., Yang, Binbin, Hamilton, Michael C., Zhang, Xufeng, and Zhang, Wei
Subjects: Condensed Matter - Materials Science, Physics - Optics
Abstract: Hybrid magnonic systems have emerged as a promising direction for information propagation with preserved coherence. Due to high tunability of magnons, their interactions with microwave photons can be engineered to probe novel phenomena based on strong photon-magnon coupling. Improving the photon-magnon coupling strength can be done by tuning the structure of microwave resonators to better interact with the magnon counterpart. Planar resonators have been explored due to their potential for on-chip integration, but only common modes from stripline-based resonators have been used. Here, we present a microwave spiral resonator supporting the spoof localized surface plasmons (LSPs) and implement it to the investigation of photon-magnon coupling for hybrid magnonic applications. We showcase strong magnon-LSP photon coupling using a ferrimagnetic yttrium iron garnet sphere. We discuss the dependence of the spiral resonator design to the engineering capacity of the photon mode frequency and spatial field distributions, via both experiment and simulation. By the localized photon mode profiles, the resulting magnetic field concentrates near the surface dielectrics, giving rise to an enhanced magnetic filling factor. The strong coupling and large engineering space render the spoof LSPs an interesting contender in developing novel hybrid magnonic systems and functionalities., Comment: 13 pages, 13 figures
Published: 2024

40. Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Author: Cheng, Wenchi, Jing, Haiyue, Zhang, Wei, Li, Zan, and Zhang, Hailin
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Orbital angular momentum (OAM) has attracted much attention for radio vortex wireless communications due to the orthogonality among different OAM-modes. To maintain the orthogonality among different OAM modes at the receiver, the strict alignment between transmit and receive antennas is highly demanded. However, it is not practical to guarantee the transceiver alignment in wireless communications. The phase turbulence, resulting from the misaligned transceivers, leads to serious inter-mode interference among different OAM modes and therefore fail for signals detection of multiple OAM modes at the receiver. To achieve practical OAM based wireless communications, in this paper we investigate the radio vortex wireless communications with misaligned transmit and receive antennas. We propose a joint Beamforming and Pre-detection (BePre) scheme, which uses two unitary matrices to convert the channel matrix into the equivalent circulant matrix for keeping the orthogonality among OAM-modes at the receiver. Then, the OAM signals can be detected with the mode-decomposition scheme at the misaligned receiver. Extensive simulations obtained validate and evaluate that our developed joint BePre scheme can efficiently detect the signals of multiple OAM-modes for the misaligned transceiver and can significantly increase the spectrum efficiency.
Published: 2024

41. Breaking Limits of Line-of-Sight MIMO Capacity in 6G Wireless Communications

Author: Jing, Haiyue, Cheng, Wenchi, and Zhang, Wei
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Multiple-input-multiple-output (MIMO) has been proved its success for the fourth generation (4G) long term evolution (LTE) and is one of the key technical enablers for evolved mobile broadband (eMBB) in the fifth generation (5G) wireless communications. However, along with the number of antennas eventually increased to be extremely large and one-hop communication distance gradually reduced, how to significantly increase the capacity for line-of-sight (LOS) MIMO becomes more and more urgent. In this article, we introduce the quasi-fractal uniform circular array (QF-UCA) antenna structure based MIMO wireless communications, which can adequately exploit the potential of MIMO in LOS channel and greatly increase the capacity with low complexity demodulation schemes. Specifically, three advantages regarding QF-UCA based LOS MIMO are reviewed. Then, research challenges on transceiver alignment, low-rank channel matrix, extended dimensions of QF-UCA, maximum number of orthogonal streams, and the corresponding potential solutions are discussed. Compared with traditional scattering-depended MIMO communications, the QF-UCA based LOS MIMO wireless communication can achieve high-efficient transmission in LOS channel.
Published: 2024

42. Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text

Author: Li, Jinpeng, Pu, Yu, Sun, Qi, and Zhang, Wei-Qiang
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Whisper and other large-scale automatic speech recognition models have made significant progress in performance. However, their performance on many low-resource languages, such as Kazakh, is not satisfactory. It is worth researching how to utilize low-cost data to improve the performance of Whisper on under-represented languages. In this study, we utilized easily accessible unpaired speech and text data and combined the language model GPT with Whisper on Kazakh. We implemented end of transcript (EOT) judgment modification and hallucination penalty to improve the performance of speech recognition. Further, we employed the decoding average token log probability as a criterion to select samples from unlabeled speech data and used pseudo-labeled data to fine-tune the model to further improve its performance. Ultimately, we achieved more than 10\% absolute WER reduction in multiple experiments, and the whole process has the potential to be generalized to other under-represented languages., Comment: Accepted by INTERSPEECH 2024;Minor typo correction
Published: 2024

43. Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Author: Cheng, Wenchi, Jing, Haiyue, Zhang, Wei, Zhang, Keyi, and Zhang, Hailin
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: The development of orbital angular momentum (OAM)-based radio vortex transmission presents a promising opportunity for increasing the capacity of wireless communication in correlated channels due to its inherent orthogonality among different OAM modes. One of the most popular schemes for high-efficient OAM transmission is the digital baseband associated with uniform circular array (UCA) based transceiver. However, the periodicity of complex-exponential feed makes the maximum number of orthogonal signals carried by multiple OAM modes generally restricted to the array-element number of UCA antenna, which poses an open question of how to employ more OAM modes given a fixed number of array elements. Furthermore, signals modulated with high-order OAM modes are difficult to be captured by the receiver due to their serious divergence as propagating in free space, thus severely limiting the capacity of radio vortex communications. To overcome the above challenges, in this paper based on the partly element-overlapped fractal geometry layout and effectively using low-order OAM modes, we propose the quasi-fractal UCA (QF-UCA) antenna based OAM multiplexing transmission. We perform the two-dimension OAM modulation (TOM) and demodulation (TOD) schemes with the orthogonal OAM mode number exceeding the array-element number, which is beyond the traditional concept of multiple antennas based wireless communications. Simulation results show that our proposed scheme can achieve more number of orthogonal multiplexing streams than the maximum number of orthogonal multiplexing corresponding to traditional multiple antenna systems.
Published: 2024

44. Towards Effective and Interpretable Semantic Communications

Author: Wu, Youlong, Shi, Yuanmin, Ma, Shuai, Jiang, Chunxiao, Zhang, Wei, and Letaief, Khaled B.
Subjects: Computer Science - Information Theory
Abstract: With the exponential surge in traffic data and the pressing need for ultra-low latency in emerging intelligence applications, it is envisioned that 6G networks will demand disruptive communication technologies to foster ubiquitous intelligence and succinctness within the human society. Semantic communication, a novel paradigm, holds the promise of significantly curtailing communication overhead and latency by transmitting only task-relevant information. Despite numerous efforts in both theoretical frameworks and practical implementations of semantic communications, a substantial theory-practice gap complicates the theoretical analysis and interpretation, particularly when employing black-box machine learning techniques. This article initially delves into information-theoretic metrics such as semantic entropy, semantic distortions, and semantic communication rate to characterize the information flow in semantic communications. Subsequently, it provides a guideline for implementing semantic communications to ensure both theoretical interpretability and communication effectiveness., Comment: This paper has been accepted by IEEE Network Magazine
Published: 2024
Full Text: View/download PDF

45. Enhanced Traffic Flow Prediction with Multi-Segment Fusion Tensor Graph Convolutional Networks

Author: Zhang, Wei and Tang, Peng
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval
Abstract: Accurate traffic Flow Prediction can assist in traffic management, route planning, and congestion mitigation, which holds significant importance in enhancing the efficiency and reliability of intelligent transportation systems (ITS). However, existing traffic flow prediction models suffer from limitations in capturing the complex spatial-temporal dependencies within traffic networks. In order to address this issue, this study proposes a multi-segment fusion tensor graph convolutional network (MS-FTGCN) for traffic flow prediction with the following three-fold ideas: a) building a unified spatial-temporal graph convolutional framework based on Tensor M-product, which capture the spatial-temporal patterns simultaneously; b) incorporating hourly, daily, and weekly components to model multi temporal properties of traffic flows, respectively; c) fusing the outputs of the three components by attention mechanism to obtain the final traffic flow prediction results. The results of experiments conducted on two traffic flow datasets demonstrate that the proposed MS-FTGCN outperforms the state-of-the-art models.
Published: 2024

46. MS-Mapping: An Uncertainty-Aware Large-Scale Multi-Session LiDAR Mapping System

Author: Hu, Xiangcheng, Wu, Jin, Jiao, Jianhao, Jiang, Binqian, Zhang, Wei, Wang, Wenshuo, and Tan, Ping
Subjects: Computer Science - Robotics
Abstract: Large-scale multi-session LiDAR mapping is essential for a wide range of applications, including surveying, autonomous driving, crowdsourced mapping, and multi-agent navigation. However, existing approaches often struggle with data redundancy, robustness, and accuracy in complex environments. To address these challenges, we present MS-Mapping, an novel multi-session LiDAR mapping system that employs an incremental mapping scheme for robust and accurate map assembly in large-scale environments. Our approach introduces three key innovations: 1) A distribution-aware keyframe selection method that captures the subtle contributions of each point cloud frame to the map by analyzing the similarity of map distributions. This method effectively reduces data redundancy and pose graph size, while enhancing graph optimization speed; 2) An uncertainty model that automatically performs least-squares adjustments according to the covariance matrix during graph optimization, improving mapping precision, robustness, and flexibility without the need for scene-specific parameter tuning. This uncertainty model enables our system to monitor pose uncertainty and avoid ill-posed optimizations, thereby increasing adaptability to diverse and challenging environments. 3) To ensure fair evaluation, we redesign baseline comparisons and the evaluation benchmark. Direct assessment of map accuracy demonstrates the superiority of the proposed MS-Mapping algorithm compared to state-of-the-art methods. In addition to employing public datasets such as Urban-Nav, FusionPortable, and Newer College, we conducted extensive experiments on such a large \SI{855}{m}$\times$\SI{636}{m} ground truth map, collecting over \SI{20}{km} of indoor and outdoor data across more than ten sequences..., Comment: 18 pages, 22 figures
Published: 2024

47. UpLIF: An Updatable Self-Tuning Learned Index Framework

Author: Heidari, Alireza, Ahmadi, Amirhossein, and Zhang, Wei
Subjects: Computer Science - Databases, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: The emergence of learned indexes has caused a paradigm shift in our perception of indexing by considering indexes as predictive models that estimate keys' positions within a data set, resulting in notable improvements in key search efficiency and index size reduction; however, a significant challenge inherent in learned index modeling is its constrained support for update operations, necessitated by the requirement for a fixed distribution of records. Previous studies have proposed various approaches to address this issue with the drawback of high overhead due to multiple model retraining. In this paper, we present UpLIF, an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates, predicts the distribution of updates for performance improvement, and optimizes its index structure using reinforcement learning. We also introduce the concept of balanced model adjustment, which determines the model's inherent properties (i.e. bias and variance), enabling the integration of these factors into the existing index model without the need for retraining with new data. Our comprehensive experiments show that the system surpasses state-of-the-art indexing solutions (both traditional and ML-based), achieving an increase in throughput of up to 3.12 times with 1000 times less memory usage., Comment: 20 pages, ACM IDEAS 2024
Published: 2024

48. Gaussian Approximations for the $k$th coordinate of sums of random vectors

Author: Ding, Yixi, Li, Qizhai, Shi, Yuke, and Zhang, Wei
Subjects: Mathematics - Statistics Theory
Abstract: We consider the problem of Gaussian approximation for the $\kappa$th coordinate of a sum of high-dimensional random vectors. Such a problem has been studied previously for $\kappa=1$ (i.e., maxima). However, in many applications, a general $\kappa\geq1$ is of great interest, which is addressed in this paper. We make four contributions: 1) we first show that the distribution of the $\kappa$th coordinate of a sum of random vectors, $\boldsymbol{X}= (X_{1},\cdots,X_{p})^{\sf T}= n^{-1/2}\sum_{i=1}^n \boldsymbol{x}_{i}$, can be approximated by that of Gaussian random vectors and derive their Kolmogorov's distributional difference bound; 2) we provide the theoretical justification for estimating the distribution of the $\kappa$th coordinate of a sum of random vectors using a Gaussian multiplier procedure, which multiplies the original vectors with i.i.d. standard Gaussian random variables; 3) we extend the Gaussian approximation result and Gaussian multiplier bootstrap procedure to a more general case where $\kappa$ diverges; 4) we further consider the Gaussian approximation for a square sum of the first $d$ largest coordinates of $\boldsymbol{X}$. All these results allow the dimension $p$ of random vectors to be as large as or much larger than the sample size $n$.
Published: 2024

49. Integrating Controllable Motion Skills from Demonstrations

Author: Liao, Honghao, Li, Zhiheng, Meng, Ziyu, Song, Ran, Li, Yibin, and Zhang, Wei
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: The expanding applications of legged robots require their mastery of versatile motion skills. Correspondingly, researchers must address the challenge of integrating multiple diverse motion skills into controllers. While existing reinforcement learning (RL)-based approaches have achieved notable success in multi-skill integration for legged robots, these methods often require intricate reward engineering or are restricted to integrating a predefined set of motion skills constrained by specific task objectives, resulting in limited flexibility. In this work, we introduce a flexible multi-skill integration framework named Controllable Skills Integration (CSI). CSI enables the integration of a diverse set of motion skills with varying styles into a single policy without the need for complex reward tuning. Furthermore, in a hierarchical control manner, the trained low-level policy can be coupled with a high-level Natural Language Inference (NLI) module to enable preliminary language-directed skill control. Our experiments demonstrate that CSI can flexibly integrate a diverse array of motion skills more comprehensively and facilitate the transitions between different skills. Additionally, CSI exhibits good scalability as the number of motion skills to be integrated increases significantly.
Published: 2024

50. DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

Author: Ding, Ruixin, Chen, Yuqi, Lan, Yu-Ting, and Zhang, Wei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning, I.2.6
Abstract: Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse characteristics across various scales. Moreover, time series data exhibit diverse variations and fluctuations across different temporal scales, which traditional approaches struggle to model effectively. In this paper, we propose a dynamic tokenizer with a dynamic sparse learning algorithm to capture diverse receptive fields and sparse patterns of time series data. In order to build hierarchical receptive fields, we develop a multi-scale Transformer model, coupled with multi-scale sequence extraction, capable of capturing multi-resolution features. Additionally, we introduce a group-aware rotary position encoding technique to enhance intra- and inter-group position awareness among representations across different temporal scales. Our proposed model, named DRFormer, is evaluated on various real-world datasets, and experimental results demonstrate its superiority compared to existing methods. Our code is available at: https://github.com/ruixindingECNU/DRFormer.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

172,331 results on '"Zhang, Wei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources