103,426 results on '"Li, Ming"'
Search Results
2. Antecedents of Spot and Contract Freight Mix in the Truckload Sector
- Author
-
Li, Ming, Bolumole, Yemisi A., and Miller, Jason W.
- Published
- 2022
3. A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection
- Author
-
Zhang, Yucong, Liu, Juan, Tian, Yao, Liu, Haifeng, and Li, Ming
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In contrast to human speech, machine-generated sounds of the same type often exhibit consistent frequency characteristics and discernible temporal periodicity. However, leveraging these dual attributes in anomaly detection remains relatively under-explored. In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. One pathway uses a novel Frequency-and-Time Excited Network (FTE-Net) to learn the salient features across frequency and time axes of the spectrogram. It incorporates a Frequency-and-Time Chunkwise Encoder (FTC-Encoder) and an excitation network. The other pathway uses a 1D convolutional network for utterance-level spectrum. Experimental results on the DCASE 2023 task 2 dataset show the state-of-the-art performance of our proposed method. Moreover, visualizations of the intermediate feature maps in the excitation network are provided to illustrate the effectiveness of our method., Comment: This Paper has been accepted to ICASSP 2024
- Published
- 2024
4. Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis
- Author
-
Zhang, Yucong, Zou, Xin, Yang, Jinshan, Chen, Wenjun, Liang, Faya, and Li, Ming
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements. The system includes a strobing video extraction module that identifies frames by analyzing hue, saturation, and value fluctuations. MASL also provides effective metrics for vocal cord paralysis detection, employing a two-stage glottis segmentation process using U-Net followed by diffusion-based refinement to reduce false positives. Instead of glottal area waveforms, MASL estimates anterior glottic angle waveforms (AGAW) from glottis masks, evaluating both left and right vocal cords to detect unilateral vocal cord paralysis (UVFP). By comparing AGAW variances, MASL distinguishes between left and right paralysis. Ablation studies and experiments on public and real-world datasets validate MASL's segmentation module and demonstrate its ability to provide reliable metrics for UVFP diagnosis.
- Published
- 2024
5. Dynamic Hybrid Beamforming Designs for ELAA Near-Field Communications
- Author
-
Liu, Mengzhen, Li, Ming, Liu, Rang, and Liu, Qian
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Extremely large-scale antenna array (ELAA) is a key candidate technology for the sixth generation (6G) mobile networks. Nevertheless, using substantial numbers of antennas to transmit high-frequency signals in ELAA systems significantly exacerbates the near-field effect. Unfortunately, traditional hybrid beamforming schemes are highly vulnerable to ELAA near-field communications. To effectively mitigate severe near-field effect, we propose a novel dynamic hybrid beamforming architecture for ELAA systems, in which each antenna is either adaptively connected to one radio frequency (RF) chain for signal transmission or deactivated for power saving. For the case that instantaneous channel state information (CSI) is available during each channel coherence time, a real-time dynamic hybrid beamforming design is developed to maximize the achievable sum rate under the constraints of the constant modulus of phase-shifters (PSs), non-overlapping dynamic connection network and total transmit power. When instantaneous CSI cannot be easily obtained in real-time, we propose a two-timescale dynamic hybrid beamforming design, which optimizes analog beamformer in long-timescale and digital beamformer in short-timescale, with the goal of maximizing ergodic sum-rate under the same constraints. Simulation results demonstrate the advantages of the proposed dynamic hybrid beamforming architecture and the effectiveness of the developed algorithms for ELAA near-field communications., Comment: 14 pages, 10 figures
- Published
- 2024
6. Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice
- Author
-
Li, Ming-Hao, Biswas, Sounak, and Parameswaran, S. A.
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
We study fermionic quantum spin liquids (QSLs) on the three-dimensonal trillium lattice of corner-sharing triangles. We are motivated by recent experimental and theoretical investigations that have explored various classical and quantum spin liquid states on similar networks of triangular motifs with strong geometric frustration. Using the framework of Projective Symmetry Groups (PSG), we obtain a classification of all symmetric $\mathsf{Z}_2$ and $\mathsf{U}(1)$ QSLs on the trillium lattice. We find 2 $\mathsf{Z}_2$ spin-liquids, and a single $\mathsf{U}(1)$ spin-liquid which is proximate to one of the $\mathsf{Z}_2$ states. The small number of solutions reflects the constraints imposed by the two non-symmorphic symmetries in the space group of trillium. Using self-consistency conditions of the mean-field equations, we obtain the spinon band-structure and spin structure factors corresponding to these states. All three of our spin liquids are gapless at their saddle points: the $\mathsf{Z}_2$ QSLs are both nodal, while the $\mathsf{U}(1)$ case hosting a spinon Fermi surface. One of our $\mathsf{Z}_2$ spin liquids hosts a stable gapless nodal star, that is protected by projective symmetries against additions of further neighbour terms in the mean field ansatz. We comment on directions for further work.
- Published
- 2024
7. USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
- Author
-
Zeng, Bang and Li, Ming
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech. Traditionally, this process has relied on extracting a speaker embedding from a reference speech, necessitating a speaker recognition model. However, identifying an appropriate speaker recognition model can be challenging, and using the target speaker embedding as reference information may not be optimal for target speaker extraction tasks. This paper introduces a Universal Speaker Embedding-Free Target Speaker Extraction (USEF-TSE) framework that operates without relying on speaker embeddings. USEF-TSE utilizes a multi-head cross-attention mechanism as a frame-level target speaker feature extractor. This innovative approach allows mainstream speaker extraction solutions to bypass the dependency on speaker recognition models and to fully leverage the information available in the enrollment speech, including speaker characteristics and contextual details. Additionally, USEF-TSE can seamlessly integrate with any time-domain or time-frequency domain speech separation model to achieve effective speaker extraction. Experimental results show that our proposed method achieves state-of-the-art (SOTA) performance in terms of Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) on the WSJ0-2mix, WHAM!, and WHAMR! datasets, which are standard benchmarks for monaural anechoic, noisy and noisy-reverberant two-speaker speech separation and speaker extraction., Comment: 13 pages, 6 figures
- Published
- 2024
8. Integrated photonic nonreciprocal devices based on susceptibility-programmable medium
- Author
-
Zhang, Yan-Lei, Li, Ming, Xu, Xin-Biao, Wang, Zhu-Bo, Dong, Chun-Hua, Guo, Guang-Can, Zou, Chang-Ling, and Zou, Xu-Bo
- Subjects
Physics - Optics - Abstract
The switching and control of optical fields based on nonlinear optical effects are often limited to relatively weak nonlinear susceptibility and strong optical pump fields. Here, an optical medium with programmable susceptibility tensor based on polarizable atoms is proposed. Under a structured optical pump, the ground state population of atoms could be efficiently controlled by tuning the chirality and intensity of optical fields, and thus the optical response of the medium is programmable in both space and time. We demonstrate the potential of this approach by engineering the spatial distribution of the complex susceptibility tensor of the medium in photonic structures to realize nonreciprocal optical effects. Specifically, we investigate the advantages of chiral interaction between atoms and photons in an atom-cladded waveguide, theoretically showing that reconfigurable, strong, and fastly switchable isolation of optical signals in a selected optical mode is possible. The susceptibility-programmable medium provides a promising way to efficiently control the optical field, opening up a wide range of applications for integrated photonic devices and structured optics., Comment: 7 pages, 4 figures
- Published
- 2024
9. One-Index Vector Quantization Based Adversarial Attack on Image Classification
- Author
-
Fan, Haiju, Qin, Xiaona, Chen, Shuang, Shum, Hubert P. H., and Li, Ming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9% and 77.4% of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.
- Published
- 2024
10. Kasner eons in Lovelock black holes
- Author
-
Bueno, Pablo, Cano, Pablo A., Hennigar, Robie A., and Li, Ming-Da
- Subjects
High Energy Physics - Theory ,General Relativity and Quantum Cosmology - Abstract
In the vicinity of space-like singularities, general relativity predicts that the metric behaves, at each point, as a Kasner space which undergoes a series of "Kasner epochs" and "eras" characterized by certain transition rules. The period during which this process takes place defines a "Kasner eon", which comes to an end when higher-curvature or quantum effects become relevant. When higher-curvature densities are included in the action, spacetime can undergo transitions into additional Kasner eons. During each eon, the metric behaves locally as a Kasner solution to the higher-curvature density controlling the dynamics. In this paper we identify the presence of Kasner eons in the interior of static and spherically symmetric Lovelock gravity black holes. We determine the conditions under which eons occur and study the Kasner metrics which characterize them, as well as the transitions between them. We show that the null energy condition implies a monotonicity property for the effective Kasner exponent at the end of the Einsteinian eon. We also characterize the Kasner solutions of more general higher-curvature theories of gravity. In particular, we observe that the Einstein gravity condition that the sum of the Kasner exponents adds up to one, $\sum_{i=1}^{D-1}p_i=1$, admits a universal generalization in the form of a family of Kasner metrics satisfying $\sum_{i=1}^{D-1}p_i=2n -1$ which exists for any order-$n$ higher-curvature density and in general dimensions., Comment: 27 pages, 5 figures
- Published
- 2024
11. Dynamic compensation for pump-induced frequency shift in Kerr-cat qubit initialization
- Author
-
Xu, Yifang, Hua, Ziyue, Wang, Weiting, Ma, Yuwei, Li, Ming, Chen, Jiajun, Zhou, Jie, Pan, Xiaoxuan, Xiao, Lintao, Huang, Hongwei, Cai, Weizhou, Ai, Hao, Liu, Yu-xi, Zou, Chang-Ling, and Sun, Luyan
- Subjects
Quantum Physics - Abstract
The noise-biased Kerr-cat qubit is an attractive candidate for fault-tolerant quantum computation; however, its initialization faces challenges due to the squeezing pump-induced frequency shift (PIFS). Here, we propose and demonstrate a dynamic compensation method to mitigate the effect of PIFS during the Kerr-cat qubit initialization. Utilizing a novel nonlinearity-engineered triple-loop SQUID device, we realize a stabilized Kerr-cat qubit and validate the advantages of the dynamic compensation method by improving the initialization fidelity from 57% to 78%, with a projected fidelity of 91% after excluding state preparation and measurement errors. Our results not only advance the practical implementation of Kerr-cat qubits, but also provide valuable insights into the fundamental adiabatic dynamics of these systems. This work paves the way for scalable quantum processors that leverage the bias-preserving properties of Kerr-cat qubits.
- Published
- 2024
12. Quantum state transfer between superconducting cavities via exchange-free interactions
- Author
-
Zhou, Jie, Li, Ming, Wang, Weiting, Cai, Weizhou, Hua, Ziyue, Xu, Yifang, Pan, Xiaoxuan, Xue, Guangming, Zhang, Hongyi, Song, Yipu, Yu, Haifeng, Zou, Chang-Ling, and Sun, Luyan
- Subjects
Quantum Physics - Abstract
We propose and experimentally demonstrate a novel protocol for transferring quantum states between superconducting cavities using only continuous two-mode squeezing interactions, without exchange of photonic excitations between cavities. This approach conceptually resembles quantum teleportation, where quantum information is transferred between different nodes without directly transmitting carrier photons. In contrast to the discrete operations of entanglement and Bell-state measurement in teleportation, our scheme is symmetric and continuous. We experimentally realize coherent and bidirectional transfer of arbitrary quantum states, including bosonic quantum error correction codes. Our results offer new insights into the quantum state transfer and quantum teleportation. In particular, our demonstration validates a new approach to realize quantum transducers, and might find applications in a wide range of physical platforms.
- Published
- 2024
13. A Joint Learning Model with Variational Interaction for Multilingual Program Translation
- Author
-
Du, Yali, Sun, Hui, and Li, Ming
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Programming Languages - Abstract
Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages using bilingual parallel data. However, parallel data is difficult to collect for some language pairs, and the distribution of program semantics across languages can shift, posing challenges for pairwise program translation. In this paper, we argue that jointly learning a unified model to translate code across multiple programming languages is superior to separately learning from bilingual parallel data. We propose Variational Interaction for Multilingual Program Translation~(VIM-PT), a disentanglement-based generative approach that jointly trains a unified model for multilingual program translation across multiple languages. VIM-PT disentangles code into language-shared and language-specific features, using variational inference and interaction information with a novel lower bound, then achieves program translation through conditional generation. VIM-PT demonstrates four advantages: 1) captures language-shared information more accurately from various implementations and improves the quality of multilingual program translation, 2) mines and leverages the capability of non-parallel data, 3) addresses the distribution shift of program semantics across languages, 4) and serves as a unified model, reducing deployment complexity., Comment: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)
- Published
- 2024
14. Photonic time-delayed reservoir computing based on lithium niobate microring resonators
- Author
-
Wang, Yuan, Li, Ming, Gao, Mingyi, Zou, Chang-Ling, Dong, Chun-Hua, Yang, Xiaoniu, Xuan, Qi, and Ren, HongLiang
- Subjects
Physics - Optics - Abstract
On-chip micro-ring resonators (MRRs) have been proposed for constructing delay reservoir computing (RC) systems, offering a highly scalable, high-density computational architecture that is easy to manufacture. However, most proposed RC schemes have utilized passive integrated optical components based on silicon-on-insulator (SOI), and RC systems based on lithium niobate on insulator (LNOI) have not yet been reported. The nonlinear optical effects exhibited by lithium niobate microphotonic devices introduce new possibilities for RC design. In this work, we design an RC scheme based on a series-coupled MRR array, leveraging the unique interplay between thermo-optic nonlinearity and photorefractive effects in lithium niobate. We first demonstrate the existence of three regions defined by wavelength detuning between the primary LNOI micro-ring resonator and the coupled micro-ring array, where one region achieves an optimal balance between nonlinearity and high memory capacity at extremely low input energy, leading to superior computational performance. We then discuss in detail the impact of each ring's nonlinearity and the system's symbol duration on performance. Finally, we design a wavelength-division multiplexing (WDM) based multi-task parallel computing scheme, showing that the computational performance for multiple tasks matches that of single-task computations., Comment: 17 pages, 6 figures
- Published
- 2024
15. Continual Dialogue State Tracking via Reason-of-Select Distillation
- Author
-
Feng, Yujie, Liu, Bo, Dong, Xiaoyu, Lu, Zexin, Zhan, Li-Ming, Wu, Xiao-Ming, and Lam, Albert Y. S.
- Subjects
Computer Science - Computation and Language - Abstract
An ideal dialogue system requires continuous skill acquisition and adaptation to new tasks while retaining prior knowledge. Dialogue State Tracking (DST), vital in these systems, often involves learning new services and confronting catastrophic forgetting, along with a critical capability loss termed the "Value Selection Quandary." To address these challenges, we introduce the Reason-of-Select (RoS) distillation method by enhancing smaller models with a novel 'meta-reasoning' capability. Meta-reasoning employs an enhanced multi-domain perspective, combining fragments of meta-knowledge from domain-specific dialogues during continual learning. This transcends traditional single-perspective reasoning. The domain bootstrapping process enhances the model's ability to dissect intricate dialogues from multiple possible values. Its domain-agnostic property aligns data distribution across different domains, effectively mitigating forgetting. Additionally, two novel improvements, "multi-value resolution" strategy and Semantic Contrastive Reasoning Selection method, significantly enhance RoS by generating DST-specific selection chains and mitigating hallucinations in teachers' reasoning, ensuring effective and reliable knowledge transfer. Extensive experiments validate the exceptional performance and robust generalization capabilities of our method. The source code is provided for reproducibility., Comment: Accepted to ACL 2024 Findings
- Published
- 2024
16. Cosmological Prediction of the Void and Galaxy Clustering Measurements in the CSST Spectroscopic Survey
- Author
-
Song, Yingxiao, Xiong, Qi, Gong, Yan, Deng, Furen, Chan, Kwan Chuen, Chen, Xuelei, Guo, Qi, Li, Guoliang, Li, Ming, Liu, Yun, Luo, Yu, Pei, Wenxiang, and Wei, Chengliang
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
The void power spectrum is related to the clustering of low-density regions in the large-scale structure (LSS) of the Universe, and can be used as an effective cosmological probe to extract the information of the LSS. We generate the galaxy mock catalogs from Jiutian simulation, and identify voids using the watershed algorithm for studying the cosmological constraint strength of the China Space Station Telescope (CSST) spectroscopic survey. The galaxy and void auto power spectra and void-galaxy cross power spectra at $z=0.3$, 0.6, and 0.9 are derived from the mock catalogs. To fit the full power spectra, we propose to use the void average effective radius at a given redshift to simplify the theoretical model, and adopt the Markov Chain Monte Carlo (MCMC) technique to implement the constraints on the cosmological and void parameters. The systematical parameters, such as galaxy and void biases, and noise terms in the power spectra are also included in the fitting process. We find that our theoretical model can correctly extract the cosmological information from the galaxy and void power spectra, which demonstrates its feasibility and effectivity. The joint constraint accuracy of the cosmological parameters can be improved by $\sim20\%$ compared to that from the galaxy power spectrum only. The fitting results of the void density profile and systematical parameters are also well constrained and consistent with the expectation. This indicates that the void clustering measurement can be an effective complement to the galaxy clustering probe, especially for the next generation galaxy surveys., Comment: 11 pages, 5 figures, 2 tables
- Published
- 2024
17. Motif analysis and passing behavior in football passing networks
- Author
-
Li, Ming-Xia, Xu, Li-Gong, and Zhou, Wei-Xing
- Subjects
Physics - Physics and Society - Abstract
The strategic orchestration of football matchplays profoundly influences game outcomes, motivating a surge in research aimed at uncovering tactical nuances through social network analysis. In this paper, we delve into the microscopic intricacies of cooperative player interactions by focusing on triadic motifs within passing networks. Employing a dataset compiled from 3,199 matches across 18 premier football competitions, we identify successful passing activities and construct passing networks for both home and away teams. Our findings highlight a pronounced disparity in passing efficiency, with home teams demonstrating superior performance relative to away teams. Through the identification and analysis of 3-motifs, we find that the motifs with more bidirectional links are more significant. It reveals that footballers exhibit a strong tendency towards backward passes rather than direct forward attacks. Comparing the results of games, we find that some motifs are related to the goal difference. It indicates that direct and effective forward passing significantly amplifies a team's offensive capabilities, whereas an abundance of passbacks portends an elevated risk of offensive futility. These revelations affirm the efficacy of network motif analysis as a potent analytical tool for unveiling the foundational components of passing dynamics among footballers and for decoding the complex tactical behaviors and interaction modalities that underpin team performance.
- Published
- 2024
18. Could the newly reported $X(2600)$ be the $\eta_2(4D)$ meson?
- Author
-
Wang, Li-Ming, Tian, Wen-Xin, and Liu, Xiang
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Experiment - Abstract
The BESIII Collaboration recently reported the observation of the $X(2600)$ state in the $\eta^\prime \pi^+\pi^-$ invariant mass spectrum of $J/\psi \to \gamma \eta^\prime \pi^+\pi^-$, with a significance exceeding $20\sigma$. Its $J^{PC}$ quantum numbers could be either $0^{-+}$ or $2^{-+}$. We explore the possibility of the $X(2600)$ being a higher state of the $\eta_2$ meson family. Through ($n,M^2$) trajectory analysis and the Quark Pair Creation model, we propose that the $X(2600)$ could be the third radial excitation of the $\eta_2(1870)$. However, the theoretical decay width of the $\eta_2(4D)$ is smaller than the experimental width of the $X(2600)$, and branching ratio calculations suggest inconsistencies, leading us to exclude the $X(2600)$ as the $\eta_2(4D)$. Our findings contribute to the understanding of the $X(2600)$ and provide insights for future experimental searches for excited the $\eta_2$ states., Comment: 8 pages, 6 figures, 4 tables
- Published
- 2024
19. Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking
- Author
-
Lyu, Zhi-Cun, Li, Xin-Ye, Xie, Zheng, and Li, Ming
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Software Engineering - Abstract
Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method., Comment: Accepted by Frontier of Computer Science
- Published
- 2024
20. Engineering Rydberg-pair interactions in divalent atoms with hyperfine-split ionization thresholds
- Author
-
Hummel, Frederic, Weber, Sebastian, Moegerle, Johannes, Menke, Henri, King, Jonathan, Bloom, Benjamin, Hofferberth, Sebastian, and Li, Ming
- Subjects
Physics - Atomic Physics ,Quantum Physics - Abstract
Quantum information processing with neutral atoms relies on Rydberg excitation for entanglement generation. While the use of heavy divalent or open-shell elements, such as strontium or ytterbium, has benefits due to their optically active core and a variety of possible qubit encodings, their Rydberg structure is generally complex. For some isotopes in particular, hyperfine interactions are relevant even for highly excited electronic states. We employ multi-channel quantum defect theory to infer the Rydberg structure of isotopes with non-zero nuclear spin and perform non-perturbative Rydberg-pair interaction calculations. We find that due to the high level density and sensitivities to external fields, experimental parameters must be precisely controlled. Specifically in ${}^{87}$Sr, we study an intrinsic F\"orster resonance, unique to divalent atoms with hyperfine-split thresholds, which simultaneously provides line stability with respect to external field fluctuations and enhanced long-range interactions. Additionally, we provide parameters for pair states that can be effectively described by single-channel Rydberg series. The explored pair states provide exciting opportunities for applications in the blockade regime as well as for more exotic long-range interactions such as largely flat, distance-independent potentials., Comment: 12 pages, 7 figures
- Published
- 2024
21. Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning
- Author
-
Wang, Yikang, Wang, Xingming, Nishizaki, Hiromitsu, and Li, Ming
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based speech enhancement front-end joint optimization (TL-SEJ) method, investigating its effectiveness in improving robustness against noise and reverberation. We evaluated the proposed method's performance through a series of comparative and ablation experiments. The experimental results show that, across different signal-to-noise ratio test conditions, the proposed TL-SEJ method improves recognition accuracy by 2.7% to 15.8% compared to the baseline. Compared to conventional data augmentation methods, our system achieves an accuracy improvement ranging from 0.7% to 5.8% in various noisy conditions and from 1.7% to 2.8% under different RT60 reverberation scenarios. These experiments demonstrate that the proposed method effectively enhances system robustness in noisy and reverberant conditions., Comment: 29 pages, 4 figures, Journal Papers
- Published
- 2024
22. GenRC: Generative 3D Room Completion from Sparse Image Collections
- Author
-
Li, Ming-Feng, Ku, Yueh-Feng, Yen, Hong-Xuan, Liu, Chi, Liu, Yu-Lun, Chen, Albert Y. C., Kuo, Cheng-Hao, and Sun, Min
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC, Comment: ECCV 2024
- Published
- 2024
23. VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
- Author
-
Lin, Yuke, Cheng, Ming, Zhang, Fulin, Gao, Yingying, Zhang, Shilei, and Li, Ming
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io., Comment: Accepted By InterSpeech2024
- Published
- 2024
24. Explosive percolation in finite dimensions
- Author
-
Li, Ming, Wang, Junfeng, and Deng, Youjin
- Subjects
Condensed Matter - Statistical Mechanics - Abstract
Explosive percolation (EP) has received significant research attention due to its rich and anomalous phenomena near criticality. In our recent study [Phys. Rev. Lett. 130, 147101 (2023)], we demonstrated that the correct critical behaviors of the EP in infinite dimensions (complete graph) can be accurately extracted using the event-based method, with finite-size scaling behaviors still described by the standard finite-size scaling theory. We perform an extensive simulation of the EPs on hypercubic lattices ranging from dimensions $d=2$ to $6$, and find that the critical behaviors consistently obey the standard finite-size scaling theory. Consequently, we obtain a high-precision determination of the percolation thresholds and critical exponents, revealing that the EPs governed by the product and sum rules belong to different universality classes. Remarkably, despite the mean of the dynamic pseudo-critical point $\mathcal{T}_L$ deviating from the infinite-lattice criticality by a distance determined by the $d$-dependent correlation-length exponent, $\mathcal{T}_L$ follows a normal (Gaussian) distribution across all dimensions, with a standard deviation proportional to $1/\sqrt{V}$, where $V$ denotes the system volume. A theoretical argument associated with the central-limit theorem is further proposed to understand the probability distribution of $\mathcal{T}_L$. These findings offer a comprehensive understanding of critical behaviors in EPs across various dimensions., Comment: 10 pages, 9 figures
- Published
- 2024
25. Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA
- Author
-
Xie, Qin, Li, Ming, and Enkhtur, Ariunaa
- Subjects
Computer Science - Computers and Society - Abstract
This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion., Comment: 14 pages, 1 table
- Published
- 2024
26. A timing view of the additional high-energy spectral component discovered in the black hole candidate Swift J1727.8-1613
- Author
-
Yang, Zi-Xu, Zhang, Liang, Zhang, Shuang-Nan, Tao, L., Zhang, Shu, Ma, Ruican, Bu, Qingcui, Huang, Yue, Liu, He-Xin, Yu, Wei, Xiao, Guang C., Wang, Peng-Ju, Feng, Hua, Song, Li-Ming, Ma, Xiang, Ge, Mingyu, Zhao, QingChang, and Qu, J. L.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. In the high energy band, an extra hard component is observed in additional to the standard thermal Comptonization component at similar energy band. The value of the QPO HE-rms excess is not only correlated with the disk parameters and the photon index of the standard Comptonization component, but also exhibits a moderate positive correlation with the flux of the additional hard spectral component. No features in the QPO phase-lag spectra are seen corresponding to the additional hard component. We propose that the additional hard component in the spectrum may originate from jet emission and the associated QPO HE-rms excess can be explained by the precession of the jet base.
- Published
- 2024
27. Topological edge states in photonic Floquet insulator with unpaired Dirac cones
- Author
-
Zhong, Hua, Kartashov, Yaroslav V., Li, Yongdong, Li, Ming, and Zhang, Yiqi
- Subjects
Physics - Optics - Abstract
Topological insulators are most frequently constructed using lattices with specific degeneracies in their linear spectra, such as Dirac points. For a broad class of lattices, such as honeycomb ones, these points and associated Dirac cones generally appear in non-equivalent pairs. Simultaneous breakup of the time-reversal and inversion symmetry in systems based on such lattices may result in the formation of the unpaired Dirac cones in bulk spectrum, but the existence of topologically protected edge states in such structures remains an open problem. Here photonic Floquet insulator on honeycomb lattice with unpaired Dirac cones in its spectrum is introduced that can support unidirectional edge states appearing at the edge between two regions with opposite sublattice detuning. Topological properties of this system are characterized by the nonzero valley Chern number. Remarkably, edge states in this system can circumvent sharp corners without inter-valley scattering even though there is no total forbidden gap in the spectrum. Our results reveal unusual interplay between two different physical mechanisms of creation of topological edge states based on simultaneous breakup of different symmetries of the system., Comment: 9 pages, 7 figures. To appear in Photonics Research. Comments are welcome
- Published
- 2024
- Full Text
- View/download PDF
28. A versatile quantum microwave photonic signal processing platform based on coincidence window selection technique
- Author
-
Li, Xinghua, Guo, Yifan, Xiang, Xiao, Quan, Runai, Cao, Mingtao, Dong, Ruifang, Liu, Tao, Li, Ming, and Zhang, Shougang
- Subjects
Physics - Optics ,Quantum Physics - Abstract
Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatility of processing the quantum microwave photonic signal by utilizing coincidence window selection on the biphoton coincidence distribution. The demonstration includes finely-tunable RF phase shifting, flexible multi-tap transversal filtering (with up to 15 taps), and photonically implemented RF mixing, leveraging the nonlocal RF mapping characteristic of QMWP. These accomplishments significantly enhance the capability of microwave photonic systems in processing ultra-weak signals, opening up new possibilities for various applications.
- Published
- 2024
29. Quantum microwave photonic mixer with a large spurious-free dynamic range
- Author
-
Li, Xinghua, Guo, Yifan, Xiang, Xiao, Quan, Runai, Cao, Mingtao, Dong, Ruifang, Liu, Tao, Li, Ming, and Zhang, Shougang
- Subjects
Physics - Optics ,Quantum Physics - Abstract
As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solution for improving SFDR in terms of higher-order harmonic distortion. In this paper, we demonstrate two types of quantum microwave photonic mixers based on the configuration of the intensity modulators: cascade-type and parallel-type. Leveraging the nonlocal RF signal encoding capability, both types of quantum microwave photonic mixers not only exhibit the advantage of dual-channel output but also present significant improvement in SFDR. Specifically, the parallel-type quantum microwave photonic mixer achieves a remarkable SFDR value of 113.6 dB.Hz1/2, which is 30 dB better than that of the cascade-type quantum microwave photonic mixer. When compared to the classical microwave photonic mixer, this enhancement reaches a notable 53.6 dB at the expense of 8 dB conversion loss. These results highlight the superiority of quantum microwave photonic mixers in the fields of microwave and millimeter-wave systems. Further applying multi-photon frequency entangled sources as optical carriers, the dual-channel microwave frequency conversion capability endowed by the quantum microwave photonic mixer can be extended to enhance the performance of multiple-paths microwave mixing which is essential for radar net systems.
- Published
- 2024
30. Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks
- Author
-
Yang, Guangrui, Li, Jianfei, Li, Ming, Feng, Han, and Zhou, Ding-Xuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Functional Analysis - Abstract
In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order.
- Published
- 2024
31. Four Parallel Pathways in T4 Ligase-Catalyzed Repair of Nicked DNA with Diverse Bending Angles.
- Author
-
Li, Na, Ma, Jianbing, Fu, Hang, Yang, Zhiwei, Xu, Chunhua, Li, Haihong, Zhao, Yimin, Zhao, Yizhen, Chen, Shuyu, Gou, Lu, Zhang, Xinghua, Zhang, Shengli, Li, Ming, Hou, Ximiao, Zhang, Lei, and Lu, Ying
- Subjects
T4 DNA ligase ,conformational dynamics ,parallel enzymatic pathways ,protein machines ,single molecules ,DNA Ligases ,DNA ,DNA Repair ,Fluorescence Resonance Energy Transfer ,Nucleic Acid Conformation ,Bacteriophage T4 ,Microscopy ,Electron - Abstract
The structural diversity of biological macromolecules in different environments contributes complexity to enzymological processes vital for cellular functions. Fluorescence resonance energy transfer and electron microscopy are used to investigate the enzymatic reaction of T4 DNA ligase catalyzing the ligation of nicked DNA. The data show that both the ligase-AMP complex and the ligase-AMP-DNA complex can have four conformations. This finding suggests the parallel occurrence of four ligation reaction pathways, each characterized by specific conformations of the ligase-AMP complex that persist in the ligase-AMP-DNA complex. Notably, these complexes have DNA bending angles of ≈0°, 20°, 60°, or 100°. The mechanism of parallel reactions challenges the conventional notion of simple sequential reaction steps occurring among multiple conformations. The results provide insights into the dynamic conformational changes and the versatile attributes of T4 DNA ligase and suggest that the parallel multiple reaction pathways may correspond to diverse T4 DNA ligase functions. This mechanism may potentially have evolved as an adaptive strategy across evolutionary history to navigate complex environments.
- Published
- 2024
32. Approximate DCT and Quantization Techniques for Energy-Constrained Image Sensors
- Author
-
Li, Ming-Che, Ghosh, Archisman, and Sen, Shreyas
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Recent expansions in multimedia devices gather enormous amounts of real-time images for processing and inference. The images are first compressed using compression schemes, like JPEG, to reduce storage costs and power for transmitting the captured data. Due to inherent error resilience and imperceptibility in images, JPEG can be approximated to reduce the required computation power and area. This work demonstrates the first end-to-end approximation computing-based optimization of JPEG hardware using i) an approximate division realized using bit-shift operators to reduce the complexity of the quantization block, ii) loop perforation, and iii) precision scaling on top of a multiplier-less fast DCT architecture to achieve an extremely energy-efficient JPEG compression unit which will be a perfect fit for power/bandwidth-limited scenario. Furthermore, a gradient descent-based heuristic composed of two conventional approximation strategies, i.e., Precision Scaling and Loop Perforation, is implemented for tuning the degree of approximation to trade off energy consumption with the quality degradation of the decoded image. The entire RTL design is coded in Verilog HDL, synthesized, mapped to TSMC 65nm CMOS technology, and simulated using Cadence Spectre Simulator under 25$^{\circ}$\textbf{C}, TT corner. The approximate division approach achieved around $\textbf{28\%}$ reduction in the active design area. The heuristic-based approximation technique combined with accelerator optimization achieves a significant energy reduction of $\textbf{36\%}$ for a minimal image quality degradation of $\textbf{2\%}$ SAD. Simulation results also show that the proposed architecture consumes 15uW at the DCT and quantization stages to compress a colored 480p image at 6fps.
- Published
- 2024
33. MIMO-OFDM ISAC Waveform Design for Range-Doppler Sidelobe Suppression
- Author
-
Li, Peishi, Li, Ming, Liu, Rang, Liu, Qian, and Swindlehurst, A. Lee
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since the dual-functional waveform carries communication information, its random nature leads to high range-Doppler sidelobes in the ambiguity function, which in turn degrades radar sensing performance. To suppress range-Doppler sidelobes, we propose a novel symbol-level precoding (SLP) based waveform design for MIMO-OFDM ISAC systems by fully exploiting the temporal degrees of freedom (DoFs). Our goal is to minimize the range-Doppler integrated sidelobe level (ISL) while satisfying the constraints of target illumination power, multi-user communication quality of service (QoS), and constant-modulus transmission. To solve the resulting non-convex waveform design problem, we develop an efficient algorithm using the majorization-minimization (MM) and alternative direction method of multipliers (ADMM) methods. Simulation results show that the proposed waveform has significantly reduced range-Doppler sidelobes compared with signals designed only for communications and other baselines. In addition, the proposed waveform design achieves target detection and estimation performance close to that achievable by waveforms designed only for radar, which demonstrates the superiority of the proposed SLP-based ISAC approach., Comment: 13 pages, 9 figures, submitted to IEEE TWC
- Published
- 2024
34. Understanding is Compression
- Author
-
Li, Ziguang, Huang, Chao, Wang, Xuliang, Hu, Haibo, Wyeth, Cole, Bu, Dongbo, Yu, Quan, Gao, Wen, Liu, Xingwu, and Li, Ming
- Subjects
Computer Science - Information Theory ,Computer Science - Artificial Intelligence - Abstract
Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. Large language models (LLMs) understand data better than ever before. Can they help us to compress data? The LLMs may be seen to approximate the uncomputable Solomonoff induction. Therefore, under this new uncomputable paradigm, we present LMCompress. LMCompress shatters all previous lossless compression algorithms, doubling the lossless compression ratios of JPEG-XL for images, FLAC for audios, and H.264 for videos, and quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.
- Published
- 2024
35. Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation
- Author
-
Liu, Xueyu, Shi, Guangze, Wang, Rui, Lai, Yexin, Zhang, Jianan, Sun, Lele, Yang, Quan, Wu, Yongfei, Li, MIng, Han, Weixia, and Zheng, Wen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images guided only by a one-shot annotated reference. Specifically, GBMSeg first exploits the robust feature matching capabilities of the pretrained foundation model to generate initial prompt points, then introduces a series of novel automatic prompt engineering techniques across the feature and physical space to optimize the prompt scheme. Finally, GBMSeg employs a class-agnostic foundation segmentation model with the generated prompt scheme to obtain accurate segmentation results. Experimental results on our collected 2538 TEM images confirm that GBMSeg achieves superior segmentation performance with a Dice similarity coefficient (DSC) of 87.27% using only one labeled reference image in a training-free manner, outperforming recently proposed one-shot or few-shot methods. In summary, GBMSeg introduces a distinctive automatic prompt framework that facilitates robust domain-independent segmentation performance without training, particularly advancing the automatic prompting of foundation segmentation models for medical images. Future work involves automating the thickness measurement of segmented GBM and quantifying pathological indicators, holding significant potential for advancing pathology assessments in clinical applications. The source code is available on https://github.com/SnowRain510/GBMSeg, Comment: Accepted for MICCAI2024
- Published
- 2024
36. RuleR: Improving LLM Controllability by Rule-based Data Recycling
- Author
-
Li, Ming, Chen, Han, Wang, Chenguang, Nguyen, Dang, Li, Dianqi, and Zhou, Tianyi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities. The code will be released on https://github.com/MingLiiii/RuleR.
- Published
- 2024
37. Ranking LLMs by compression
- Author
-
Guo, Peijia, Li, Ziguang, Hu, Haibo, Huang, Chao, Li, Ming, and Zhang, Rui
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially the process of learning the optimal coding length. At the same time, the evaluation metric compression ratio can be obtained without actual compression, which greatly saves overhead. In this paper, we use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models., Comment: 7 pages, 4 tables
- Published
- 2024
38. PFID: Privacy First Inference Delegation Framework for LLMs
- Author
-
Yang, Haoyan, Li, Zhitao, Zhang, Yong, Wang, Jianzong, Cheng, Ning, Li, Ming, and Xiao, Jing
- Subjects
Computer Science - Computation and Language - Abstract
This paper introduces a novel privacy-preservation framework named PFID for LLMs that addresses critical privacy concerns by localizing user data through model sharding and singular value decomposition. When users are interacting with LLM systems, their prompts could be subject to being exposed to eavesdroppers within or outside LLM system providers who are interested in collecting users' input. In this work, we proposed a framework to camouflage user input, so as to alleviate privacy issues. Our framework proposes to place model shards on the client and the public server, we sent compressed hidden states instead of prompts to and from servers. Clients have held back information that can re-privatized the hidden states so that overall system performance is comparable to traditional LLMs services. Our framework was designed to be communication efficient, computation can be delegated to the local client so that the server's computation burden can be lightened. We conduct extensive experiments on machine translation tasks to verify our framework's performance., Comment: Submitted to EMNLP2024
- Published
- 2024
39. Helicity Evolution at Small $x$: Quark to Gluon and Gluon to Quark Transition Operators
- Author
-
Borden, Jeremy, Kovchegov, Yuri V., and Li, Ming
- Subjects
High Energy Physics - Phenomenology ,Nuclear Experiment ,Nuclear Theory - Abstract
We include the quark to gluon and gluon to quark shock-wave transition operators into the small Bjorken-$x$ evolution equations for helicity in the flavor-singlet channel derived earlier. While such transitions do not affect the large-$N_c$ version of the evolution equations for helicity, the large-$N_c \& N_f$ equations are affected. ($N_c$ and $N_f$ are the numbers of quark colors and flavors, respectively.) We derive the corresponding corrected large-$N_c \& N_f$ equations for the polarized dipole amplitudes contributing to the flavor-singlet quark and gluon helicity distributions in the double-logarithmic approximation (DLA), resumming powers of $\alpha_s \, \ln^2 (1/x)$ with $\alpha_s$ the strong coupling constant. We solve these equations iteratively and extract the polarized splitting functions up to four loops. We show that our splitting functions agree with the fixed-order perturbative calculations up to and including the existing three-loops results. Similar to the large-$N_c$ helicity evolution in the shock-wave approach, our large-$N_c \& N_f$ small-$x$ splitting functions agree with those obtained in the infrared evolution equations framework up to three loops, but appear to slightly disagree at four loops., Comment: 38 pages, 7 figures
- Published
- 2024
40. Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design
- Author
-
Gao, Ming, Chen, Hang, Du, Jun, Xu, Xin, Guo, Hongxiao, Bu, Hui, Yang, Jianxing, Li, Ming, and Lee, Chin-Hui
- Subjects
Computer Science - Computation and Language - Abstract
Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B., Comment: to be published in Interspeech 2024
- Published
- 2024
41. AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
- Author
-
Gong, Rong, Xue, Hongfei, Wang, Lezhi, Xu, Xin, Li, Qisheng, Xie, Lei, Bu, Hui, Wu, Shaomei, Zhou, Jiaming, Qin, Yong, Zhang, Binbin, Du, Jun, Bin, Jia, and Li, Ming
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech., Comment: Accepted by Interspeech 2024
- Published
- 2024
42. The Database and Benchmark for Source Speaker Verification Against Voice Conversion
- Author
-
Li, Ze, Lin, Yuke, Yao, Tian, Suo, Hongbin, and Li, Ming
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Voice conversion systems can transform audio to mimic another speaker's voice, thereby attacking speaker verification systems. However, ongoing studies on source speaker verification are hindered by limited data availability and methodological constraints. In this paper, we generate a large-scale converted speech database and train a batch of baseline systems based on the MFA-Conformer architecture to promote the source speaker verification task. In addition, we introduce a related task called conversion method recognition. An adapter-based multi-task learning approach is employed to achieve effective conversion method recognition without compromising source speaker verification performance. Additionally, we investigate and effectively address the open-set conversion method recognition problem through the implementation of an open-set nearest neighbor approach.
- Published
- 2024
43. The Broadband X-ray Spectral Properties during the Rising Phases of the Outburst of the New Black Hole X-ray Binary Candidate Swift J1727.8-1613
- Author
-
Liu, He-Xin, Xu, Yan-Jun, Zhang, Shuang-Nan, Yu, Wei, Huang, Yue, Tao, Lian, Zhang, Liang, Yang, Zi-Xu, Zhao, Qing-Chang, Qu, Jin-Lu, and Song, Li-Ming
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
We report data analysis results about the outburst evolution and spectral properties during the hard state of the recently discovered X-ray transient Swift J1727.8-163 as observed by \emph{Insight}-HXMT and NuSTAR. We find that the broadband X-ray spectrum of Swift J1727.8-163 is more complex than the most typical spectral patterns of black hole X-ray binary systems, with not only a comparatively weaker reflection component but also an additional spectral continuum component, manifesting itself as a hard X-ray tail beyond the thermal Comptonization description detectable below 100 keV. This additional component can be phenomenologically well fitted by adding an extra power-law model with high energy exponential cutoff in the 2-120 keV energy band. We made an attempt to explain the broadband X-ray spectral continuum with a thermal/non-thermal hybrid plasma corona scenario , and find an ultra high compactness parameter ($l_{\rm s}\sim2000$) and a steep non-thermal electron distribution ($\Gamma_{\rm inj}>4$), suggesting the source was accreting with high Eddington rates and that the electron acceleration mechanism is not very efficient. We also present a detailed multi-epoch analysis of spectral properties using \emph{Insight}-HXMT data to investigate the evolution of the key physical properties regarding the disk and corona during the hard states. No significant variation is found with the inner disk radius and the coronal temperature during this time period, and the weak reflection and hard X-ray tail features are persistent. We discuss the physical implications of our spectral analysis results in the context of disk-corona relation, particle acceleration, and jet contribution, during the rise of a black hole X-ray binary in outburst., Comment: 16 pages, 6 figures
- Published
- 2024
44. Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems
- Author
-
Zhang, Shoushuo, Xiao, Zichao, Liu, Rang, Li, Ming, Wang, Wei, and Liu, Qian
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctuations. In this letter, we propose to utilize reconfigurable intelligent surfaces (RIS) in ISAC systems to provide high-quality and controllable multipath propagation for improving the performance of fluctuating target detection and simultaneously enhancing the quality of communication services. To effectively exploit the spatial diversity offered by RIS-empowered multipath, the dual-functional transmit beamforming and the RIS reflection beamforming are jointly designed to maximize the expectation of radar signal-to-noise ratio (SNR). To solve the resulting complex non-convex optimization problem, we develop an efficient alternating optimization algorithm that utilizes majorization-minimization (MM) and alternating direction method of multipliers (ADMM) algorithms. Simulation results illustrate the advantages of multipath exploitation and the proposed beamforming design algorithm for fluctuating target detection in RIS-assisted ISAC systems., Comment: submitted to IEEE WCL
- Published
- 2024
45. Self-locked broadband Raman-electro-optic microcomb
- Author
-
Wan, Shuai, Wang, Pi-Yu, Li, Ming, Ma, Rui, Niu, Rui, Sun, Fang-Wen, Bo, Fang, Guo, Guang-Can, and Dong, Chun-Hua
- Subjects
Physics - Optics - Abstract
Optical frequency combs (OFCs), composed of equally spaced frequency tones, have spurred advancements in communications, spectroscopy, precision measurement and fundamental physics research. A prevalent method for generating OFCs involves the electro-optic (EO) effect, i.e., EO comb, renowned for its rapid tunability via precise microwave field control. Recent advances in integrated lithium niobate (LN) photonics have greatly enhanced the efficiency of EO effect, enabling the generation of broadband combs with reduced microwave power. However, parasitic nonlinear effects, such as Raman scattering and four-wave mixing, often emerge in high quality nonlinear devices, impeding the expansion of comb bandwidth and the minimization of frequency noise. Here, we tame these nonlinear effects and present a novel type of OFC, i.e., the self-locked Raman-electro-optic (REO) microcomb by leveraging the collaboration of EO, Kerr and Raman scattering processes. The spectral width of the REO microcomb benefits from the Raman gain and Kerr effect, encompassing nearly 1400 comb lines spanning over 300 nm with a fine repetition rate of 26.03 GHz, much larger than the pure EO combs. Remarkably, the system can maintain a self-locked low-noise state in the presence of multiple nonlinearities without the need for external active feedback. Our approach points to a direction for improving the performance of microcombs and paves the way for exploring new nonlinear physics, such as new laser locking techniques, through the collaboration of inevitable multiple nonlinear effects in integrated photonics.
- Published
- 2024
46. Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI
- Author
-
Jiang, Wei-Bang, Zhao, Li-Ming, and Lu, Bao-Liang
- Subjects
Computer Science - Machine Learning - Abstract
The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM., Comment: The Twelfth International Conference on Learning Representations
- Published
- 2024
47. DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge
- Author
-
Mao, Yifan, Li, Ming, Liu, Jian, Liu, Jiayang, Qin, Zihan, Chu, Chunxi, Xu, Jialei, Zhao, Wenbo, Jiang, Junjun, and Liu, Xianming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution(OoD) data. While some works try to improve the robustness of depth model under OoD data, these methods either require additional training data or lake generalizability. In this report, we introduce the DINO-SD, a novel surround-view depth estimation model. Our DINO-SD does not need additional data and has strong robustness. Our DINO-SD get the best performance in the track4 of ICRA 2024 RoboDepth Challenge., Comment: Outstanding Champion in the RoboDepth Challenge (ICRA24) https://robodrive-24.github.io/
- Published
- 2024
48. Accurate Measurement of the Lensing Magnification by BOSS CMASS Galaxies and Its Implications for Cosmology and Dark Matter
- Author
-
Xu, Kun, Jing, Y. P., Gao, Hongyu, Luo, Xiaolin, and Li, Ming
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,Astrophysics - Astrophysics of Galaxies - Abstract
Magnification serves as an independent and complementary gravitational lensing measurement to shear. We develop a novel method to achieve an accurate and robust magnification measurement around BOSS CMASS galaxies across physical scales of $0.016h^{-1}{\rm Mpc} < r_{\rm p} < 10h^{-1}{\rm Mpc}$. We first measure the excess total flux density $\delta M$ of the source galaxies in deep DECaLS photometric catalog that are lensed by CMASS galaxies. We convert $\delta M$ to magnification $\mu$ by establishing the $\delta \mu-\delta M$ relation using a deeper photometric sample. By comparing magnification measurements in three optical bands ($grz$), we constrain the dust attenuation curve and its radial distribution, discovering a steep attenuation curve in the circumgalactic medium of CMASS galaxies. We further compare dust-corrected magnification measurements to model predictions from high-resolution dark matter-only (DMO) simulations in WMAP and Planck cosmologies, as well as the hydrodynamic simulation \texttt{TNG300-1}, using precise galaxy-halo connections from the Photometric objects Around Cosmic webs method and the accurate ray-tracing algorithm \texttt{P3MLens}. For $r_{\rm p} > 70h^{-1}$ kpc, our magnification measurements are in good agreement with both WMAP and Planck cosmologies. However, at $r_{\rm p} < 70h^{-1}$ kpc, we observe an excess magnification signal, which is higher than the DMO model in Planck cosmology at $2.8\sigma$ and would be exacerbated if significant baryon feedback is included. Implications of the potential small scale discrepancy for the nature of dark matter and for the processes governing galaxy formation are discussed., Comment: 25 pages, 19 figures. Main results in Figure 9 (dust) and Figure 18 (matter). Accepted for publication in ApJ
- Published
- 2024
49. Scaling Laws for Discriminative Classification in Large Language Models
- Author
-
Wyatte, Dean, Tahmasbi, Fatemeh, Li, Ming, and Markovich, Thomas
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Modern large language models (LLMs) represent a paradigm shift in what can plausibly be expected of machine learning models. The fact that LLMs can effectively generate sensible answers to a diverse range of queries suggests that they would be useful in customer support applications. While powerful, LLMs have been observed to be prone to hallucination which unfortunately makes their near term use in customer support applications challenging. To address this issue we present a system that allows us to use an LLM to augment our customer support advocates by re-framing the language modeling task as a discriminative classification task. In this framing, we seek to present the top-K best template responses for a customer support advocate to use when responding to a customer. We present the result of both offline and online experiments where we observed offline gains and statistically significant online lifts for our experimental system. Along the way, we present observed scaling curves for validation loss and top-K accuracy, resulted from model parameter ablation studies. We close by discussing the space of trade-offs with respect to model size, latency, and accuracy as well as and suggesting future applications to explore.
- Published
- 2024
50. HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning
- Author
-
Xu, Zhuo, Bai, Lu, Cui, Lixin, Li, Ming, Wang, Yue, and Hancock, Edwin R.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. On the other hand, during the decoding process, we adopt the soft node assignment to reconstruct the original graph structure by expanding the coarsened nodes. By hierarchically performing the above compressing procedure during the decoding process as well as the expanding procedure during the decoding process, the proposed HC-GAE can effectively extract bidirectionally hierarchical structural features of the original sample graph. Furthermore, we re-design the loss function that can integrate the information from either the encoder or the decoder. Since the associated graph convolution operation of the proposed HC-GAE is restricted in each individual separated subgraph and cannot propagate the node information between different subgraphs, the proposed HC-GAE can significantly reduce the over-smoothing problem arising in the classical convolution-based GAEs. The proposed HC-GAE can generate effective representations for either node classification or graph classification, and the experiments demonstrate the effectiveness on real-world datasets.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.