Author: "Li, Ming" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Ming"' showing total 103,426 results

Start Over Author "Li, Ming"

103,426 results on '"Li, Ming"'

1. Landscape Architecture Chairs’ Retrospect and Prospect of Academic Leadership Disrupted by COVID-19

Author: Li, Ming-Han, Artunç, Sadik, Clements, Terry, and Allen, Diane Jones
Published: 2023

2. Antecedents of Spot and Contract Freight Mix in the Truckload Sector

Author: Li, Ming, Bolumole, Yemisi A., and Miller, Jason W.
Published: 2022

3. A Dual-Path Framework with Frequency-and-Time Excited Network for Anomalous Sound Detection

Author: Zhang, Yucong, Liu, Juan, Tian, Yao, Liu, Haifeng, and Li, Ming
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In contrast to human speech, machine-generated sounds of the same type often exhibit consistent frequency characteristics and discernible temporal periodicity. However, leveraging these dual attributes in anomaly detection remains relatively under-explored. In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. One pathway uses a novel Frequency-and-Time Excited Network (FTE-Net) to learn the salient features across frequency and time axes of the spectrogram. It incorporates a Frequency-and-Time Chunkwise Encoder (FTC-Encoder) and an excitation network. The other pathway uses a 1D convolutional network for utterance-level spectrum. Experimental results on the DCASE 2023 task 2 dataset show the state-of-the-art performance of our proposed method. Moreover, visualizations of the intermediate feature maps in the excitation network are provided to illustrate the effectiveness of our method., Comment: This Paper has been accepted to ICASSP 2024
Published: 2024

4. Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis

Author: Zhang, Yucong, Zou, Xin, Yang, Jinshan, Chen, Wenjun, Liang, Faya, and Li, Ming
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements. The system includes a strobing video extraction module that identifies frames by analyzing hue, saturation, and value fluctuations. MASL also provides effective metrics for vocal cord paralysis detection, employing a two-stage glottis segmentation process using U-Net followed by diffusion-based refinement to reduce false positives. Instead of glottal area waveforms, MASL estimates anterior glottic angle waveforms (AGAW) from glottis masks, evaluating both left and right vocal cords to detect unilateral vocal cord paralysis (UVFP). By comparing AGAW variances, MASL distinguishes between left and right paralysis. Ablation studies and experiments on public and real-world datasets validate MASL's segmentation module and demonstrate its ability to provide reliable metrics for UVFP diagnosis.
Published: 2024

5. Dynamic Hybrid Beamforming Designs for ELAA Near-Field Communications

Author: Liu, Mengzhen, Li, Ming, Liu, Rang, and Liu, Qian
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Extremely large-scale antenna array (ELAA) is a key candidate technology for the sixth generation (6G) mobile networks. Nevertheless, using substantial numbers of antennas to transmit high-frequency signals in ELAA systems significantly exacerbates the near-field effect. Unfortunately, traditional hybrid beamforming schemes are highly vulnerable to ELAA near-field communications. To effectively mitigate severe near-field effect, we propose a novel dynamic hybrid beamforming architecture for ELAA systems, in which each antenna is either adaptively connected to one radio frequency (RF) chain for signal transmission or deactivated for power saving. For the case that instantaneous channel state information (CSI) is available during each channel coherence time, a real-time dynamic hybrid beamforming design is developed to maximize the achievable sum rate under the constraints of the constant modulus of phase-shifters (PSs), non-overlapping dynamic connection network and total transmit power. When instantaneous CSI cannot be easily obtained in real-time, we propose a two-timescale dynamic hybrid beamforming design, which optimizes analog beamformer in long-timescale and digital beamformer in short-timescale, with the goal of maximizing ergodic sum-rate under the same constraints. Simulation results demonstrate the advantages of the proposed dynamic hybrid beamforming architecture and the effectiveness of the developed algorithms for ELAA near-field communications., Comment: 14 pages, 10 figures
Published: 2024

6. Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice

Author: Li, Ming-Hao, Biswas, Sounak, and Parameswaran, S. A.
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: We study fermionic quantum spin liquids (QSLs) on the three-dimensonal trillium lattice of corner-sharing triangles. We are motivated by recent experimental and theoretical investigations that have explored various classical and quantum spin liquid states on similar networks of triangular motifs with strong geometric frustration. Using the framework of Projective Symmetry Groups (PSG), we obtain a classification of all symmetric $\mathsf{Z}_2$ and $\mathsf{U}(1)$ QSLs on the trillium lattice. We find 2 $\mathsf{Z}_2$ spin-liquids, and a single $\mathsf{U}(1)$ spin-liquid which is proximate to one of the $\mathsf{Z}_2$ states. The small number of solutions reflects the constraints imposed by the two non-symmorphic symmetries in the space group of trillium. Using self-consistency conditions of the mean-field equations, we obtain the spinon band-structure and spin structure factors corresponding to these states. All three of our spin liquids are gapless at their saddle points: the $\mathsf{Z}_2$ QSLs are both nodal, while the $\mathsf{U}(1)$ case hosting a spinon Fermi surface. One of our $\mathsf{Z}_2$ spin liquids hosts a stable gapless nodal star, that is protected by projective symmetries against additions of further neighbour terms in the mean field ansatz. We comment on directions for further work.
Published: 2024

7. USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction

Author: Zeng, Bang and Li, Ming
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech. Traditionally, this process has relied on extracting a speaker embedding from a reference speech, necessitating a speaker recognition model. However, identifying an appropriate speaker recognition model can be challenging, and using the target speaker embedding as reference information may not be optimal for target speaker extraction tasks. This paper introduces a Universal Speaker Embedding-Free Target Speaker Extraction (USEF-TSE) framework that operates without relying on speaker embeddings. USEF-TSE utilizes a multi-head cross-attention mechanism as a frame-level target speaker feature extractor. This innovative approach allows mainstream speaker extraction solutions to bypass the dependency on speaker recognition models and to fully leverage the information available in the enrollment speech, including speaker characteristics and contextual details. Additionally, USEF-TSE can seamlessly integrate with any time-domain or time-frequency domain speech separation model to achieve effective speaker extraction. Experimental results show that our proposed method achieves state-of-the-art (SOTA) performance in terms of Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) on the WSJ0-2mix, WHAM!, and WHAMR! datasets, which are standard benchmarks for monaural anechoic, noisy and noisy-reverberant two-speaker speech separation and speaker extraction., Comment: 13 pages, 6 figures
Published: 2024

8. Integrated photonic nonreciprocal devices based on susceptibility-programmable medium

Author: Zhang, Yan-Lei, Li, Ming, Xu, Xin-Biao, Wang, Zhu-Bo, Dong, Chun-Hua, Guo, Guang-Can, Zou, Chang-Ling, and Zou, Xu-Bo
Subjects: Physics - Optics
Abstract: The switching and control of optical fields based on nonlinear optical effects are often limited to relatively weak nonlinear susceptibility and strong optical pump fields. Here, an optical medium with programmable susceptibility tensor based on polarizable atoms is proposed. Under a structured optical pump, the ground state population of atoms could be efficiently controlled by tuning the chirality and intensity of optical fields, and thus the optical response of the medium is programmable in both space and time. We demonstrate the potential of this approach by engineering the spatial distribution of the complex susceptibility tensor of the medium in photonic structures to realize nonreciprocal optical effects. Specifically, we investigate the advantages of chiral interaction between atoms and photons in an atom-cladded waveguide, theoretically showing that reconfigurable, strong, and fastly switchable isolation of optical signals in a selected optical mode is possible. The susceptibility-programmable medium provides a promising way to efficiently control the optical field, opening up a wide range of applications for integrated photonic devices and structured optics., Comment: 7 pages, 4 figures
Published: 2024

9. One-Index Vector Quantization Based Adversarial Attack on Image Classification

Author: Fan, Haiju, Qin, Xiaona, Chen, Shuang, Shum, Hubert P. H., and Li, Ming
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9% and 77.4% of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.
Published: 2024

10. Kasner eons in Lovelock black holes

Author: Bueno, Pablo, Cano, Pablo A., Hennigar, Robie A., and Li, Ming-Da
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology
Abstract: In the vicinity of space-like singularities, general relativity predicts that the metric behaves, at each point, as a Kasner space which undergoes a series of "Kasner epochs" and "eras" characterized by certain transition rules. The period during which this process takes place defines a "Kasner eon", which comes to an end when higher-curvature or quantum effects become relevant. When higher-curvature densities are included in the action, spacetime can undergo transitions into additional Kasner eons. During each eon, the metric behaves locally as a Kasner solution to the higher-curvature density controlling the dynamics. In this paper we identify the presence of Kasner eons in the interior of static and spherically symmetric Lovelock gravity black holes. We determine the conditions under which eons occur and study the Kasner metrics which characterize them, as well as the transitions between them. We show that the null energy condition implies a monotonicity property for the effective Kasner exponent at the end of the Einsteinian eon. We also characterize the Kasner solutions of more general higher-curvature theories of gravity. In particular, we observe that the Einstein gravity condition that the sum of the Kasner exponents adds up to one, $\sum_{i=1}^{D-1}p_i=1$, admits a universal generalization in the form of a family of Kasner metrics satisfying $\sum_{i=1}^{D-1}p_i=2n -1$ which exists for any order-$n$ higher-curvature density and in general dimensions., Comment: 27 pages, 5 figures
Published: 2024

11. Dynamic compensation for pump-induced frequency shift in Kerr-cat qubit initialization

Author: Xu, Yifang, Hua, Ziyue, Wang, Weiting, Ma, Yuwei, Li, Ming, Chen, Jiajun, Zhou, Jie, Pan, Xiaoxuan, Xiao, Lintao, Huang, Hongwei, Cai, Weizhou, Ai, Hao, Liu, Yu-xi, Zou, Chang-Ling, and Sun, Luyan
Subjects: Quantum Physics
Abstract: The noise-biased Kerr-cat qubit is an attractive candidate for fault-tolerant quantum computation; however, its initialization faces challenges due to the squeezing pump-induced frequency shift (PIFS). Here, we propose and demonstrate a dynamic compensation method to mitigate the effect of PIFS during the Kerr-cat qubit initialization. Utilizing a novel nonlinearity-engineered triple-loop SQUID device, we realize a stabilized Kerr-cat qubit and validate the advantages of the dynamic compensation method by improving the initialization fidelity from 57% to 78%, with a projected fidelity of 91% after excluding state preparation and measurement errors. Our results not only advance the practical implementation of Kerr-cat qubits, but also provide valuable insights into the fundamental adiabatic dynamics of these systems. This work paves the way for scalable quantum processors that leverage the bias-preserving properties of Kerr-cat qubits.
Published: 2024

12. Quantum state transfer between superconducting cavities via exchange-free interactions

Author: Zhou, Jie, Li, Ming, Wang, Weiting, Cai, Weizhou, Hua, Ziyue, Xu, Yifang, Pan, Xiaoxuan, Xue, Guangming, Zhang, Hongyi, Song, Yipu, Yu, Haifeng, Zou, Chang-Ling, and Sun, Luyan
Subjects: Quantum Physics
Abstract: We propose and experimentally demonstrate a novel protocol for transferring quantum states between superconducting cavities using only continuous two-mode squeezing interactions, without exchange of photonic excitations between cavities. This approach conceptually resembles quantum teleportation, where quantum information is transferred between different nodes without directly transmitting carrier photons. In contrast to the discrete operations of entanglement and Bell-state measurement in teleportation, our scheme is symmetric and continuous. We experimentally realize coherent and bidirectional transfer of arbitrary quantum states, including bosonic quantum error correction codes. Our results offer new insights into the quantum state transfer and quantum teleportation. In particular, our demonstration validates a new approach to realize quantum transducers, and might find applications in a wide range of physical platforms.
Published: 2024

13. A Joint Learning Model with Variational Interaction for Multilingual Program Translation

Author: Du, Yali, Sun, Hui, and Li, Ming
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages using bilingual parallel data. However, parallel data is difficult to collect for some language pairs, and the distribution of program semantics across languages can shift, posing challenges for pairwise program translation. In this paper, we argue that jointly learning a unified model to translate code across multiple programming languages is superior to separately learning from bilingual parallel data. We propose Variational Interaction for Multilingual Program Translation~(VIM-PT), a disentanglement-based generative approach that jointly trains a unified model for multilingual program translation across multiple languages. VIM-PT disentangles code into language-shared and language-specific features, using variational inference and interaction information with a novel lower bound, then achieves program translation through conditional generation. VIM-PT demonstrates four advantages: 1) captures language-shared information more accurately from various implementations and improves the quality of multilingual program translation, 2) mines and leverages the capability of non-parallel data, 3) addresses the distribution shift of program semantics across languages, 4) and serves as a unified model, reducing deployment complexity., Comment: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)
Published: 2024

14. Photonic time-delayed reservoir computing based on lithium niobate microring resonators

Author: Wang, Yuan, Li, Ming, Gao, Mingyi, Zou, Chang-Ling, Dong, Chun-Hua, Yang, Xiaoniu, Xuan, Qi, and Ren, HongLiang
Subjects: Physics - Optics
Abstract: On-chip micro-ring resonators (MRRs) have been proposed for constructing delay reservoir computing (RC) systems, offering a highly scalable, high-density computational architecture that is easy to manufacture. However, most proposed RC schemes have utilized passive integrated optical components based on silicon-on-insulator (SOI), and RC systems based on lithium niobate on insulator (LNOI) have not yet been reported. The nonlinear optical effects exhibited by lithium niobate microphotonic devices introduce new possibilities for RC design. In this work, we design an RC scheme based on a series-coupled MRR array, leveraging the unique interplay between thermo-optic nonlinearity and photorefractive effects in lithium niobate. We first demonstrate the existence of three regions defined by wavelength detuning between the primary LNOI micro-ring resonator and the coupled micro-ring array, where one region achieves an optimal balance between nonlinearity and high memory capacity at extremely low input energy, leading to superior computational performance. We then discuss in detail the impact of each ring's nonlinearity and the system's symbol duration on performance. Finally, we design a wavelength-division multiplexing (WDM) based multi-task parallel computing scheme, showing that the computational performance for multiple tasks matches that of single-task computations., Comment: 17 pages, 6 figures
Published: 2024

15. Continual Dialogue State Tracking via Reason-of-Select Distillation

Author: Feng, Yujie, Liu, Bo, Dong, Xiaoyu, Lu, Zexin, Zhan, Li-Ming, Wu, Xiao-Ming, and Lam, Albert Y. S.
Subjects: Computer Science - Computation and Language
Abstract: An ideal dialogue system requires continuous skill acquisition and adaptation to new tasks while retaining prior knowledge. Dialogue State Tracking (DST), vital in these systems, often involves learning new services and confronting catastrophic forgetting, along with a critical capability loss termed the "Value Selection Quandary." To address these challenges, we introduce the Reason-of-Select (RoS) distillation method by enhancing smaller models with a novel 'meta-reasoning' capability. Meta-reasoning employs an enhanced multi-domain perspective, combining fragments of meta-knowledge from domain-specific dialogues during continual learning. This transcends traditional single-perspective reasoning. The domain bootstrapping process enhances the model's ability to dissect intricate dialogues from multiple possible values. Its domain-agnostic property aligns data distribution across different domains, effectively mitigating forgetting. Additionally, two novel improvements, "multi-value resolution" strategy and Semantic Contrastive Reasoning Selection method, significantly enhance RoS by generating DST-specific selection chains and mitigating hallucinations in teachers' reasoning, ensuring effective and reliable knowledge transfer. Extensive experiments validate the exceptional performance and robust generalization capabilities of our method. The source code is provided for reproducibility., Comment: Accepted to ACL 2024 Findings
Published: 2024

16. Cosmological Prediction of the Void and Galaxy Clustering Measurements in the CSST Spectroscopic Survey

Author: Song, Yingxiao, Xiong, Qi, Gong, Yan, Deng, Furen, Chan, Kwan Chuen, Chen, Xuelei, Guo, Qi, Li, Guoliang, Li, Ming, Liu, Yun, Luo, Yu, Pei, Wenxiang, and Wei, Chengliang
Subjects: Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: The void power spectrum is related to the clustering of low-density regions in the large-scale structure (LSS) of the Universe, and can be used as an effective cosmological probe to extract the information of the LSS. We generate the galaxy mock catalogs from Jiutian simulation, and identify voids using the watershed algorithm for studying the cosmological constraint strength of the China Space Station Telescope (CSST) spectroscopic survey. The galaxy and void auto power spectra and void-galaxy cross power spectra at $z=0.3$, 0.6, and 0.9 are derived from the mock catalogs. To fit the full power spectra, we propose to use the void average effective radius at a given redshift to simplify the theoretical model, and adopt the Markov Chain Monte Carlo (MCMC) technique to implement the constraints on the cosmological and void parameters. The systematical parameters, such as galaxy and void biases, and noise terms in the power spectra are also included in the fitting process. We find that our theoretical model can correctly extract the cosmological information from the galaxy and void power spectra, which demonstrates its feasibility and effectivity. The joint constraint accuracy of the cosmological parameters can be improved by $\sim20\%$ compared to that from the galaxy power spectrum only. The fitting results of the void density profile and systematical parameters are also well constrained and consistent with the expectation. This indicates that the void clustering measurement can be an effective complement to the galaxy clustering probe, especially for the next generation galaxy surveys., Comment: 11 pages, 5 figures, 2 tables
Published: 2024

17. Motif analysis and passing behavior in football passing networks

Author: Li, Ming-Xia, Xu, Li-Gong, and Zhou, Wei-Xing
Subjects: Physics - Physics and Society
Abstract: The strategic orchestration of football matchplays profoundly influences game outcomes, motivating a surge in research aimed at uncovering tactical nuances through social network analysis. In this paper, we delve into the microscopic intricacies of cooperative player interactions by focusing on triadic motifs within passing networks. Employing a dataset compiled from 3,199 matches across 18 premier football competitions, we identify successful passing activities and construct passing networks for both home and away teams. Our findings highlight a pronounced disparity in passing efficiency, with home teams demonstrating superior performance relative to away teams. Through the identification and analysis of 3-motifs, we find that the motifs with more bidirectional links are more significant. It reveals that footballers exhibit a strong tendency towards backward passes rather than direct forward attacks. Comparing the results of games, we find that some motifs are related to the goal difference. It indicates that direct and effective forward passing significantly amplifies a team's offensive capabilities, whereas an abundance of passbacks portends an elevated risk of offensive futility. These revelations affirm the efficacy of network motif analysis as a potent analytical tool for unveiling the foundational components of passing dynamics among footballers and for decoding the complex tactical behaviors and interaction modalities that underpin team performance.
Published: 2024

18. Could the newly reported $X(2600)$ be the $\eta_2(4D)$ meson?

Author: Wang, Li-Ming, Tian, Wen-Xin, and Liu, Xiang
Subjects: High Energy Physics - Phenomenology, High Energy Physics - Experiment
Abstract: The BESIII Collaboration recently reported the observation of the $X(2600)$ state in the $\eta^\prime \pi^+\pi^-$ invariant mass spectrum of $J/\psi \to \gamma \eta^\prime \pi^+\pi^-$, with a significance exceeding $20\sigma$. Its $J^{PC}$ quantum numbers could be either $0^{-+}$ or $2^{-+}$. We explore the possibility of the $X(2600)$ being a higher state of the $\eta_2$ meson family. Through ($n,M^2$) trajectory analysis and the Quark Pair Creation model, we propose that the $X(2600)$ could be the third radial excitation of the $\eta_2(1870)$. However, the theoretical decay width of the $\eta_2(4D)$ is smaller than the experimental width of the $X(2600)$, and branching ratio calculations suggest inconsistencies, leading us to exclude the $X(2600)$ as the $\eta_2(4D)$. Our findings contribute to the understanding of the $X(2600)$ and provide insights for future experimental searches for excited the $\eta_2$ states., Comment: 8 pages, 6 figures, 4 tables
Published: 2024

19. Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking

Author: Lyu, Zhi-Cun, Li, Xin-Ye, Xie, Zheng, and Li, Ming
Subjects: Computer Science - Artificial Intelligence, Computer Science - Software Engineering
Abstract: Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method., Comment: Accepted by Frontier of Computer Science
Published: 2024

20. Engineering Rydberg-pair interactions in divalent atoms with hyperfine-split ionization thresholds

Author: Hummel, Frederic, Weber, Sebastian, Moegerle, Johannes, Menke, Henri, King, Jonathan, Bloom, Benjamin, Hofferberth, Sebastian, and Li, Ming
Subjects: Physics - Atomic Physics, Quantum Physics
Abstract: Quantum information processing with neutral atoms relies on Rydberg excitation for entanglement generation. While the use of heavy divalent or open-shell elements, such as strontium or ytterbium, has benefits due to their optically active core and a variety of possible qubit encodings, their Rydberg structure is generally complex. For some isotopes in particular, hyperfine interactions are relevant even for highly excited electronic states. We employ multi-channel quantum defect theory to infer the Rydberg structure of isotopes with non-zero nuclear spin and perform non-perturbative Rydberg-pair interaction calculations. We find that due to the high level density and sensitivities to external fields, experimental parameters must be precisely controlled. Specifically in ${}^{87}$Sr, we study an intrinsic F\"orster resonance, unique to divalent atoms with hyperfine-split thresholds, which simultaneously provides line stability with respect to external field fluctuations and enhanced long-range interactions. Additionally, we provide parameters for pair states that can be effectively described by single-channel Rydberg series. The explored pair states provide exciting opportunities for applications in the blockade regime as well as for more exotic long-range interactions such as largely flat, distance-independent potentials., Comment: 12 pages, 7 figures
Published: 2024

21. Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

Author: Wang, Yikang, Wang, Xingming, Nishizaki, Hiromitsu, and Li, Ming
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based speech enhancement front-end joint optimization (TL-SEJ) method, investigating its effectiveness in improving robustness against noise and reverberation. We evaluated the proposed method's performance through a series of comparative and ablation experiments. The experimental results show that, across different signal-to-noise ratio test conditions, the proposed TL-SEJ method improves recognition accuracy by 2.7% to 15.8% compared to the baseline. Compared to conventional data augmentation methods, our system achieves an accuracy improvement ranging from 0.7% to 5.8% in various noisy conditions and from 1.7% to 2.8% under different RT60 reverberation scenarios. These experiments demonstrate that the proposed method effectively enhances system robustness in noisy and reverberant conditions., Comment: 29 pages, 4 figures, Journal Papers
Published: 2024

22. GenRC: Generative 3D Room Completion from Sparse Image Collections

Author: Li, Ming-Feng, Ku, Yueh-Feng, Yen, Hong-Xuan, Liu, Chi, Liu, Yu-Lun, Chen, Albert Y. C., Kuo, Cheng-Hao, and Sun, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC, Comment: ECCV 2024
Published: 2024

23. VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Author: Lin, Yuke, Cheng, Ming, Zhang, Fulin, Gao, Yingying, Zhang, Shilei, and Li, Ming
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io., Comment: Accepted By InterSpeech2024
Published: 2024

24. Explosive percolation in finite dimensions

Author: Li, Ming, Wang, Junfeng, and Deng, Youjin
Subjects: Condensed Matter - Statistical Mechanics
Abstract: Explosive percolation (EP) has received significant research attention due to its rich and anomalous phenomena near criticality. In our recent study [Phys. Rev. Lett. 130, 147101 (2023)], we demonstrated that the correct critical behaviors of the EP in infinite dimensions (complete graph) can be accurately extracted using the event-based method, with finite-size scaling behaviors still described by the standard finite-size scaling theory. We perform an extensive simulation of the EPs on hypercubic lattices ranging from dimensions $d=2$ to $6$, and find that the critical behaviors consistently obey the standard finite-size scaling theory. Consequently, we obtain a high-precision determination of the percolation thresholds and critical exponents, revealing that the EPs governed by the product and sum rules belong to different universality classes. Remarkably, despite the mean of the dynamic pseudo-critical point $\mathcal{T}_L$ deviating from the infinite-lattice criticality by a distance determined by the $d$-dependent correlation-length exponent, $\mathcal{T}_L$ follows a normal (Gaussian) distribution across all dimensions, with a standard deviation proportional to $1/\sqrt{V}$, where $V$ denotes the system volume. A theoretical argument associated with the central-limit theorem is further proposed to understand the probability distribution of $\mathcal{T}_L$. These findings offer a comprehensive understanding of critical behaviors in EPs across various dimensions., Comment: 10 pages, 9 figures
Published: 2024

25. Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA

Author: Xie, Qin, Li, Ming, and Enkhtur, Ariunaa
Subjects: Computer Science - Computers and Society
Abstract: This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion., Comment: 14 pages, 1 table
Published: 2024

26. A timing view of the additional high-energy spectral component discovered in the black hole candidate Swift J1727.8-1613

Author: Yang, Zi-Xu, Zhang, Liang, Zhang, Shuang-Nan, Tao, L., Zhang, Shu, Ma, Ruican, Bu, Qingcui, Huang, Yue, Liu, He-Xin, Yu, Wei, Xiao, Guang C., Wang, Peng-Ju, Feng, Hua, Song, Li-Ming, Ma, Xiang, Ge, Mingyu, Zhao, QingChang, and Qu, J. L.
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. In the high energy band, an extra hard component is observed in additional to the standard thermal Comptonization component at similar energy band. The value of the QPO HE-rms excess is not only correlated with the disk parameters and the photon index of the standard Comptonization component, but also exhibits a moderate positive correlation with the flux of the additional hard spectral component. No features in the QPO phase-lag spectra are seen corresponding to the additional hard component. We propose that the additional hard component in the spectrum may originate from jet emission and the associated QPO HE-rms excess can be explained by the precession of the jet base.
Published: 2024

27. Topological edge states in photonic Floquet insulator with unpaired Dirac cones

Author: Zhong, Hua, Kartashov, Yaroslav V., Li, Yongdong, Li, Ming, and Zhang, Yiqi
Subjects: Physics - Optics
Abstract: Topological insulators are most frequently constructed using lattices with specific degeneracies in their linear spectra, such as Dirac points. For a broad class of lattices, such as honeycomb ones, these points and associated Dirac cones generally appear in non-equivalent pairs. Simultaneous breakup of the time-reversal and inversion symmetry in systems based on such lattices may result in the formation of the unpaired Dirac cones in bulk spectrum, but the existence of topologically protected edge states in such structures remains an open problem. Here photonic Floquet insulator on honeycomb lattice with unpaired Dirac cones in its spectrum is introduced that can support unidirectional edge states appearing at the edge between two regions with opposite sublattice detuning. Topological properties of this system are characterized by the nonzero valley Chern number. Remarkably, edge states in this system can circumvent sharp corners without inter-valley scattering even though there is no total forbidden gap in the spectrum. Our results reveal unusual interplay between two different physical mechanisms of creation of topological edge states based on simultaneous breakup of different symmetries of the system., Comment: 9 pages, 7 figures. To appear in Photonics Research. Comments are welcome
Published: 2024
Full Text: View/download PDF

28. A versatile quantum microwave photonic signal processing platform based on coincidence window selection technique

Author: Li, Xinghua, Guo, Yifan, Xiang, Xiao, Quan, Runai, Cao, Mingtao, Dong, Ruifang, Liu, Tao, Li, Ming, and Zhang, Shougang
Subjects: Physics - Optics, Quantum Physics
Abstract: Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatility of processing the quantum microwave photonic signal by utilizing coincidence window selection on the biphoton coincidence distribution. The demonstration includes finely-tunable RF phase shifting, flexible multi-tap transversal filtering (with up to 15 taps), and photonically implemented RF mixing, leveraging the nonlocal RF mapping characteristic of QMWP. These accomplishments significantly enhance the capability of microwave photonic systems in processing ultra-weak signals, opening up new possibilities for various applications.
Published: 2024

29. Quantum microwave photonic mixer with a large spurious-free dynamic range

Author: Li, Xinghua, Guo, Yifan, Xiang, Xiao, Quan, Runai, Cao, Mingtao, Dong, Ruifang, Liu, Tao, Li, Ming, and Zhang, Shougang
Subjects: Physics - Optics, Quantum Physics
Abstract: As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solution for improving SFDR in terms of higher-order harmonic distortion. In this paper, we demonstrate two types of quantum microwave photonic mixers based on the configuration of the intensity modulators: cascade-type and parallel-type. Leveraging the nonlocal RF signal encoding capability, both types of quantum microwave photonic mixers not only exhibit the advantage of dual-channel output but also present significant improvement in SFDR. Specifically, the parallel-type quantum microwave photonic mixer achieves a remarkable SFDR value of 113.6 dB.Hz1/2, which is 30 dB better than that of the cascade-type quantum microwave photonic mixer. When compared to the classical microwave photonic mixer, this enhancement reaches a notable 53.6 dB at the expense of 8 dB conversion loss. These results highlight the superiority of quantum microwave photonic mixers in the fields of microwave and millimeter-wave systems. Further applying multi-photon frequency entangled sources as optical carriers, the dual-channel microwave frequency conversion capability endowed by the quantum microwave photonic mixer can be extended to enhance the performance of multiple-paths microwave mixing which is essential for radar net systems.
Published: 2024

30. Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

Author: Yang, Guangrui, Li, Jianfei, Li, Ming, Feng, Han, and Zhou, Ding-Xuan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Functional Analysis
Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order.
Published: 2024

31. Four Parallel Pathways in T4 Ligase-Catalyzed Repair of Nicked DNA with Diverse Bending Angles.

Author: Li, Na, Ma, Jianbing, Fu, Hang, Yang, Zhiwei, Xu, Chunhua, Li, Haihong, Zhao, Yimin, Zhao, Yizhen, Chen, Shuyu, Gou, Lu, Zhang, Xinghua, Zhang, Shengli, Li, Ming, Hou, Ximiao, Zhang, Lei, and Lu, Ying
Subjects: T4 DNA ligase, conformational dynamics, parallel enzymatic pathways, protein machines, single molecules, DNA Ligases, DNA, DNA Repair, Fluorescence Resonance Energy Transfer, Nucleic Acid Conformation, Bacteriophage T4, Microscopy, Electron
Abstract: The structural diversity of biological macromolecules in different environments contributes complexity to enzymological processes vital for cellular functions. Fluorescence resonance energy transfer and electron microscopy are used to investigate the enzymatic reaction of T4 DNA ligase catalyzing the ligation of nicked DNA. The data show that both the ligase-AMP complex and the ligase-AMP-DNA complex can have four conformations. This finding suggests the parallel occurrence of four ligation reaction pathways, each characterized by specific conformations of the ligase-AMP complex that persist in the ligase-AMP-DNA complex. Notably, these complexes have DNA bending angles of ≈0°, 20°, 60°, or 100°. The mechanism of parallel reactions challenges the conventional notion of simple sequential reaction steps occurring among multiple conformations. The results provide insights into the dynamic conformational changes and the versatile attributes of T4 DNA ligase and suggest that the parallel multiple reaction pathways may correspond to diverse T4 DNA ligase functions. This mechanism may potentially have evolved as an adaptive strategy across evolutionary history to navigate complex environments.
Published: 2024

32. Approximate DCT and Quantization Techniques for Energy-Constrained Image Sensors

Author: Li, Ming-Che, Ghosh, Archisman, and Sen, Shreyas
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Recent expansions in multimedia devices gather enormous amounts of real-time images for processing and inference. The images are first compressed using compression schemes, like JPEG, to reduce storage costs and power for transmitting the captured data. Due to inherent error resilience and imperceptibility in images, JPEG can be approximated to reduce the required computation power and area. This work demonstrates the first end-to-end approximation computing-based optimization of JPEG hardware using i) an approximate division realized using bit-shift operators to reduce the complexity of the quantization block, ii) loop perforation, and iii) precision scaling on top of a multiplier-less fast DCT architecture to achieve an extremely energy-efficient JPEG compression unit which will be a perfect fit for power/bandwidth-limited scenario. Furthermore, a gradient descent-based heuristic composed of two conventional approximation strategies, i.e., Precision Scaling and Loop Perforation, is implemented for tuning the degree of approximation to trade off energy consumption with the quality degradation of the decoded image. The entire RTL design is coded in Verilog HDL, synthesized, mapped to TSMC 65nm CMOS technology, and simulated using Cadence Spectre Simulator under 25$^{\circ}$\textbf{C}, TT corner. The approximate division approach achieved around $\textbf{28\%}$ reduction in the active design area. The heuristic-based approximation technique combined with accelerator optimization achieves a significant energy reduction of $\textbf{36\%}$ for a minimal image quality degradation of $\textbf{2\%}$ SAD. Simulation results also show that the proposed architecture consumes 15uW at the DCT and quantization stages to compress a colored 480p image at 6fps.
Published: 2024

33. MIMO-OFDM ISAC Waveform Design for Range-Doppler Sidelobe Suppression

Author: Li, Peishi, Li, Ming, Liu, Rang, Liu, Qian, and Swindlehurst, A. Lee
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since the dual-functional waveform carries communication information, its random nature leads to high range-Doppler sidelobes in the ambiguity function, which in turn degrades radar sensing performance. To suppress range-Doppler sidelobes, we propose a novel symbol-level precoding (SLP) based waveform design for MIMO-OFDM ISAC systems by fully exploiting the temporal degrees of freedom (DoFs). Our goal is to minimize the range-Doppler integrated sidelobe level (ISL) while satisfying the constraints of target illumination power, multi-user communication quality of service (QoS), and constant-modulus transmission. To solve the resulting non-convex waveform design problem, we develop an efficient algorithm using the majorization-minimization (MM) and alternative direction method of multipliers (ADMM) methods. Simulation results show that the proposed waveform has significantly reduced range-Doppler sidelobes compared with signals designed only for communications and other baselines. In addition, the proposed waveform design achieves target detection and estimation performance close to that achievable by waveforms designed only for radar, which demonstrates the superiority of the proposed SLP-based ISAC approach., Comment: 13 pages, 9 figures, submitted to IEEE TWC
Published: 2024

34. Understanding is Compression

Author: Li, Ziguang, Huang, Chao, Wang, Xuliang, Hu, Haibo, Wyeth, Cole, Bu, Dongbo, Yu, Quan, Gao, Wen, Liu, Xingwu, and Li, Ming
Subjects: Computer Science - Information Theory, Computer Science - Artificial Intelligence
Abstract: Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. Large language models (LLMs) understand data better than ever before. Can they help us to compress data? The LLMs may be seen to approximate the uncomputable Solomonoff induction. Therefore, under this new uncomputable paradigm, we present LMCompress. LMCompress shatters all previous lossless compression algorithms, doubling the lossless compression ratios of JPEG-XL for images, FLAC for audios, and H.264 for videos, and quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.
Published: 2024

35. Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

Author: Liu, Xueyu, Shi, Guangze, Wang, Rui, Lai, Yexin, Zhang, Jianan, Sun, Lele, Yang, Quan, Wu, Yongfei, Li, MIng, Han, Weixia, and Zheng, Wen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images guided only by a one-shot annotated reference. Specifically, GBMSeg first exploits the robust feature matching capabilities of the pretrained foundation model to generate initial prompt points, then introduces a series of novel automatic prompt engineering techniques across the feature and physical space to optimize the prompt scheme. Finally, GBMSeg employs a class-agnostic foundation segmentation model with the generated prompt scheme to obtain accurate segmentation results. Experimental results on our collected 2538 TEM images confirm that GBMSeg achieves superior segmentation performance with a Dice similarity coefficient (DSC) of 87.27% using only one labeled reference image in a training-free manner, outperforming recently proposed one-shot or few-shot methods. In summary, GBMSeg introduces a distinctive automatic prompt framework that facilitates robust domain-independent segmentation performance without training, particularly advancing the automatic prompting of foundation segmentation models for medical images. Future work involves automating the thickness measurement of segmented GBM and quantifying pathological indicators, holding significant potential for advancing pathology assessments in clinical applications. The source code is available on https://github.com/SnowRain510/GBMSeg, Comment: Accepted for MICCAI2024
Published: 2024

36. RuleR: Improving LLM Controllability by Rule-based Data Recycling

Author: Li, Ming, Chen, Han, Wang, Chenguang, Nguyen, Dang, Li, Dianqi, and Zhou, Tianyi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities. The code will be released on https://github.com/MingLiiii/RuleR.
Published: 2024

37. Ranking LLMs by compression

Author: Guo, Peijia, Li, Ziguang, Hu, Haibo, Huang, Chao, Li, Ming, and Zhang, Rui
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially the process of learning the optimal coding length. At the same time, the evaluation metric compression ratio can be obtained without actual compression, which greatly saves overhead. In this paper, we use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models., Comment: 7 pages, 4 tables
Published: 2024

38. PFID: Privacy First Inference Delegation Framework for LLMs

Author: Yang, Haoyan, Li, Zhitao, Zhang, Yong, Wang, Jianzong, Cheng, Ning, Li, Ming, and Xiao, Jing
Subjects: Computer Science - Computation and Language
Abstract: This paper introduces a novel privacy-preservation framework named PFID for LLMs that addresses critical privacy concerns by localizing user data through model sharding and singular value decomposition. When users are interacting with LLM systems, their prompts could be subject to being exposed to eavesdroppers within or outside LLM system providers who are interested in collecting users' input. In this work, we proposed a framework to camouflage user input, so as to alleviate privacy issues. Our framework proposes to place model shards on the client and the public server, we sent compressed hidden states instead of prompts to and from servers. Clients have held back information that can re-privatized the hidden states so that overall system performance is comparable to traditional LLMs services. Our framework was designed to be communication efficient, computation can be delegated to the local client so that the server's computation burden can be lightened. We conduct extensive experiments on machine translation tasks to verify our framework's performance., Comment: Submitted to EMNLP2024
Published: 2024

39. Helicity Evolution at Small $x$: Quark to Gluon and Gluon to Quark Transition Operators

Author: Borden, Jeremy, Kovchegov, Yuri V., and Li, Ming
Subjects: High Energy Physics - Phenomenology, Nuclear Experiment, Nuclear Theory
Abstract: We include the quark to gluon and gluon to quark shock-wave transition operators into the small Bjorken-$x$ evolution equations for helicity in the flavor-singlet channel derived earlier. While such transitions do not affect the large-$N_c$ version of the evolution equations for helicity, the large-$N_c \& N_f$ equations are affected. ($N_c$ and $N_f$ are the numbers of quark colors and flavors, respectively.) We derive the corresponding corrected large-$N_c \& N_f$ equations for the polarized dipole amplitudes contributing to the flavor-singlet quark and gluon helicity distributions in the double-logarithmic approximation (DLA), resumming powers of $\alpha_s \, \ln^2 (1/x)$ with $\alpha_s$ the strong coupling constant. We solve these equations iteratively and extract the polarized splitting functions up to four loops. We show that our splitting functions agree with the fixed-order perturbative calculations up to and including the existing three-loops results. Similar to the large-$N_c$ helicity evolution in the shock-wave approach, our large-$N_c \& N_f$ small-$x$ splitting functions agree with those obtained in the infrared evolution equations framework up to three loops, but appear to slightly disagree at four loops., Comment: 38 pages, 7 figures
Published: 2024

40. Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Author: Gao, Ming, Chen, Hang, Du, Jun, Xu, Xin, Guo, Hongxiao, Bu, Hui, Yang, Jianxing, Li, Ming, and Lee, Chin-Hui
Subjects: Computer Science - Computation and Language
Abstract: Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B., Comment: to be published in Interspeech 2024
Published: 2024

41. AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Author: Gong, Rong, Xue, Hongfei, Wang, Lezhi, Xu, Xin, Li, Qisheng, Xie, Lei, Bu, Hui, Wu, Shaomei, Zhou, Jiaming, Qin, Yong, Zhang, Binbin, Du, Jun, Bin, Jia, and Li, Ming
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech., Comment: Accepted by Interspeech 2024
Published: 2024

42. The Database and Benchmark for Source Speaker Verification Against Voice Conversion

Author: Li, Ze, Lin, Yuke, Yao, Tian, Suo, Hongbin, and Li, Ming
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Voice conversion systems can transform audio to mimic another speaker's voice, thereby attacking speaker verification systems. However, ongoing studies on source speaker verification are hindered by limited data availability and methodological constraints. In this paper, we generate a large-scale converted speech database and train a batch of baseline systems based on the MFA-Conformer architecture to promote the source speaker verification task. In addition, we introduce a related task called conversion method recognition. An adapter-based multi-task learning approach is employed to achieve effective conversion method recognition without compromising source speaker verification performance. Additionally, we investigate and effectively address the open-set conversion method recognition problem through the implementation of an open-set nearest neighbor approach.
Published: 2024

43. The Broadband X-ray Spectral Properties during the Rising Phases of the Outburst of the New Black Hole X-ray Binary Candidate Swift J1727.8-1613

Author: Liu, He-Xin, Xu, Yan-Jun, Zhang, Shuang-Nan, Yu, Wei, Huang, Yue, Tao, Lian, Zhang, Liang, Yang, Zi-Xu, Zhao, Qing-Chang, Qu, Jin-Lu, and Song, Li-Ming
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: We report data analysis results about the outburst evolution and spectral properties during the hard state of the recently discovered X-ray transient Swift J1727.8-163 as observed by \emph{Insight}-HXMT and NuSTAR. We find that the broadband X-ray spectrum of Swift J1727.8-163 is more complex than the most typical spectral patterns of black hole X-ray binary systems, with not only a comparatively weaker reflection component but also an additional spectral continuum component, manifesting itself as a hard X-ray tail beyond the thermal Comptonization description detectable below 100 keV. This additional component can be phenomenologically well fitted by adding an extra power-law model with high energy exponential cutoff in the 2-120 keV energy band. We made an attempt to explain the broadband X-ray spectral continuum with a thermal/non-thermal hybrid plasma corona scenario , and find an ultra high compactness parameter ($l_{\rm s}\sim2000$) and a steep non-thermal electron distribution ($\Gamma_{\rm inj}>4$), suggesting the source was accreting with high Eddington rates and that the electron acceleration mechanism is not very efficient. We also present a detailed multi-epoch analysis of spectral properties using \emph{Insight}-HXMT data to investigate the evolution of the key physical properties regarding the disk and corona during the hard states. No significant variation is found with the inner disk radius and the coronal temperature during this time period, and the weak reflection and hard X-ray tail features are persistent. We discuss the physical implications of our spectral analysis results in the context of disk-corona relation, particle acceleration, and jet contribution, during the rise of a black hole X-ray binary in outburst., Comment: 16 pages, 6 figures
Published: 2024

44. Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems

Author: Zhang, Shoushuo, Xiao, Zichao, Liu, Rang, Li, Ming, Wang, Wei, and Liu, Qian
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctuations. In this letter, we propose to utilize reconfigurable intelligent surfaces (RIS) in ISAC systems to provide high-quality and controllable multipath propagation for improving the performance of fluctuating target detection and simultaneously enhancing the quality of communication services. To effectively exploit the spatial diversity offered by RIS-empowered multipath, the dual-functional transmit beamforming and the RIS reflection beamforming are jointly designed to maximize the expectation of radar signal-to-noise ratio (SNR). To solve the resulting complex non-convex optimization problem, we develop an efficient alternating optimization algorithm that utilizes majorization-minimization (MM) and alternating direction method of multipliers (ADMM) algorithms. Simulation results illustrate the advantages of multipath exploitation and the proposed beamforming design algorithm for fluctuating target detection in RIS-assisted ISAC systems., Comment: submitted to IEEE WCL
Published: 2024

45. Self-locked broadband Raman-electro-optic microcomb

Author: Wan, Shuai, Wang, Pi-Yu, Li, Ming, Ma, Rui, Niu, Rui, Sun, Fang-Wen, Bo, Fang, Guo, Guang-Can, and Dong, Chun-Hua
Subjects: Physics - Optics
Abstract: Optical frequency combs (OFCs), composed of equally spaced frequency tones, have spurred advancements in communications, spectroscopy, precision measurement and fundamental physics research. A prevalent method for generating OFCs involves the electro-optic (EO) effect, i.e., EO comb, renowned for its rapid tunability via precise microwave field control. Recent advances in integrated lithium niobate (LN) photonics have greatly enhanced the efficiency of EO effect, enabling the generation of broadband combs with reduced microwave power. However, parasitic nonlinear effects, such as Raman scattering and four-wave mixing, often emerge in high quality nonlinear devices, impeding the expansion of comb bandwidth and the minimization of frequency noise. Here, we tame these nonlinear effects and present a novel type of OFC, i.e., the self-locked Raman-electro-optic (REO) microcomb by leveraging the collaboration of EO, Kerr and Raman scattering processes. The spectral width of the REO microcomb benefits from the Raman gain and Kerr effect, encompassing nearly 1400 comb lines spanning over 300 nm with a fine repetition rate of 26.03 GHz, much larger than the pure EO combs. Remarkably, the system can maintain a self-locked low-noise state in the presence of multiple nonlinearities without the need for external active feedback. Our approach points to a direction for improving the performance of microcombs and paves the way for exploring new nonlinear physics, such as new laser locking techniques, through the collaboration of inevitable multiple nonlinear effects in integrated photonics.
Published: 2024

46. Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

Author: Jiang, Wei-Bang, Zhao, Li-Ming, and Lu, Bao-Liang
Subjects: Computer Science - Machine Learning
Abstract: The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM., Comment: The Twelfth International Conference on Learning Representations
Published: 2024

47. DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge

Author: Mao, Yifan, Li, Ming, Liu, Jian, Liu, Jiayang, Qin, Zihan, Chu, Chunxi, Xu, Jialei, Zhao, Wenbo, Jiang, Junjun, and Liu, Xianming
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution(OoD) data. While some works try to improve the robustness of depth model under OoD data, these methods either require additional training data or lake generalizability. In this report, we introduce the DINO-SD, a novel surround-view depth estimation model. Our DINO-SD does not need additional data and has strong robustness. Our DINO-SD get the best performance in the track4 of ICRA 2024 RoboDepth Challenge., Comment: Outstanding Champion in the RoboDepth Challenge (ICRA24) https://robodrive-24.github.io/
Published: 2024

48. Accurate Measurement of the Lensing Magnification by BOSS CMASS Galaxies and Its Implications for Cosmology and Dark Matter

Author: Xu, Kun, Jing, Y. P., Gao, Hongyu, Luo, Xiaolin, and Li, Ming
Subjects: Astrophysics - Cosmology and Nongalactic Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: Magnification serves as an independent and complementary gravitational lensing measurement to shear. We develop a novel method to achieve an accurate and robust magnification measurement around BOSS CMASS galaxies across physical scales of $0.016h^{-1}{\rm Mpc} < r_{\rm p} < 10h^{-1}{\rm Mpc}$. We first measure the excess total flux density $\delta M$ of the source galaxies in deep DECaLS photometric catalog that are lensed by CMASS galaxies. We convert $\delta M$ to magnification $\mu$ by establishing the $\delta \mu-\delta M$ relation using a deeper photometric sample. By comparing magnification measurements in three optical bands ($grz$), we constrain the dust attenuation curve and its radial distribution, discovering a steep attenuation curve in the circumgalactic medium of CMASS galaxies. We further compare dust-corrected magnification measurements to model predictions from high-resolution dark matter-only (DMO) simulations in WMAP and Planck cosmologies, as well as the hydrodynamic simulation \texttt{TNG300-1}, using precise galaxy-halo connections from the Photometric objects Around Cosmic webs method and the accurate ray-tracing algorithm \texttt{P3MLens}. For $r_{\rm p} > 70h^{-1}$ kpc, our magnification measurements are in good agreement with both WMAP and Planck cosmologies. However, at $r_{\rm p} < 70h^{-1}$ kpc, we observe an excess magnification signal, which is higher than the DMO model in Planck cosmology at $2.8\sigma$ and would be exacerbated if significant baryon feedback is included. Implications of the potential small scale discrepancy for the nature of dark matter and for the processes governing galaxy formation are discussed., Comment: 25 pages, 19 figures. Main results in Figure 9 (dust) and Figure 18 (matter). Accepted for publication in ApJ
Published: 2024

49. Scaling Laws for Discriminative Classification in Large Language Models

Author: Wyatte, Dean, Tahmasbi, Fatemeh, Li, Ming, and Markovich, Thomas
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Modern large language models (LLMs) represent a paradigm shift in what can plausibly be expected of machine learning models. The fact that LLMs can effectively generate sensible answers to a diverse range of queries suggests that they would be useful in customer support applications. While powerful, LLMs have been observed to be prone to hallucination which unfortunately makes their near term use in customer support applications challenging. To address this issue we present a system that allows us to use an LLM to augment our customer support advocates by re-framing the language modeling task as a discriminative classification task. In this framing, we seek to present the top-K best template responses for a customer support advocate to use when responding to a customer. We present the result of both offline and online experiments where we observed offline gains and statistically significant online lifts for our experimental system. Along the way, we present observed scaling curves for validation loss and top-K accuracy, resulted from model parameter ablation studies. We close by discussing the space of trade-offs with respect to model size, latency, and accuracy as well as and suggesting future applications to explore.
Published: 2024

50. HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning

Author: Xu, Zhuo, Bai, Lu, Cui, Lixin, Li, Ming, Wang, Yue, and Hancock, Edwin R.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. On the other hand, during the decoding process, we adopt the soft node assignment to reconstruct the original graph structure by expanding the coarsened nodes. By hierarchically performing the above compressing procedure during the decoding process as well as the expanding procedure during the decoding process, the proposed HC-GAE can effectively extract bidirectionally hierarchical structural features of the original sample graph. Furthermore, we re-design the loss function that can integrate the information from either the encoder or the decoder. Since the associated graph convolution operation of the proposed HC-GAE is restricted in each individual separated subgraph and cannot propagate the node information between different subgraphs, the proposed HC-GAE can significantly reduce the over-smoothing problem arising in the classical convolution-based GAEs. The proposed HC-GAE can generate effective representations for either node classification or graph classification, and the experiments demonstrate the effectiveness on real-world datasets.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

103,426 results on '"Li, Ming"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources