67,446 results on '"LI, Chen"'
Search Results
2. BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights
- Author
-
Hsu, Chan-Jan, Lin, Yi-Cheng, Lin, Chia-Chun, Chen, Wei-Chih, Chung, Ho Lam, Li, Chen-An, Chen, Yi-Chang, Yu, Chien-Yu, Lee, Ming-Ji, Chen, Chien-Cheng, Huang, Ru-Heng, Lee, Hung-yi, and Shiu, Da-Shan
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We present BreezyVoice, a Text-to-Speech (TTS) system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities to address the unique challenges of polyphone disambiguation in the language. Building upon CosyVoice, we incorporate a $S^{3}$ tokenizer, a large language model (LLM), an optimal-transport conditional flow matching model (OT-CFM), and a grapheme to phoneme prediction model, to generate realistic speech that closely mimics human utterances. Our evaluation demonstrates BreezyVoice's superior performance in both general and code-switching contexts, highlighting its robustness and effectiveness in generating high-fidelity speech. Additionally, we address the challenges of generalizability in modeling long-tail speakers and polyphone disambiguation. Our approach significantly enhances performance and offers valuable insights into the workings of neural codec TTS systems.
- Published
- 2025
3. ContourFormer:Real-Time Contour-Based End-to-End Instance Segmentation Transformer
- Author
-
Yao, Weiwei, Li, Chen, Xiong, Minjun, Dong, Wenbo, Chen, Hao, and Xiao, Xiong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
This paper presents Contourformer, a real-time contour-based instance segmentation algorithm. The method is fully based on the DETR paradigm and achieves end-to-end inference through iterative and progressive mechanisms to optimize contours. To improve efficiency and accuracy, we develop two novel techniques: sub-contour decoupling mechanisms and contour fine-grained distribution refinement. In the sub-contour decoupling mechanism, we propose a deformable attention-based module that adaptively selects sampling regions based on the current predicted contour, enabling more effective capturing of object boundary information. Additionally, we design a multi-stage optimization process to enhance segmentation precision by progressively refining sub-contours. The contour fine-grained distribution refinement technique aims to further improve the ability to express fine details of contours. These innovations enable Contourformer to achieve stable and precise segmentation for each instance while maintaining real-time performance. Extensive experiments demonstrate the superior performance of Contourformer on multiple benchmark datasets, including SBD, COCO, and KINS. We conduct comprehensive evaluations and comparisons with existing state-of-the-art methods, showing significant improvements in both accuracy and inference speed. This work provides a new solution for contour-based instance segmentation tasks and lays a foundation for future research, with the potential to become a strong baseline method in this field.
- Published
- 2025
4. A failed wind candidate in NGC 3783 from the 2001 year campaign with Chandra/HETGS
- Author
-
Li, Chen, Kaastra, Jelle S., Gu, Liyi, Rogantini, Daniele, Juráňová, Anna, Mehdipour, Missagh, and de Plaa, Jelle
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Astrophysics of Galaxies - Abstract
We reanalyze the Chandra/HETGS observations of NGC 3783 from the campaign in the year 2001, identifying significant spectral variations in the Fe unresolved transition array (UTA) over timescales of weeks to months. These changes correlate with a $1.4-2$ fold increase in the ionizing continuum and exceed $10 \, \sigma$ significance. The variations primarily originate from a low-ionization state ($\rm log \xi = 1.65$) component of the warm absorber. Time-dependent photoionization modelling confirms the sensitivity of this low-ionization component to continuum variations within the Fe UTA band. Local fitting indicates a lower density limit of $>10^{12.3} \, \rm m^{-3}$ at $3 \, \sigma$ statistical uncertainty, with the component located within $0.27 \, \rm pc$. Our findings suggest that this low-ionization component is a potential failed wind candidate., Comment: Accepted for publication in A&A, 10 pages, 12 figures
- Published
- 2025
5. Self-Adapted Josephson Oscillation of Dark-Bright Solitons under Constant Forces
- Author
-
Meng, Ling-Zheng, Luo, Xi-Wang, and Zhao, Li-Chen
- Subjects
Condensed Matter - Quantum Gases ,Nonlinear Sciences - Pattern Formation and Solitons - Abstract
We study the propagation of dark-bright solitons in two-component Bose-Einstein condensates (BECs) with general nonlinear parameters, and explore how nonlinear interactions enrich the soliton dynamics giving rise to nonsinusoidal oscillations under constant forces. Treating the bright soliton as an effective barrier, we reveal that such oscillations are characterized by the Josephson equations with self-adapted critical current and bias voltage, whose explicit analytic expressions are derived using the Lagrangian variational method. The dynamical phase diagram in nonlinear parameter space is presented, identifying oscillation regions with different skewed sinusoidal dependence, and diffusion regions with irreversible soliton spreading due to instability of the barrier. Furthermore, we obtain periodic dispersion relations of the solitons, indicating a switch between positive and negative inertial masses, consistent with the oscillation behaviors. Our results provide a general and comprehensive theoretical framework for soliton oscillation dynamics and pave the way for investigating various nonlinear transports and their potential applications., Comment: 9 pages, 5 figures
- Published
- 2025
6. Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis
- Author
-
Guo, Xiujing, Li, Chen, and Tsuchiya, Tatsuhiro
- Subjects
Computer Science - Software Engineering - Abstract
As software systems grow more complex, automated testing has become essential to ensuring reliability and performance. Traditional methods for boundary value test input generation can be time-consuming and may struggle to address all potential error cases effectively, especially in systems with intricate or highly variable boundaries. This paper presents a framework for assessing the effectiveness of large language models (LLMs) in generating boundary value test inputs for white-box software testing by examining their potential through prompt engineering. Specifically, we evaluate the effectiveness of LLM-based test input generation by analyzing fault detection rates and test coverage, comparing these LLM-generated test sets with those produced using traditional boundary value analysis methods. Our analysis shows the strengths and limitations of LLMs in boundary value generation, particularly in detecting common boundary-related issues. However, they still face challenges in certain areas, especially when handling complex or less common test inputs. This research provides insights into the role of LLMs in boundary value testing, underscoring both their potential and areas for improvement in automated testing methods.
- Published
- 2025
7. UI-TARS: Pioneering Automated GUI Interaction with Native Agents
- Author
-
Qin, Yujia, Ye, Yining, Fang, Junjie, Wang, Haoming, Liang, Shihao, Tian, Shizuo, Zhang, Junda, Li, Jiahao, Li, Yunxin, Huang, Shijue, Zhong, Wanjun, Li, Kuanye, Yang, Jiale, Miao, Yu, Lin, Woyu, Liu, Longxiang, Jiang, Xu, Ma, Qianli, Li, Jingyu, Xiao, Xiaojun, Cai, Kai, Li, Chuang, Zheng, Yaowei, Jin, Chaolin, Li, Chen, Zhou, Xiao, Wang, Minchao, Chen, Haoli, Li, Zhaojian, Yang, Haihua, Liu, Haifeng, Lin, Feng, Peng, Tao, Liu, Xin, and Shi, Guang
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction - Abstract
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.
- Published
- 2025
8. TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection
- Author
-
Cao, Yang, Yang, Sikun, Li, Chen, Xiang, Haolong, Qi, Lianyong, Liu, Bo, Li, Rongsheng, and Liu, Ming
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Text anomaly detection is crucial for identifying spam, misinformation, and offensive language in natural language processing tasks. Despite the growing adoption of embedding-based methods, their effectiveness and generalizability across diverse application scenarios remain under-explored. To address this, we present TAD-Bench, a comprehensive benchmark designed to systematically evaluate embedding-based approaches for text anomaly detection. TAD-Bench integrates multiple datasets spanning different domains, combining state-of-the-art embeddings from large language models with a variety of anomaly detection algorithms. Through extensive experiments, we analyze the interplay between embeddings and detection methods, uncovering their strengths, weaknesses, and applicability to different tasks. These findings offer new perspectives on building more robust, efficient, and generalizable anomaly detection systems for real-world applications.
- Published
- 2025
9. Mining Intraday Risk Factor Collections via Hierarchical Reinforcement Learning based on Transferred Options
- Author
-
Xu, Wenyan, Chen, Jiayu, Li, Chen, Hu, Yonghong, and Lu, Zhonghua
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
Traditional risk factors like beta, size/value, and momentum often lag behind market dynamics in measuring and predicting stock return volatility. Statistical models like PCA and factor analysis fail to capture hidden nonlinear relationships. Genetic programming (GP) can identify nonlinear factors but often lacks mechanisms for evaluating factor quality, and the resulting formulas are complex. To address these challenges, we propose a Hierarchical Proximal Policy Optimization (HPPO) framework for automated factor generation and evaluation. HPPO uses two PPO models: a high-level policy assigns weights to stock features, and a low-level policy identifies latent nonlinear relationships. The Pearson correlation between generated factors and return volatility serves as the reward signal. Transfer learning pre-trains the high-level policy on large-scale historical data, fine-tuning it with the latest data to adapt to new features and shifts. Experiments show the HPPO-TO algorithm achieves a 25\% excess return in HFT markets across China (CSI 300/800), India (Nifty 100), and the US (S\&P 500). Code and data are available at https://github.com/wencyxu/HRL-HF_risk_factor_set., Comment: accepted by AAAI25 worshop Full research papers
- Published
- 2025
10. Exact Constraint of Density Functional Approximations at the Semiclassical Limit
- Author
-
Li, Yunzhi and Li, Chen
- Subjects
Physics - Computational Physics - Abstract
We introduce the semiclassical limit to electronic systems by taking the limit $\hbar\rightarrow 0$ in the solution of Schr\"odinger equations. We show that this limit is closely related to one type of strong correlation that is particularly challenging from conventional multi-configurational perspective but can be readily described through semiclassical analysis. Furthermore, by studying the performance of density functional approximations (DFAs) in the semiclassical limit, we find that mainstream DFAs have erroneous divergent energy behaviors as $\hbar \rightarrow 0$, violating the exact constraint of finite energy. Importantly, by making connection of the significantly underestimated DFA energies of many strongly correlated transition-metal diatomic molecules to their rather small estimated $\hbar_{\text{eff}}$, we demonstrate the usefulness of our semiclassical analysis and its promise for inspiring better DFAs.
- Published
- 2025
11. HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving
- Author
-
Li, Yang, Du, Dong, Song, Linfeng, Li, Chen, Wang, Weikang, Yang, Tao, and Mi, Haitao
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
We introduce HunyuanProver, an language model finetuned from the Hunyuan 7B for interactive automatic theorem proving with LEAN4. To alleviate the data sparsity issue, we design a scalable framework to iterative synthesize data with low cost. Besides, guided tree search algorithms are designed to enable effective ``system 2 thinking`` of the prover. HunyuanProver achieves state-of-the-art (SOTA) performances on major benchmarks. Specifically, it achieves a pass of 68.4% on the miniF2F-test compared to 65.9%, the current SOTA results. It proves 4 IMO statements (imo_1960_p2, imo_1962_p2}, imo_1964_p2 and imo_1983_p6) in miniF2F-test. To benefit the community, we will open-source a dataset of 30k synthesized instances, where each instance contains the original question in natural language, the converted statement by autoformalization, and the proof by HunyuanProver.
- Published
- 2024
12. Gx2Mol: De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning
- Author
-
Li, Chen, Matsukiyo, Yuki, and Yamanishi, Yoshihiro
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Quantitative Biology - Quantitative Methods - Abstract
De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a deep generative model, Gx2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed Gx2Mol model can produce new molecules with potential bioactivities and drug-like properties.
- Published
- 2024
13. DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check
- Author
-
Qiao, Ziheng, Zhou, Houquan, Liu, Yumeng, Li, Zhenghua, Zhang, Min, Zhang, Bo, Li, Chen, Zhang, Ji, and Huang, Fei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
One key characteristic of the Chinese spelling check (CSC) task is that incorrect characters are usually similar to the correct ones in either phonetics or glyph. To accommodate this, previous works usually leverage confusion sets, which suffer from two problems, i.e., difficulty in determining which character pairs to include and lack of probabilities to distinguish items in the set. In this paper, we propose a light-weight plug-and-play DISC (i.e., decoding intervention with similarity of characters) module for CSC models.DISC measures phonetic and glyph similarities between characters and incorporates this similarity information only during the inference phase. This method can be easily integrated into various existing CSC models, such as ReaLiSe, SCOPE, and ReLM, without additional training costs. Experiments on three CSC benchmarks demonstrate that our proposed method significantly improves model performance, approaching and even surpassing the current state-of-the-art models.
- Published
- 2024
14. RemDet: Rethinking Efficient Model Design for UAV Object Detection
- Author
-
Li, Chen, Zhao, Rui, Wang, Zeyu, Xu, Huiying, and Zhu, Xinzhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Object detection in Unmanned Aerial Vehicle (UAV) images has emerged as a focal area of research, which presents two significant challenges: i) objects are typically small and dense within vast images; ii) computational resource constraints render most models unsuitable for real-time deployment. Current real-time object detectors are not optimized for UAV images, and complex methods designed for small object detection often lack real-time capabilities. To address these challenges, we propose a novel detector, RemDet (Reparameter efficient multiplication Detector). Our contributions are as follows: 1) Rethinking the challenges of existing detectors for small and dense UAV images, and proposing information loss as a design guideline for efficient models. 2) We introduce the ChannelC2f module to enhance small object detection performance, demonstrating that high-dimensional representations can effectively mitigate information loss. 3) We design the GatedFFN module to provide not only strong performance but also low latency, effectively addressing the challenges of real-time detection. Our research reveals that GatedFFN, through the use of multiplication, is more cost-effective than feed-forward networks for high-dimensional representation. 4) We propose the CED module, which combines the advantages of ViT and CNN downsampling to effectively reduce information loss. It specifically enhances context information for small and dense objects. Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real-time efficiency and superior performance of our methods. On the challenging UAV dataset VisDrone, our methods not only provided state-of-the-art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single 4090., Comment: Accepted to AAAI25
- Published
- 2024
15. Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer
- Author
-
Labiosa, Adam, Wang, Zhihan, Agarwal, Siddhant, Cong, William, Hemkumar, Geethika, Harish, Abhinav Narayan, Hong, Benjamin, Kelle, Josh, Li, Chen, Li, Yuhao, Shao, Zisen, Stone, Peter, and Hanna, Josiah P.
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decision-making in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures., Comment: Submitted to ICRA 2025
- Published
- 2024
16. Collisional scattering of strongly interacting D-band Feshbach molecules in optical lattices
- Author
-
Wei, Fansu, Lai, Chi-Kin, Chen, Yuying, Zhang, Zhengxi, Liang, Yun, Shui, Hongmian, Li, Chen, and Zhou, Xiaoji
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
The excited bands in optical lattices manifest an important tool for studying quantum simulation and many-body physics, making it crucial to measure high-band scattering dynamics under strong interactions. This work investigates both experimentally and theoretically the collisional scattering of $^{6}\rm Li_2$ molecular Bose-Einstein condensate in the $D$ band of a one-dimensional optical lattice, with interaction strength directly tunable via magnetic Feshbach resonance. We find a clear dependence of the $D$-band lifetimes on the interaction strength within the strongly interacting regime, which arises from the fact that the scattering cross-section is proportional to the square of the scattering length. The maximum lifetime versus lattice depth is measured to reveal the effects of interactions. We also investigate the scattering channels of $D$-band molecules under different interaction levels and develop a reliable two-body scattering rate equation. This work provides insight into the interplay between interaction and the collisional scattering of high-band bosons in optical lattices, paving the way for research into strong correlation effects in high-band lattice systems.
- Published
- 2024
17. Manipulating topological charges via engineering zeros of wave functions
- Author
-
Li, Xiao-Lin, Gong, Ming, Wang, Yu-Hao, and Zhao, Li-Chen
- Subjects
Condensed Matter - Quantum Gases ,Nonlinear Sciences - Pattern Formation and Solitons ,Quantum Physics - Abstract
Topological charges are typically manipulated by managing their energy bands in quantum systems. In this work, we propose a new approach to manipulate the topological charges of systems by engineering density zeros of localized wave excitations in them. We demonstrate via numerical simulation and analytical analysis that the winding number of a toroidal Bose condensate can be well manipulated by engineering the relative velocities between the dark solitons and their backgrounds. The crossing of relative velocities through zero makes a change in winding number by inducing density zeros during acceleration, with the direction of crossing determining whether charge increases or decreases. Possibilities of observing such winding number manipulation are discussed for current experimental settings. This idea may also be to higher dimensions. These results will inspire new pathways in designing topological materials using quantum simulation platforms., Comment: 12 pages, 10 figures
- Published
- 2024
18. VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features
- Author
-
Li, Sifei, Yang, Binxin, Yin, Chunji, Sun, Chong, Zhang, Yuxin, Dong, Weiming, and Li, Chen
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Video-to-music generation presents significant potential in video production, requiring the generated music to be both semantically and rhythmically aligned with the video. Achieving this alignment demands advanced music generation capabilities, sophisticated video understanding, and an efficient mechanism to learn the correspondence between the two modalities. In this paper, we propose VidMusician, a parameter-efficient video-to-music generation framework built upon text-to-music models. VidMusician leverages hierarchical visual features to ensure semantic and rhythmic alignment between video and music. Specifically, our approach utilizes global visual features as semantic conditions and local visual features as rhythmic cues. These features are integrated into the generative backbone via cross-attention and in-attention mechanisms, respectively. Through a two-stage training process, we incrementally incorporate semantic and rhythmic features, utilizing zero initialization and identity initialization to maintain the inherent music-generative capabilities of the backbone. Additionally, we construct a diverse video-music dataset, DVMSet, encompassing various scenarios, such as promo videos, commercials, and compilations. Experiments demonstrate that VidMusician outperforms state-of-the-art methods across multiple evaluation metrics and exhibits robust performance on AI-generated videos. Samples are available at \url{https://youtu.be/EPOSXwtl1jw}.
- Published
- 2024
19. TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
- Author
-
Xu, Meilong, Gupta, Saumya, Hu, Xiaoling, Li, Chen, Abousamra, Shahira, Samaras, Dimitris, Prasanna, Prateek, and Chen, Chao
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Accurately modeling multi-class cell topology is crucial in digital pathology, as it provides critical insights into tissue structure and pathology. The synthetic generation of cell topology enables realistic simulations of complex tissue environments, enhances downstream tasks by augmenting training data, aligns more closely with pathologists' domain knowledge, and offers new opportunities for controlling and generalizing the tumor microenvironment. In this paper, we propose a novel approach that integrates topological constraints into a diffusion model to improve the generation of realistic, contextually accurate cell topologies. Our method refines the simulation of cell distributions and interactions, increasing the precision and interpretability of results in downstream tasks such as cell detection and classification. To assess the topological fidelity of generated layouts, we introduce a new metric, Topological Frechet Distance (TopoFD), which overcomes the limitations of traditional metrics like FID in evaluating topological structure. Experimental results demonstrate the effectiveness of our approach in generating multi-class cell layouts that capture intricate topological relationships., Comment: 14 pages, 7 figures
- Published
- 2024
20. Noise Adaptor: Enhancing Low-Latency Spiking Neural Networks through Noise-Injected Low-Bit ANN Conversion
- Author
-
Li, Chen and Rajendran, Bipin.
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
We present Noise Adaptor, a novel method for constructing competitive low-latency spiking neural networks (SNNs) by converting noise-injected, low-bit artificial neural networks (ANNs). This approach builds on existing ANN-to-SNN conversion techniques but offers several key improvements: (1) By injecting noise during quantized ANN training, Noise Adaptor better accounts for the dynamic differences between ANNs and SNNs, significantly enhancing SNN accuracy. (2) Unlike previous methods, Noise Adaptor does not require the application of run-time noise correction techniques in SNNs, thereby avoiding modifications to the spiking neuron model and control flow during inference. (3) Our method extends the capability of handling deeper architectures, achieving successful conversions of activation-quantized ResNet-101 and ResNet-152 to SNNs. We demonstrate the effectiveness of our method on CIFAR-10 and ImageNet, achieving competitive performance. The code will be made available as open-source.
- Published
- 2024
21. Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
- Author
-
Li, Chen, Lammie, Corey, Gallo, Manuel Le, and Rajendran, Bipin
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Machine Learning - Abstract
Analog in-memory computing (AIMC) has emerged as a promising solution to overcome the von Neumann bottleneck, accelerating neural network computations and improving computational efficiency. While AIMC has demonstrated success with architectures such as CNNs, MLPs, and RNNs, deploying transformer-based models using AIMC presents unique challenges. Transformers are expected to handle diverse downstream tasks and adapt to new user data or instructions after deployment, which requires more flexible approaches to suit AIMC constraints. In this paper, we propose a novel method for deploying pre-trained transformer models onto AIMC hardware. Unlike traditional approaches requiring hardware-aware training, our technique allows direct deployment without the need for retraining the original model. Instead, we utilize lightweight, low-rank adapters -- compact modules stored in digital cores -- to adapt the model to hardware constraints. We validate our approach on MobileBERT, demonstrating accuracy on par with, or even exceeding, a traditional hardware-aware training approach. Our method is particularly appealing in multi-task scenarios, as it enables a single analog model to be reused across multiple tasks. Moreover, it supports on-chip adaptation to new hardware constraints and tasks without updating analog weights, providing a flexible and versatile solution for real-world AI applications. Code is available.
- Published
- 2024
22. Cavity-Quantum Electrodynamics with Moir\'e Flatband Photonic Crystals
- Author
-
Wang, Yu-Tong, Ye, Qi-Hang, Yan, Jun-Yong, Qiao, Yufei, Chen, Chen, Cheng, Xiao-Tian, Li, Chen-Hui, Zhang, Zi-Jian, Huang, Cheng-Nian, Meng, Yun, Zou, Kai, Zhan, Wen-Kang, Zhao, Chao, Hu, Xiaolong, Tee, Clarence Augustine T H, Sha, Wei E. I., Huang, Zhixiang, Liu, Huiyun, Jin, Chao-Yuan, Ying, Lei, and Liu, Feng
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Quantum Physics - Abstract
Quantum emitters are a key component in photonic quantum technologies. Enhancing their single-photon emission by engineering the photonic environment using cavities can significantly improve the overall efficiency in quantum information processing. However, this enhancement is often constrained by the need for precise nanoscale control over the emitter's position within micro- or nano-cavities. Inspired by the fascinating physics of moir\'e patterns, we present an approach to strongly modify the spontaneous emission rate of a quantum emitter using a finely designed multilayer moir\'e photonic crystal with a robust isolated-flatband dispersion. Theoretical analysis reveals that, due to its nearly infinite photonic density of states, the moir\'e cavity can simultaneously achieve a high Purcell factor and exhibit large tolerance over the emitter's position. We experimentally demonstrate the coupling between this moir\'e cavity and a quantum dot through the cavity-determined polarization of the dot's emission. The radiative lifetime of the quantum dot can be tuned by a factor of 40, ranging from 42 ps to 1692 ps, which is attributed to strong Purcell enhancement and Purcell inhibition effects. Our findings pave the way for moir\'e flatband cavity-enhanced quantum light sources, quantum optical switches, and quantum nodes for quantum internet applications.
- Published
- 2024
23. Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
- Author
-
Li, Zhuo, Luo, Mingshuang, Hou, Ruibing, Zhao, Xin, Liu, Hao, Chang, Hong, Liu, Zimo, and Li, Chen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Human motion generation plays a vital role in applications such as digital humans and humanoid robot control. However, most existing approaches disregard physics constraints, leading to the frequent production of physically implausible motions with pronounced artifacts such as floating and foot sliding. In this paper, we propose \textbf{Morph}, a \textbf{Mo}tion-f\textbf{r}ee \textbf{ph}ysics optimization framework, comprising a Motion Generator and a Motion Physics Refinement module, for enhancing physical plausibility without relying on costly real-world motion data. Specifically, the Motion Generator is responsible for providing large-scale synthetic motion data, while the Motion Physics Refinement Module utilizes these synthetic data to train a motion imitator within a physics simulator, enforcing physical constraints to project the noisy motions into a physically-plausible space. These physically refined motions, in turn, are used to fine-tune the Motion Generator, further enhancing its capability. Experiments on both text-to-motion and music-to-dance generation tasks demonstrate that our framework achieves state-of-the-art motion generation quality while improving physical plausibility drastically., Comment: 15 pages, 6 figures
- Published
- 2024
24. Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
- Author
-
Cheng, Jikang, Yan, Zhiyuan, Zhang, Ying, Hao, Li, Ai, Jiaxin, Zou, Qin, Li, Chen, and Wang, Zhongyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The rapid advancement of face forgery techniques has introduced a growing variety of forgeries. Incremental Face Forgery Detection (IFFD), involving gradually adding new forgery data to fine-tune the previously trained model, has been introduced as a promising strategy to deal with evolving forgery methods. However, a naively trained IFFD model is prone to catastrophic forgetting when new forgeries are integrated, as treating all forgeries as a single ''Fake" class in the Real/Fake classification can cause different forgery types overriding one another, thereby resulting in the forgetting of unique characteristics from earlier tasks and limiting the model's effectiveness in learning forgery specificity and generality. In this paper, we propose to stack the latent feature distributions of previous and new tasks brick by brick, $\textit{i.e.}$, achieving $\textbf{aligned feature isolation}$. In this manner, we aim to preserve learned forgery information and accumulate new knowledge by minimizing distribution overriding, thereby mitigating catastrophic forgetting. To achieve this, we first introduce Sparse Uniform Replay (SUR) to obtain the representative subsets that could be treated as the uniformly sparse versions of the previous global distributions. We then propose a Latent-space Incremental Detector (LID) that leverages SUR data to isolate and align distributions. For evaluation, we construct a more advanced and comprehensive benchmark tailored for IFFD. The leading experimental results validate the superiority of our method.
- Published
- 2024
25. A Monocular SLAM-based Multi-User Positioning System with Image Occlusion in Augmented Reality
- Author
-
Lien, Wei-Hsiang, Chandra, Benedictus Kent, Fischer, Robin, Tang, Ya-Hui, Wang, Shiann-Jang, Hsu, Wei-En, and Fu, Li-Chen
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, with the rapid development of augmented reality (AR) technology, there is an increasing demand for multi-user collaborative experiences. Unlike for single-user experiences, ensuring the spatial localization of every user and maintaining synchronization and consistency of positioning and orientation across multiple users is a significant challenge. In this paper, we propose a multi-user localization system based on ORB-SLAM2 using monocular RGB images as a development platform based on the Unity 3D game engine. This system not only performs user localization but also places a common virtual object on a planar surface (such as table) in the environment so that every user holds a proper perspective view of the object. These generated virtual objects serve as reference points for multi-user position synchronization. The positioning information is passed among every user's AR devices via a central server, based on which the relative position and movement of other users in the space of a specific user are presented via virtual avatars all with respect to these virtual objects. In addition, we use deep learning techniques to estimate the depth map of an image from a single RGB image to solve occlusion problems in AR applications, making virtual objects appear more natural in AR scenes.
- Published
- 2024
26. DiHuR: Diffusion-Guided Generalizable Human Reconstruction
- Author
-
Chen, Jinnan, Li, Chen, and Lee, Gim Hee
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce DiHuR, a novel Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images. While existing generalizable human radiance fields excel at novel view synthesis, they often struggle with comprehensive 3D reconstruction. Similarly, directly optimizing implicit Signed Distance Function (SDF) fields from sparse-view images typically yields poor results due to limited overlap. To enhance 3D reconstruction quality, we propose using learnable tokens associated with SMPL vertices to aggregate sparse view features and then to guide SDF prediction. These tokens learn a generalizable prior across different identities in training datasets, leveraging the consistent projection of SMPL vertices onto similar semantic areas across various human identities. This consistency enables effective knowledge transfer to unseen identities during inference. Recognizing SMPL's limitations in capturing clothing details, we incorporate a diffusion model as an additional prior to fill in missing information, particularly for complex clothing geometries. Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision. DiHuR demonstrates superior performance in both within-dataset and cross-dataset generalization settings, as validated on THuman, ZJU-MoCap, and HuMMan datasets compared to existing methods., Comment: Accepted to WACV 2025
- Published
- 2024
27. Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
- Author
-
Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, and Lee, Hung-yi
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin., Comment: Work in progress
- Published
- 2024
28. Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
- Author
-
Huang, Chien-yu, Chen, Wei-Chih, Yang, Shu-wen, Liu, Andy T., Li, Chen-An, Lin, Yu-Xiang, Tseng, Wei-Cheng, Diwan, Anuj, Shih, Yi-Jen, Shi, Jiatong, Chen, William, Chen, Xuanjun, Hsiao, Chi-Yuan, Peng, Puyuan, Wang, Shih-Heng, Kuan, Chun-Yi, Lu, Ke-Han, Chang, Kai-Wei, Yang, Chih-Kai, Ritter-Gutierrez, Fabian, Chuang, Ming To, Huang, Kuan-Po, Arora, Siddhant, Lin, You-Kuan, Yeo, Eunjung, Chang, Kalvin, Chien, Chung-Ming, Choi, Kwanghee, Hsieh, Cheng-Hsiu, Lin, Yi-Cheng, Yu, Chee-En, Chiu, I-Hsiang, Guimarães, Heitor R., Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Chang, Ting-Wu, Chen, Chun Wei, Chen, Shou-Jen, Chen, Yu-Hua, Cheng, Hsi-Chun, Dhawan, Kunal, Fang, Jia-Lin, Fang, Shi-Xin, Chiang, Kuan-Yu Fang, Fu, Chi An, Hsiao, Hsien-Fu, Hsu, Ching Yu, Huang, Shao-Syuan, Wei, Lee Chen, Lin, Hsi-Che, Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Liu, Ting-Chun, Lu, Li-Chun, Pai, Tsung-Min, Pasad, Ankita, Kuan, Shih-Yun Shan, Shon, Suwon, Tang, Yuxun, Tsai, Yun-Shao, Wei, Jui-Chiang, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Yang, Chao-Han Huck, Yang, Chieh-Chi, Yip, Jia Qi, Yuan, Shao-Xiang, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, and Lee, Hung-yi
- Subjects
Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.
- Published
- 2024
29. PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
- Author
-
Liu, Ruyang, Tang, Haoran, Liu, Haibo, Ge, Yixiao, Shan, Ying, Li, Chen, and Yang, Jiankun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The past year has witnessed the significant advancement of video-based large language models. However, the challenge of developing a unified model for both short and long video understanding remains unresolved. Most existing video LLMs cannot handle hour-long videos, while methods custom for long videos tend to be ineffective for shorter videos and images. In this paper, we identify the key issue as the redundant content in videos. To address this, we propose a novel pooling strategy that simultaneously achieves token compression and instruction-aware visual feature aggregation. Our model is termed Prompt-guided Pooling LLaVA, or PPLLaVA for short. Specifically, PPLLaVA consists of three core components: the CLIP-based visual-prompt alignment that extracts visual information relevant to the user's instructions, the prompt-guided pooling that compresses the visual sequence to arbitrary scales using convolution-style pooling, and the clip context extension designed for lengthy prompt common in visual dialogue. Moreover, our codebase also integrates the most advanced video Direct Preference Optimization (DPO) and visual interleave training. Extensive experiments have validated the performance of our model. With superior throughput and only 1024 visual context, PPLLaVA achieves better results on image benchmarks as a video LLM, while achieving state-of-the-art performance across various video benchmarks, excelling in tasks ranging from caption generation to multiple-choice questions, and handling video lengths from seconds to hours. Codes have been available at https://github.com/farewellthree/PPLLaVA.
- Published
- 2024
30. SUANPAN: Scalable Photonic Linear Vector Machine
- Author
-
Yang, Ziyue, Li, Chen, Ran, Yuqia, Li, Yongzhuo, Feng, Xue, Cui, Kaiyu, Liu, Fang, Sun, Hao, Zhang, Wei, Ye, Yu, Qiao, Fei, Ning, Cun-Zheng, Wang, Jiaxing, Chang-Hasnain, Connie J., and Huang, Yidong
- Subjects
Physics - Optics - Abstract
Photonic linear operation is a promising approach to handle the extensive vector multiplications in artificial intelligence techniques due to the natural bosonic parallelism and high-speed information transmission of photonics. Although it is believed that maximizing the interaction of the light beams is necessary to fully utilize the parallelism and tremendous efforts have been made in past decades, the achieved dimensionality of vector-matrix multiplication is very limited due to the difficulty of scaling up a tightly interconnected or highly coupled optical system. Additionally, there is still a lack of a universal photonic computing architecture that can be readily merged with existing computing system to meet the computing power demand of AI techniques. Here, we propose a programmable and reconfigurable photonic linear vector machine to perform only the inner product of two vectors, formed by a series of independent basic computing units, while each unit is just one pair of light-emitter and photodetector. Since there is no interaction among light beams inside, extreme scalability could be achieved by simply duplicating the independent basic computing unit while there is no requirement of large-scale analog-to-digital converter and digital-to-analog converter arrays. Our architecture is inspired by the traditional Chinese Suanpan or abacus and thus is denoted as photonic SUANPAN. As a proof of principle, SUANPAN architecture is implemented with an 8*8 vertical cavity surface emission laser array and an 8*8 MoTe2 two-dimensional material photodetector array. We believe that our proposed photonic SUANPAN is capable of serving as a fundamental linear vector machine that can be readily merged with existing electronic digital computing system and is potential to enhance the computing power for future various AI applications.
- Published
- 2024
31. Situational Scene Graph for Structured Human-centric Situation Understanding
- Author
-
Sugandhika, Chinthani, Li, Chen, Rajan, Deepu, and Fernando, Basura
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Graph based representation has been widely used in modelling spatio-temporal relationships in video understanding. Although effective, existing graph-based approaches focus on capturing the human-object relationships while ignoring fine-grained semantic properties of the action components. These semantic properties are crucial for understanding the current situation, such as where does the action takes place, what tools are used and functional properties of the objects. In this work, we propose a graph-based representation called Situational Scene Graph (SSG) to encode both human-object relationships and the corresponding semantic properties. The semantic details are represented as predefined roles and values inspired by situation frame, which is originally designed to represent a single action. Based on our proposed representation, we introduce the task of situational scene graph generation and propose a multi-stage pipeline Interactive and Complementary Network (InComNet) to address the task. Given that the existing datasets are not applicable to the task, we further introduce a SSG dataset whose annotations consist of semantic role-value frames for human, objects and verb predicates of human-object relations. Finally, we demonstrate the effectiveness of our proposed SSG representation by testing on different downstream tasks. Experimental results show that the unified representation can not only benefit predicate classification and semantic role-value classification, but also benefit reasoning tasks on human-centric situation understanding. We will release the code and the dataset soon., Comment: Accepted for WACV 2025
- Published
- 2024
32. MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps
- Author
-
Xu, Yating, Li, Chen, and Lee, Gim Hee
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at https://github.com/Pixie8888/MVSDet., Comment: Accepted by NeurIPS 2024
- Published
- 2024
33. Evaluating AI-Generated Essays with GRE Analytical Writing Assessment
- Author
-
Zhong, Yang, Hao, Jiangang, Fauss, Michael, Li, Chen, and Wang, Yuan
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The recent revolutionary advance in generative AI enables the generation of realistic and coherent texts by large language models (LLMs). Despite many existing evaluation metrics on the quality of the generated texts, there is still a lack of rigorous assessment of how well LLMs perform in complex and demanding writing assessments. This study examines essays generated by ten leading LLMs for the analytical writing assessment of the Graduate Record Exam (GRE). We assessed these essays using both human raters and the e-rater automated scoring engine as used in the GRE scoring pipeline. Notably, the top-performing Gemini and GPT-4o received an average score of 4.78 and 4.67, respectively, falling between "generally thoughtful, well-developed analysis of the issue and conveys meaning clearly" and "presents a competent analysis of the issue and conveys meaning with acceptable clarity" according to the GRE scoring guideline. We also evaluated the detection accuracy of these essays, with detectors trained on essays generated by the same and different LLMs., Comment: 20 pages, 6 figures
- Published
- 2024
34. Workflows Community Summit 2024: Future Trends and Challenges in Scientific Workflows
- Author
-
da Silva, Rafael Ferreira, Bard, Deborah, Chard, Kyle, de Witt, Shaun, Foster, Ian T., Gibbs, Tom, Goble, Carole, Godoy, William, Gustafsson, Johan, Haus, Utz-Uwe, Hudson, Stephen, Jha, Shantenu, Los, Laila, Paine, Drew, Suter, Frédéric, Ward, Logan, Wilkinson, Sean, Amaris, Marcos, Babuji, Yadu, Bader, Jonathan, Balin, Riccardo, Balouek, Daniel, Beecroft, Sarah, Belhajjame, Khalid, Bhattarai, Rajat, Brewer, Wes, Brunk, Paul, Caino-Lores, Silvina, Casanova, Henri, Cassol, Daniela, Coleman, Jared, Coleman, Taina, Colonnelli, Iacopo, Da Silva, Anderson Andrei, de Oliveira, Daniel, Elahi, Pascal, Elfaramawy, Nour, Elwasif, Wael, Etz, Brian, Fahringer, Thomas, Ferreira, Wesley, Filgueira, Rosa, Tande, Jacob Fosso, Gadelha, Luiz, Gallo, Andy, Garijo, Daniel, Georgiou, Yiannis, Gritsch, Philipp, Grubel, Patricia, Gueroudji, Amal, Guilloteau, Quentin, Hamalainen, Carlo, Enriquez, Rolando Hong, Huet, Lauren, Kesling, Kevin Hunter, Iborra, Paula, Jahangiri, Shiva, Janssen, Jan, Jordan, Joe, Kanwal, Sehrish, Kunstmann, Liliane, Lehmann, Fabian, Leser, Ulf, Li, Chen, Liu, Peini, Luettgau, Jakob, Lupat, Richard, Fernandez, Jose M., Maheshwari, Ketan, Malik, Tanu, Marquez, Jack, Matsuda, Motohiko, Medic, Doriana, Mohammadi, Somayeh, Mulone, Alberto, Navarro, John-Luke, Ng, Kin Wai, Noelp, Klaus, Kinoshita, Bruno P., Prout, Ryan, Crusoe, Michael R., Ristov, Sashko, Robila, Stefan, Rosendo, Daniel, Rowell, Billy, Rybicki, Jedrzej, Sanchez, Hector, Saurabh, Nishant, Saurav, Sumit Kumar, Scogland, Tom, Senanayake, Dinindu, Shin, Woong, Sirvent, Raul, Skluzacek, Tyler, Sly-Delgado, Barry, Soiland-Reyes, Stian, Souza, Abel, Souza, Renan, Talia, Domenico, Tallent, Nathan, Thamsen, Lauritz, Titov, Mikhail, Tovar, Benjamin, Vahi, Karan, Vardar-Irrgang, Eric, Vartina, Edite, Wang, Yuandou, Wouters, Merridee, Yu, Qi, Bkhetan, Ziad Al, and Zulfiqar, Mahnoor
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and exascale computing has revolutionized scientific workflows, enabling higher-fidelity models and complex, time-sensitive processes, while introducing challenges in managing heterogeneous environments and multi-facility data dependencies. The rise of large language models is driving computational demands to zettaflop scales, necessitating modular, adaptable systems and cloud-service models to optimize resource utilization and ensure reproducibility. Multi-facility workflows present challenges in data movement, curation, and overcoming institutional silos, while diverse hardware architectures require integrating workflow considerations into early system design and developing standardized resource management tools. The summit emphasized improving user experience in workflow systems and ensuring FAIR workflows to enhance collaboration and accelerate scientific discovery. Key recommendations include developing standardized metrics for time-sensitive workflows, creating frameworks for cloud-HPC integration, implementing distributed-by-design workflow modeling, establishing multi-facility authentication protocols, and accelerating AI integration in HPC workflow management. The summit also called for comprehensive workflow benchmarks, workflow-specific UX principles, and a FAIR workflow maturity model, highlighting the need for continued collaboration in addressing the complex challenges posed by the convergence of AI, HPC, and multi-facility research environments.
- Published
- 2024
- Full Text
- View/download PDF
35. Cooperation in Public Goods Games: Leveraging Other-Regarding Reinforcement Learning on Hypergraphs
- Author
-
Li, Bo-Ying, Zhang, Zhen-Na, Zheng, Guo-Zhong, Cai, Chao-Ran, Zhang, Ji-Qiang, and Li, Chen
- Subjects
Physics - Physics and Society ,Nonlinear Sciences - Adaptation and Self-Organizing Systems - Abstract
Cooperation as a self-organized collective behavior plays a significant role in the evolution of ecosystems and human society. Reinforcement learning (RL) offers a new perspective, distinct from imitation learning in evolutionary games, for exploring the mechanisms underlying its emergence. However, most existing studies with the public good game (PGG) employ a self-regarding setup or are on pairwise interaction networks. Players in the real world, however, optimize their policies based not only on their histories but also on the histories of their co-players, and the game is played in a group manner. In the work, we investigate the evolution of cooperation in the PGG under the other-regarding reinforcement learning evolutionary game (OR-RLEG) on hypergraph by combining the Q-learning algorithm and evolutionary game framework, where other players' action history is incorporated and the game is played on hypergraphs. Our results show that as the synergy factor increases, the parameter interval is divided into three distinct regions, the absence of cooperation (AC), medium cooperation (MC), and high cooperation (HC), accompanied by two abrupt transitions in the cooperation level near two transition points, respectively. Interestingly, we identify regular and anti-coordinated chessboard structures in the spatial pattern that positively contribute to the first cooperation transition but adversely affect the second. Furthermore, we provide a theoretical treatment for the first transition with an approximated first transition point and reveal that players with a long-sighted perspective and low exploration rate are more likely to reciprocate kindness with each other, thus facilitating the emergence of cooperation. Our findings contribute to understanding the evolution of human cooperation, where other-regarding information and group interactions are commonplace.
- Published
- 2024
36. BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models
- Author
-
Wang, Fangyikang, Yin, Hubery, Dong, Yuejiang, Zhu, Huminhao, Zhang, Chao, Zhao, Hanbin, Qian, Hui, and Li, Chen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications., Comment: accepted paper by NeurIPS
- Published
- 2024
37. Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology
- Author
-
Nimbekar, Anagha, Katti, Prabodh, Li, Chen, Al-Hashimi, Bashir M., Acharyya, Amit, and Rajendran, Bipin
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.
- Published
- 2024
38. A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models
- Author
-
Zhou, Houquan, Li, Zhenghua, Zhang, Bo, Li, Chen, Lai, Shaopeng, Zhang, Ji, Huang, Fei, and Zhang, Min
- Subjects
Computer Science - Computation and Language - Abstract
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches. The key idea is to use an LLM as a pure language model in a conventional manner. The LLM goes through the input sentence from the beginning, and at each inference step, produces a distribution over its vocabulary for deciding the next token, given a partial sentence. To ensure that the output sentence remains faithful to the input sentence, we design a minimal distortion model that utilizes pronunciation or shape similarities between the original and replaced characters. Furthermore, we propose two useful reward strategies to address practical challenges specific to the CSC task. Experiments on five public datasets demonstrate that our approach significantly improves LLM performance, enabling them to compete with state-of-the-art domain-general CSC models., Comment: Accepted at Main Conference of EMNLP 2024
- Published
- 2024
39. Tailored Federated Learning: Leveraging Direction Regulation & Knowledge Distillation
- Author
-
Tang, Huidong, Li, Chen, Yu, Huachong, Kamei, Sayaka, and Morimoto, Yasuhiko
- Subjects
Computer Science - Machine Learning - Abstract
Federated learning (FL) has emerged as a transformative training paradigm, particularly invaluable in privacy-sensitive domains like healthcare. However, client heterogeneity in data, computing power, and tasks poses a significant challenge. To address such a challenge, we propose an FL optimization algorithm that integrates model delta regularization, personalized models, federated knowledge distillation, and mix-pooling. Model delta regularization optimizes model updates centrally on the server, efficiently updating clients with minimal communication costs. Personalized models and federated knowledge distillation strategies are employed to tackle task heterogeneity effectively. Additionally, mix-pooling is introduced to accommodate variations in the sensitivity of readout operations. Experimental results demonstrate the remarkable accuracy and rapid convergence achieved by model delta regularization. Additionally, the federated knowledge distillation algorithm notably improves FL performance, especially in scenarios with diverse data. Moreover, mix-pooling readout operations provide tangible benefits for clients, showing the effectiveness of our proposed methods.
- Published
- 2024
40. When Molecular GAN Meets Byte-Pair Encoding
- Author
-
Tang, Huidong, Li, Chen, and Morimoto, Yasuhiko
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by detailed visualization analysis, robustly demonstrate the effectiveness of our GAN.
- Published
- 2024
41. Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming
- Author
-
Li, Chen, Zong, Tongyu, Hu, Yueyu, Wang, Yao, and Liu, Yong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model that leverages the historical 3D visibility data and incorporates spatial perception, neighboring cell correlation, and occlusion information to predict the cell visibility in the future. Our model significantly improves the long-term cell visibility prediction, reducing the prediction MSE loss by up to 50% compared to the state-of-the-art models while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
- Published
- 2024
42. Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking
- Author
-
Bai, Jun, Chen, Zhuofan, Li, Zhenzi, Hong, Hanhua, Zhang, Jianfei, Li, Chen, Lin, Chenghua, and Rong, Wenge
- Subjects
Computer Science - Artificial Intelligence - Abstract
Text ranking has witnessed significant advancements, attributed to the utilization of dual-encoder enhanced by Pre-trained Language Models (PLMs). Given the proliferation of available PLMs, selecting the most effective one for a given dataset has become a non-trivial challenge. As a promising alternative to human intuition and brute-force fine-tuning, Transferability Estimation (TE) has emerged as an effective approach to model selection. However, current TE methods are primarily designed for classification tasks, and their estimated transferability may not align well with the objectives of text ranking. To address this challenge, we propose to compute the expected rank as transferability, explicitly reflecting the model's ranking capability. Furthermore, to mitigate anisotropy and incorporate training dynamics, we adaptively scale isotropic sentence embeddings to yield an accurate expected rank score. Our resulting method, Adaptive Ranking Transferability (AiRTran), can effectively capture subtle differences between models. On challenging model selection scenarios across various text ranking datasets, it demonstrates significant improvements over previous classification-oriented TE methods, human intuition, and ChatGPT with minor time consumption., Comment: Accepted by EMNLP 2024 main conference
- Published
- 2024
43. Bridging the Gap: GRB 230812B -- A Three-Second Supernova-Associated Burst Detected by the GRID Mission
- Author
-
Wang, Chen-Yu, Yin, Yi-Han Iris, Zhang, Bin-Bin, Feng, Hua, Zeng, Ming, Xiong, Shao-Lin, Pan, Xiao-Fan, Yang, Jun, Zhang, Yan-Qiu, Li, Chen, Yan, Zhen-Yu, Wang, Chen-Wei, Zheng, Xu-Tao, Liu, Jia-Cong, Wang, Qi-Dong, Yang, Zi-Rui, Li, Long-Hao, Liu, Qi-Ze, Zhao, Zheng-Yang, Hu, Bo, Liu, Yi-Qi, Lu, Si-Yuan, Luo, Zi-You, Cang, Ji-Rong, Cao, De-Zhi, Han, Wen-Tao, Jia, Li-Ping, Pan, Xing-Yu, Tian, Yang, Xu, Ben-Da, Yang, Xiao, and Zeng, Zhi
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
GRB 230812B, detected by the Gamma-Ray Integrated Detectors (GRID) constellation mission, is an exceptionally bright gamma-ray burst (GRB) with a duration of only 3 seconds. Sitting near the traditional boundary ($\sim$ 2 s) between long and short GRBs, GRB 230812B is notably associated with a supernova (SN), indicating a massive star progenitor. This makes it a rare example of a short-duration GRB resulting from stellar collapse. Our analysis, using a time-evolving synchrotron model, suggests that the burst has an emission radius of approximately $10^{14.5}$~cm. We propose that the short duration of GRB 230812B is due to the combined effects of the central engine's activity time and the time required for the jet to break through the stellar envelope. Our findings provide another case that challenges the conventional view that short-duration GRBs originate exclusively from compact object mergers, demonstrating that a broader range of durations exists for GRBs arising from the collapse of massive stars., Comment: 10 pages, 3 tables, 11 figures
- Published
- 2024
44. Development and Testing of a Vine Robot for Urban Search and Rescue in Confined Rubble Environments
- Author
-
Zhou, Zheyu, Wang, Yaqing, Hawkes, Elliot W., and Li, Chen
- Subjects
Computer Science - Robotics - Abstract
The request for fast response and safe operation after natural and man-made disasters in urban environments has spurred the development of robotic systems designed to assist in search and rescue operations within complex rubble sites. Traditional Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) face significant limitations in such confined and obstructed environments. This paper introduces a novel vine robot designed to navigate dense rubble, drawing inspiration from natural growth mechanisms found in plants. Unlike conventional robots, vine robots are soft robots that can grow by everting their material, allowing them to navigate through narrow spaces and obstacles. The prototype presented in this study incorporates pneumatic muscles for steering and oscillation, an equation-based robot length control plus feedback pressure regulating system for extending and retracting the robot body. We conducted a series of controlled experiments in an artificial rubble testbed to assess the robot performance under varying environmental conditions and robot parameters, including volume ratio, environmental weight, oscillation, and steering. The results show that the vine robot can achieve significant penetration depths in cluttered environments with mixed obstacle sizes and weights, and can maintain repeated trajectories, demonstrating potential for mapping and navigating complex underground paths. Our findings highlight the suitability of the vine robot for urban search and rescue missions, with further research planned to enhance its robustness and deployability in real-world scenarios., Comment: Upon further review, this research was conducted as part of a short-term project, and in hindsight, it does not offer the level of depth and exhaustiveness necessary for a complete study. It would be in the best interest of the academic community to withdraw the paper at this time.
- Published
- 2024
45. Explorations in Designing Virtual Environments for Remote Counselling
- Author
-
Cao, Jiashuo, Gao, Wujie, Pai, Yun Suen, Hoermann, Simon, Li, Chen, Baghaei, Nilufar, and Billinghurst, Mark
- Subjects
Computer Science - Human-Computer Interaction - Abstract
The advent of technology-enhanced interventions has significantly transformed mental health services, offering new opportunities for delivering psychotherapy, particularly in remote settings. This paper reports on a pilot study exploring the use of Virtual Reality (VR) as a medium for remote counselling. The study involved four experienced psychotherapists who evaluated three different virtual environments designed to support remote counselling. Through thematic analysis of interviews and feedback, we identified key factors that could be critical for designing effective virtual environments for counselling. These include the creation of clear boundaries, customization to meet specific therapeutic needs, and the importance of aligning the environment with various therapeutic approaches. Our findings suggest that VR can enhance the sense of presence and engagement in remote therapy, potentially improving the therapeutic relationship. In the paper we also outline areas for future research based on these pilot study results.
- Published
- 2024
46. Spatial Diffusion for Cell Layout Generation
- Author
-
Li, Chen, Hu, Xiaoling, Abousamra, Shahira, Xu, Meilong, and Chen, Chao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Generative models, such as GANs and diffusion models, have been used to augment training sets and boost performances in different tasks. We focus on generative models for cell detection instead, i.e., locating and classifying cells in given pathology images. One important information that has been largely overlooked is the spatial patterns of the cells. In this paper, we propose a spatial-pattern-guided generative model for cell layout generation. Specifically, a novel diffusion model guided by spatial features and generates realistic cell layouts has been proposed. We explore different density models as spatial features for the diffusion model. In downstream tasks, we show that the generated cell layouts can be used to guide the generation of high-quality pathology images. Augmenting with these images can significantly boost the performance of SOTA cell detection methods. The code is available at https://github.com/superlc1995/Diffusion-cell., Comment: 12 pages, 4 figures, accepted by MICCAI 2024
- Published
- 2024
47. Note on Dirac monopole theory and Berry geometric phase
- Author
-
Zhao, Li-Chen
- Subjects
Quantum Physics ,Mathematical Physics - Abstract
We discuss the intrinsic relations between Dirac monopole theory and Berry geometric phases. We demonstrate that the existence of Dirac strings with endpoints brings non-integrable phase factors in the parameters space. We choose one of the simplest two-mode Hamilton model to visualize Dirac string and its endpoint of a wave function, based on its eigenstates. The geometric phase variation around an arbitrary circle can be calculated explicitly according to Dirac's picture, where the well-known Berry connection and curvature can be derived directly by performing Dirac monopole theory in the parameters space. The correspondence between the endpoints of Dirac strings and the accident degenerated points of eigenvalues are clearly shown for the Hermitian systems. These results suggest that Berry phase can be seen as the non-integrable phase factor induced by Dirac strings with endpoints in the parameters space, and would motivate more studies on geometric phase by performing or extending Dirac monopole theory.
- Published
- 2024
48. Novice Writers and Scholarly Publication: Authors, Mentors, Gatekeepers ed. by Pejman Habibie and Ken Hyland (review)
- Author
-
Mao, Zhicheng and Li, Chen
- Published
- 2019
49. Intestinal Symptoms among Children Aged 2-7 Years with Autism Spectrum Disorder in 13 Cities of China
- Author
-
Ting Yang, Qian Zhang, Li Chen, Ying Dai, Fei-Yong Jia, Yan Hao, Ling Li, Jie Zhang, Li-Jie Wu, Xiao-Yan Ke, Ming-Ji Yi, Qi Hong, Jin-Jin Chen, Shuan-Feng Fang, Yi-Chao Wang, Qi Wang, Chun-Hua Jin, Jie Chen, and Ting-Yu Li
- Abstract
Background: Autism spectrum disorder (ASD) is a multifactorial, pervasive, neurodevelopmental disorder, of which intestinal symptoms collectively represent one of the most common comorbidities. Methods: In this study, 1,222 children with ASD and 1,206 typically developing (TD) children aged 2-7 years were enrolled from 13 cities in China. Physical measurement and basic information questionnaires were conducted in ASD and TD children. The Childhood Autism Rating Scale (CARS), Social Responsiveness Scale (SRS), and Autism Behavior Checklist (ABC) were used to evaluate the clinical symptoms of children with ASD. The six-item Gastrointestinal Severity Index (6-GSI) was used to evaluate the prevalence of intestinal symptoms in two groups. Results: The detection rates of constipation, stool odor, and total intestinal symptoms in ASD children were significantly higher than those in TD children (40.098% vs. 25.622%, 17.021% vs. 9.287%, and 53.601% vs. 41.294%, respectively). Autistic children presenting with intestinal comorbidity had significantly higher scores on the ABC, SRS, CARS, and multiple subscales than autistic children without intestinal symptoms, suggesting that intestinal comorbidity may exacerbates the core symptoms of ASD children. Conclusion: Intestinal dysfunction was significantly more common in autistic than in TD children. This dysfunction may aggravate the core symptoms of children with ASD.
- Published
- 2024
- Full Text
- View/download PDF
50. Irish Literature in China
- Author
-
Li, Chen
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.