377,151 results on '"Pang, A"'
Search Results
2. Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
- Author
-
Pang, Bowen, Li, Kai, and Wang, Feifan
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The increasing adoption of large language models (LLMs) necessitates inference serving systems that can deliver both high throughput and low latency. Deploying LLMs with hundreds of billions of parameters on memory-constrained GPUs exposes significant limitations in static batching methods. Current inference serving systems often treat batch sizes as fixed hyper-parameters, hindering real-time adaptation to varying system conditions. In this paper, we propose a dynamic batching method that continuously monitors memory utilization and adheres to service-level agreements (SLAs) to enable real-time batch size configuration adjustment. The method comprises two core components: a memory-aware batch scheduler that dynamically allocates GPU resources and a latency feedback mechanism that optimizes decoding processes under SLA constraints. The numerical experiments demonstrate throughput gains of 8% to 28% and capacity improvements of 22% compared to traditional static batching methods, while maintaining full compatibility with existing inference infrastructure. These results highlight the effectiveness of dynamic batching in balancing computational efficiency and quality-of-service requirements for contemporary LLM deployment scenarios. The source code of this work is publicly available at https://github.com/KevinLee1110/dynamic-batching.
- Published
- 2025
3. From Voice to Safety: Language AI Powered Pilot-ATC Communication Understanding for Airport Surface Movement Collision Risk Assessment
- Author
-
Pang, Yutian, Kendall, Andrew Paul, Porcayo, Alex, Barsotti, Mariah, Jain, Anahita, and Clarke, John-Paul
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
This work integrates language AI-based voice communication understanding with collision risk assessment. The proposed framework consists of two major parts, (a) Automatic Speech Recognition (ASR); (b) surface collision risk modeling. ASR module generates information tables by processing voice communication transcripts, which serve as references for producing potential taxi plans and calculating the surface movement collision risk. For ASR, we collect and annotate our own Named Entity Recognition (NER) dataset based on open-sourced video recordings and safety investigation reports. Additionally, we refer to FAA Order JO 7110.65W and FAA Order JO 7340.2N to get the list of heuristic rules and phase contractions of communication between the pilot and the Air Traffic Controller (ATCo) used in daily aviation operations. Then, we propose the novel ATC Rule-Enhanced NER method, which integrates the heuristic rules into the model training and inference stages, resulting into hybrid rule-based NER model. We show the effectiveness of this hybrid approach by comparing different setups with different token-level embedding models. For the risk modeling, we adopt the node-link airport layout graph from NASA FACET and model the aircraft taxi speed at each link as a log-normal distribution and derive the total taxi time distribution. Then, we propose a spatiotemporal formulation of the risk probability of two aircraft moving across potential collision nodes during ground movement. We show the effectiveness of our approach by simulating two case studies, (a) the Henada airport runway collision accident happened in January 2024; (b) the KATL taxiway collision happened in September 2024. We show that, by understanding the pilot-ATC communication transcripts and analyzing surface movement patterns, the proposed model improves airport safety by providing risk assessment in time.
- Published
- 2025
4. Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
- Author
-
Lu, Liming, Pang, Shuchao, Liang, Siyuan, Zhu, Haotian, Zeng, Xiyu, Liu, Aishan, Liu, Yunhuai, and Zhou, Yongbin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Multimodal large language models (MLLMs) have made remarkable strides in cross-modal comprehension and generation tasks. However, they remain vulnerable to jailbreak attacks, where crafted perturbations bypass security guardrails and elicit harmful outputs. In this paper, we present the first adversarial training (AT) paradigm tailored to defend against jailbreak attacks during the MLLM training phase. Extending traditional AT to this domain poses two critical challenges: efficiently tuning massive parameters and ensuring robustness against attacks across multiple modalities. To address these challenges, we introduce Projection Layer Against Adversarial Training (ProEAT), an end-to-end AT framework. ProEAT incorporates a projector-based adversarial training architecture that efficiently handles large-scale parameters while maintaining computational feasibility by focusing adversarial training on a lightweight projector layer instead of the entire model; additionally, we design a dynamic weight adjustment mechanism that optimizes the loss function's weight allocation based on task demands, streamlining the tuning process. To enhance defense performance, we propose a joint optimization strategy across visual and textual modalities, ensuring robust resistance to jailbreak attacks originating from either modality. Extensive experiments conducted on five major jailbreak attack methods across three mainstream MLLMs demonstrate the effectiveness of our approach. ProEAT achieves state-of-the-art defense performance, outperforming existing baselines by an average margin of +34% across text and image modalities, while incurring only a 1% reduction in clean accuracy. Furthermore, evaluations on real-world embodied intelligent systems highlight the practical applicability of our framework, paving the way for the development of more secure and reliable multimodal systems.
- Published
- 2025
5. Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
- Author
-
Ji, Zhong, Cao, Weilong, Zhang, Yan, Pang, Yanwei, Han, Jungong, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models has emerged as a powerful framework for tasks like image controllable generation and dense prediction. However, existing models often struggle to capture underlying semantics (e.g., edges, textures, shapes) and effectively utilize in-context learning, limiting their contextual understanding and image generation quality. Additionally, high computational costs and slow inference speeds hinder their real-time applicability. To address these challenges, we propose Underlying Semantic Diffusion (US-Diffusion), an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities on multi-task scenarios. We introduce Separate & Gather Adapter (SGA), which decouples input conditions for different tasks while sharing the architecture, enabling better in-context learning and generalization across diverse visual domains. We also present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details and dynamically adapting to task-specific contextual cues. Furthermore, we propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels, which aims at optimizing training and inference efficiency while maintaining strong in-context learning performance. Experimental results demonstrate that US-Diffusion outperforms the state-of-the-art method, achieving an average reduction of 7.47 in FID on Map2Image tasks and an average reduction of 0.026 in RMSE on Image2Map tasks, while achieving approximately 9.45 times faster inference speed. Our method also demonstrates superior training efficiency and in-context learning capabilities, excelling in new datasets and tasks, highlighting its robustness and adaptability across diverse visual domains.
- Published
- 2025
6. Video Super-Resolution: All You Need is a Video Diffusion Model
- Author
-
Zhan, Zhihao, Pang, Wang, Zhu, Xiang, and Bai, Yechao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
We present a generic video super-resolution algorithm in this paper, based on the Diffusion Posterior Sampling framework with an unconditional video generation model in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment. Furthermore, a single instance of the proposed video diffusion transformer model can adapt to different sampling conditions without re-training. Due to limited computational resources and training data, our experiments provide empirical evidence of the algorithm's strong super-resolution capabilities using synthetic data.
- Published
- 2025
7. Large-Scale Data Selection for Instruction Tuning
- Author
-
Ivison, Hamish, Zhang, Muru, Brahman, Faeze, Koh, Pang Wei, and Dasigi, Pradeep
- Subjects
Computer Science - Computation and Language - Abstract
Selecting high-quality training data from a larger pool is a crucial step when instruction-tuning language models, as carefully curated datasets often produce models that outperform those trained on much larger, noisier datasets. Automated data selection approaches for instruction-tuning are typically tested by selecting small datasets (roughly 10k samples) from small pools (100-200k samples). However, popular deployed instruction-tuned models often train on hundreds of thousands to millions of samples, subsampled from even larger data pools. We present a systematic study of how well data selection methods scale to these settings, selecting up to 2.5M samples from pools of up to 5.8M samples and evaluating across 7 diverse tasks. We show that many recently proposed methods fall short of random selection in this setting (while using more compute), and even decline in performance when given access to larger pools of data to select over. However, we find that a variant of representation-based data selection (RDS+), which uses weighted mean pooling of pretrained LM hidden states, consistently outperforms more complex methods across all settings tested -- all whilst being more compute-efficient. Our findings highlight that the scaling properties of proposed automated selection methods should be more closely examined. We release our code, data, and models at https://github.com/hamishivi/automated-instruction-selection.
- Published
- 2025
8. Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge
- Author
-
Liu, Xiao, Wu, Zirui, Li, Jiayi, Shao, Zhicheng, Pang, Xun, and Feng, Yansong
- Subjects
Computer Science - Computation and Language ,Computer Science - Social and Information Networks - Abstract
Longitudinal network data are essential for analyzing political, economic, and social systems and processes. In political science, these datasets are often generated through human annotation or supervised machine learning applied to evolving corpora. However, as semantic contexts shift over time, inferring dynamic interaction types on emerging issues among a diverse set of entities poses significant challenges, particularly in maintaining timely and consistent annotations. This paper presents the Expert-Augmented LLM Annotation (EALA) approach, which leverages Large Language Models (LLMs) in combination with historically annotated data and expert-constructed codebooks to extrapolate and extend datasets into future periods. We evaluate the performance and reliability of EALA using a dataset of climate negotiations. Our findings demonstrate that EALA effectively predicts nuanced interactions between negotiation parties and captures the evolution of topics over time. At the same time, we identify several limitations inherent to LLM-based annotation, highlighting areas for further improvement. Given the wide availability of codebooks and annotated datasets, EALA holds substantial promise for advancing research in political science and beyond., Comment: Work in progress, presented at the 2025 Asian PolMeth Conference
- Published
- 2025
9. CAO-RONet: A Robust 4D Radar Odometry with Exploring More Information from Low-Quality Points
- Author
-
Li, Zhiheng, Cui, Yubo, Huang, Ningyuan, Pang, Chenglin, and Fang, Zheng
- Subjects
Computer Science - Robotics - Abstract
Recently, 4D millimetre-wave radar exhibits more stable perception ability than LiDAR and camera under adverse conditions (e.g. rain and fog). However, low-quality radar points hinder its application, especially the odometry task that requires a dense and accurate matching. To fully explore the potential of 4D radar, we introduce a learning-based odometry framework, enabling robust ego-motion estimation from finite and uncertain geometry information. First, for sparse radar points, we propose a local completion to supplement missing structures and provide denser guideline for aligning two frames. Then, a context-aware association with a hierarchical structure flexibly matches points of different scales aided by feature similarity, and improves local matching consistency through correlation balancing. Finally, we present a window-based optimizer that uses historical priors to establish a coupling state estimation and correct errors of inter-frame matching. The superiority of our algorithm is confirmed on View-of-Delft dataset, achieving around a 50% performance improvement over previous approaches and delivering accuracy on par with LiDAR odometry. Our code will be available., Comment: 7 pages, 7 figures
- Published
- 2025
10. Integrated Computation and Communication with Fiber-optic Transmissions
- Author
-
Zhang, Jiahao, Zhang, Lu, Pang, Xiaodan, Ozolins, Oskars, Zhang, Qun, and Yu, Xianbin
- Subjects
Physics - Optics ,Computer Science - Machine Learning - Abstract
Fiber-optic transmission systems are leveraged not only as high-speed communication channels but also as nonlinear kernel functions for machine learning computations, enabling the seamless integration of computational intelligence and communication.
- Published
- 2025
11. Semantic-ICP: Iterative Closest Point for Non-rigid Multi-Organ Point Cloud Registration
- Author
-
Chen, Wanwen, Studders, Carson, Kwon, Jamie J. Y., Pang, Emily H. T., Prisman, Eitan, and Salcudean, Septimiu E.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Point cloud registration is important in computer-aided interventions (CAI). While learning-based point cloud registration methods have been developed, their clinical application is hampered by issues of generalizability and explainability. Therefore, classical point cloud registration methods, such as Iterative Closest Point (ICP), are still widely applied in CAI. ICP methods fail to consider that: (1) the points have well-defined semantic meaning, in that each point can be related to a specific anatomical label; (2) the deformation needs to follow biomechanical energy constraints. In this paper, we present a novel semantic ICP (sem-ICP) method that handles multiple point labels and uses linear elastic energy regularization. We use semantic labels to improve the robustness of the closest point matching and propose a new point cloud deformation representation to apply explicit biomechanical energy regularization. Our experiments on the Learn2reg abdominal MR-CT registration dataset and a trans-oral robotic surgery ultrasound-CT registration dataset show that our method improves the Hausdorff distance compared with other state-of-the-art ICP-based registration methods. We also perform a sensitivity study to show that our rigid initialization achieves better convergence with different initializations and visible ratios., Comment: 10 pages, 3 figures, submitted to MICCAI 2025
- Published
- 2025
12. Hybrid Fiber-Based Radio Frequency Distribution and Vibration Detection System Tailored for Large Radio Arrays
- Author
-
Dai, Hongfei, Li, Wenlin, Pang, Zhongwang, Li, Chunyi, Song, Dongqi, Ww, Tong, and Wang, Bo
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Physics - Instrumentation and Detectors - Abstract
Radio telescope arrays, such as Square Kilometre Array (SKA) and next-generation Very Large Array (ngVLA), require highly precise synchronization of time-frequency references to ensure high-quality observational data. Fiber-based frequency distribution systems are highly effective. However, their proper functioning can be threatened by risk events. In this paper, we propose a hybrid fiber-based frequency-distribution and vibration detection system tailored for large radio arrays. The system ensures the performance of distributed frequency signals while allowing for the monitoring of potential threats to the optical fiber network. We design and implement a single-to-multiple hybrid system, conducting tests via a 55-km fiber link. Experimental results demonstrate its effectiveness, achieving the relative frequency stability of 3E-14/1 s and 2.7E-17/1E5 s, along with vibration detection and localization capabilities., Comment: 12 pages, 6 figures
- Published
- 2025
13. Unnatural Languages Are Not Bugs but Features for LLMs
- Author
-
Duan, Keyu, Zhao, Yiran, Feng, Zhili, Ni, Jinjie, Pang, Tianyu, Liu, Qian, Cai, Tianle, Dou, Longxu, Kawaguchi, Kenji, Goyal, Anirudh, Kolter, J. Zico, and Shieh, Michael Qizhe
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts, often viewed as a bug for aligned LLMs. In this work, we present a systematic investigation challenging this perception, demonstrating that unnatural languages - strings that appear incomprehensible to humans but maintain semantic meanings for LLMs - contain latent features usable by models. Notably, unnatural languages possess latent features that can be generalized across different models and tasks during inference. Furthermore, models fine-tuned on unnatural versions of instruction datasets perform on-par with those trained on natural language, achieving 49.71 win rates in Length-controlled AlpacaEval 2.0 in average across various base models. In addition, through comprehensive analysis, we demonstrate that LLMs process unnatural languages by filtering noise and inferring contextual meaning from filtered words.
- Published
- 2025
14. Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
- Author
-
Arora, Siddhant, Lu, Zhiyun, Chiu, Chung-Cheng, Pang, Ruoming, and Watanabe, Shinji
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The recent wave of audio foundation models (FMs) could provide new capabilities for conversational modeling. However, there have been limited efforts to evaluate these audio FMs comprehensively on their ability to have natural and interactive conversations. To engage in meaningful conversation with the end user, we would want the FMs to additionally perform a fluent succession of turns without too much overlapping speech or long stretches of silence. Inspired by this, we ask whether the recently proposed audio FMs can understand, predict, and perform turn-taking events? To answer this, we propose a novel evaluation protocol that can assess spoken dialog system's turn-taking capabilities using a supervised model as a judge that has been trained to predict turn-taking events in human-human conversations. Using this protocol, we present the first comprehensive user study that evaluates existing spoken dialogue systems on their ability to perform turn-taking events and reveal many interesting insights, such as they sometimes do not understand when to speak up, can interrupt too aggressively and rarely backchannel. We further evaluate multiple open-source and proprietary audio FMs accessible through APIs on carefully curated test benchmarks from Switchboard to measure their ability to understand and predict turn-taking events and identify significant room for improvement. We will open source our evaluation platform to promote the development of advanced conversational AI systems., Comment: Accepted at ICLR 2025
- Published
- 2025
15. DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning
- Author
-
Pang, Teng, Wang, Bingzheng, Wu, Guoqiang, and Yin, Yilong
- Subjects
Computer Science - Machine Learning - Abstract
Offline preference-based reinforcement learning (PbRL) mitigates the need for reward definition, aligning with human preferences via preference-driven reward feedback without interacting with the environment. However, the effectiveness of preference-driven reward functions depends on the modeling ability of the learning model, which current MLP-based and Transformer-based methods may fail to adequately provide. To alleviate the failure of the reward function caused by insufficient modeling, we propose a novel preference-based reward acquisition method: Diffusion Preference-based Reward (DPR). Unlike previous methods using Bradley-Terry models for trajectory preferences, we use diffusion models to directly model preference distributions for state-action pairs, allowing rewards to be discriminatively obtained from these distributions. In addition, considering the particularity of preference data that only know the internal relationships of paired trajectories, we further propose Conditional Diffusion Preference-based Reward (C-DPR), which leverages relative preference information to enhance the construction of the diffusion model. We apply the above methods to existing offline reinforcement learning algorithms and a series of experiment results demonstrate that the diffusion-based reward acquisition approach outperforms previous MLP-based and Transformer-based methods.
- Published
- 2025
16. Effects of Initial Nucleon-Nucleon Correlations on Light Nuclei Production in Au+Au Collisions at $\sqrt{s_\mathrm{NN}} = 3\ $ GeV
- Author
-
Lin, Qian-Ru, Huang, Yu-Jing, Pang, Long-Gang, Luo, Xiao-Feng, and Wang, Xin-Nian
- Subjects
High Energy Physics - Phenomenology ,Nuclear Theory - Abstract
Light nuclei production in heavy-ion collisions serves as a sensitive probe of the QCD phase structure. In coalescence models, triton ($N_t$) and deuteron ($N_d$) yields depend on the spatial separation of nucleon pairs ($\Delta r$) in Wigner functions, yet the impact of initial two-nucleon correlations $\rho(\Delta r)$ remains underexplored. We develop a method to sample nucleons in $^{197}$Au nuclei that simultaneously satisfies both the single-particle distribution $f(r)$ and the two-nucleon correlation $\rho(\Delta r)$. Using these nuclei, we simulate Au+Au collisions at $\sqrt{s_\mathrm{NN}}=3$ GeV via the SMASH transport model (mean-field mode) to calculate proton, deuteron, and triton yields. Simulations reveal a 36% enhancement in mid-rapidity deuteron yields across all centrality ranges and a 33% rise in mid-rapidity triton production for 0-10% central collisions. Calculated transverse momentum of light nuclei aligns with STAR data. We further analyze impacts of baryon conservation, spectator exclusion, and centrality determination via charged multiplicity. Notably, observed discrepancies in the double yield ratio suggest unaccounted physical mechanisms, such as critical fluctuations or inaccuracies in coalescence parameters or light nuclei cross-sections. This underscores the critical role of initial nucleon-nucleon correlations, linking microscopic nuclear structure to intermediate-energy collision dynamics., Comment: 15 pages, 16 figures
- Published
- 2025
17. Large Language Model-Based Benchmarking Experiment Settings for Evolutionary Multi-Objective Optimization
- Author
-
Pang, Lie Meng and Ishibuchi, Hisao
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
When we manually design an evolutionary optimization algorithm, we implicitly or explicitly assume a set of target optimization problems. In the case of automated algorithm design, target optimization problems are usually explicitly shown. Recently, the use of large language models (LLMs) for the design of evolutionary multi-objective optimization (EMO) algorithms have been examined in some studies. In those studies, target multi-objective problems are not always explicitly shown. It is well known in the EMO community that the performance evaluation results of EMO algorithms depend on not only test problems but also many other factors such as performance indicators, reference point, termination condition, and population size. Thus, it is likely that the designed EMO algorithms by LLMs depends on those factors. In this paper, we try to examine the implicit assumption about the performance comparison of EMO algorithms in LLMs. For this purpose, we ask LLMs to design a benchmarking experiment of EMO algorithms. Our experiments show that LLMs often suggest classical benchmark settings: Performance examination of NSGA-II, MOEA/D and NSGA-III on ZDT, DTLZ and WFG by HV and IGD under the standard parameter specifications.
- Published
- 2025
18. Universal electronic structure of layered nickelates via oxygen-centered planar orbitals
- Author
-
Au-Yeung, Christine C., Chen, X., Smit, S., Bluschke, M., Zimmermann, V., Michiardi, M., Moen, P., Kraan, J., Pang, C. S. B., Suen, C. T., Zhdanovich, S., Zonno, M., Gorovikov, S., Liu, Y., Levy, G., Elfimov, I. S., Berciu, M., Sawatzky, G. A., Mitchell, J. F., and Damascelli, A.
- Subjects
Condensed Matter - Superconductivity - Abstract
Superconductivity has recently been demonstrated in La$_3$Ni$_2$O$_7$ up to 91K under moderate pressure in bulk crystals, and up to 48K at ambient pressure in thin films grown under compressive strain. Key questions remain open regarding the crystal structure and low-energy electronic states that support superconductivity in these compounds. Here we take advantage of the natural polymorphism between bilayer (2222) or alternating monolayer-trilayer (1313) stacking sequences that arises in bulk La$_3$Ni$_2$O$_7$ crystals to identify universal features in this family of materials. Employing angle-resolved photoemission spectroscopy (ARPES) we observe the fingerprint of a spin-density wave (SDW) instability, strong and coherent enough to modify the electronic structure. We demonstrate that this feature is a `translated' $\beta$ Fermi surface associated with a scattering vector $Q_{t\beta}$ which matches the $Q_{SDW}$ detected by neutron and x-ray scattering experiments. This observation provides an important link between surface and bulk probes, and demonstrates a universal connection between magnetism and fermiology in La$_3$Ni$_2$O$_7$ as well as La$_4$Ni$_3$O$_{10}$. We simulate the spectral weight distribution observed in our ARPES dichroism experiments and establish that the low-energy electronic phenomenology is dominated by oxygen-centered planar orbitals, which -- upon moving along the Fermi surface away from the Ni-O-Ni bond directions -- evolve from the $d_{3x^2-r^2}$ and $d_{3y^2-r^2}$ symmetry characteristic of 3-spin polarons to the familiar $d_{x^2-y^2}$ Zhang-Rice singlets that support high-temperature superconductivity in cuprates. Despite the multiorbital nature of the nickelates, our work establishes an empirical correspondence between the low-energy electronic structure of cuprates and nickelates, thus suggesting a common origin for their unconventional superconductivity., Comment: 13 pages, 4 figures
- Published
- 2025
19. Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization
- Author
-
Yang, Lujie, Suh, H. J. Terry, Zhao, Tong, Graesdal, Bernhard Paus, Kelestemur, Tarik, Wang, Jiuguang, Pang, Tao, and Tedrake, Russ
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.
- Published
- 2025
20. HVI: A New Color Space for Low-light Image Enhancement
- Author
-
Yan, Qingsen, Feng, Yixu, Zhang, Cheng, Pang, Guansong, Shi, Kangbiao, Wu, Peng, Dong, Wei, Sun, Jinqiu, and Zhang, Yanning
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space helps resolve the brightness issue, it introduces significant red and black noise artifacts. To address this issue, we propose a new color space for LLIE, namely Horizontal/Vertical-Intensity (HVI), defined by polarized HS maps and learnable intensity. The former enforces small distances for red coordinates to remove the red artifacts, while the latter compresses the low-light regions to remove the black artifacts. To fully leverage the chromatic and intensity information, a novel Color and Intensity Decoupling Network (CIDNet) is further introduced to learn accurate photometric mapping function under different lighting conditions in the HVI space. Comprehensive results from benchmark and ablation experiments show that the proposed HVI color space with CIDNet outperforms the state-of-the-art methods on 10 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet., Comment: Qingsen Yan, Yixu Feng, and Cheng Zhang contributed equally to this work
- Published
- 2025
21. Accessibility for Whom? Perceptions of Sidewalk Barriers Across Disability Groups and Implications for Designing Personalized Maps
- Author
-
Li, Chu, Pang, Rock Yuren, Labbé, Delphine, Eisenberg, Yochai, Hosseini, Maryam, and Froehlich, Jon E.
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Despite diverse mobility needs worldwide, existing mapping tools fail to address the varied experiences of different mobility device users. This paper presents a large-scale online survey exploring how five mobility groups -- users of canes, walkers, mobility scooters, manual wheelchairs, and motorized wheelchairs -- perceive sidewalk barriers. Using 52 sidewalk barrier images, respondents evaluated their confidence in navigating each scenario. Our findings (N=190) reveal variations in barrier perceptions across groups, while also identifying shared concerns. To further demonstrate the value of this data, we showcase its use in two custom prototypes: a visual analytics tool and a personalized routing tool. Our survey findings and open dataset advance work in accessibility-focused maps, routing algorithms, and urban planning., Comment: Manuscript accepted at CHI'25
- Published
- 2025
- Full Text
- View/download PDF
22. Assessing Autonomous Inspection Regimes: Active Versus Passive Satellite Inspection
- Author
-
Aurand, Joshua, Pang, Christopher, Mokhtar, Sina, Lei, Henry, Cutlip, Steven, and Phillips, Sean
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
This paper addresses the problem of satellite inspection, where one or more satellites (inspectors) are tasked with imaging or inspecting a resident space object (RSO) due to potential malfunctions or anomalies. Inspection strategies are often reduced to a discretized action space with predefined waypoints, facilitating tractability in both classical optimization and machine learning based approaches. However, this discretization can lead to suboptimal guidance in certain scenarios. This study presents a comparative simulation to explore the tradeoffs of passive versus active strategies in multi-agent missions. Key factors considered include RSO dynamic mode, state uncertainty, unmodeled entrance criteria, and inspector motion types. The evaluation is conducted with a focus on fuel utilization and surface coverage. Building on a Monte-Carlo based evaluator of passive strategies and a reinforcement learning framework for training active inspection policies, this study investigates conditions under which passive strategies, such as Natural Motion Circumnavigation (NMC), may perform comparably to active strategies like Reinforcement Learning based waypoint transfers.
- Published
- 2025
- Full Text
- View/download PDF
23. Some permutation polynomials via linear translators
- Author
-
Pang, Xuan, Yuan, Pingzhi, and Li, Hongjian
- Subjects
Mathematics - Number Theory ,11C08, 12E10 - Abstract
Permutation polynomials with explicit constructions over finite fields have long been a topic of great interest in number theory. In recent years, by applying linear translators of functions from $\mathbb{F}_{q^n}$ to $\mathbb{F}_q$, many scholars constructed some classes of permutation polynomials. Motivated by previous works, we first naturally extend the notion of linear translators and then construct some permutation polynomials., Comment: Finite field; permutation polynomial; linear translator; additive polynomial
- Published
- 2025
24. An Accurate Computational Approach for Partial Likelihood Using Poisson-Binomial Distributions
- Author
-
Cho, Youngjin, Hong, Yili, and Du, Pang
- Subjects
Statistics - Methodology - Abstract
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.
- Published
- 2025
25. First-principles Investigation of Exceptional Coarsening-resistant V-Sc(Al2Cu)4 Nanoprecipitates in Al-Cu-Mg-Ag-Sc Alloys
- Author
-
Bai, Junyuan, Xue, Hao, Li, Jiaming, Pang, Xueyong, Zhao, Zhihao, Liu, Gang, and Qin, Gaowu
- Subjects
Condensed Matter - Materials Science - Abstract
Aluminum-copper-magnesium-sliver (Al-Cu-Mg-Ag) alloys are extensively utilized in aerospace industries due to the formation of Omega nano-plates.However, the rapid coarsening of these nano-plates above 475 K restricts their application at elevated temperatures.When introducing scandium (Sc) to these alloys, the service temperature of the resultant alloys can reach an unprecedented 675 K, attributed to the in situ formation of a coarsening-resistant V-Sc(Al2Cu)4 phase within the Omega nano-plates. However, the fundamental thermodynamic properties and mechanisms behind the remarkable coarsening resistance of V nano-plates remain unexplored.Here, we employ first-principles calculations to investigate the phase stability of V-Sc(Al2Cu)4 phase, the basic kinetic features of V phase formation within Omega nano-plates, and the origins of the extremely high thermal stability of V nano-plates. Our results indicate that V-Sc(Al2Cu)4 is meta-stable and thermodynamically tends to evolve into a stable ScAl7Cu5 phase. We also demonstrate that kinetic factors are mainly responsible for the temperature dependence of V phase formation. Notably, the formation of V-Sc(Al2Cu)4 within Omega nano-plates modifies the Kagome lattice in the shell layer of the Omega nano-plates, inhibiting further thickening of V nano-plates through the thickening pathway of Omega nano-plates. This interface transition leads to the exceptional coarsening resistance of the V nano-plates. Moreover, we also screened 14 promising element substitutions for Sc. These findings are anticipated to accelerate the development of high-performance Al alloys with superior heat resistance.
- Published
- 2025
26. S4S: Solving for a Diffusion Model Solver
- Author
-
Frankel, Eric, Chen, Sitan, Li, Jerry, Koh, Pang Wei, Ratliff, Lillian J., and Oh, Sewoong
- Subjects
Computer Science - Machine Learning - Abstract
Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a $1.5\times$ improvement over previous training-free ODE methods.
- Published
- 2025
27. LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
- Author
-
Yang, Penghui, Du, Cunxiao, Zhang, Fengzhuo, Wang, Haonan, Pang, Tianyu, Du, Chao, and An, Bo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Speculative decoding has become a promising technique to mitigate the high inference latency of autoregressive decoding in Large Language Models (LLMs). Despite its promise, the effective application of speculative decoding in LLMs still confronts three key challenges: the increasing memory demands of the draft model, the distribution shift between the short-training corpora and long-context inference, and inefficiencies in attention implementation. In this work, we enhance the performance of speculative decoding in long-context settings by addressing these challenges. First, we propose a memory-efficient draft model with a constant-sized Key-Value (KV) cache. Second, we introduce novel position indices for short-training data, enabling seamless adaptation from short-context training to long-context inference. Finally, we present an innovative attention aggregation method that combines fast implementations for prefix computation with standard attention for tree mask handling, effectively resolving the latency and memory inefficiencies of tree decoding. Our approach achieves strong results on various long-context tasks, including repository-level code completion, long-context summarization, and o1-like long reasoning tasks, demonstrating significant improvements in latency reduction. The code is available at https://github.com/sail-sg/LongSpec.
- Published
- 2025
28. Applications of Large Models in Medicine
- Author
-
Su, YunHe, Lu, Zhengyang, Liu, Junhui, Pang, Ke, Dai, Haoran, Jia, Sa Liu Yuxin, Ge, Lujia, and Yang, Jing-min
- Subjects
Computer Science - Artificial Intelligence - Abstract
This paper explores the advancements and applications of large-scale models in the medical field, with a particular focus on Medical Large Models (MedLMs). These models, encompassing Large Language Models (LLMs), Vision Models, 3D Large Models, and Multimodal Models, are revolutionizing healthcare by enhancing disease prediction, diagnostic assistance, personalized treatment planning, and drug discovery. The integration of graph neural networks in medical knowledge graphs and drug discovery highlights the potential of Large Graph Models (LGMs) in understanding complex biomedical relationships. The study also emphasizes the transformative role of Vision-Language Models (VLMs) and 3D Large Models in medical image analysis, anatomical modeling, and prosthetic design. Despite the challenges, these technologies are setting new benchmarks in medical innovation, improving diagnostic accuracy, and paving the way for personalized healthcare solutions. This paper aims to provide a comprehensive overview of the current state and future directions of large models in medicine, underscoring their significance in advancing global health.
- Published
- 2025
- Full Text
- View/download PDF
29. A Split-Window Transformer for Multi-Model Sequence Spammer Detection using Multi-Model Variational Autoencoder
- Author
-
Yang, Zhou, Pang, Yucai, Yin, Hongbo, and Xiao, Yunpeng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia ,Computer Science - Social and Information Networks - Abstract
This paper introduces a new Transformer, called MS$^2$Dformer, that can be used as a generalized backbone for multi-modal sequence spammer detection. Spammer detection is a complex multi-modal task, thus the challenges of applying Transformer are two-fold. Firstly, complex multi-modal noisy information about users can interfere with feature mining. Secondly, the long sequence of users' historical behaviors also puts a huge GPU memory pressure on the attention computation. To solve these problems, we first design a user behavior Tokenization algorithm based on the multi-modal variational autoencoder (MVAE). Subsequently, a hierarchical split-window multi-head attention (SW/W-MHA) mechanism is proposed. The split-window strategy transforms the ultra-long sequences hierarchically into a combination of intra-window short-term and inter-window overall attention. Pre-trained on the public datasets, MS$^2$Dformer's performance far exceeds the previous state of the art. The experiments demonstrate MS$^2$Dformer's ability to act as a backbone.
- Published
- 2025
30. DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps
- Author
-
Zhu, Hongjie, Zhang, Zeyu, Pang, Guansong, Wang, Xu, Wen, Shimin, Bai, Yu, Ergu, Daji, Cai, Ying, and Zhao, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.
- Published
- 2025
31. GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer
- Author
-
Ye, Yufan, Pang, Pu, Zhang, Ting, and Huang, Hua
- Subjects
Computer Science - Information Retrieval ,Computer Science - Software Engineering - Abstract
Code retrieval is a crucial component in modern software development, particularly in large-scale projects. However, existing approaches relying on sequence-based models often fail to fully exploit the structural dependencies inherent in code, leading to suboptimal retrieval performance, particularly with structurally complex code fragments. In this paper, we introduce GNN-Coder, a novel framework based on Graph Neural Network (GNN) to utilize Abstract Syntax Tree (AST). We make the first attempt to study how GNN-integrated Transformer can promote the development of semantic retrieval tasks by capturing the structural and semantic features of code. We further propose an innovative graph pooling method tailored for AST, utilizing the number of child nodes as a key feature to highlight the intrinsic topological relationships within the AST. This design effectively integrates both sequential and hierarchical representations, enhancing the model's ability to capture code structure and semantics. Additionally, we introduce the Mean Angular Margin (MAM), a novel metric for quantifying the uniformity of code embedding distributions, providing a standardized measure of feature separability. The proposed method achieves a lower MAM, indicating a more discriminative feature representation. This underscores GNN-Coder's superior ability to distinguish between code snippets, thereby enhancing retrieval accuracy. Experimental results show that GNN-Coder significantly boosts retrieval performance, with a 1\%-10\% improvement in MRR on the CSN dataset, and a notable 20\% gain in zero-shot performance on the CosQA dataset.
- Published
- 2025
32. VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception
- Author
-
Ren, Junli, Huang, Tao, Wang, Huayi, Wang, Zirui, Ben, Qingwei, Pang, Jiangmiao, and Luo, Ping
- Subjects
Computer Science - Robotics - Abstract
The performance of legged locomotion is closely tied to the accuracy and comprehensiveness of state observations. Blind policies, which rely solely on proprioception, are considered highly robust due to the reliability of proprioceptive observations. However, these policies significantly limit locomotion speed and often require collisions with the terrain to adapt. In contrast, Vision policies allows the robot to plan motions in advance and respond proactively to unstructured terrains with an online perception module. However, perception is often compromised by noisy real-world environments, potential sensor failures, and the limitations of current simulations in presenting dynamic or deformable terrains. Humanoid robots, with high degrees of freedom and inherently unstable morphology, are particularly susceptible to misguidance from deficient perception, which can result in falls or termination on challenging dynamic terrains. To leverage the advantages of both vision and blind policies, we propose VB-Com, a composite framework that enables humanoid robots to determine when to rely on the vision policy and when to switch to the blind policy under perceptual deficiency. We demonstrate that VB-Com effectively enables humanoid robots to traverse challenging terrains and obstacles despite perception deficiencies caused by dynamic terrains or perceptual noise.
- Published
- 2025
33. SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
- Author
-
Team, M-A-P, Du, Xinrun, Yao, Yifan, Ma, Kaijing, Wang, Bingli, Zheng, Tianyu, Zhu, Kang, Liu, Minghao, Liang, Yiming, Jin, Xiaolong, Wei, Zhenlin, Zheng, Chujie, Deng, Kaixin, Jia, Shian, Jiang, Sichao, Liao, Yiyan, Li, Rui, Li, Qinrui, Li, Sirun, Li, Yizhi, Li, Yunwen, Ma, Dehua, Ni, Yuansheng, Que, Haoran, Wang, Qiyao, Wen, Zhoufutu, Wu, Siwei, Xing, Tianshun, Xu, Ming, Yang, Zhenzhu, Wang, Zekun Moore, Zhou, Junting, Bai, Yuelin, Bu, Xingyuan, Cai, Chenglin, Chen, Liang, Chen, Yifan, Cheng, Chengtuo, Cheng, Tianhao, Ding, Keyi, Huang, Siming, Huang, Yun, Li, Yaoru, Li, Yizhe, Li, Zhaoqun, Liang, Tianhao, Lin, Chengdong, Lin, Hongquan, Ma, Yinghao, Pang, Tianyang, Peng, Zhongyuan, Peng, Zifan, Qi, Qige, Qiu, Shi, Qu, Xingwei, Quan, Shanghaoran, Tan, Yizhou, Wang, Zili, Wang, Chenqing, Wang, Hao, Wang, Yiya, Wang, Yubo, Xu, Jiajun, Yang, Kexin, Yuan, Ruibin, Yue, Yuanhao, Zhan, Tianyang, Zhang, Chun, Zhang, Jinyang, Zhang, Xiyue, Zhang, Xingjian, Zhang, Yue, Zhao, Yongchi, Zheng, Xiangyu, Zhong, Chenghua, Gao, Yang, Li, Zhoujun, Liu, Dayiheng, Liu, Qian, Liu, Tianyu, Ni, Shiwen, Peng, Junran, Qin, Yujia, Su, Wenbo, Wang, Guoyin, Wang, Shi, Yang, Jian, Yang, Min, Cao, Meng, Yue, Xiang, Zhang, Zhaoxiang, Zhou, Wangchunshu, Liu, Jiaheng, Lin, Qunshu, Huang, Wenhao, and Zhang, Ge
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.
- Published
- 2025
34. A double-layer placement algorithm for integrated circuit-based modules on printed circuit board
- Author
-
Li, Hangyuan, Yang, Zhaoyang, Pang, Haotian, Xu, Ning, and Chen, Yu
- Subjects
Computer Science - Other Computer Science - Abstract
Considering that the physical design of printed circuit board (PCB) follows the principle of modularized design, this paper proposes an automatic placement algorithm for functional modules. We first model the placement problem as a mixed-variable optimization problem, and then, developed tailored algorithms of global placement and legalization for the top-layer centralized placement subproblem and the bottom-layer pin-oriented placement subproblem. Numerical comparison demonstrates that the proposed mixed-variable optimization scheme can get optimized total wirelength of placement. Meanwhile, experimental results on several industrial PCB cases show that the developed centralized strategies can well accommodate the requirement of top-layer placement, and the pin-oriented global placement based on bin clustering contributes to optimized placement results meeting the requirement of pin-oriented design.
- Published
- 2025
35. Millimeter-Wave ISAC Testbed Using Programmable Digital Coding Dynamic Metasurface Antenna: Practical Design and Implementation
- Author
-
Jabbar, Abdul, Elsayed, Mostafa, Kazim, Jalil Ur-Rehman, Pang, Zhibo, Kernec, Julien Le, Imran, Muhammad, Larijani, Hadi, Ur-Rehman, Masood, Abbasi, Qammer, and Usman, Muhammad
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Dynamic Metasurface Antennas (DMAs) are transforming reconfigurable antenna technology by enabling energy-efficient, cost-effective beamforming through programmable meta-elements, eliminating the need for traditional phase shifters and delay lines. This breakthrough technology is emerging to revolutionize beamforming for next-generation wireless communication and sensing networks. In this paper, we present the design and real-world implementation of a DMA-assisted wireless communication platform operating in the license-free 60 GHz millimeter-wave (mmWave) band. Our system employs high-speed binary-coded sequences generated via a field-programmable gate array (FPGA), enabling real-time beam steering for spatial multiplexing and independent data transmission. A proof-of-concept experiment successfully demonstrates high-definition quadrature phase-shift keying (QPSK) modulated video transmission at 62 GHz. Furthermore, leveraging the DMA's multi-beam capability, we simultaneously transmit video to two spatially separated receivers, achieving accurate demodulation. We envision the proposed mmWave testbed as a platform for enabling the seamless integration of sensing and communication by allowing video transmission to be replaced with sensing data or utilizing an auxiliary wireless channel to transmit sensing information to multiple receivers. This synergy paves the way for advancing integrated sensing and communication (ISAC) in beyond-5G and 6G networks. Additionally, our testbed demonstrates potential for real-world use cases, including mmWave backhaul links and massive multiple-input multiple-output (MIMO) mmWave base stations., Comment: 12 pages, 11 figures
- Published
- 2025
36. HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
- Author
-
Ben, Qingwei, Jia, Feiyu, Zeng, Jia, Dong, Junting, Lin, Dahua, and Pang, Jiangmiao
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
Current humanoid teleoperation systems either lack reliable low-level control policies, or struggle to acquire accurate whole-body control commands, making it difficult to teleoperate humanoids for loco-manipulation tasks. To solve these issues, we propose HOMIE, a novel humanoid teleoperation cockpit integrates a humanoid loco-manipulation policy and a low-cost exoskeleton-based hardware system. The policy enables humanoid robots to walk and squat to specific heights while accommodating arbitrary upper-body poses. This is achieved through our novel reinforcement learning-based training framework that incorporates upper-body pose curriculum, height-tracking reward, and symmetry utilization, without relying on any motion priors. Complementing the policy, the hardware system integrates isomorphic exoskeleton arms, a pair of motion-sensing gloves, and a pedal, allowing a single operator to achieve full control of the humanoid robot. Our experiments show our cockpit facilitates more stable, rapid, and precise humanoid loco-manipulation teleoperation, accelerating task completion and eliminating retargeting errors compared to inverse kinematics-based methods. We also validate the effectiveness of the data collected by our cockpit for imitation learning. Our project is fully open-sourced, demos and code can be found in https://homietele.github.io/.
- Published
- 2025
37. Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
- Author
-
Dou, Longxu, Liu, Qian, Zhou, Fan, Chen, Changyu, Wang, Zili, Jin, Ziqi, Liu, Zichen, Zhu, Tongyao, Du, Cunxiao, Yang, Penghui, Wang, Haonan, Liu, Jiaheng, Zhao, Yongchi, Feng, Xiachong, Mao, Xin, Yeung, Man Tsung, Pipatanakul, Kunat, Koto, Fajri, Thu, Min Si, Kydlíček, Hynek, Liu, Zeyi, Lin, Qunshu, Sripaisarnmongkol, Sittipong, Sae-Khow, Kridtaphad, Thongchim, Nirattisai, Konkaew, Taechawat, Borijindargoon, Narong, Dao, Anh, Maneegard, Matichon, Artkaew, Phakphum, Yong, Zheng-Xin, Nguyen, Quan, Phatthiyaphaibun, Wannaphong, Tran, Hoang H., Zhang, Mike, Chen, Shiqi, Pang, Tianyu, Du, Chao, Wan, Xinyi, Lu, Wei, and Lin, Min
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Sailor2 is a family of cutting-edge multilingual language models for South-East Asian (SEA) languages, available in 1B, 8B, and 20B sizes to suit diverse applications. Building on Qwen2.5, Sailor2 undergoes continuous pre-training on 500B tokens (400B SEA-specific and 100B replay tokens) to support 13 SEA languages while retaining proficiency in Chinese and English. Sailor2-20B model achieves a 50-50 win rate against GPT-4o across SEA languages. We also deliver a comprehensive cookbook on how to develop the multilingual model in an efficient manner, including five key aspects: data curation, pre-training, post-training, model customization and evaluation. We hope that Sailor2 model (Apache 2.0 license) will drive language development in the SEA region, and Sailor2 cookbook will inspire researchers to build more inclusive LLMs for other under-served languages., Comment: 49 pages, 16 figures. Technical Report of Sailor2: https://sea-sailor.github.io/blog/sailor2/
- Published
- 2025
38. A Synergy Scoring Filter for Unsupervised Anomaly Detection with Noisy Data
- Author
-
Wang, Fengjie, Liu, Chengming, Haibo, Pang, and Shi, Lei
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Noise-inclusive fully unsupervised anomaly detection (FUAD) holds significant practical relevance. Although various methods exist to address this problem, they are limited in both performance and scalability. Our work seeks to overcome these obstacles, enabling broader adaptability of unsupervised anomaly detection (UAD) models to FUAD. To achieve this, we introduce the Synergy Scoring Filter (SSFilter), the first fully unsupervised anomaly detection approach to leverage sample-level filtering. SSFilter facilitates end-to-end robust training and applies filtering to the complete training set post-training, offering a model-agnostic solution for FUAD. Specifically, SSFilter integrates a batch-level anomaly scoring mechanism based on mutual patch comparison and utilizes regression errors in anomalous regions, alongside prediction uncertainty, to estimate sample-level uncertainty scores that calibrate the anomaly scoring mechanism. This design produces a synergistic, robust filtering approach. Furthermore, we propose a realistic anomaly synthesis method and an integrity enhancement strategy to improve model training and mitigate missed noisy samples. Our method establishes state-of-the-art performance on the FUAD benchmark of the recent large-scale industrial anomaly detection dataset, Real-IAD. Additionally, dataset-level filtering enhances the performance of various UAD methods on the FUAD benchmark, and the high scalability of our approach significantly boosts its practical applicability.
- Published
- 2025
39. Late-time cosmic acceleration from quantum gravity
- Author
-
Oriti, Daniele and Pang, Xiankai
- Subjects
General Relativity and Quantum Cosmology ,Astrophysics - Cosmology and Nongalactic Astrophysics ,High Energy Physics - Theory - Abstract
We deepen the analysis of the cosmological acceleration produced by quantum gravity dynamics in the formalism of group field theory condensate cosmology, treated at the coarse-grained level via a phenomenological model, in the language of hydrodynamics on minisuperspace. Specifically, we conduct a detailed analysis of the late-time evolution, which shows a phantom-like phase followed by an asymptotic De Sitter expansion. We argue that the model indicates a recent occurrence of the phantom crossing and we extract a more precise expression for the effective cosmological constant, linking its value to other parameters in the model and to the scale of the quantum bounce in the early universe evolution. Additionally, we show how the phantom phase produced by our quantum gravity dynamics increases the inferred value of the current Hubble parameter based on observed data, indicating a possible quantum gravity mechanism for alleviating the Hubble tension. Our results represent a concrete example of how quantum gravity can provide an explanation for large-scale cosmological puzzles, in an emergent spacetime scenario., Comment: 38 pages, 5 figures
- Published
- 2025
40. A versatile experimental method to measure the traction forces at interfaces
- Author
-
Hou, Yingwei, Liu, Tao, and Pang, Yong
- Subjects
Physics - Instrumentation and Detectors ,Condensed Matter - Materials Science ,Physics - Biological Physics ,Physics - Optics - Abstract
Measurement of surface forces, including cohesive forces and contact forces, is critical for understanding and controlling interactions at interfaces to optimize the interfacial performance of applications. The objective of this paper is to introduce a general in-situ method that enables the measurement of 3D micron-scale displacements and corresponding force distribution at interfaces in dry or wet environment. Stereo digital image correlation was used to measure the 3D-displacement of a soft and deformable substrate. The efficiency and accuracy of the technique were evaluated by applying compression to the substrate using a steel ball, with the measured 3D displacements aligning closely with finite element analysis simulations. To further assess the method's applicability, the wet adhesion between mussel plaques and substrate was tested under aqueous conditions. The interfacial displacements and forces at different stages during the test were measured. The application of the technique can be extended for varied circumstances regarding force range and substrate materials based on Winkler Spring model.
- Published
- 2025
41. Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
- Author
-
Huang, Ailin, Wu, Boyong, Wang, Bruce, Yan, Chao, Hu, Chen, Feng, Chengli, Tian, Fei, Shen, Feiyu, Li, Jingbei, Chen, Mingrui, Liu, Peng, Miao, Ruihang, You, Wang, Chen, Xi, Yang, Xuerui, Huang, Yechang, Zhang, Yuxiang, Gong, Zheng, Zhang, Zixin, Zhou, Hongyu, Sun, Jianjian, Li, Brian, Feng, Chengting, Wan, Changyi, Hu, Hanpeng, Wu, Jianchang, Zhen, Jiangjie, Ming, Ranchen, Yuan, Song, Zhang, Xuelin, Zhou, Yu, Li, Bingxin, Ma, Buyun, Wang, Hongyuan, An, Kang, Ji, Wei, Li, Wen, Wen, Xuan, Kong, Xiangwen, Ma, Yuankai, Liang, Yuanwei, Mou, Yun, Ahmidi, Bahtiyar, Wang, Bin, Li, Bo, Miao, Changxin, Xu, Chen, Wang, Chenrun, Shi, Dapeng, Sun, Deshan, Hu, Dingyuan, Sai, Dula, Liu, Enle, Huang, Guanzhe, Yan, Gulin, Wang, Heng, Jia, Haonan, Zhang, Haoyang, Gong, Jiahao, Guo, Junjing, Liu, Jiashuai, Liu, Jiahong, Feng, Jie, Wu, Jie, Wu, Jiaoren, Yang, Jie, Wang, Jinguo, Zhang, Jingyang, Lin, Junzhe, Li, Kaixiang, Xia, Lei, Zhou, Li, Zhao, Liang, Gu, Longlong, Chen, Mei, Wu, Menglin, Li, Ming, Li, Mingxiao, Li, Mingliang, Liang, Mingyao, Wang, Na, Hao, Nie, Wu, Qiling, Tan, Qinyuan, Sun, Ran, Shuai, Shuai, Pang, Shaoliang, Yang, Shiliang, Gao, Shuli, Yuan, Shanshan, Liu, Siqi, Deng, Shihong, Jiang, Shilei, Liu, Sitong, Cao, Tiancheng, Wang, Tianyu, Deng, Wenjin, Xie, Wuxun, Ming, Weipeng, He, Wenqing, Sun, Wen, Han, Xin, Huang, Xin, Deng, Xiaomin, Liu, Xiaojia, Wu, Xin, Zhao, Xu, Wei, Yanan, Yu, Yanbo, Cao, Yang, Li, Yangguang, Ma, Yangzhen, Xu, Yanming, Wang, Yaoyu, Shi, Yaqiang, Wang, Yilei, Zhou, Yizhuang, Zhong, Yinmin, Zhang, Yang, Wei, Yaoben, Luo, Yu, Lu, Yuanwei, Yin, Yuhe, Luo, Yuchu, Ding, Yuanhao, Yan, Yuting, Dai, Yaqi, Yang, Yuxiang, Xie, Zhe, Ge, Zheng, Sun, Zheng, Huang, Zhewei, Chang, Zhichao, Guan, Zhisheng, Yang, Zidong, Zhang, Zili, Jiao, Binxing, Jiang, Daxin, Shum, Heung-Yeung, Chen, Jiansheng, Li, Jing, Zhou, Shuchang, Zhang, Xiangyu, Zhang, Xinhao, and Zhu, Yibo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
- Published
- 2025
42. FedEAT: A Robustness Optimization Framework for Federated LLMs
- Author
-
Pang, Yahao, Wu, Xingyuan, Zhang, Xiaojin, Chen, Wei, and Jin, Hai
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Significant advancements have been made by Large Language Models (LLMs) in the domains of natural language understanding and automated content creation. However, they still face persistent problems, including substantial computational costs and inadequate availability of training data. The combination of Federated Learning (FL) and LLMs (federated LLMs) offers a solution by leveraging distributed data while protecting privacy, which positions it as an ideal choice for sensitive domains. However, Federated LLMs still suffer from robustness challenges, including data heterogeneity, malicious clients, and adversarial attacks, which greatly hinder their applications. We first introduce the robustness problems in federated LLMs, to address these challenges, we propose FedEAT (Federated Embedding space Adversarial Training), a novel framework that applies adversarial training in the embedding space of client LLM and employs a robust aggregation approach, specifically geometric median aggregation, to enhance the robustness of Federated LLMs. Our experiments demonstrate that FedEAT effectively improves the robustness of Federated LLMs with minimal performance loss.
- Published
- 2025
43. FineFilter: A Fine-grained Noise Filtering Mechanism for Retrieval-Augmented Large Language Models
- Author
-
Zhang, Qianchi, Zhang, Hainan, Pang, Liang, Zheng, Hongwei, Tong, Yongxin, and Zheng, Zhiming
- Subjects
Computer Science - Computation and Language - Abstract
Retrieved documents containing noise will hinder Retrieval-Augmented Generation (RAG) from detecting answer clues, necessitating noise filtering mechanisms to enhance accuracy. Existing methods use re-ranking or summarization to identify the most relevant sentences, but directly and accurately locating answer clues from these large-scale and complex documents remains challenging. Unlike these document-level operations, we treat noise filtering as a sentence-level MinMax optimization problem: first identifying the potential clues from multiple documents using contextual information, then ranking them by relevance, and finally retaining the least clues through truncation. In this paper, we propose FineFilter, a novel fine-grained noise filtering mechanism for RAG consisting of a clue extractor, a re-ranker, and a truncator. We optimize each module to tackle complex reasoning challenges: (1) Clue extractor firstly uses sentences containing the answer and similar ones as fine-tuned targets, aiming at extracting sufficient potential clues; (2) Re-ranker is trained to prioritize effective clues based on the real feedback from generation module, with clues capable of generating correct answer as positive samples and others as negative; (3) Truncator takes the minimum clues needed to answer the question (truncation point) as fine-tuned targets, and performs truncation on the re-ranked clues to achieve fine-grained noise filtering. Experiments on three QA datasets demonstrate that FineFilter significantly outperforms baselines in terms of performance and inference cost. Further analysis on each module shows the effectiveness of our optimizations for complex reasoning.
- Published
- 2025
44. Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
- Author
-
Pang, Yuqi, Yang, Bowen, Tu, Haoqin, Cao, Yun, and Zhang, Zeyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Although Large Language Models (LLMs) excel in reasoning and generation for language tasks, they are not specifically designed for multimodal challenges. Training Multimodal Large Language Models (MLLMs), however, is resource-intensive and constrained by various training limitations. In this paper, we propose the Modular-based Visual Contrastive Decoding (MVCD) framework to move this obstacle. Our framework leverages LLMs' In-Context Learning (ICL) capability and the proposed visual contrastive-example decoding (CED), specifically tailored for this framework, without requiring any additional training. By converting visual signals into text and focusing on contrastive output distributions during decoding, we can highlight the new information introduced by contextual examples, explore their connections, and avoid over-reliance on prior encoded knowledge. MVCD enhances LLMs' visual perception to make it see and reason over the input visuals. To demonstrate MVCD's effectiveness, we conduct experiments with four LLMs across five question answering datasets. Our results not only show consistent improvement in model accuracy but well explain the effective components inside our decoding strategy. Our code will be available at https://github.com/Pbhgit/MVCD., Comment: Accepted to ICASSP 2025
- Published
- 2025
45. How to Divide: A Set Partitioning Strategy Balancing the Trade-off Between Intra-Subset Correlation and Inter-Subset Gain Mutual Influence in Distributed Attack Detection Scheduling Task
- Author
-
Suo, Yuhan, Chai, Runqi, Chai, Senchun, Pang, Zhong-Hua, Xu, Jiping, and Xia, Yuanqing
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Recently, the efficiency of attack detection in large-scale sensor networks has remained a critical research challenge. Studies have shown that while distributed algorithms offer higher efficiency compared to centralized approaches, they often come at the cost of reduced performance. To strike a balance between detection efficiency and performance in large-scale sensor networks, this paper explores the feasibility of extending existing algorithms to a distributed framework. Starting from the perspective of set partitioning strategies, this study analyzes the key factor that contributes to the performance differences between distributed and centralized algorithms. By examining the gain mutual influence of sensor subsets, an optimal set partitioning strategy is designed to minimize inter-subset mutual influence while enhancing intra-subset correlation. To further reduce the computational cost of gain updates, a suboptimal partitioning strategy based on Grassmann distance is proposed, improving the efficiency of selecting suspicious sensors. Theoretical analysis demonstrates that this approach effectively reduces the computational cost of gain updates while maintaining detection performance. Finally, simulation results validate the effectiveness of the proposed method in enhancing attack detection performance., Comment: 14 pages, 5 figures. This work has been submitted to the IEEE for possible publication
- Published
- 2025
46. ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models
- Author
-
Ding, Hanxing, Tao, Shuchang, Pang, Liang, Wei, Zihao, Gao, Jinyang, Ding, Bolin, Shen, Huawei, and Chen, Xueqi
- Subjects
Computer Science - Computation and Language - Abstract
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. Existing approaches face significant challenges, including reliance on hand-crafted prompts, difficulty in multi-step planning, and lack of precise error diagnosis and reflection mechanisms. We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task. Inspired by software engineering principles, ToolCoder transforms natural language queries into structured Python function scaffold and systematically breaks down tasks with descriptive comments, enabling LLMs to leverage coding paradigms for complex reasoning and planning. It then generates and executes function implementations to obtain final responses. Additionally, ToolCoder stores successfully executed functions in a repository to promote code reuse, while leveraging error traceback mechanisms for systematic debugging, optimizing both execution efficiency and robustness. Experiments demonstrate that ToolCoder achieves superior performance in task completion accuracy and execution reliability compared to existing approaches, establishing the effectiveness of code-centric approaches in tool learning.
- Published
- 2025
47. Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs?
- Author
-
Ding, Hanxing, Tao, Shuchang, Pang, Liang, Wei, Zihao, Chen, Liwei, Xu, Kun, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language - Abstract
Retrieval-augmented generation (RAG) systems often suffer from performance degradation when encountering noisy or irrelevant documents, driving researchers to develop sophisticated training strategies to enhance their robustness against such retrieval noise. However, as large language models (LLMs) continue to advance, the necessity of these complex training methods is increasingly questioned. In this paper, we systematically investigate whether complex robust training strategies remain necessary as model capacity grows. Through comprehensive experiments spanning multiple model architectures and parameter scales, we evaluate various document selection methods and adversarial training techniques across diverse datasets. Our extensive experiments consistently demonstrate that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically. We delve into the rationale and find that more powerful models inherently exhibit superior confidence calibration, better generalization across datasets (even when trained with randomly selected documents), and optimal attention mechanisms learned with simpler strategies. Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful, enabling more scalable applications with minimal complexity.
- Published
- 2025
48. Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
- Author
-
Deng, Jingcheng, Jiang, Zhongtao, Pang, Liang, Chen, Liwei, Xu, Kun, Wei, Zihao, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language - Abstract
A new trend uses LLMs as dense text encoders via contrastive learning. However, since LLM embeddings predict the probability distribution of the next token, they are inherently generative and distributive, conflicting with contrastive learning, which requires embeddings to capture full-text semantics and align via cosine similarity. This discrepancy hinders the full utilization of LLMs' pre-training capabilities, resulting in inefficient learning. In response to this issue, we propose AutoRegEmbed, a new contrastive learning method built on embedding conditional probability distributions, which integrates two core tasks: information compression and conditional distribution alignment. The information compression task encodes text into the embedding space, ensuring that the embedding vectors capture global semantics. The conditional distribution alignment task focuses on aligning text embeddings with positive samples embeddings by leveraging the conditional distribution of embeddings while simultaneously reducing the likelihood of generating negative samples from text embeddings, thereby achieving embedding alignment and uniformity. Experimental results demonstrate that our method significantly outperforms traditional contrastive learning approaches and achieves performance comparable to state-of-the-art models when using the same amount of data.
- Published
- 2025
49. High-pressure floating zone crystal growth of Sr$_2$IrO$_4$
- Author
-
Alvarado, S. J. Gomez, Pang, Y., Barrera, P. A., Rout, D., Robison, C., Porter, Z., Porter, H. Z., Lawrence, E. A., Bassey, E. N., and Wilson, S. D.
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Materials Science - Abstract
Here we demonstrate the floating zone crystal growth of the $J_\mathrm{eff}=1/2$ Mott insulator Sr$_2$IrO$_4$. Historically, the growth of iridates from a ternary melt has been precluded by the extreme vapor pressure of the metal oxide species and the difficulty of maintaining the correct oxidation state of Ir at high temperatures. Here, we show that the application of a high-pressure oxygen growth environment stabilizes the Sr$_2$IrO$_4$ phase, leading to the first demonstration of cm$^{3}$-scale crystals. In contrast to the conventional SrCl$_2$ flux growth method, where poor control over disorder leads to strong sample dependence, the high-pressure floating zone growth enables active control over the homogeneity of the melt. Crystals grown via this technique possess qualitatively similar properties to those grown via flux, with a relatively sharp onset of antiferromagnetic order observed in temperature-dependent magnetization. Further, we demonstrate that by tuning the mixing rate of the melt, we are able to grow natively hole-doped Sr$_2$Ir$_{1-y}$O$_4$, which exhibits a strongly modified magnetic and electronic response., Comment: 8 pages, 4 figures
- Published
- 2025
50. Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
- Author
-
Pang, Bowen, Li, Kai, She, Ruifeng, and Wang, Feifan
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture ,Computer Science - Machine Learning - Abstract
With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the serving system that deploys LLMs. To optimize system throughput and maximize hardware utilization, we formulate the inference optimization problem as a mixed-integer programming (MIP) model and propose a hybrid offline-online method as solution. The offline method improves large-scale inference systems by introducing a Minimizing Makespan Bin Packing Problem. We further provide a theoretical lower bound computation method. Then, we propose an online sorting and preemptive scheduling method to better utilize hardware. In the online iteration scheduling process, a Lagrangian method is applied to evaluate the cost efficiency of inserting prefill stages versus decode stages at each iteration and dynamically determine when to preempt decoding tasks and insert prefill tasks. Experiments using real-world data from the LLaMA-65B model and the GSM8K dataset demonstrate that system utilization improves from 80.2% to 89.1%, and the total inference time decreases from 201.00 to 190.58 seconds. A 100-cases study shows that our method consistently outperforms the baseline method and improves the utilization rate by 8.0% on average. Finally, we discuss potential future extensions, including stochastic modeling, reinforcement learning-based schedulers, and dynamic decision-making strategies for system throughput and hardware utilization.
- Published
- 2025
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.