Author: "Chuang P" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chuang P"' showing total 21,099 results

Start Over Author "Chuang P"

21,099 results on '"Chuang P"'

1. Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

Author: Chuang, Yun-Shiuan, Harlalka, Nikunj, Narendran, Sameer, Cheung, Alexander, Gao, Sizhe, Suresh, Siddharth, Hu, Junjie, and Rogers, Timothy T.
Subjects: Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: Guesstimation, the task of making approximate quantity estimates, is a common real-world challenge. However, it has been largely overlooked in large language models (LLMs) and vision language models (VLMs) research. We introduce a novel guesstimation dataset, MARBLES. This dataset requires one to estimate how many items (e.g., marbles) can fit into containers (e.g., a one-cup measuring cup), both with and without accompanying images. Inspired by the social science concept of the ``Wisdom of Crowds'' (WOC) - taking the median from estimates from a crowd), which has proven effective in guesstimation, we propose ``WOC decoding'' strategy for LLM guesstimation. We show that LLMs/VLMs perform well on guesstimation, suggesting that they possess some level of a "world model" necessary for guesstimation. Moreover, similar to human performance, the WOC decoding method improves LLM/VLM guesstimation accuracy. Furthermore, the inclusion of images in the multimodal condition enhances model performance. These results highlight the value of WOC decoding strategy for LLMs/VLMs and position guesstimation as a probe for evaluating LLMs/VLMs' world model. As LLMs' world model is a fundamental prerequisite for many real-world tasks, e.g., human-AI teaming, our findings have broad implications for the AI community.
Published: 2025

2. Lyapunov exponents as probes for phase transitions of Kerr-AdS black holes

Author: Chen, Deyou and Yang, Chuang
Subjects: High Energy Physics - Theory
Abstract: In this paper, we study proper time and coordinate time Lyapunov exponents of chaos for massless and massive particles, and explore their relationships with phase transitions of four-dimensional and five-dimensional Kerr-AdS black holes. The results reveal that these Lyapunov exponents can reflect the occurrence of phase transitions. Specifically, when compared to the Lyapunov exponents of massive particles in chaotic states, those of massless particles exhibit a more comprehensive ability to describe the phase transitions. Furthermore, we conduct a study on the critical exponents associated with the Lyapunov exponents in four-dimensional Kerr AdS black holes, identifying a critical exponent value of 1/2., Comment: 23 pages, 10 figures
Published: 2025

3. Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Author: Guo, Hailong, Zeng, Bohan, Song, Yiren, Zhang, Wentao, Zhang, Chuang, and Liu, Jiaming
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. However, the scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON. Also, it limits the ability to generate mask-free try-ons. To tackle the data scarcity problem, approaches such as Stable Garment and MMTryon use a synthetic data strategy, effectively increasing the amount of paired data on the model side. However, existing methods are typically limited to performing specific try-on tasks and lack user-friendliness. To enhance the generalization and controllability of VTON generation, we propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images to meet various needs, eliminating the reliance on masks, poses, or other conditions. Specifically, we first construct the virtual try-on dataset LAION-Garment, the largest known open-source garment try-on dataset. Then, we introduce adaptive position embedding, which enables the model to generate satisfactory outfitted model images or garment images based on input images of different sizes and categories, significantly enhancing the generalization and controllability of VTON generation. In our experiments, we demonstrate the effectiveness of our Any2AnyTryon and compare it with existing methods. The results show that Any2AnyTryon enables flexible, controllable, and high-quality image-based virtual try-on generation.https://logn-2024.github.io/Any2anyTryonProjectPage/, Comment: 13 pages,13 figures
Published: 2025

4. Unveiling Discrete Clues: Superior Healthcare Predictions for Rare Diseases

Author: Zhao, Chuang, Tang, Hui, Zhang, Jiheng, and Li, Xiaomeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: Accurate healthcare prediction is essential for improving patient outcomes. Existing work primarily leverages advanced frameworks like attention or graph networks to capture the intricate collaborative (CO) signals in electronic health records. However, prediction for rare diseases remains challenging due to limited co-occurrence and inadequately tailored approaches. To address this issue, this paper proposes UDC, a novel method that unveils discrete clues to bridge consistent textual knowledge and CO signals within a unified semantic space, thereby enriching the representation semantics of rare diseases. Specifically, we focus on addressing two key sub-problems: (1) acquiring distinguishable discrete encodings for precise disease representation and (2) achieving semantic alignment between textual knowledge and the CO signals at the code level. For the first sub-problem, we refine the standard vector quantized process to include condition awareness. Additionally, we develop an advanced contrastive approach in the decoding stage, leveraging synthetic and mixed-domain targets as hard negatives to enrich the perceptibility of the reconstructed representation for downstream tasks. For the second sub-problem, we introduce a novel codebook update strategy using co-teacher distillation. This approach facilitates bidirectional supervision between textual knowledge and CO signals, thereby aligning semantically equivalent information in a shared discrete latent space. Extensive experiments on three datasets demonstrate our superiority.
Published: 2025

5. CAND: Cross-Domain Ambiguity Inference for Early Detecting Nuanced Illness Deterioration

Author: Ting, Lo Pang-Yun, Tan, Zhen, Chen, Hong-Pei, Li, Cheng-Te, Chen, Po-Lin, Chuang, Kun-Ta, and Liu, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Early detection of patient deterioration is essential for timely treatment, with vital signs like heart rates being key health indicators. Existing methods tend to solely analyze vital sign waveforms, ignoring transition relationships of waveforms within each vital sign and the correlation strengths among various vital signs. Such studies often overlook nuanced illness deterioration, which is the early sign of worsening health but is difficult to detect. In this paper, we introduce CAND, a novel method that organizes the transition relationships and the correlations within and among vital signs as domain-specific and cross-domain knowledge. CAND jointly models these knowledge in a unified representation space, considerably enhancing the early detection of nuanced illness deterioration. In addition, CAND integrates a Bayesian inference method that utilizes augmented knowledge from domain-specific and cross-domain knowledge to address the ambiguities in correlation strengths. With this architecture, the correlation strengths can be effectively inferred to guide joint modeling and enhance representations of vital signs. This allows a more holistic and accurate interpretation of patient health. Our experiments on a real-world ICU dataset demonstrate that CAND significantly outperforms existing methods in both effectiveness and earliness in detecting nuanced illness deterioration. Moreover, we conduct a case study for the interpretable detection process to showcase the practicality of CAND.
Published: 2025

6. UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Author: Qin, Yujia, Ye, Yining, Fang, Junjie, Wang, Haoming, Liang, Shihao, Tian, Shizuo, Zhang, Junda, Li, Jiahao, Li, Yunxin, Huang, Shijue, Zhong, Wanjun, Li, Kuanye, Yang, Jiale, Miao, Yu, Lin, Woyu, Liu, Longxiang, Jiang, Xu, Ma, Qianli, Li, Jingyu, Xiao, Xiaojun, Cai, Kai, Li, Chuang, Zheng, Yaowei, Jin, Chaolin, Li, Chen, Zhou, Xiao, Wang, Minchao, Chen, Haoli, Li, Zhaojian, Yang, Haihua, Liu, Haifeng, Lin, Feng, Peng, Tao, Liu, Xin, and Shi, Guang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction
Abstract: This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.
Published: 2025

7. Each Graph is a New Language: Graph Learning with LLMs

Author: Zhou, Huachi, Du, Jiahe, Zhou, Chuang, Yang, Chang, Xiao, Yilin, Xie, Yuxuan, and Huang, Xiao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent efforts leverage Large Language Models (LLMs) for modeling text-attributed graph structures in node classification tasks. These approaches describe graph structures for LLMs to understand or aggregate LLM-generated textual attribute embeddings through graph structure. However, these approaches face two main limitations in modeling graph structures with LLMs. (i) Graph descriptions become verbose in describing high-order graph structure. (ii) Textual attributes alone do not contain adequate graph structure information. It is challenging to model graph structure concisely and adequately with LLMs. LLMs lack built-in mechanisms to model graph structures directly. They also struggle with complex long-range dependencies between high-order nodes and target nodes. Inspired by the observation that LLMs pre-trained on one language can achieve exceptional performance on another with minimal additional training, we propose \textbf{G}raph-\textbf{D}efined \textbf{L}anguage for \textbf{L}arge \textbf{L}anguage \textbf{M}odel (GDL4LLM). This novel framework enables LLMs to transfer their powerful language understanding capabilities to graph-structured data. GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures. During fine-tuning, this corpus describes the structural information of target nodes concisely with only a few tokens. By treating graphs as a new language, GDL4LLM enables LLMs to model graph structures adequately and concisely for node classification tasks. Extensive experiments on three real-world datasets demonstrate that GDL4LLM outperforms description-based and textual attribute embeddings-based baselines by efficiently modeling different orders of graph structure with LLMs.
Published: 2025

8. Grain boundary interstitial segregation in substitutional binary alloys

Author: Zhang, Zuoyong and Deng, Chuang
Subjects: Condensed Matter - Materials Science
Abstract: Grain boundary (GB) segregation is a powerful approach for optimizing the thermal and mechanical properties of metal alloys. In this study, we report significant GB interstitial segregation in a representative substitutional binary alloy system (Al-Ni) through atomistic simulations, challenging prevailing assumptions in the literature. Our findings show that Ni atoms preferentially segregate to interstitial sites within numerous kite-like GB structures in the Al bicrystals. An intriguing interplanar interstitial segregation pattern was also observed and analyzed. Additionally, interstitial segregation can induce unexpected GB transitions, such as kite transitions and nano-faceting, due to the existence of small interstitial sites. Building upon these observations, we developed a robust method to systematically identify the interstitial candidate sites for accommodating solutes at GBs. This approach combines site detection with structural filtering to produce distributions of interstitial sites that closely match atomistic simulation results. Applied to nanocrystalline alloys, this method enabled the calculation of interstitial segregation energies, significantly improving GB segregation predictions for the Al-Ni system. Furthermore, machine learning models using smooth overlap of atomic positions descriptors successfully predicted per-site interstitial segregation energy. This study highlights the critical role of GB interstitial segregation in advancing our understanding of solute behavior and provides valuable insights for designing next-generation alloys., Comment: 34 pages and 11 figures for the main manuscript, 10 pages and 11 figures for supplemental
Published: 2025

9. Studying the Baryon Acoustic Oscillations using photometric redshifts from the DESI Legacy Imaging survey DR9

Author: Saulder, Christoph, Song, Yong-Seon, Oh, Minji, Zheng, Yi, Ross, Ashley J., Zhou, Rongpu, Newman, Jeffrey A., Chuang, Chia-Hsun, Aguilar, Jessica Nicole, Ahlen, Steven, Blum, Robert, Brooks, David, Claybaugh, Todd, de la Macorra, Axel, Dey, Biprateep, Ding, Zhejie, Doel, Peter, Forero-Romero, Jaime E., Gaztañaga, Enrique, Gontcho, Satya Gontcho A, Gutierrez, Gaston, Juneau, Stephanie, Kirkby, David, Kisner, Theodore, Kremin, Anthony, Lambert, Andrew, Landriau, Martin, Guillou, Laurent Le, Levi, Michael, Meisner, Aaron, Mueller, Eva-Maria, Muñoz-Gutiérrez, Andrea, Niz, Gustavo, Prada, Francisco, Rezaie, Mehdi, Rossi, Graziano, Sanchez, Eusebio, Schubnell, Michael, Silber, Joseph Harry, Sprayberry, David, Tarlé, Gregory, Valdes, Francisco, Weaver, Benjamin Alan, and Zou, Hu
Subjects: Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: Context. The DESI Legacy Imaging Survey DR9, with its extensive dataset of galaxy locations and photometric redshifts, presents an opportunity to study Baryon Acoustic Oscillations (BAO) in the region covered by the ongoing DESI spectroscopic survey. Aims. We aim to investigate differences between different parts of the DR9 footprint. Furthermore, we want to measure the BAO scale for luminous red galaxies within them. Our selected redshift range of 0.6 to 0.8 corresponds to the bin in which a tension between DESI Y1 and eBOSS was found. Methods. We calculated the anisotropic two-point correlation function in a modified binning scheme to detect the BAO in DR9 data. We then use template fits based on simulations to measure the BAO scale in the imaging data. Results. Our analysis revealed the expected correlation function shape in most of the footprint areas, showing a BAO scale consistent with Planck's observations. Aside from identified mask-related data issues in the southern region of the South Galactic Cap, we also find a notable variance between the different footprints. Conclusions. We find that this variance is consistent with the difference between the DESI Y1 and eBOSS data and it supports the argument that that tension is caused by sample variance. Additionally, we also uncovered systematic biases not previously accounted for in photometric BAO studies. We emphasize the necessity of adjusting for the systematic shift in the BAO scale associated with typical photometric redshift uncertainties to ensure accurate measurements., Comment: 20 pages, 14 figures, 3 tables, accepted for publication in Astronomy and Astrophysics
Published: 2025

10. AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

Author: Seo, Jamin, Ramachandran, Akshat, Chuang, Yu-Chuan, Itagi, Anirudh, and Krishna, Tushar
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture
Abstract: Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture., Comment: Accepted to DATE 2025
Published: 2025

11. Bacterial proliferation pattern formation

Author: Chuang, John S., Rao, Riccardo, and Leibler, Stanislas
Subjects: Quantitative Biology - Populations and Evolution, Nonlinear Sciences - Adaptation and Self-Organizing Systems, Nonlinear Sciences - Pattern Formation and Solitons, Physics - Biological Physics, Quantitative Biology - Cell Behavior
Abstract: Bacteria can form a great variety of spatially heterogeneous cell density patterns, ranging from simple concentric rings to dynamical spiral waves appearing in growing colonies. These pattern formation phenomena are important as they reflect how cellular processes such as metabolism operate in heterogeneous chemical environments. In the laboratory, they can be studied in simplified set-ups, where spatial gradients of oxygen and nutrients are externally imposed, and cells are immobilized in a gel matrix. An intriguing example, observed in such set-ups over 80 years ago, is the sequential formation of narrow bands of high cell density, taking place even for a clonal population. However, key aspects of the dynamics of band formation remained obscure. Using time-lapse imaging of replicate transparent columns in simplified growth media, we first quantify the precision of the positioning and timing of band formation. We also show that the appearance and position of different bands can be modulated independently. This "modularity" is suggested by the observation that different bands differ in their gene expression, and it is reproduced by a theoretical model based on the existence of internal metabolic states and the induction of a pH gradient. Finally, we can also modify the observed pattern formation by introducing genetic modifications that impair selected metabolic pathways. In our opinion, the possibility of precise measurements and controls, together with the simplicity and richness of the "proliferation pattern formation" phenomenon, can make it a model system to study the response of cellular processes to heterogeneous environments., Comment: 12 pages (+ 20 SI text pages), 6 figures (+ 16 SI figures). To be published in PRX Life
Published: 2025

12. Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction

Author: Zadorozhny, Karina, Chuang, Kangway V., Sathappan, Bharath, Wallace, Ewan, Sresht, Vishnu, and Grambow, Colin A.
Subjects: Computer Science - Machine Learning
Abstract: Accurate prediction of molecular activities is crucial for efficient drug discovery, yet remains challenging due to limited and noisy datasets. We introduce Similarity-Quantized Relative Learning (SQRL), a learning framework that reformulates molecular activity prediction as relative difference learning between structurally similar pairs of compounds. SQRL uses precomputed molecular similarities to enhance training of graph neural networks and other architectures, and significantly improves accuracy and generalization in low-data regimes common in drug discovery. We demonstrate its broad applicability and real-world potential through benchmarking on public datasets as well as proprietary industry data. Our findings demonstrate that leveraging similarity-aware relative differences provides an effective paradigm for molecular activity prediction.
Published: 2025

13. Atomic Norm Soft Thresholding for Sparse Time-frequency Representation

Author: Yang, Zongyue, Ding, Baoqing, Wang, Shibin, Sun, Chuang, and Chen, Xuefeng
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Time-frequency (TF) representation of non-stationary signals typically requires the effective concentration of energy distribution along the instantaneous frequency (IF) ridge, which exhibits intrinsic sparsity. Inspired by the sparse optimization over continuum via atomic norm, a novel atomic norm soft thresholding for sparse TF representation (AST-STF) method is proposed, which ensures accurate TF localization under the strong duality. Numerical experiments demonstrate that the performance of the proposed method surpasses that of conventional methods., Comment: 12 pages
Published: 2025

14. Quantum Computing Enhanced Sensing

Author: Allen, Richard R., Machado, Francisco, Chuang, Isaac L., Huang, Hsin-Yuan, and Choi, Soonwon
Subjects: Quantum Physics
Abstract: Quantum computing and quantum sensing represent two distinct frontiers of quantum information science. In this work, we harness quantum computing to solve a fundamental and practically important sensing problem: the detection of weak oscillating fields with unknown strength and frequency. We present a quantum computing enhanced sensing protocol that outperforms all existing approaches. Furthermore, we prove our approach is optimal by establishing the Grover-Heisenberg limit -- a fundamental lower bound on the minimum sensing time. The key idea is to robustly digitize the continuous, analog signal into a discrete operation, which is then integrated into a quantum algorithm. Our metrological gain originates from quantum computation, distinguishing our protocol from conventional sensing approaches. Indeed, we prove that broad classes of protocols based on quantum Fisher information, finite-lifetime quantum memory, or classical signal processing are strictly less powerful. Our protocol is compatible with multiple experimental platforms. We propose and analyze a proof-of-principle experiment using nitrogen-vacancy centers, where meaningful improvements are achievable using current technology. This work establishes quantum computation as a powerful new resource for advancing sensing capabilities., Comment: 9 pages, 4 figures + 44 pages, 3 figures
Published: 2025

15. An Efficiency Firmware Verification Framework for Public Key Infrastructure with Smart Grid and Energy Storage System

Author: Shih, Jhih-Zen, Chuang, Cheng-Che, Huang, Hong-Sheng, Chen, Hsuan-Tung, and Sun, Hung-Min
Subjects: Computer Science - Cryptography and Security
Abstract: As a critical component of electrical energy infrastructure, the smart grid system has become indispensable to the energy sector. However, the rapid evolution of smart grids has attracted numerous nation-state actors seeking to disrupt the power infrastructure of adversarial nations. This development underscores the urgent need to establish secure mechanisms for firmware updates, with firmware signing and verification serving as pivotal elements in safeguarding system integrity. In this work, we propose a digital signing and verification framework grounded in Public Key Infrastructure (PKI), specifically tailored for resource-constrained devices such as smart meters. The framework utilizes the Concise Binary Object Representation (CBOR) and Object Signing and Encryption (COSE) formats to achieve efficient da-ta encapsulation and robust security features. Our approach not only en-sures the secure deployment of firmware updates against the convergence of information technology (IT) and operational technology (OT) attacks but also addresses performance bottlenecks stemming from device limitations, thereby enhancing the overall reliability and stability of the smart grid sys-tem., Comment: 10pages, 5 figures
Published: 2025

16. Multiscale discrete Maxwell boundary condition for the discrete unified gas kinetic scheme for all Knudsen number flows

Author: Xin, Ziyang, Zhang, Yue, Zhang, Chuang, and Guo, Zhaoli
Subjects: Physics - Fluid Dynamics, Physics - Computational Physics
Abstract: In this paper, a multiscale boundary condition for the discrete unified gas kinetic scheme (DUGKS) is developed for gas flows in all flow regimes. Based on the discrete Maxwell boundary condition (DMBC), this study addresses the limitations of the original DMBC used in DUGKS. Specifically, it is found that the DMBC produces spurious velocity slip and temperature jump, which are proportional to the mesh size and the momentum accommodation coefficient. The proposed multiscale DMBC is implemented by ensuring that the reflected original distribution function excludes collision effects. Theoretical analyses and numerous numerical tests show that the multiscale DMBC can achieve exactly the non-slip and non-jump conditions in the continuum limit and accurately captures non-equilibrium phenomena across a wide range of Knudsen numbers. The results demonstrate that the DUGKS with the multiscale DMBC can work properly for wall boundary conditions in all flow regimes with a fixed discretization in both space and time, without limitations on the thickness of the Knudsen layer and relaxation time., Comment: 38 pages, 16 figures
Published: 2025

17. Production of linear alkanes via the solid-state hydrogenation of interstellar polyynes

Author: Fedoseev, G., Li, X., Baratta, G. A., Palumbo, M. E., and Chuang, K. -J.
Subjects: Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Astrophysics of Galaxies, Astrophysics - Solar and Stellar Astrophysics, Physics - Chemical Physics
Abstract: Highly unsaturated carbon chains, including polyynes, have been detected in many astronomical regions and planetary systems. With the success of the QUIJOTE survey of the TMC-1, the community has seen a "boom" in the number of detected carbon chains. On the other hand, the Rosetta mission revealed the release of fully saturated hydrocarbons, C$_3$H$_8$, C$_4$H$_{10}$, C$_5$H$_{12}$, and (under specific conditions) C$_6$H$_{14}$ with C$_7$H$_{16}$, from the comet 67P/Churyumov-Gerasimenko. The detection of the latter two is attributed to dust-rich events. Similarly, the analysis of samples returned from asteroid Ryugu by Hayabusa2 mission indicates the presence of long saturated aliphatic chains in Ryugu's organic matter. The surface chemistry of unsaturated carbon chains under conditions resembling those of molecular clouds can provide a possible link among these independent observations. However, laboratory-based investigations to validate such a chemistry is still lacking. In the present study, we aim to experimentally verify the formation of fully saturated hydrocarbons by the surface hydrogenation of C$_{2n}$H$_2$ ($n>1$) polyynes under ultra-high vacuum conditions at 10 K. We undertook a two-step experimental technique. First, a thin layer of C$_2$H$_2$ ice was irradiated by UV-photons ($\geq$ 121 nm) to achieve a partial conversion of C$_2$H$_2$ into larger polyynes: C$_4$H$_2$ and C$_6$H$_2$. Afterwards, the obtained photoprocessed ice was exposed to H atoms to verify the formation of various saturated hydrocarbons. In addition to C$_2$H$_6$, which was investigated previously, the formation of larger alkanes, including C$_4$H$_{10}$ and (tentatively) C$_6$H$_{14}$, is confirmed by our study. A qualitative analysis of the obtained kinetic data indicates that hydrogenation of HCCH and HCCCCH triple bonds proceeds at comparable rates, given a surface temperature of 10 K.}
Published: 2025

18. Information-Maximized Soft Variable Discretization for Self-Supervised Image Representation Learning

Author: Niu, Chuang, Xia, Wenjun, Shan, Hongming, and Wang, Ge
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised learning (SSL) has emerged as a crucial technique in image processing, encoding, and understanding, especially for developing today's vision foundation models that utilize large-scale datasets without annotations to enhance various downstream tasks. This study introduces a novel SSL approach, Information-Maximized Soft Variable Discretization (IMSVD), for image representation learning. Specifically, IMSVD softly discretizes each variable in the latent space, enabling the estimation of their probability distributions over training batches and allowing the learning process to be directly guided by information measures. Motivated by the MultiView assumption, we propose an information-theoretic objective function to learn transform-invariant, non-travail, and redundancy-minimized representation features. We then derive a joint-cross entropy loss function for self-supervised image representation learning, which theoretically enjoys superiority over the existing methods in reducing feature redundancy. Notably, our non-contrastive IMSVD method statistically performs contrastive learning. Extensive experimental results demonstrate the effectiveness of IMSVD on various downstream tasks in terms of both accuracy and efficiency. Thanks to our variable discretization, the embedding features optimized by IMSVD offer unique explainability at the variable level. IMSVD has the potential to be adapted to other learning paradigms. Our code is publicly available at https://github.com/niuchuangnn/IMSVD.
Published: 2025

19. V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection

Author: Wang, Sichao, Yuan, Ming, Zhang, Chuang, Xu, Qing, He, Lei, and Wang, Jianqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multiagent Systems
Abstract: In V2X collaborative perception, the domain gaps between heterogeneous nodes pose a significant challenge for effective information fusion. Pose errors arising from latency and GPS localization noise further exacerbate the issue by leading to feature misalignment. To overcome these challenges, we propose V2X-DGPE, a high-accuracy and robust V2X feature-level collaborative perception framework. V2X-DGPE employs a Knowledge Distillation Framework and a Feature Compensation Module to learn domain-invariant representations from multi-source data, effectively reducing the feature distribution gap between vehicles and roadside infrastructure. Historical information is utilized to provide the model with a more comprehensive understanding of the current scene. Furthermore, a Collaborative Fusion Module leverages a heterogeneous self-attention mechanism to extract and integrate heterogeneous representations from vehicles and infrastructure. To address pose errors, V2X-DGPE introduces a deformable attention mechanism, enabling the model to adaptively focus on critical parts of the input features by dynamically offsetting sampling points. Extensive experiments on the real-world DAIR-V2X dataset demonstrate that the proposed method outperforms existing approaches, achieving state-of-the-art detection performance. The code is available at https://github.com/wangsch10/V2X-DGPE.
Published: 2025

20. Source-free Semantic Regularization Learning for Semi-supervised Domain Adaptation

Author: Huang, Xinyang, Zhu, Chuang, Ren, Ruiying, Liu, Shengjie, and Huang, Tiejun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Semi-supervised domain adaptation (SSDA) has been extensively researched due to its ability to improve classification performance and generalization ability of models by using a small amount of labeled data on the target domain. However, existing methods cannot effectively adapt to the target domain due to difficulty in fully learning rich and complex target semantic information and relationships. In this paper, we propose a novel SSDA learning framework called semantic regularization learning (SERL), which captures the target semantic information from multiple perspectives of regularization learning to achieve adaptive fine-tuning of the source pre-trained model on the target domain. SERL includes three robust semantic regularization techniques. Firstly, semantic probability contrastive regularization (SPCR) helps the model learn more discriminative feature representations from a probabilistic perspective, using semantic information on the target domain to understand the similarities and differences between samples. Additionally, adaptive weights in SPCR can help the model learn the semantic distribution correctly through the probabilities of different samples. To further comprehensively understand the target semantic distribution, we introduce hard-sample mixup regularization (HMR), which uses easy samples as guidance to mine the latent target knowledge contained in hard samples, thereby learning more complete and complex target semantic knowledge. Finally, target prediction regularization (TPR) regularizes the target predictions of the model by maximizing the correlation between the current prediction and the past learned objective, thereby mitigating the misleading of semantic information caused by erroneous pseudo-labels. Extensive experiments on three benchmark datasets demonstrate that our SERL method achieves state-of-the-art performance.
Published: 2025

21. No Preference Left Behind: Group Distributional Preference Optimization

Author: Yao, Binwei, Cai, Zefan, Chuang, Yun-Shiuan, Yang, Shanglin, Jiang, Ming, Yang, Diyi, and Hu, Junjie
Subjects: Computer Science - Computation and Language
Abstract: Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to reflect human preferences, they struggle to capture the distributional pluralistic preferences within a group. These methods often skew toward dominant preferences, overlooking the diversity of opinions, especially when conflicting preferences arise. To address this issue, we propose Group Distribution Preference Optimization (GDPO), a novel framework that aligns language models with the distribution of preferences within a group by incorporating the concept of beliefs that shape individual preferences. GDPO calibrates a language model using statistical estimation of the group's belief distribution and aligns the model with belief-conditioned preferences, offering a more inclusive alignment framework than traditional methods. In experiments using both synthetic controllable opinion generation and real-world movie review datasets, we show that DPO fails to align with the targeted belief distributions, while GDPO consistently reduces this alignment gap during training. Moreover, our evaluation metrics demonstrate that GDPO outperforms existing approaches in aligning with group distributional preferences, marking a significant advance in pluralistic alignment.
Published: 2024

22. Anisotropic and tunable vortex topology in multiband iron-based superconductors

Author: Yu, Si-Qi, Cheng, Wei, Li, Chuang, Pan, Xiao-Hong, Xu, Gang, Zhang, Fu-Chun, and Liu, Xin
Subjects: Condensed Matter - Superconductivity
Abstract: Building on the multiband nature of iron-based superconductors (FeSCs), we have uncovered pronounced anisotropy in Majorana vortex topology arising from the interaction between vortex orientation and multiple electronic topologies. This anisotropy manifests in two distinct vortex configurations: the z-vortex and x-vortex, oriented perpendicular and parallel to the Dirac axis (z-axis for FeSCs), respectively. The x-vortex exhibits a unique duality, displaying two distinct topological phase diagrams. One is strikingly simple, comprising only trivial and topological superconducting phases, and remains resilient to multiband entanglement. The other mirrors the z-vortex's complex diagram, featuring alternating trivial, topological crystalline and topological superconducting phases. Crucially, the former is exclusive to the x-vortex and supports unpaired Majorana vortices across a wide parameter range, even with Dirac nodes in electronic bands. Notably, uniaxial strain can modulate these x-vortex phases, enabling the x-vortex to support both stable Majorana vortices and rich exotic physics in a controllable manner. Moreover, we propose that the x-vortex offers promising advantages for developing iron-based superconducting quantum devices. Our findings introduce a novel paradigm in vortex topology within multiband superconducting systems, highlighting the x-vortex as a promising platform for exploring Majorana physics and advancing iron-based superconducting quantum technology., Comment: 7 pages + 4 figures
Published: 2024

23. Establishing Reality-Virtuality Interconnections in Urban Digital Twins for Superior Intelligent Road Inspection

Author: Zhang, Yikang, Liu, Chuang-Wei, Li, Jiahang, Chen, Yingbing, Cheng, Jie, and Fan, Rui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Road inspection is essential for ensuring road maintenance and traffic safety, as road defects gradually emerge and compromise road functionality. Traditional methods, which rely on manual evaluations, are labor-intensive, costly, and time-consuming. Although data-driven approaches are gaining traction, the scarcity and spatial sparsity of road defects in the real world pose significant challenges in acquiring high-quality datasets. Existing simulators designed to generate detailed synthetic driving scenes, however, lack models for road defects. Furthermore, advanced driving tasks involving interactions with road surfaces, such as planning and control in defective areas, remain underexplored. To address these limitations, we propose a system based on Urban Digital Twin (UDT) technology for intelligent road inspection. First, hierarchical road models are constructed from real-world driving data, creating highly detailed representations of road defect structures and surface elevations. Next, digital road twins are generated to create simulation environments for comprehensive analysis and evaluation. These scenarios are subsequently imported into a simulator to enable both data acquisition and physical simulation. Experimental results demonstrate that driving tasks, including perception and decision-making, can be significantly improved using the high-fidelity road defect scenes generated by our system., Comment: 13 pages, 9 figures
Published: 2024

24. Large Language Model Safety: A Holistic Survey

Author: Shi, Dan, Shen, Tianhao, Huang, Yufei, Li, Zhigen, Leng, Yongqi, Jin, Renren, Liu, Chuang, Wu, Xinwei, Guo, Zishan, Yu, Linhao, Shi, Ling, Jiang, Bojian, and Xiong, Deyi
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence, marked by unprecedented capabilities in natural language understanding and generation. However, the increasing integration of these models into critical applications raises substantial safety concerns, necessitating a thorough examination of their potential risks and associated mitigation strategies. This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks. In addition to the comprehensive review of the mitigation methodologies and evaluation resources on these four aspects, we further explore four topics related to LLM safety: the safety implications of LLM agents, the role of interpretability in enhancing LLM safety, the technology roadmaps proposed and abided by a list of AI companies and institutes for LLM safety, and AI governance aimed at LLM safety with discussions on international cooperation, policy proposals, and prospective regulatory directions. Our findings underscore the necessity for a proactive, multifaceted approach to LLM safety, emphasizing the integration of technical solutions, ethical considerations, and robust governance frameworks. This survey is intended to serve as a foundational resource for academy researchers, industry practitioners, and policymakers, offering insights into the challenges and opportunities associated with the safe integration of LLMs into society. Ultimately, it seeks to contribute to the safe and beneficial development of LLMs, aligning with the overarching goal of harnessing AI for societal advancement and well-being. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers., Comment: 158 pages, 18 figures
Published: 2024

25. Uncertainty-Aware Critic Augmentation for Hierarchical Multi-Agent EV Charging Control

Author: Ting, Lo Pang-Yun, Şenol, Ali, Wang, Huan-Yang, Lai, Hsu-Chao, Chuang, Kun-Ta, and Liu, Huan
Subjects: Electrical Engineering and Systems Science - Systems and Control, Computer Science - Artificial Intelligence
Abstract: The advanced bidirectional EV charging and discharging technology, aimed at supporting grid stability and emergency operations, has driven a growing interest in workplace applications. It not only effectively reduces electricity expenses but also enhances the resilience of handling practical issues, such as peak power limitation, fluctuating energy prices, and unpredictable EV departures. However, existing EV charging strategies have yet to fully consider these factors in a way that benefits both office buildings and EV users simultaneously. To address these issues, we propose HUCA, a novel real-time charging control for regulating energy demands for both the building and electric vehicles. HUCA employs hierarchical actor-critic networks to dynamically reduce electricity costs in buildings, accounting for the needs of EV charging in the dynamic pricing scenario. To tackle the uncertain EV departures, a new critic augmentation is introduced to account for departure uncertainties in evaluating the charging decisions, while maintaining the robustness of the charging control. Experiments on real-world electricity datasets under both simulated certain and uncertain departure scenarios demonstrate that HUCA outperforms baselines in terms of total electricity costs while maintaining competitive performance in fulfilling EV charging requirements. A case study also manifests that HUCA effectively balances energy supply between the building and EVs based on real-time information.
Published: 2024

26. Microscopic study of supercurrent diode effect in chiral nanotubes

Author: Li, Chuang and He, James Jun
Subjects: Condensed Matter - Superconductivity
Abstract: Nonreciprocity of supercurrents may exist when both spatial inversion and time-reversal symmetries are broken, leading to the supercurrent diode effect (SDE). The spatial inversion symmetry may be broken by chiral structures in nanotubes where the SDE is expected when a magnetic flux passes through the tube. While such an effect has been predicted based on a phenomenological theory, a microscopic and quantitative study with a concrete lattice model is missing. Here, we investigate the SDE in chiral nanotubes made of carbon and those made of transition metal dichalcogenides (TMD) with tight-binding models. We obtain the SDE efficiency as a function of the nanotube radius, the chiral angle, the magnetic flux, the temperature, the chemical potential, etc., and find that sign flipping happens in various parameter dependencies. In TMD nanotubes, the SDEs with and without the spin-orbit coupling are compared. We also simulate CNTs made from square lattice materials for comparison and discuss the effects of strains. Besides qualitative consistency with previous phenomenological theory, new features are found and the microscopic origins are clarified., Comment: 7 pages+ Appendix, 8 figures
Published: 2024

27. Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles

Author: Lin, Chuang, Zhuang, Bingbing, Sun, Shanlin, Jiang, Ziyu, Cai, Jianfei, and Chandraker, Manmohan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The recent advent of large-scale 3D data, e.g. Objaverse, has led to impressive progress in training pose-conditioned diffusion models for novel view synthesis. However, due to the synthetic nature of such 3D data, their performance drops significantly when applied to real-world images. This paper consolidates a set of good practices to finetune large pretrained models for a real-world task -- harvesting vehicle assets for autonomous driving applications. To this end, we delve into the discrepancies between the synthetic data and real driving data, then develop several strategies to account for them properly. Specifically, we start with a virtual camera rotation of real images to ensure geometric alignment with synthetic data and consistency with the pose manifold defined by pretrained models. We also identify important design choices in object-centric data curation to account for varying object distances in real driving scenes -- learn across varying object scales with fixed camera focal length. Further, we perform occlusion-aware training in latent spaces to account for ubiquitous occlusions in real data, and handle large viewpoint changes by leveraging a symmetric prior. Our insights lead to effective finetuning that results in a $68.8\%$ reduction in FID for novel view synthesis over prior arts.
Published: 2024

28. MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Author: Yang, Chuang, Zhao, Bingxuan, Zhou, Qing, and Wang, Qi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.
Published: 2024

29. Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction

Author: Maleki, Sepideh, Huetter, Jan-Christian, Chuang, Kangway V., Scalia, Gabriele, and Biancalani, Tommaso
Subjects: Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods
Abstract: Predicting transcriptional responses to novel drugs provides a unique opportunity to accelerate biomedical research and advance drug discovery efforts. However, the inherent complexity and high dimensionality of cellular responses, combined with the extremely limited available experimental data, makes the task challenging. In this study, we leverage single-cell foundation models (FMs) pre-trained on tens of millions of single cells, encompassing multiple cell types, states, and disease annotations, to address molecular perturbation prediction. We introduce a drug-conditional adapter that allows efficient fine-tuning by training less than 1% of the original foundation model, thus enabling molecular conditioning while preserving the rich biological representation learned during pre-training. The proposed strategy allows not only the prediction of cellular responses to novel drugs, but also the zero-shot generalization to unseen cell lines. We establish a robust evaluation framework to assess model performance across different generalization tasks, demonstrating state-of-the-art results across all settings, with significant improvements in the few-shot and zero-shot generalization to new cell lines compared to existing baselines.
Published: 2024

30. Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching

Author: Yu, Chuang, Liu, Yunpeng, Zhao, Jinmiao, and Yue, Xiangyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, cross-spectral image patch matching based on feature relation learning has attracted extensive attention. However, performance bottleneck problems have gradually emerged in existing methods. To address this challenge, we make the first attempt to explore a stable and efficient bridge between descriptor learning and metric learning, and construct a knowledge-guided learning network (KGL-Net), which achieves amazing performance improvements while abandoning complex network structures. Specifically, we find that there is feature extraction consistency between metric learning based on feature difference learning and descriptor learning based on Euclidean distance. This provides the foundation for bridge building. To ensure the stability and efficiency of the constructed bridge, on the one hand, we conduct an in-depth exploration of 20 combined network architectures. On the other hand, a feature-guided loss is constructed to achieve mutual guidance of features. In addition, unlike existing methods, we consider that the feature mapping ability of the metric branch should receive more attention. Therefore, a hard negative sample mining for metric learning (HNSM-M) strategy is constructed. To the best of our knowledge, this is the first time that hard negative sample mining for metric networks has been implemented and brings significant performance gains. Extensive experimental results show that our KGL-Net achieves SOTA performance in three different cross-spectral image patch matching scenarios. Our code are available at https://github.com/YuChuang1205/KGL-Net.
Published: 2024

31. From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision

Author: Yu, Chuang, Zhao, Jinmiao, Liu, Yunpeng, Zhao, Sicheng, and Yue, Xiangyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, single-frame infrared small target (SIRST) detection with single point supervision has drawn wide-spread attention. However, the latest label evolution with single point supervision (LESPS) framework suffers from instability, excessive label evolution, and difficulty in exerting embedded network performance. Therefore, we construct a Progressive Active Learning (PAL) framework. Specifically, inspired by organisms gradually adapting to their environment and continuously accumulating knowledge, we propose an innovative progressive active learning idea, which emphasizes that the network progressively and actively recognizes and learns more hard samples to achieve continuous performance enhancement. Based on this, on the one hand, we propose a model pre-start concept, which focuses on selecting a portion of easy samples and can help models have basic task-specific learning capabilities. On the other hand, we propose a refined dual-update strategy, which can promote reasonable learning of harder samples and continuous refinement of pseudo-labels. In addition, to alleviate the risk of excessive label evolution, a decay factor is reasonably introduced, which helps to achieve a dynamic balance between the expansion and contraction of target annotations. Extensive experiments show that convolutional neural networks (CNNs) equipped with our PAL framework have achieved state-of-the-art (SOTA) results on multiple public datasets. Furthermore, our PAL framework can build a efficient and stable bridge between full supervision and point supervision tasks. Our code are available at https://github.com/YuChuang1205/PAL.
Published: 2024

32. VCA: Video Curious Agent for Long Video Understanding

Author: Yang, Zeyuan, Chen, Delin, Yu, Xueyang, Shen, Maohao, and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Long video understanding poses unique challenges due to their temporal complexity and low information density. Recent works address this task by sampling numerous frames or incorporating auxiliary tools using LLMs, both of which result in high computational costs. In this work, we introduce a curiosity-driven video agent with self-exploration capability, dubbed as VCA. Built upon VLMs, VCA autonomously navigates video segments and efficiently builds a comprehensive understanding of complex video sequences. Instead of directly sampling frames, VCA employs a tree-search structure to explore video segments and collect frames. Rather than relying on external feedback or reward, VCA leverages VLM's self-generated intrinsic reward to guide its exploration, enabling it to capture the most crucial information for reasoning. Experimental results on multiple long video benchmarks demonstrate our approach's superior effectiveness and efficiency.
Published: 2024

33. Y-NQ: English-Yor\`ub\'a Evaluation dataset for Open-Book Reading Comprehension and Text Generation

Author: Costa-jussà, Marta R., Chen, Joy, Adebara, Ifeoluwanimi, Chuang, Joe, Ropers, Christophe, and Sánchez, Eduardo
Subjects: Computer Science - Computation and Language, I.2.7
Abstract: The purpose of this work is to share an English-Yor\`ub\'a evaluation dataset for open-book reading comprehension and text generation to assess the performance of models both in a high- and a low- resource language. The dataset contains 358 questions and answers on 338 English documents and 208 Yor\`ub\'a documents. The average document length is ~ 10k words for English and 430 words for Yor\`ub\'a. Experiments show a consistent disparity in performance between the two languages, with Yor\`ub\'a falling behind English for automatic metrics even if documents are much shorter for this language. For a small set of documents with comparable length, performance of Yor\`ub\'a drops by x2.5 times. When analyzing performance by length, we observe that Yor\`ub\'a decreases performance dramatically for documents that reach 1500 words while English performance is barely affected at that length. Our dataset opens the door to showcasing if English LLM reading comprehension capabilities extend to Yor\`ub\'a, which for the evaluated LLMs is not the case.
Published: 2024

34. 2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

Author: Costa-jussà, Marta R., Yu, Bokai, Andrews, Pierre, Alastruey, Belen, Camgoz, Necati Cihan, Chuang, Joe, Maillard, Jean, Ropers, Christophe, Turkantenko, Arina, and Wood, Carleigh
Subjects: Computer Science - Computation and Language, I.2.7
Abstract: We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared to reading comprehension.
Published: 2024

35. LCFO: Long Context and Long Form Output Dataset and Benchmarking

Author: Costa-jussà, Marta R., Andrews, Pierre, Meglioli, Mariano Coria, Chen, Joy, Chuang, Joe, Dale, David, Ropers, Christophe, Mourachko, Alexandre, Sánchez, Eduardo, Schwenk, Holger, Tran, Tuan, Turkatenko, Arina, and Wood, Carleigh
Subjects: Computer Science - Computation and Language, I.2.7
Abstract: This paper presents the Long Context and Form Output (LCFO) benchmark, a novel evaluation framework for assessing gradual summarization and summary expansion capabilities across diverse domains. LCFO consists of long input documents (5k words average length), each of which comes with three summaries of different lengths (20%, 10%, and 5% of the input text), as well as approximately 15 questions and answers (QA) related to the input content. Notably, LCFO also provides alignments between specific QA pairs and corresponding summaries in 7 domains. The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i.e. summary expansion. To establish an evaluation metric framework for summarization and summary expansion, we provide human evaluation scores for human-generated outputs, as well as results from various state-of-the-art large language models (LLMs). GPT-4o-mini achieves best human scores among automatic systems in both summarization and summary expansion tasks (~ +10% and +20%, respectively). It even surpasses human output quality in the case of short summaries (~ +7%). Overall automatic metrics achieve low correlations with human evaluation scores (~ 0.4) but moderate correlation on specific evaluation aspects such as fluency and attribution (~ 0.6). The LCFO benchmark offers a standardized platform for evaluating summarization and summary expansion performance, as well as corresponding automatic metrics, thereby providing an important evaluation framework to advance generative AI.
Published: 2024

36. CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Author: Chen, Hao, Tao, Han, Song, Guo, Zhang, Jie, Yu, Yunlong, Dong, Yonghan, Yang, Chuang, and Bai, Lei
Subjects: Computer Science - Machine Learning, Physics - Atmospheric and Oceanic Physics
Abstract: Atmospheric science is intricately connected with other fields, e.g., geography and aerospace. Most existing approaches involve training a joint atmospheric and geographic model from scratch, which incurs significant computational costs and overlooks the potential for incremental learning of weather variables across different domains. In this paper, we introduce incremental learning to weather forecasting and propose a novel structure that allows for the flexible expansion of variables within the model. Specifically, our method presents a Channel-Adapted MoE (CA-MoE) that employs a divide-and-conquer strategy. This strategy assigns variable training tasks to different experts by index embedding and reduces computational complexity through a channel-wise Top-K strategy. Experiments conducted on the widely utilized ERA5 dataset reveal that our method, utilizing only approximately 15\% of trainable parameters during the incremental stage, attains performance that is on par with state-of-the-art competitors. Notably, in the context of variable incremental experiments, our method demonstrates negligible issues with catastrophic forgetting.
Published: 2024

37. LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Author: Zhi, Hongyan, Chen, Peihao, Li, Junyan, Ma, Shuailei, Sun, Xinyu, Xiang, Tianhang, Lei, Yinjie, Tan, Mingkui, and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Research on 3D Vision-Language Models (3D-VLMs) is gaining increasing attention, which is crucial for developing embodied AI within 3D scenes, such as visual navigation and embodied question answering. Due to the high density of visual features, especially in large 3D scenes, accurately locating task-relevant visual information is challenging. Existing works attempt to segment all objects and consider their features as scene representations. However, these task-agnostic object features include much redundant information and missing details for the task-relevant area. To tackle these problems, we propose LSceneLLM, an adaptive framework that automatically identifies task-relevant areas by leveraging LLM's visual preference for different tasks, followed by a plug-and-play scene magnifier module to capture fine-grained details in focused areas. Specifically, a dense token selector examines the attention map of LLM to identify visual preferences for the instruction input. It then magnifies fine-grained details of the focusing area. An adaptive self-attention module is leveraged to fuse the coarse-grained and selected fine-grained visual information. To comprehensively evaluate the large scene understanding ability of 3D-VLMs, we further introduce a cross-room understanding benchmark, XR-Scene, which contains a series of large scene understanding tasks including XR-QA, XR-EmbodiedPlanning, and XR-SceneCaption. Experiments show that our method surpasses existing methods on both large scene understanding and existing scene understanding benchmarks. Plunging our scene magnifier module into the existing 3D-VLMs also brings significant improvement.
Published: 2024

38. 'Let Emotion Ring': An Autoethnographic Self-Study of an EFL Instructor in Wuhan during COVID-19

Author: Shuwen Liu, Rui Yuan, and Chuang Wang
Abstract: Teaching has always been an emotionally demanding profession, which involves tremendous emotional labour on the part of language teachers. This is particularly true for instructors of English as a foreign language (EFL) suddenly obliged to teach online during the COVID-19 pandemic. Drawing on the approach of autoethnographic self-study, this article reveals the first author's understanding and negotiation of complex emotions during her online teaching from February to May 2020 in Wuhan, the centre of the COVID-19 pandemic during that period. This study sheds light on a language teacher's complex emotional experiences in relation to technology, students and colleagues under the sudden shifts of teaching environment. It also explores how emotional regulation strategies are employed through undertaking emotional labour and observing/renegotiating emotional rules. The study exemplifies the importance of emotionally managed classrooms, and language teachers are advised to make deliberate and strategic efforts in channelling positive emotions into online teaching. The authors call for more attention to self-study as a viable and instrumental approach to facilitating language teachers' continuing development and enhancing their emotional reflexivity and well-being.
Published: 2025
Full Text: View/download PDF

39. Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Author: Ping-Lin Chuang
Abstract: This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were adapted from essays written by EPT test-takers. These responses were crafted to reflect different conditions of source use features, namely source use quantity and quality. Rater scores were analyzed using the many-facet Rasch model and mixed two-way analyses of variance (ANOVAs) to examine how they are affected by source use features and rater experience. Results show that source use features impacted the argumentation scores assigned by raters. Paragraphs with more source text ideas that are better incorporated received the highest argumentation scores, and vice versa for those with limited, poorly integrated source information. Rater experience impacted scores but did not influence rater performance meaningfully. The findings of this study connect specific source use features with raters' evaluation of argumentation, helping to further disentangle the relationships among examinee performance, rater decision, and task features of integrated argumentative writing tests. They also provide meaningful implications for writing assessment research and practices.
Published: 2025
Full Text: View/download PDF

40. Effects of a Self-Regulated-Based Gamified Interactive E-Books on Primary Students' Learning Performance and Affection in a Flipped Mathematics Classroom

Author: Chuang Chen, Nurullizam Jamiat, Siti Nazleen Abdul Rabu, and Yongchun Mao
Abstract: Gamified interactive e-books can make the learning process more interactive, enjoyable, and personalized by incorporating game elements into the educational content, thus increasing student engagement and retention in the flipped classroom. However, scholars have pointed out that learning with game elements that may lead to poor learning outcomes. Therefore, this study proposes a self-regulated-based gamified interactive e-book (S-GIEB) to stimulate students to monitor their learning process in a flipped mathematics classroom with the expectation of improving their learning performance and affection. A total of 120 sixth grade students were divided into three groups: students who used self-regulation-based interactive e-book in a flipped mathematics classroom (S-GIEB group), students who learned with gamified interactive e-book in a flipped mathematics classroom (GIEB group), and those who learned with instructional videos in a traditional flipped classroom (TFC group). The findings revealed that incorporating self-regulated strategies in gamified interactive e-books can significantly enhance primary students' learning achievement, self-regulation abilities, learning motivation, and meta-cognition tendency. This suggests that such an approach not only fosters better academic performance but also cultivates important skills like self-regulation abilities and meta-cognition. The study underlines the potential of technology-enhanced learning tools to revolutionize traditional teaching methods, offering a blueprint for future educational innovations. Recommendations for the use of S-GIEB and for future research were also discussed.
Published: 2024
Full Text: View/download PDF

41. Development and Evaluation of Collaboration Scripts for Long-Distance VR Team Collaboration and Co-Creation in Elementary STEM Learning

Author: Hui-Chun Hung, Min-Yu Chuang, and Cheng-Huan Chen
Abstract: Due to the pandemic, many students have been forced to study remotely. This study aims to investigate the impact of online collaboration scripts on learning outcomes in virtual reality (VR) co-creation learning activities during distance learning. The collaboration scripts were designed to foster students' remote teamwork. The participants consisted of 106 Taiwanese sixth-grade elementary students, divided equally into an experimental group and a comparison group. The experiment lasted for seven weeks, with approximately 40 minutes per week dedicated to VR co-creation activities. The experimental group followed online collaboration scripts, while the comparison group engaged in distance learning without scripts. Additionally, the researchers collected students' learning logs and analyzed behavioral sequences to enhance understanding of their learning behaviors. Epistemic network analysis was employed to analyze group interactions, focusing on Orientation, Planning, Executing, Monitoring, and Evaluation codes. Results revealed significant improvement in the experimental group's learning achievement and communication skills using online collaboration scripts in VR co-creation learning. The experimental group exhibited a stronger correlation in Planning--Executing, following the script stages, while the comparison group relied more on peer assistance. Overall, the online collaboration scripts could scaffold learning and support effective remote VR learning activities for content co-creation for elementary students.
Published: 2024
Full Text: View/download PDF

42. School Culture's Openness to Creative Solutions and Teachers' Inquiry-Based Teaching: A Multinational Study of Asian and European Countries

Author: Ju-Hui Wei, Hsueh-Hua Chuang, and Thomas J. Smith
Abstract: Previous research suggests that teachers' adoption of inquiry-based teaching is influenced by school culture's openness to creative solutions, with teachers' self-efficacy in inquiry-based teaching acting as a mediator. However, considering the potential impact of local educational and socio-cultural context on teachers' behavior, findings from one country may not readily generalize to another. For example, in regions with limited exposure and resources for inquiry-based teaching, self-efficacy may play a more prominent mediating role. Therefore, examining the comparative relationship between school culture's openness to creative solutions and teaching practices in Asian and European countries is worthwhile. This multinational study employed data from 23 Asian and European countries to scrutinize the connection between school culture's openness to creative solutions and inquiry-based teaching, with teachers' self-efficacy in inquiry-based teaching as the mediating factor. The results extended the findings of a previous single-country investigation and provided evidence supporting an indirect effect of school culture's openness to creative solutions on inquiry-based teaching as mediated by teacher self-efficacy across all 23 countries. Furthermore, it revealed stronger effects in Asian countries compared to their European counterparts.
Published: 2024
Full Text: View/download PDF

43. Mediating Roles of Motivational Beliefs and Vocabulary Learning Strategies for the Relationship between Self-Regulation and Vocabulary Proficiency

Author: Jiajing Li and Chuang Wang
Abstract: Successful vocabulary acquisition hinges on the harmonious interplay of various factors. Despite some studies that have been conducted to examine the direct effect of self-regulation on vocabulary learning, few of them tapped into the relationship among self-regulation, motivational beliefs, vocabulary learning strategies and vocabulary proficiency. This study extended previous research by examining whether motivation, self-efficacy and learning strategies mediate the relationship between self-regulation and vocabulary proficiency and investigating whether the relationship varied by gender. Data were extracted from 399 senior secondary school students. Results from structural equation modelling revealed that motivation and self-efficacy mediated the relationship between self-regulated learning capacity and vocabulary learning strategies. Vocabulary learning strategies further mediated the relationships between self-efficacy, motivation and vocabulary proficiency. Meanwhile, the relationship between self-regulation, motivational beliefs, vocabulary learning strategies and vocabulary proficiency held equivalent across genders. Theoretical and practical implications are discussed.
Published: 2024
Full Text: View/download PDF

44. Boosting Learners' Confidence in Learning English: Can Self-Efficacy-Based Intervention Make a Difference?

Author: Jiajing Li, Chuang Wang, Yao Zhao, and Yijie Li
Abstract: Self-efficacy is theorized as a malleable construct that can be influenced by teachers' instructions; however, limited effort has been devoted to evaluating the effectiveness of self-efficacy interventions. This study leverages a mixed methods design to evaluate the effectiveness of a self-efficacy intervention over 16 weeks among Chinese English as a foreign language learners. Data were collected from 102 secondary school students (52 and 50 students in the intervention and comparison groups, respectively). We also interviewed nine students from the intervention class. Repeated measures mixed analyses of variance were employed to check the changes in self-efficacy, intrinsic motivation, engagement, and academic achievement over time as well as to detect the differences in these aspects between the comparison and the intervention groups. Moreover, thematic analysis was adopted to handle the qualitative data. Results of the quantitative phase showed that the explicit self-efficacy supports worked effectively in promoting self-efficacy, intrinsic motivation, and language proficiency. Results of the qualitative phase triangulated the findings from the quantitative phase, illuminating the broad-spectrum effect of self-efficacy support. Implications and directions for future research are discussed.
Published: 2024
Full Text: View/download PDF

45. How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

Author: Wang, Shih-Heng, Chen, Zih-Ching, Shi, Jiatong, Chuang, Ming-To, Lin, Guan-Ting, Huang, Kuan-Po, Harwath, David, Li, Shang-Wen, and Lee, Hung-yi
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. Typical solutions like fine-tuning the SSL model suffer from high computation costs while using frozen SSL models as feature extractors comes with poor performance. To handle these issues, we extend a conventional efficient fine-tuning scheme based on the adapter. We add an extra intermediate adaptation to warm up the adapter and downstream model initialization. Remarkably, we update only 1-5% of the total model parameters to achieve the adaptation. Experimental results on the ML-SUPERB dataset show that our solution outperforms conventional efficient fine-tuning. It achieves up to a 28% relative improvement in the Character/Phoneme error rate when adapting to unseen languages.
Published: 2024

46. SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Author: Xing, Ximing, Yu, Qian, Wang, Chuang, Zhou, Haitao, Zhang, Jing, and Xu, Dong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recently, text-guided scalable vector graphics (SVG) synthesis has demonstrated significant potential in domains such as iconography and sketching. However, SVGs generated from existing Text-to-SVG methods often lack editability and exhibit deficiencies in visual quality and diversity. In this paper, we propose a novel text-guided vector graphics synthesis method to address these limitations. To enhance the editability of output SVGs, we introduce a Hierarchical Image VEctorization (HIVE) framework that operates at the semantic object level and supervises the optimization of components within the vector object. This approach facilitates the decoupling of vector graphics into distinct objects and component levels. Our proposed HIVE algorithm, informed by image segmentation priors, not only ensures a more precise representation of vector graphics but also enables fine-grained editing capabilities within vector objects. To improve the diversity of output SVGs, we present a Vectorized Particle-based Score Distillation (VPSD) approach. VPSD addresses over-saturation issues in existing methods and enhances sample diversity. A pre-trained reward model is incorporated to re-weight vector particles, improving aesthetic appeal and enabling faster convergence. Additionally, we design a novel adaptive vector primitives control strategy, which allows for the dynamic adjustment of the number of primitives, thereby enhancing the presentation of graphic details. Extensive experiments validate the effectiveness of the proposed method, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. We also show that our new method supports up to six distinct vector styles, capable of generating high-quality vector assets suitable for stylized vector design and poster design. Code and demo will be released at: http://ximinng.github.io/SVGDreamerV2Project/, Comment: 17 pages, 17 figures. Project Page: http://ximinng.github.io/SVGDreamerV2Project/. arXiv admin note: text overlap with arXiv:2312.16476
Published: 2024

47. 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Author: Yang, Yuncong, Yang, Han, Zhou, Jiachen, Chen, Peihao, Zhang, Hongxin, Du, Yilun, and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Constructing compact and informative 3D scene representations is essential for effective embodied exploration and reasoning, especially in complex environments over extended periods. Existing representations, such as object-centric 3D scene graphs, oversimplify spatial relationships by modeling scenes as isolated objects with restrictive textual relationships, making it difficult to address queries requiring nuanced spatial understanding. Moreover, these representations lack natural mechanisms for active exploration and memory management, hindering their application to lifelong autonomy. In this work, we propose 3D-Mem, a novel 3D scene memory framework for embodied agents. 3D-Mem employs informative multi-view images, termed Memory Snapshots, to represent the scene and capture rich visual information of explored regions. It further integrates frontier-based exploration by introducing Frontier Snapshots-glimpses of unexplored areas-enabling agents to make informed decisions by considering both known and potential new information. To support lifelong memory in active exploration settings, we present an incremental construction pipeline for 3D-Mem, as well as a memory retrieval technique for memory management. Experimental results on three benchmarks demonstrate that 3D-Mem significantly enhances agents' exploration and reasoning capabilities in 3D environments, highlighting its potential for advancing applications in embodied AI.
Published: 2024

48. Optimizing Social Media Annotation of HPV Vaccine Skepticism and Misinformation Using Large Language Models: An Experimental Evaluation of In-Context Learning and Fine-Tuning Stance Detection Across Multiple Models

Author: Sun, Luhang, Pendyala, Varsha, Chuang, Yun-Shiuan, Yang, Shanglin, Feldman, Jonathan, Zhao, Andrew, De Choudhury, Munmun, Yang, Sijia, and Shah, Dhavan
Subjects: Computer Science - Computation and Language
Abstract: This paper leverages large-language models (LLMs) to experimentally determine optimal strategies for scaling up social media content annotation for stance detection on HPV vaccine-related tweets. We examine both conventional fine-tuning and emergent in-context learning methods, systematically varying strategies of prompt engineering across widely used LLMs and their variants (e.g., GPT4, Mistral, and Llama3, etc.). Specifically, we varied prompt template design, shot sampling methods, and shot quantity to detect stance on HPV vaccination. Our findings reveal that 1) in general, in-context learning outperforms fine-tuning in stance detection for HPV vaccine social media content; 2) increasing shot quantity does not necessarily enhance performance across models; and 3) different LLMs and their variants present differing sensitivity to in-context learning conditions. We uncovered that the optimal in-context learning configuration for stance detection on HPV vaccine tweets involves six stratified shots paired with detailed contextual prompts. This study highlights the potential and provides an applicable approach for applying LLMs to research on social media stance and skepticism detection.
Published: 2024

49. UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments

Author: Lin, Chunru, Fan, Jugang, Wang, Yian, Yang, Zeyuan, Chen, Zhehuan, Fang, Lixing, Wang, Tsun-Hsuan, Xian, Zhou, and Gan, Chuang
Subjects: Computer Science - Robotics
Abstract: It is desired to equip robots with the capability of interacting with various soft materials as they are ubiquitous in the real world. While physics simulations are one of the predominant methods for data collection and robot training, simulating soft materials presents considerable challenges. Specifically, it is significantly more costly than simulating rigid objects in terms of simulation speed and storage requirements. These limitations typically restrict the scope of studies on soft materials to small and bounded areas, thereby hindering the learning of skills in broader spaces. To address this issue, we introduce UBSoft, a new simulation platform designed to support unbounded soft environments for robot skill acquisition. Our platform utilizes spatially adaptive resolution scales, where simulation resolution dynamically adjusts based on proximity to active robotic agents. Our framework markedly reduces the demand for extensive storage space and computation costs required for large-scale scenarios involving soft materials. We also establish a set of benchmark tasks in our platform, including both locomotion and manipulation tasks, and conduct experiments to evaluate the efficacy of various reinforcement learning algorithms and trajectory optimization techniques, both gradient-based and sampling-based. Preliminary results indicate that sampling-based trajectory optimization generally achieves better results for obtaining one trajectory to solve the task. Additionally, we conduct experiments in real-world environments to demonstrate that advancements made in our UBSoft simulator could translate to improved robot interactions with large-scale soft material. More videos can be found at https://vis-www.cs.umass.edu/ubsoft/., Comment: CoRL 2024. The first two authors contributed equally to this paper
Published: 2024

50. Mixtures and grain boundaries of P, D, and G triply periodic minimal surfaces

Author: Chuang, Chern and Jin, Bih-Yaw
Subjects: Condensed Matter - Materials Science
Abstract: We introduce a square tiling/tetragonal strip representation to the P, D, and G triply periodic minimal surfaces. This approach is useful in identifying mixtures and grain boundaries of these surfaces that might be useful for material sciences or advanced manufacturing purposes. Generalizations to more complicated strip topology (multi-strand) as well as other regular and semi-regular tilings are discussed. Examples of these include double diamond, double gyroid, and triangular P/D/G surfaces., Comment: 11 pages, 6 figures
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

21,099 results on '"Chuang P"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources