Author: "Chang, Yi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chang, Yi"' showing total 14,392 results

Start Over Author "Chang, Yi"

14,392 results on '"Chang, Yi"'

1. Archipelagic Optics in Wu Ming-Yi's The Man with the Compound Eyes

Author: Chang, Yi-Ting
Published: 2023

2. Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation

Author: Zhou, Hanyu, Chang, Yi, Shi, Zhiwei, Yan, Wending, Chen, Gang, Tian, Yonghong, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Optical flow has made great progress in clean scenes, while suffers degradation under adverse weather due to the violation of the brightness constancy and gradient continuity assumptions of optical flow. Typically, existing methods mainly adopt domain adaptation to transfer motion knowledge from clean to degraded domain through one-stage adaptation. However, this direct adaptation is ineffective, since there exists a large gap due to adverse weather and scene style between clean and real degraded domains. Moreover, even within the degraded domain itself, static weather (e.g., fog) and dynamic weather (e.g., rain) have different impacts on optical flow. To address above issues, we explore synthetic degraded domain as an intermediate bridge between clean and real degraded domains, and propose a cumulative homogeneous-heterogeneous adaptation framework for real adverse weather optical flow. Specifically, for clean-degraded transfer, our key insight is that static weather possesses the depth-association homogeneous feature which does not change the intrinsic motion of the scene, while dynamic weather additionally introduces the heterogeneous feature which results in a significant boundary discrepancy in warp errors between clean and degraded domains. For synthetic-real transfer, we figure out that cost volume correlation shares a similar statistical histogram between synthetic and real degraded domains, benefiting to holistically aligning the homogeneous correlation distribution for synthetic-real knowledge distillation. Under this unified framework, the proposed method can progressively and explicitly transfer knowledge from clean scenes to real adverse weather. In addition, we further collect a real adverse weather dataset with manually annotated optical flow labels and perform extensive experiments to verify the superiority of the proposed method.
Published: 2024

3. CHBench: A Chinese Dataset for Evaluating Health in Large Language Models

Author: Guo, Chenlu, Xu, Nuo, Chang, Yi, and Wu, Yuan
Subjects: Computer Science - Computation and Language
Abstract: With the rapid development of large language models (LLMs), assessing their performance on health-related inquiries has become increasingly essential. It is critical that these models provide accurate and trustworthy health information, as their application in real-world contexts--where misinformation can have serious consequences for individuals seeking medical advice and support--depends on their reliability. In this work, we present CHBench, the first comprehensive Chinese Health-related Benchmark designed to evaluate LLMs' capabilities in understanding physical and mental health across diverse scenarios. CHBench includes 6,493 entries related to mental health and 2,999 entries focused on physical health, covering a broad spectrum of topics. This dataset serves as a foundation for evaluating Chinese LLMs' capacity to comprehend and generate accurate health-related information. Our extensive evaluations of four popular Chinese LLMs demonstrate that there remains considerable room for improvement in their understanding of health-related information. The code is available at https://github.com/TracyGuo2001/CHBench., Comment: 11 pages
Published: 2024

4. XTRUST: On the Multilingual Trustworthiness of Large Language Models

Author: Li, Yahan, Wang, Yi, Chang, Yi, and Wu, Yuan
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a range of natural language processing (NLP) tasks, capturing the attention of both practitioners and the broader public. A key question that now preoccupies the AI community concerns the capabilities and limitations of these models, with trustworthiness emerging as a central issue, particularly as LLMs are increasingly applied in sensitive fields like healthcare and finance, where errors can have serious consequences. However, most previous studies on the trustworthiness of LLMs have been limited to a single language, typically the predominant one in the dataset, such as English. In response to the growing global deployment of LLMs, we introduce XTRUST, the first comprehensive multilingual trustworthiness benchmark. XTRUST encompasses a diverse range of topics, including illegal activities, hallucination, out-of-distribution (OOD) robustness, physical and mental health, toxicity, fairness, misinformation, privacy, and machine ethics, across 10 different languages. Using XTRUST, we conduct an empirical evaluation of the multilingual trustworthiness of five widely used LLMs, offering an in-depth analysis of their performance across languages and tasks. Our results indicate that many LLMs struggle with certain low-resource languages, such as Arabic and Russian, highlighting the considerable room for improvement in the multilingual trustworthiness of current language models. The code is available at https://github.com/LluckyYH/XTRUST., Comment: 21 pages
Published: 2024

5. On the Generalizability of Foundation Models for Crop Type Mapping

Author: Chang, Yi-Chia, Stewart, Adam J., Bastani, Favyen, Wolters, Piper, Kannan, Shreya, Huber, George R., Wang, Jingtong, and Banerjee, Arindam
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Foundation models pre-trained using self-supervised and weakly-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. Recently, the Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery (e.g., Sentinel-2) for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed countries not transferring well to data-scarce developing countries -- remain. We investigate the ability of popular EO foundation models to transfer to new geographic regions in the agricultural domain, where differences in farming practices and class imbalance make transfer learning particularly challenging. We first select six crop classification datasets across five continents, normalizing for dataset size and harmonizing classes to focus on four major cereal grains: maize, soybean, rice, and wheat. We then compare three popular foundation models, pre-trained on SSL4EO-S12, SatlasPretrain, and ImageNet, using in-distribution (ID) and out-of-distribution (OOD) evaluation. Experiments show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. Furthermore, the benefits of pre-training on OOD data are the most significant when only 10--100 ID training samples are used. Transfer learning and pre-training with OOD and limited ID data show promising applications, as many developing regions have scarce crop type labels. All harmonized datasets and experimental code are open-source and available for download.
Published: 2024

6. Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Author: Hu, Jifeng, Shen, Li, Huang, Sili, Yang, Zhejian, Chen, Hechang, Sun, Lichao, Chang, Yi, and Tao, Dacheng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. In view of this, we propose a rehearsal-based continual diffusion model, called Continual Diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability). Specifically, we first construct an offline benchmark that contains 90 tasks from multiple domains. Then, we train the CoD on each task with sequential modeling and conditional generation for making decisions. Next, we preserve a small portion of previous datasets as the rehearsal buffer and replay it to retain the acquired knowledge. Extensive experiments on a series of tasks show CoD can achieve a promising plasticity-stability trade-off and outperform existing diffusion-based methods and other representative baselines on most tasks.
Published: 2024

7. Bidirectional Gated Mamba for Sequential Recommendation

Author: Liu, Ziwei, Liu, Qidong, Wang, Yejing, Wang, Wanyu, Jia, Pengyue, Wang, Maolin, Liu, Zitao, Chang, Yi, and Zhao, Xiangyu
Subjects: Computer Science - Artificial Intelligence
Abstract: In various domains, Sequential Recommender Systems (SRS) have become essential due to their superior capability to discern intricate user preferences. Typically, SRS utilize transformer-based architectures to forecast the subsequent item within a sequence. Nevertheless, the quadratic computational complexity inherent in these models often leads to inefficiencies, hindering the achievement of real-time recommendations. Mamba, a recent advancement, has exhibited exceptional performance in time series prediction, significantly enhancing both efficiency and accuracy. However, integrating Mamba directly into SRS poses several challenges. Its inherently unidirectional nature may constrain the model's capacity to capture the full context of user-item interactions, while its instability in state estimation can compromise its ability to detect short-term patterns within interaction sequences. To overcome these issues, we introduce a new framework named Selective Gated Mamba (SIGMA) for Sequential Recommendation. This framework leverages a Partially Flipped Mamba (PF-Mamba) to construct a bidirectional architecture specifically tailored to improve contextual modeling. Additionally, an input-sensitive Dense Selective Gate (DS Gate) is employed to optimize directional weights and enhance the processing of sequential information in PF-Mamba. For short sequence modeling, we have also developed a Feature Extract GRU (FE-GRU) to efficiently capture short-term dependencies. Empirical results indicate that SIGMA outperforms current models on five real-world datasets. Our implementation code is available at https://github.com/ziwliu-cityu/SIMGA to ease reproducibility.
Published: 2024

8. An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

Author: Huang, Qiang, Meng, Chuizheng, Cao, Defu, Huang, Biwei, Chang, Yi, and Liu, Yan
Subjects: Computer Science - Machine Learning
Abstract: Counterfactual estimation from observations represents a critical endeavor in numerous application fields, such as healthcare and finance, with the primary challenge being the mitigation of treatment bias. The balancing strategy aimed at reducing covariate disparities between different treatment groups serves as a universal solution. However, when it comes to the time series data, the effectiveness of balancing strategies remains an open question, with a thorough analysis of the robustness and applicability of balancing strategies still lacking. This paper revisits counterfactual estimation in the temporal setting and provides a brief overview of recent advancements in balancing strategies. More importantly, we conduct a critical empirical examination for the effectiveness of the balancing strategies within the realm of temporal counterfactual estimation in various settings on multiple datasets. Our findings could be of significant interest to researchers and practitioners and call for a reexamination of the balancing strategy in time series settings., Comment: ICML 2024 Carema Ready Version. 20 Pages, 12 Figures, 10 Tables
Published: 2024

9. CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving

Author: Peng, Shihan, Zhou, Hanyu, Dong, Hao, Shi, Zhiwei, Liu, Haoyue, Duan, Yuxing, Chang, Yi, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly place event and frame cameras in parallel and directly align them spatially via warping operation. However, this parallel strategy is less effective for multimodal fusion, since the large disparity exacerbates spatial misalignment due to the large event-frame baseline. We argue that baseline minimization can reduce alignment error between event and frame cameras. In this work, we introduce hybrid coaxial event-frame devices to build the multimodal system, and propose a coaxial stereo event camera (CoSEC) dataset for autonomous driving. As for the multimodal system, we first utilize the microcontroller to achieve time synchronization, and then spatially calibrate different sensors, where we perform intra- and inter-calibration of stereo coaxial devices. As for the multimodal dataset, we filter LiDAR point clouds to generate depth and optical flow labels using reference depth, which is further improved by fusing aligned event and frame data in nighttime conditions. With the help of the coaxial device, the proposed dataset can promote the all-day pixel-level multimodal fusion. Moreover, we also conduct experiments to demonstrate that the proposed dataset can improve the performance and generalization of the multimodal fusion., Comment: This work has been submitted to the IEEE for possible publication
Published: 2024

10. Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models

Author: Chang, Yupeng, Chang, Yi, and Wu, Yuan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have exhibited remarkable proficiency across a diverse array of natural language processing (NLP) tasks. However, adapting LLMs to downstream applications typically necessitates computationally intensive and memory-demanding fine-tuning procedures. To mitigate these burdens, parameter-efficient fine-tuning (PEFT) techniques have emerged as a promising approach to tailor LLMs with minimal computational overhead. While PEFT methods offer substantial advantages, they do not fully address the pervasive issue of bias propagation from pre-training data. In this work, we introduce Bias-Aware Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance. BA-LoRA incorporates three distinct regularization terms: (1) consistency regularizer, (2) diversity regularizer, and (3) singular vector decomposition regularizer. These regularizers collectively aim to improve the generative models' consistency, diversity, and generalization capabilities during the fine-tuning process. Through extensive experiments on a variety of natural language understanding (NLU) and natural language generation (NLG) tasks, employing prominent LLMs such as LLaMA, Mistral, and Gemma, we demonstrate that BA-LoRA surpasses the performance of LoRA and its state-of-the-art variants. Moreover, our method effectively mitigates the deleterious effects of pre-training bias, leading to more reliable and robust model outputs. The code is available at https://github.com/cyp-jlu-ai/BA-LoRA., Comment: Work in progress
Published: 2024

11. PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

Author: Tan, Chih-Pin, Ai, Hsin, Chang, Yi-Hsin, Guan, Shuen-Huei, and Yang, Yi-Hsuan
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Piano cover generation aims to create a piano cover from a pop song. Existing approaches mainly employ supervised learning and the training demands strongly-aligned and paired song-to-piano data, which is built by remapping piano notes to song audio. This would, however, result in the loss of piano information and accordingly cause inconsistencies between the original and remapped piano versions. To overcome this limitation, we propose a transfer learning approach that pre-trains our model on piano-only data and fine-tunes it on weakly-aligned paired data constructed without note remapping. During pre-training, to guide the model to learn piano composition concepts instead of merely transcribing audio, we use an existing lead sheet transcription model as the encoder to extract high-level features from the piano recordings. The pre-trained model is then fine-tuned on the paired song-piano data to transfer the learned composition knowledge to the pop song domain. Our evaluation shows that this training strategy enables our model, named PiCoGen2, to attain high-quality results, outperforming baselines on both objective and subjective metrics across five pop genres., Comment: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024
Published: 2024

12. Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness

Author: Guo, Kai, Liu, Zewen, Chen, Zhikai, Wen, Hongzhi, Jin, Wei, Tang, Jiliang, and Chang, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks. Recently, several LLMs-based pipelines have been developed to enhance learning on graphs with text attributes, showcasing promising performance. However, graphs are well-known to be susceptible to adversarial attacks and it remains unclear whether LLMs exhibit robustness in learning on graphs. To address this gap, our work aims to explore the potential of LLMs in the context of adversarial attacks on graphs. Specifically, we investigate the robustness against graph structural and textual perturbations in terms of two dimensions: LLMs-as-Enhancers and LLMs-as-Predictors. Through extensive experiments, we find that, compared to shallow models, both LLMs-as-Enhancers and LLMs-as-Predictors offer superior robustness against structural and textual attacks.Based on these findings, we carried out additional analyses to investigate the underlying causes. Furthermore, we have made our benchmark library openly available to facilitate quick and fair evaluations, and to encourage ongoing innovative research in this field.
Published: 2024

13. Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework

Author: Xu, Shengqi, Sun, Run, Chang, Yi, Cao, Shuning, Xiao, Xueyao, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and advance the field, we construct a large-scale real long-range atmospheric turbulence dataset (RLR-AT), including 1500 turbulence sequences spanning distances from 1 Km to 13 Km. The advantages of RLR-AT compared to existing ones: turbulence with longer-distances and higher-diversity, scenes with greater-variety and larger-scale. Moreover, most existing work adopts either registration-based or decomposition-based methods to address distortions through one-step mitigation. However, they fail to effectively handle long-range turbulence due to its significant pixel displacements. In this work, we propose a coarse-to-fine framework to handle severe distortions, which cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we discover the pixel motion statistical prior of turbulence, and propose a frequency-aware reference frame for better large-scale distortion registration, greatly reducing the burden of refinement. On the other hand, we take advantage of the static prior of background, and propose a subspace-based low-rank tensor refinement model to eliminate the misalignments inevitably left by registration while well preserving details. The dynamic and static priors complement to each other, facilitating us to progressively mitigate long-range turbulence with severe distortions. Extensive experiments demonstrate that the proposed method outperforms SOTA methods on different datasets., Comment: This paper is accepted by ECCV 2024
Published: 2024

14. PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

Author: Ye, Hangting, Fan, Wei, Song, Xiaozhuang, Zheng, Shun, Zhao, He, Guo, Dandan, and Chang, Yi
Subjects: Computer Science - Machine Learning
Abstract: Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representation entanglement and localization, which largely hinders their prediction performance and leads to performance inconsistency on tabular tasks. To overcome these problems, we explore a novel direction of applying prototype learning for tabular ML and propose a prototype-based tabular representation learning framework, PTaRL, for tabular prediction tasks. The core idea of PTaRL is to construct prototype-based projection space (P-Space) and learn the disentangled representation around global data prototypes. Specifically, PTaRL mainly involves two stages: (i) Prototype Generation, that constructs global prototypes as the basis vectors of P-Space for representation, and (ii) Prototype Projection, that projects the data samples into P-Space and keeps the core global data information via Optimal Transport. Then, to further acquire the disentangled representations, we constrain PTaRL with two strategies: (i) to diversify the coordinates towards global prototypes of different representations within P-Space, we bring up a diversification constraint for representation calibration; (ii) to avoid prototype entanglement in P-Space, we introduce a matrix orthogonalization constraint to ensure the independence of global prototypes. Finally, we conduct extensive experiments in PTaRL coupled with state-of-the-art deep tabular ML models on various tabular benchmarks and the results have shown our consistent superiority., Comment: Accepted by ICLR 2024
Published: 2024

15. Reverse Engineering the Fly Brain Using FlyCircuit Database

Author: Ching, Yu-Tai, Cho, Chin-Ping, Tang, Fu-Kai, Chang, Yi-Chiun, Cheng, Chang-Chieh, He, Guan-Wei, Chang, Ann-Shyn, and Chuang, Chaochun
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: A method to reverse engineering of a fly brain using the {\it FlyCircuit} database is presented. This method was designed based on the assumption that similar neurons could serve identical functions. We thus cluster the neurons based on the similarity between neurons. The procedures are to partition the neurons in the database into groups, and then assemble the groups into potential modules. Some of the modules correspond to known neuropils, including Medulla were obtained. The same clustering algorithm was applied to analyze Medulla's structure. Another possible application of the clustering result is to study the brain-wide neuron connectome by looking at the connectivity between groups of neurons.
Published: 2024

16. FANFOLD: Graph Normalizing Flows-driven Asymmetric Network for Unsupervised Graph-Level Anomaly Detection

Author: Cao, Rui, Xue, Shijie, Li, Jindong, Wang, Qi, and Chang, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent outputs across both architectures, making it difficult to distinguish abnormal graphs from normal graphs. Also, existing methods mainly rely on graph features to distinguish anomalies, which may be unstable with complex and diverse data and fail to capture the essence that differentiates normal graphs from abnormal ones. In this work, we propose a Graph Normalizing Flows-driven Asymmetric Network For Unsupervised Graph-Level Anomaly Detection (FANFOLD in short). We introduce normalizing flows to unsupervised graph-level anomaly detection due to their successful application and superior quality in learning the underlying distribution of samples. Specifically, we adopt the knowledge distillation technique and apply normalizing flows on the source network, achieving the asymmetric network. In the training stage, FANFOLD transforms the original distribution of normal graphs to a standard normal distribution. During inference, FANFOLD computes the anomaly score using the source-target loss to discriminate between normal and anomalous graphs. We conduct extensive experiments on 15 datasets of different fields with 9 baseline methods to validate the superiority of FANFOLD.
Published: 2024

17. Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Author: Shi, Wanli, Chang, Yi, and Gu, Bin
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(\delta, \epsilon)$-stationary point with $\tilde{\mathcal{O}}(d_2^2\epsilon^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method., Comment: 27pages, 9 figures
Published: 2024

18. Speech Emotion Recognition under Resource Constraints with Data Distillation

Author: Chang, Yi, Ren, Zhao, Zhao, Zhonghao, Nguyen, Thanh Tam, Qian, Kun, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
Published: 2024

19. SocialNLP Fake-EmoReact 2021 Challenge Overview: Predicting Fake Tweets from Their Replies and GIFs

Author: Huang, Chien-Kun, Chang, Yi-Ting, Ku, Lun-Wei, Li, Cheng-Te, and Shuai, Hong-Han
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: This paper provides an overview of the Fake-EmoReact 2021 Challenge, held at the 9th SocialNLP Workshop, in conjunction with NAACL 2021. The challenge requires predicting the authenticity of tweets using reply context and augmented GIF categories from EmotionGIF dataset. We offer the Fake-EmoReact dataset with more than 453k as the experimental materials, where every tweet is labeled with authenticity. Twenty-four teams registered to participate in this challenge, and 5 submitted their results successfully in the evaluation phase. The best team achieves 93.9 on Fake-EmoReact 2021 dataset using F1 score. In addition, we show the definition of share task, data collection, and the teams' performance that joined this challenge and their approaches.
Published: 2024

20. LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

Author: Duan, Yuxing, Peng, Shihan, Zhu, Lin, Zhang, Wei, Chang, Yi, Zhong, Sheng, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event camera has significant advantages in capturing dynamic scene information while being prone to noise interference, particularly in challenging conditions like low threshold and low illumination. However, most existing research focuses on gentle situations, hindering event camera applications in realistic complex scenarios. To tackle this limitation and advance the field, we construct a new paired real-world event denoising dataset (LED), including 3K sequences with 18K seconds of high-resolution (1200*680) event streams and showing three notable distinctions compared to others: diverse noise levels and scenes, larger-scale with high-resolution, and high-quality GT. Specifically, it contains stepped parameters and varying illumination with diverse scenarios. Moreover, based on the property of noise events inconsistency and signal events consistency, we propose a novel effective denoising framework(DED) using homogeneous dual events to generate the GT with better separating noise from the raw. Furthermore, we design a bio-inspired baseline leveraging Leaky-Integrate-and-Fire (LIF) neurons with dynamic thresholds to realize accurate denoising. The experimental results demonstrate that the remarkable performance of the proposed approach on different datasets.The dataset and code are at https://github.com/Yee-Sing/led., Comment: Accepted by CVPR 2024
Published: 2024

21. Language Models can Evaluate Themselves via Probability Discrepancy

Author: Xia, Tingyu, Yu, Bowen, Wu, Yuan, Chang, Yi, and Zhou, Chang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes., Comment: ACL 2024 Findings
Published: 2024

22. Low-Distortion Clustering in Bounded Growth Graphs

Author: Chang, Yi-Jun, Dani, Varsha, and Hayes, Thomas P.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Data Structures and Algorithms
Abstract: The well-known clustering algorithm of Miller, Peng, and Xu (SPAA 2013) is useful for many applications, including low-diameter decomposition and low-energy distributed algorithms. One nice property of their clustering, shown in previous work by Chang, Dani, Hayes, and Pettie (PODC 2020), is that distances in the cluster graph are rescaled versions of distances in the original graph, up to an $O(\log n)$ distortion factor and rounding issues. Minimizing this distortion factor is important for efficiency in computing the clustering, as well as in further applications, once the clustering has been constructed. We prove that there exist graphs for which an $\Omega((\log n)^{1/3})$ distortion factor is necessary for any clustering. We also consider a class of nice graphs which we call uniformly bounded independence graphs. These include, for example, paths, lattice graphs, and "dense" unit disk graphs. For these graphs, we prove that clusterings of constant distortion always exist, and moreover, we give an efficient distributed algorithm to construct them. Our clustering algorithm is based on Voronoi cells centered at the vertices of a maximal independent set in a suitable power graph. Applications of our new clustering include low-energy simulation of distributed algorithms in the LOCAL, CONGEST, and RADIO-CONGEST models, as well as efficient approximate solutions to distributed combinatorial optimization problems. We complement these results with matching or nearly matching lower bounds.
Published: 2024

23. Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom

Author: Wang, Bo, Ma, Jing, Lin, Hongzhan, Yang, Zhiwei, Yang, Ruichao, Tian, Yuan, and Chang, Yi
Subjects: Computer Science - Computation and Language
Abstract: Most fake news detection methods learn latent feature representations based on neural networks, which makes them black boxes to classify a piece of news without giving any justification. Existing explainable systems generate veracity justifications from investigative journalism, which suffer from debunking delayed and low efficiency. Recent studies simply assume that the justification is equivalent to the majority opinions expressed in the wisdom of crowds. However, the opinions typically contain some inaccurate or biased information since the wisdom of crowds is uncensored. To detect fake news from a sea of diverse, crowded and even competing narratives, in this paper, we propose a novel defense-based explainable fake news detection framework. Specifically, we first propose an evidence extraction module to split the wisdom of crowds into two competing parties and respectively detect salient evidences. To gain concise insights from evidences, we then design a prompt-based module that utilizes a large language model to generate justifications by inferring reasons towards two possible veracities. Finally, we propose a defense-based inference module to determine veracity via modeling the defense among these justifications. Extensive experiments conducted on two real-world benchmarks demonstrate that our proposed method outperforms state-of-the-art baselines in terms of fake news detection and provides high-quality justifications., Comment: 12 pages, WWW'2024
Published: 2024

24. Deterministic Expander Routing: Faster and More Versatile

Author: Chang, Yi-Jun, Huang, Shang-En, and Su, Hsin-Hao
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Data Structures and Algorithms
Abstract: We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $\deg(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(\phi^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the $\textsf{CONGEST}$ model, where $\phi$ is the conductance of the graph. Later, Ghaffari and Li (DISC 2018) gave an improved algorithm. However, both algorithms are randomized, which means that all the resulting applications are also randomized. Recently, Chang and Saranurak (FOCS 2020) gave a deterministic algorithm that solves an expander routing instance in $2^{O(\log^{2/3} n \cdot \log^{1/3} \log n)}$ rounds. The deterministic algorithm is less efficient and does not allow preprocessing/query tradeoffs, which precludes the de-randomization of algorithms that require this feature, such as the $k$-clique enumeration algorithm in general graphs. The main contribution of our work is a new deterministic expander routing algorithm that not only matches the randomized bound of [GKS 2017] but also allows preprocessing/query tradeoffs. Our algorithm solves a single instance of routing query in $2^{{O}(\sqrt{\log n \cdot \log \log n})}$ rounds. Our algorithm achieves the following preprocessing and query tradeoffs: For $0 < \epsilon < 1$, we can answer every routing query in $\log^{O(1/\epsilon)} n$ rounds at the cost of a $(n^{O(\epsilon)} + \log^{O(1/\epsilon)} n)$-round preprocessing procedure. Combining this with the approach of Censor-Hillel, Leitersdorf, and Vulakh (PODC 2022), we obtain a near-optimal $\tilde{O}(n^{1-2/k})$-round deterministic algorithm for $k$-clique enumeration in general graphs, improving the previous state-of-the-art $n^{1-2/k+o(1)}$., Comment: Accepted to PODC 2024
Published: 2024

25. NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

Author: Wang, Xu, Li, Cheng, Chang, Yi, Wang, Jindong, and Wu, Yuan
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt., Comment: This paper has been accepted by IJCAI 2024
Published: 2024

26. Improved All-Pairs Approximate Shortest Paths in Congested Clique

Author: Bui, Hong Duc, Chandra, Shashwat, Chang, Yi-Jun, Dory, Michal, and Leitersdorf, Dean
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: In this paper, we present new algorithms for approximating All-Pairs Shortest Paths (APSP) in the Congested Clique model. We present randomized algorithms for weighted undirected graphs. Our first contribution is an $O(1)$-approximate APSP algorithm taking just $O(\log \log \log n)$ rounds. Prior to our work, the fastest algorithms that give an $O(1)$-approximation for APSP take $\operatorname{poly}(\log{n})$ rounds in weighted undirected graphs, and $\operatorname{poly}(\log \log n)$ rounds in unweighted undirected graphs. If we terminate the execution of the algorithm early, we obtain an $O(t)$-round algorithm that yields an $O \big( (\log n)^{1/2^t} \big) $ distance approximation for a parameter $t$. The trade-off between $t$ and the approximation quality provides flexibility for different scenarios, allowing the algorithm to adapt to specific requirements. In particular, we can get an $O \big( (\log n)^{1/2^t} \big) $-approximation for any constant $t$ in $O(1)$-rounds. Such result was previously known only for the special case that $t=0$. A key ingredient in our algorithm is a lemma that allows to improve an $O(a)$-approximation for APSP to an $O(\sqrt{a})$-approximation for APSP in $O(1)$ rounds. To prove the lemma, we develop several new tools, including $O(1)$-round algorithms for computing the $k$ closest nodes, a certain type of hopset, and skeleton graphs.
Published: 2024

27. CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection

Author: Li, Jindong, Xing, Qianli, Wang, Qi, and Chang, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous works only considered the relationship between nodes/graphs from a limited receptive field, resulting in some key structure patterns and feature information being neglected. In addition, most existing methods consider different views separately in a parallel manner, which is not able to explore the inter-relationship across different views directly. Thus, a method with a larger receptive field that can explore the inter-relationship across different views directly is in need. In this paper, we propose a novel Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection, namely, CVTGAD. To increase the receptive field, we construct a simplified transformer-based module, exploiting the relationship between nodes/graphs from both intra-graph and inter-graph perspectives. Furthermore, we design a cross-view attention mechanism to directly exploit the view co-occurrence between different views, bridging the inter-view gap at node level and graph level. To the best of our knowledge, this is the first work to apply transformer and cross attention to UGAD, which realizes graph neural network and transformer working collaboratively. Extensive experiments on 15 real-world datasets of 3 fields demonstrate the superiority of CVTGAD on the UGAD task. The code is available at \url{https://github.com/jindongli-Ai/CVTGAD}.
Published: 2024
Full Text: View/download PDF

28. Validation of human telomere length multi-ancestry meta-analysis association signals identifies POP5 and KBTBD6 as human telomere length regulation genes

Author: Keener, Rebecca, Chhetri, Surya B, Connelly, Carla J, Taub, Margaret A, Conomos, Matthew P, Weinstock, Joshua, Ni, Bohan, Strober, Benjamin, Aslibekyan, Stella, Auer, Paul L, Barwick, Lucas, Becker, Lewis C, Blangero, John, Bleecker, Eugene R, Brody, Jennifer A, Cade, Brian E, Celedon, Juan C, Chang, Yi-Cheng, Cupples, L Adrienne, Custer, Brian, Freedman, Barry I, Gladwin, Mark T, Heckbert, Susan R, Hou, Lifang, Irvin, Marguerite R, Isasi, Carmen R, Johnsen, Jill M, Kenny, Eimear E, Kooperberg, Charles, Minster, Ryan L, Naseri, Take, Viali, Satupa’itea, Nekhai, Sergei, Pankratz, Nathan, Peyser, Patricia A, Taylor, Kent D, Telen, Marilyn J, Wu, Baojun, Yanek, Lisa R, Yang, Ivana V, Albert, Christine, Arnett, Donna K, Ashley-Koch, Allison E, Barnes, Kathleen C, Bis, Joshua C, Blackwell, Thomas W, Boerwinkle, Eric, Burchard, Esteban G, Carson, April P, Chen, Zhanghua, Chen, Yii-Der Ida, Darbar, Dawood, de Andrade, Mariza, Ellinor, Patrick T, Fornage, Myriam, Gelb, Bruce D, Gilliland, Frank D, He, Jiang, Islam, Talat, Kaab, Stefan, Kardia, Sharon LR, Kelly, Shannon, Konkle, Barbara A, Kumar, Rajesh, Loos, Ruth JF, Martinez, Fernando D, McGarvey, Stephen T, Meyers, Deborah A, Mitchell, Braxton D, Montgomery, Courtney G, North, Kari E, Palmer, Nicholette D, Peralta, Juan M, Raby, Benjamin A, Redline, Susan, Rich, Stephen S, Roden, Dan, Rotter, Jerome I, Ruczinski, Ingo, Schwartz, David, Sciurba, Frank, Shoemaker, M Benjamin, Silverman, Edwin K, Sinner, Moritz F, Smith, Nicholas L, Smith, Albert V, Tiwari, Hemant K, Vasan, Ramachandran S, Weiss, Scott T, Williams, L Keoki, Zhang, Yingze, Ziv, Elad, Raffield, Laura M, Reiner, Alexander P, Arvanitis, Marios, Greider, Carol W, Mathias, Rasika A, and Battle, Alexis
Subjects: Biological Sciences, Genetics, Human Genome, Clinical Research, Underpinning research, 2.1 Biological and endogenous factors, Aetiology, 1.1 Normal biological development and functioning, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Hematology and Hemostasis Working Group, TOPMed Structural Variation Working Group, K562 Cells, Telomere, Humans, Gene Expression Regulation, Polymorphism, Single Nucleotide, Genome-Wide Association Study, Telomere Homeostasis, CRISPR-Cas Systems
Abstract: Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.
Published: 2024

29. William Carlos Williams Bibliography 2015–16

Author: Chang, Yi-Ting and Broome, Judith
Published: 2017
Full Text: View/download PDF

30. Alien Capital: Asian Racialization and the Logic of Settler Colonial Capitalism by Iyko Day (review)

Author: Chang, Yi-Ting
Published: 2018

31. Fast Broadcast in Highly Connected Networks

Author: Chandra, Shashwat, Chang, Yi-Jun, Dory, Michal, Ghaffari, Mohsen, and Leitersdorf, Dean
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Data Structures and Algorithms
Abstract: We revisit the classic broadcast problem, wherein we have $k$ messages, each composed of $O(\log{n})$ bits, distributed arbitrarily across a network. The objective is to broadcast these messages to all nodes in the network. In the distributed CONGEST model, a textbook algorithm solves this problem in $O(D+k)$ rounds, where $D$ is the diameter of the graph. While the $O(D)$ term in the round complexity is unavoidable$\unicode{x2014}$given that $\Omega(D)$ rounds are necessary to solve broadcast in any graph$\unicode{x2014}$it remains unclear whether the $O(k)$ term is needed in all graphs. In cases where the minimum cut size is one, simply transmitting messages from one side of the cut to the other would require $\Omega(k)$ rounds. However, if the size of the minimum cut is larger, it may be possible to develop faster algorithms. This motivates the exploration of the broadcast problem in networks with high edge connectivity. In this work, we present a simple randomized distributed algorithm for performing $k$-message broadcast in $O(((n+k)/\lambda)\log n)$ rounds in any $n$-node simple graph with edge connectivity $\lambda$. When $k = \Omega(n)$, our algorithm is universally optimal, up to an $O(\log n)$ factor, as its complexity nearly matches an information-theoretic $\Omega(k/\lambda)$ lower bound that applies to all graphs, even when the network topology is known to the algorithm. The setting $k = \Omega(n)$ is particularly interesting because several fundamental problems can be reduced to broadcasting $\Omega(n)$ messages. Our broadcast algorithm finds several applications in distributed computing, enabling $O(1)$-approximation for all distances and $(1+\epsilon)$-approximation for all cut sizes in $\tilde{O}(n/\lambda)$ rounds.
Published: 2024

32. Seeing Motion at Nighttime with an Event Camera

Author: Liu, Haoyue, Peng, Shihan, Zhu, Lin, Chang, Yi, Zhou, Hanyu, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We focus on a very challenging task: imaging at nighttime dynamic scenes. Most previous methods rely on the low-light enhancement of a conventional RGB camera. However, they would inevitably face a dilemma between the long exposure time of nighttime and the motion blur of dynamic scenes. Event cameras react to dynamic changes with higher temporal resolution (microsecond) and higher dynamic range (120dB), offering an alternative solution. In this work, we present a novel nighttime dynamic imaging method with an event camera. Specifically, we discover that the event at nighttime exhibits temporal trailing characteristics and spatial non-stationary distribution. Consequently, we propose a nighttime event reconstruction network (NER-Net) which mainly includes a learnable event timestamps calibration module (LETC) to align the temporal trailing events and a non-uniform illumination aware module (NIAM) to stabilize the spatiotemporal distribution of events. Moreover, we construct a paired real low-light event dataset (RLED) through a co-axial imaging system, including 64,200 spatially and temporally aligned image GTs and low-light events. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art methods in terms of visual quality and generalization ability on real-world nighttime datasets. The project are available at: https://github.com/Liu-haoyue/NER-Net., Comment: Accepted by CVPR 2024
Published: 2024

33. GT-Rain Single Image Deraining Challenge Report

Author: Zhang, Howard, Ba, Yunhao, Yang, Ethan, Upadhyay, Rishi, Wong, Alex, Kadambi, Achuta, Guo, Yun, Xiao, Xueyao, Wang, Xiaoxiong, Li, Yi, Chang, Yi, Yan, Luxin, Zheng, Chaochao, Wang, Luping, Liu, Bin, Khowaja, Sunder Ali, Yoon, Jiseok, Lee, Ik-Hyun, Zhang, Zhao, Wei, Yanyan, Ren, Jiahuan, Zhao, Suiyi, and Zheng, Huan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.
Published: 2024

34. Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow

Author: Zhou, Hanyu, Chang, Yi, Shi, Zhiwei, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Single RGB or LiDAR is the mainstream sensor for the challenging scene flow, which relies heavily on visual features to match motion features. Compared with single modality, existing methods adopt a fusion strategy to directly fuse the cross-modal complementary knowledge in motion space. However, these direct fusion methods may suffer the modality gap due to the visual intrinsic heterogeneous nature between RGB and LiDAR, thus deteriorating motion features. We discover that event has the homogeneous nature with RGB and LiDAR in both visual and motion spaces. In this work, we bring the event as a bridge between RGB and LiDAR, and propose a novel hierarchical visual-motion fusion framework for scene flow, which explores a homogeneous space to fuse the cross-modal complementary knowledge for physical interpretation. In visual fusion, we discover that event has a complementarity (relative v.s. absolute) in luminance space with RGB for high dynamic imaging, and has a complementarity (local boundary v.s. global shape) in scene structure space with LiDAR for structure integrity. In motion fusion, we figure out that RGB, event and LiDAR are complementary (spatial-dense, temporal-dense v.s. spatiotemporal-sparse) to each other in correlation space, which motivates us to fuse their motion correlations for motion continuity. The proposed hierarchical fusion can explicitly fuse the multimodal knowledge to progressively improve scene flow from visual space to motion space. Extensive experiments have been performed to verify the superiority of the proposed method.
Published: 2024

35. JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection

Author: Zhou, Hanyu, Shi, Zhiwei, Dong, Hao, Peng, Shihan, Chang, Yi, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event-based moving object detection is a challenging task, where static background and moving object are mixed together. Typically, existing methods mainly align the background events to the same spatial coordinate system via motion compensation to distinguish the moving object. However, they neglect the potential spatial tailing effect of moving object events caused by excessive motion, which may affect the structure integrity of the extracted moving object. We discover that the moving object has a complete columnar structure in the point cloud composed of motion-compensated events along the timestamp. Motivated by this, we propose a novel joint spatio-temporal reasoning method for event-based moving object detection. Specifically, we first compensate the motion of background events using inertial measurement unit. In spatial reasoning stage, we project the compensated events into the same image coordinate, discretize the timestamp of events to obtain a time image that can reflect the motion confidence, and further segment the moving object through adaptive threshold on the time image. In temporal reasoning stage, we construct the events into a point cloud along timestamp, and use RANSAC algorithm to extract the columnar shape in the cloud for peeling off the background. Finally, we fuse the results from the two reasoning stages to extract the final moving object region. This joint spatio-temporal reasoning framework can effectively detect the moving object from motion confidence and geometric structure. Moreover, we conduct extensive experiments on various datasets to verify that the proposed method can improve the moving object detection accuracy by 13\%.
Published: 2024

36. DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Author: Guo, Siyuan, Deng, Cheng, Wen, Ying, Chen, Hechang, Chang, Yi, and Wang, Jun
Subjects: Computer Science - Machine Learning
Abstract: In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves 100\% success rate in the development stage, while attaining 36\% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing \$1.60 and \$0.13 per run with GPT-4, respectively. Our data and code are open-sourced at https://github.com/guosyjlu/DS-Agent., Comment: Accepted by ICML 2024
Published: 2024

37. The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Author: Zeng, Shenglai, Zhang, Jiankun, He, Pengfei, Xing, Yue, Liu, Yiding, Xu, Han, Ren, Jie, Wang, Shuaiqiang, Yin, Dawei, Chang, Yi, and Tang, Jiliang
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.
Published: 2024

38. Observation of electronic coherence created at conical intersections and its decoherence in aqueous solution

Author: Chang, Yi-Ping, Balciunas, Tadas, Yin, Zhong, Sapunar, Marin, Tenorio, Bruno N. C., Paul, Alexander C., Tsuru, Shota, Koch, Henrik, Wolf, Jean Pierre, Coriani, Sonia, and Wörner, Hans Jakob
Subjects: Physics - Chemical Physics, Physics - Atomic and Molecular Clusters
Abstract: Electronic quantum coherence is a prerequisite for charge migration in molecules and emerging molecular quantum technologies. It also underlies new forms of spectroscopy and can enhance molecular function. Whether conical intersections can create electronic coherence and for how long electronic coherence can survive in aqueous environments has remained elusive and controversial, respectively. Here, we use X-ray spectroscopy to realize a breakthrough on both of these topics. We find that electronic relaxation from the $^1$B$_{2\rm{u}}$($\pi\pi^*$) state of pyrazine through conical intersections (CIs) induces previously unobserved electronic and vibrational quantum coherences between the $^1$B$_{3\rm{u}}$(n$\pi^*$) and $^1$A$_{\rm{u}}$(n$\pi^*$) states in the gas phase, which correspond to an electronic ring current. These coherences are entirely suppressed when pyrazine is dissolved in water. These observations, supported by the latest advances in multiconfigurational electronic-structure calculations and non-adiabatic dynamics simulations, confirm that CIs can create electronic coherences and that aqueous solvation can decohere them in less than 40 fs. This study opens the door to the investigation of the broad class of electronic coherences created during light-induced molecular dynamics and to quantify their susceptibility to aqueous solvation., Comment: 54 pages, 28 figures
Published: 2024

39. Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspective

Author: Guo, Kai, Wen, Hongzhi, Jin, Wei, Guo, Yaming, Tang, Jiliang, and Chang, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Graph neural networks (GNNs) have exhibited remarkable performance under the assumption that test data comes from the same distribution of training data. However, in real-world scenarios, this assumption may not always be valid. Consequently, there is a growing focus on exploring the Out-of-Distribution (OOD) problem in the context of graphs. Most existing efforts have primarily concentrated on improving graph OOD generalization from two \textbf{model-agnostic} perspectives: data-driven methods and strategy-based learning. However, there has been limited attention dedicated to investigating the impact of well-known \textbf{GNN model architectures} on graph OOD generalization, which is orthogonal to existing research. In this work, we provide the first comprehensive investigation of OOD generalization on graphs from an architecture perspective, by examining the common building blocks of modern GNNs. Through extensive experiments, we reveal that both the graph self-attention mechanism and the decoupled architecture contribute positively to graph OOD generalization. In contrast, we observe that the linear classification layer tends to compromise graph OOD generalization capability. Furthermore, we provide in-depth theoretical insights and discussions to underpin these discoveries. These insights have empowered us to develop a novel GNN backbone model, DGAT, designed to harness the robust properties of both graph self-attention mechanism and the decoupled architecture. Extensive experimental results demonstrate the effectiveness of our model under graph OOD, exhibiting substantial and consistent enhancements across various training strategies.
Published: 2024

40. ScreenAgent: A Vision Language Model-driven Computer Control Agent

Author: Niu, Runliang, Li, Jindong, Wang, Shiqi, Fu, Yali, Hu, Xiyu, Leng, Xueyuan, Kong, He, Chang, Yi, and Wang, Qi
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphics User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing a variety of daily computer tasks. Finally, we trained a model, ScreenAgent, which achieved computer control capabilities comparable to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code is available at \url{https://github.com/niuzaisheng/ScreenAgent}.
Published: 2024

41. Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning

Author: Shan, Yixiang, Zhu, Zhengbang, Long, Ting, Liang, Qifan, Chang, Yi, Zhang, Weinan, and Yin, Liang
Subjects: Computer Science - Machine Learning
Abstract: The performance of offline reinforcement learning (RL) is sensitive to the proportion of high-return trajectories in the offline dataset. However, in many simulation environments and real-world scenarios, there are large ratios of low-return trajectories rather than high-return trajectories, which makes learning an efficient policy challenging. In this paper, we propose a method called Contrastive Diffuser (CDiffuser) to make full use of low-return trajectories and improve the performance of offline RL algorithms. Specifically, CDiffuser groups the states of trajectories in the offline dataset into high-return states and low-return states and treats them as positive and negative samples correspondingly. Then, it designs a contrastive mechanism to pull the trajectory of an agent toward high-return states and push them away from low-return states. Through the contrast mechanism, trajectories with low returns can serve as negative examples for policy learning, guiding the agent to avoid areas associated with low returns and achieve better performance. Experiments on 14 commonly used D4RL benchmarks demonstrate the effectiveness of our proposed method. Our code is publicly available at \url{https://anonymous.4open.science/r/CDiffuser}., Comment: 18 pages with appendix and references, 10 figures, 4 tables
Published: 2024

42. Transductive Reward Inference on Graph

Author: Qu, Bohao, Cao, Xiaofeng, Guo, Qing, Chang, Yi, Tsang, Ivor W., and Zhang, Chengqi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.
Published: 2024

43. Copyright Protection in Generative AI: A Technical Perspective

Author: Ren, Jie, Xu, Han, He, Pengfei, Cui, Yingqian, Zeng, Shenglai, Zhang, Jiankun, Wen, Hongzhi, Ding, Jiayuan, Huang, Pei, Lyu, Lingjuan, Liu, Hui, Chang, Yi, and Tang, Jiliang
Subjects: Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective. We examine from two distinct viewpoints: the copyrights pertaining to the source data held by the data owners and those of the generative models maintained by the model builders. For data copyright, we delve into methods data owners can protect their content and DGMs can be utilized without infringing upon these rights. For model copyright, our discussion extends to strategies for preventing model theft and identifying outputs generated by specific models. Finally, we highlight the limitations of existing techniques and identify areas that remain unexplored. Furthermore, we discuss prospective directions for the future of copyright protection, underscoring its importance for the sustainable and ethical development of Generative AI., Comment: 26 pages
Published: 2024

44. STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Author: Chang, Yi, Ren, Zhao, Zhang, Zixing, Jing, Xin, Qian, Kun, Shao, Xi, Hu, Bin, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.
Published: 2024