Author: "Zhang,Huan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhang,Huan"' showing total 14,927 results

Start Over Author "Zhang,Huan"

14,927 results on '"Zhang,Huan"'

1. DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Author: Zou, Chengke, Guo, Xingang, Yang, Rui, Zhang, Junyu, Hu, Bin, and Zhang, Huan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The rapid advancements in Vision-Language Models (VLMs) have shown great potential in tackling mathematical reasoning tasks that involve visual context. Unlike humans who can reliably apply solution steps to similar problems with minor modifications, we found that SOTA VLMs like GPT-4o can consistently fail in these scenarios, revealing limitations in their mathematical reasoning capabilities. In this paper, we investigate the mathematical reasoning robustness in VLMs and evaluate how well these models perform under different variants of the same question, such as changes in visual numerical values or function graphs. While several vision-based math benchmarks have been developed to assess VLMs' problem-solving capabilities, these benchmarks contain only static sets of problems and cannot easily evaluate mathematical reasoning robustness. To fill this gap, we introduce DynaMath, a dynamic visual math benchmark designed for in-depth assessment of VLMs. DynaMath includes 501 high-quality, multi-topic seed questions, each represented as a Python program. Those programs are carefully designed and annotated to enable the automatic generation of a much larger set of concrete questions, including many different types of visual and textual variations. DynaMath allows us to evaluate the generalization ability of VLMs, by assessing their performance under varying input conditions of a seed question. We evaluated 14 SOTA VLMs with 5,010 generated concrete questions. Our results show that the worst-case model accuracy, defined as the percentage of correctly answered seed questions in all 10 variants, is significantly lower than the average-case accuracy. Our analysis emphasizes the need to study the robustness of VLMs' reasoning abilities, and DynaMath provides valuable insights to guide the development of more reliable models for mathematical reasoning., Comment: 39 pages, 10 figures
Published: 2024

2. SVIP: Towards Verifiable Inference of Open-source Large Language Models

Author: Sun, Yifan, Li, Yuhang, Zhang, Yue, Jin, Yuchen, and Zhang, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. However, their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings. In this paper, we formalize the problem of verifiable inference for LLMs. Existing verifiable computing solutions based on cryptographic or game-theoretic techniques are either computationally uneconomical or rest on strong assumptions. We introduce SVIP, a secret-based verifiable LLM inference protocol that leverages intermediate outputs from LLM as unique model identifiers. By training a proxy task on these outputs and requiring the computing provider to return both the generated text and the processed intermediate outputs, users can reliably verify whether the computing provider is acting honestly. In addition, the integration of a secret mechanism further enhances the security of our protocol. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. Our extensive experiments demonstrate that SVIP is accurate, generalizable, computationally efficient, and resistant to various attacks. Notably, SVIP achieves false negative rates below 5% and false positive rates below 3%, while requiring less than 0.01 seconds per query for verification., Comment: 20 pages
Published: 2024

3. ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise

Author: Guo, Xingang, Keivan, Darioush, Syed, Usman, Qin, Lianhui, Zhang, Huan, Dullerud, Geir, Seiler, Peter, and Hu, Bin
Subjects: Electrical Engineering and Systems Science - Systems and Control, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Control system design is a crucial aspect of modern engineering with far-reaching applications across diverse sectors including aerospace, automotive systems, power grids, and robotics. Despite advances made by Large Language Models (LLMs) in various domains, their application in control system design remains limited due to the complexity and specificity of control theory. To bridge this gap, we introduce ControlAgent, a new paradigm that automates control system design via novel integration of LLM agents and control-oriented domain expertise. ControlAgent encodes expert control knowledge and emulates human iterative design processes by gradually tuning controller parameters to meet user-specified requirements for stability, performance, and robustness. ControlAgent integrates multiple collaborative LLM agents, including a central agent responsible for task distribution and task-specific agents dedicated to detailed controller design for various types of systems and requirements. ControlAgent also employs a Python computation agent that performs complex calculations and controller evaluations based on standard design information provided by task-specified LLM agents. Combined with a history and feedback module, the task-specific LLM agents iteratively refine controller parameters based on real-time feedback from prior designs. Overall, ControlAgent mimics the design processes used by (human) practicing engineers, but removes all the human efforts and can be run in a fully automated way to give end-to-end solutions for control system design with user-specified requirements. To validate ControlAgent's effectiveness, we develop ControlEval, an evaluation dataset that comprises 500 control tasks with various specific design goals. The effectiveness of ControlAgent is demonstrated via extensive comparative evaluations between LLM-based and traditional human-involved toolbox-based baselines.
Published: 2024

4. How does the teacher rate? Observations from the NeuroPiano dataset

Author: Zhang, Huan, Cheung, Vincent, Nishioka, Hayato, Dixon, Simon, and Furuya, Shinichi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: This paper provides a detailed analysis of the NeuroPiano dataset, which comprise 104 audio recordings of student piano performances accompanied with 2255 textual feedback and ratings given by professional pianists. We offer a statistical overview of the dataset, focusing on the standardization of annotations and inter-annotator agreement across 12 evaluative questions concerning performance quality. We also explore the predictive relationship between audio features and teacher ratings via machine learning, as well as annotations provided for text analysis of the responses.
Published: 2024

5. LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment

Author: Zhang, Huan, Cheung, Vincent, Nishioka, Hayato, Dixon, Simon, and Furuya, Shinichi
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Multimedia
Abstract: Research in music understanding has extensively explored composition-level attributes such as key, genre, and instrumentation through advanced representations, leading to cross-modal applications using large language models. However, aspects of musical performance such as stylistic expression and technique remain underexplored, along with the potential of using large language models to enhance educational outcomes with customized feedback. To bridge this gap, we introduce LLaQo, a Large Language Query-based music coach that leverages audio language modeling to provide detailed and formative assessments of music performances. We also introduce instruction-tuned query-response datasets that cover a variety of performance dimensions from pitch accuracy to articulation, as well as contextual performance understanding (such as difficulty and performance techniques). Utilizing AudioMAE encoder and Vicuna-7b LLM backend, our model achieved state-of-the-art (SOTA) results in predicting teachers' performance ratings, as well as in identifying piece difficulty and playing techniques. Textual responses from LLaQo was moreover rated significantly higher compared to other baseline models in a user study using audio-text matching. Our proposed model can thus provide informative answers to open-ended questions related to musical performance from audio data.
Published: 2024

6. Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Author: Lim, Wen Qing, Liang, Jinhua, and Zhang, Huan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Music is inherently made up of complex structures, and representing them as graphs helps to capture multiple levels of relationships. While music generation has been explored using various deep generation techniques, research on graph-related music generation is sparse. Earlier graph-based music generation worked only on generating melodies, and recent works to generate polyphonic music do not account for longer-term structure. In this paper, we explore a multi-graph approach to represent both the rhythmic patterns and phrase structure of Chinese pop music. Consequently, we propose a two-step approach that aims to generate polyphonic music with coherent rhythm and long-term structure. We train two Variational Auto-Encoder networks - one on a MIDI dataset to generate 4-bar phrases, and another on song structure labels to generate full song structure. Our work shows that the models are able to learn most of the structural nuances in the training dataset, including chord and pitch frequency distributions, and phrase attributes.
Published: 2024

7. Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection

Author: Jia, Jiehui, Zhang, Huan, and Liang, Jinhua
Subjects: Computer Science - Multimedia
Abstract: In the domain of human-computer interaction, accurately recognizing and interpreting human emotions is crucial yet challenging due to the complexity and subtlety of emotional expressions. This study explores the potential for detecting a rich and flexible range of emotions through a multimodal approach which integrates facial expressions, voice tones, and transcript from video clips. We propose a novel framework that maps variety of emotions in a three-dimensional Valence-Arousal-Dominance (VAD) space, which could reflect the fluctuations and positivity/negativity of emotions to enable a more variety and comprehensive representation of emotional states. We employed K-means clustering to transit emotions from traditional discrete categorization to a continuous labeling system and built a classifier for emotion recognition upon this system. The effectiveness of the proposed model is evaluated using the MER2024 dataset, which contains culturally consistent video clips from Chinese movies and TV series, annotated with both discrete and open-vocabulary emotion labels. Our experiment successfully achieved the transformation between discrete and continuous models, and the proposed model generated a more diverse and comprehensive set of emotion vocabulary while maintaining strong accuracy.
Published: 2024

8. Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

Author: Hisariya, Tanisha, Zhang, Huan, and Liang, Jinhua
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches. This research develops a model capable of generating music that resonates with the emotions depicted in visual arts, integrating emotion labeling, image captioning, and language models to transform visual inputs into musical compositions. Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music Dataset, pairing paintings with corresponding music for effective training and evaluation. Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data. Performance is evaluated using metrics such as Fr\'echet Audio Distance (FAD), Total Harmonic Distortion (THD), Inception Score (IS), and KL divergence, with audio-emotion text similarity confirmed by the pre-trained CLAP model to demonstrate high alignment between generated music and text. This synthesis tool bridges visual art and music, enhancing accessibility for the visually impaired and opening avenues in educational and therapeutic applications by providing enriched multi-sensory experiences.
Published: 2024

9. A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement

Author: Zhang, Huan, Cheng, Wei, Wu, Yuhan, and Hu, Wei
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid solution plans. In this paper, we draw on pair programming practices to propose PairCoder, a novel LLM-based framework for code generation. PairCoder incorporates two collaborative LLM agents, namely a Navigator agent for high-level planning and a Driver agent for specific implementation. The Navigator is responsible for proposing promising solution plans, selecting the current optimal plan, and directing the next iteration round based on execution feedback. The Driver follows the guidance of Navigator to undertake initial code generation, code testing, and refinement. This interleaved and iterative workflow involves multi-plan exploration and feedback-based refinement, which mimics the collaboration of pair programmers. We evaluate PairCoder with both open-source and closed-source LLMs on various code generation benchmarks. Extensive experimental results demonstrate the superior accuracy of PairCoder, achieving relative pass@1 improvements of 12.00%-162.43% compared to prompting LLMs directly., Comment: Accepted in the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)
Published: 2024

10. HoneyComb: A Flexible LLM-Based Agent System for Materials Science

Author: Zhang, Huan, Song, Yu, Hou, Ziyu, Miret, Santiago, and Liu, Bang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The emergence of specialized large language models (LLMs) has shown promise in addressing complex tasks for materials science. Many LLMs, however, often struggle with distinct complexities of material science tasks, such as materials science computational tasks, and often rely heavily on outdated implicit knowledge, leading to inaccuracies and hallucinations. To address these challenges, we introduce HoneyComb, the first LLM-based agent system specifically designed for materials science. HoneyComb leverages a novel, high-quality materials science knowledge base (MatSciKB) and a sophisticated tool hub (ToolHub) to enhance its reasoning and computational capabilities tailored to materials science. MatSciKB is a curated, structured knowledge collection based on reliable literature, while ToolHub employs an Inductive Tool Construction method to generate, decompose, and refine API tools for materials science. Additionally, HoneyComb leverages a retriever module that adaptively selects the appropriate knowledge source or tools for specific tasks, thereby ensuring accuracy and relevance. Our results demonstrate that HoneyComb significantly outperforms baseline models across various tasks in materials science, effectively bridging the gap between current LLM capabilities and the specialized needs of this domain. Furthermore, our adaptable framework can be easily extended to other scientific domains, highlighting its potential for broad applicability in advancing scientific research and applications., Comment: Under Review on EMNLP 2024
Published: 2024

11. Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

Author: Zhang, Xu, Ma, Jiaqi, Wang, Guoli, Zhang, Qian, Zhang, Huan, and Zhang, Lefei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The limitations of task-specific and general image restoration methods for specific degradation have prompted the development of all-in-one image restoration techniques. However, the diversity of patterns among multiple degradation, along with the significant uncertainties in mapping between degraded images of different severities and their corresponding undistorted versions, pose significant challenges to the all-in-one restoration tasks. To address these challenges, we propose Perceive-IR, an all-in-one image restorer designed to achieve fine-grained quality control that enables restored images to more closely resemble their undistorted counterparts, regardless of the type or severity of degradation. Specifically, Perceive-IR contains two stages: (1) prompt learning stage and (2) restoration stage. In the prompt learning stage, we leverage prompt learning to acquire a fine-grained quality perceiver capable of distinguishing three-tier quality levels by constraining the prompt-image similarity in the CLIP perception space. Subsequently, this quality perceiver and difficulty-adaptive perceptual loss are integrated as a quality-aware learning strategy to realize fine-grained quality control in restoration stage. For the restoration stage, a semantic guidance module (SGM) and compact feature extraction (CFE) are proposed to further promote the restoration process by utilizing the robust semantic information from the pre-trained large scale vision models and distinguishing degradation-specific features. Extensive experiments demonstrate that our Perceive-IR outperforms state-of-the-art methods in all-in-one image restoration tasks and exhibit superior generalization ability when dealing with unseen tasks., Comment: 13 pages, 8 figures
Published: 2024

12. Foundation Models for Music: A Survey

Author: Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, and Wang, Ziyu
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.
Published: 2024

13. PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks

Author: Zhang, Xiyue, Wang, Benjie, Kwiatkowska, Marta, and Zhang, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Logic in Computer Science
Abstract: Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example, check the robustness of neural network predictions to bounded perturbations of an input. However, verifying properties concerning the preimage, i.e., the set of inputs satisfying an output property, requires abstractions in the input space. We present a general framework for preimage abstraction that produces under- and over-approximations of any polyhedral output set. Our framework employs cheap parameterised linear relaxations of the neural network, together with an anytime refinement procedure that iteratively partitions the input region by splitting on input features and neurons. The effectiveness of our approach relies on carefully designed heuristics and optimization objectives to achieve rapid improvements in the approximation volume. We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks compared to state-of-the-art techniques. Further, we showcase the application to quantitative verification and robustness analysis, presenting a sound and complete algorithm for the former and providing sound quantitative results for the latter., Comment: arXiv admin note: text overlap with arXiv:2305.03686
Published: 2024

14. Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors

Author: Syed, Usman, Light, Ethan, Guo, Xingang, Zhang, Huan, Qin, Lianhui, Ouyang, Yanfeng, and Hu, Bin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.
Published: 2024

15. Constructing Enhanced Mutual Information for Online Class-Incremental Learning

Author: Zhang, Huan, Lyu, Fan, Fan, Shenghua, Zheng, Yujin, and Wang, Dingwen
Subjects: Computer Science - Machine Learning
Abstract: Online Class-Incremental continual Learning (OCIL) addresses the challenge of continuously learning from a single-channel data stream, adapting to new tasks while mitigating catastrophic forgetting. Recently, Mutual Information (MI)-based methods have shown promising performance in OCIL. However, existing MI-based methods treat various knowledge components in isolation, ignoring the knowledge confusion across tasks. This narrow focus on simple MI knowledge alignment may lead to old tasks being easily forgotten with the introduction of new tasks, risking the loss of common parts between past and present knowledge.To address this, we analyze the MI relationships from the perspectives of diversity, representativeness, and separability, and propose an Enhanced Mutual Information (EMI) method based on knwoledge decoupling. EMI consists of Diversity Mutual Information (DMI), Representativeness Mutual Information (RMI) and Separability Mutual Information (SMI). DMI diversifies intra-class sample features by considering the similarity relationships among inter-class sample features to enable the network to learn more general knowledge. RMI summarizes representative features for each category and aligns sample features with these representative features, making the intra-class sample distribution more compact. SMI establishes MI relationships for inter-class representative features, enhancing the stability of representative features while increasing the distinction between inter-class representative features, thus creating clear boundaries between class. Extensive experimental results on widely used benchmark datasets demonstrate the superior performance of EMI over state-of-the-art baseline methods.
Published: 2024

16. From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano

Author: Zhang, Huan, Liang, Jinhua, and Dixon, Simon
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or genre, we identify a knowledge gap in performance-level music understanding, and address three critical tasks: expertise ranking, difficulty estimation, and piano technique detection, introducing a comprehensive Pianism-Labelling Dataset (PLD) for this purpose. We leverage pre-trained audio encoders, specifically Jukebox, Audio-MAE, MERT, and DAC, demonstrating varied capabilities in tackling downstream tasks, to explore whether domain-specific fine-tuning enhances capability in capturing performance nuances. Our best approach achieved 93.6\% accuracy in expertise ranking, 33.7\% in difficulty estimation, and 46.7\% in technique detection, with Audio-MAE as the overall most effective encoder. Finally, we conducted a case study on Chopin Piano Competition data using trained models for expertise ranking, which highlights the challenge of accurately assessing top-tier performances., Comment: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)
Published: 2024

17. F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

Author: Xu, Zexing, Zhang, Linjun, Yang, Sitan, Etesami, Rasoul, Tong, Hanghang, Zhang, Huan, and Han, Jiawei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Graphics, Economics - Econometrics, Statistics - Methodology, 68T07, 68T05, 62M10, 62M20, 90C90, 91B84
Abstract: Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns from similar entities during non-peak periods, enriched by features learned from a graph neural networks (GNNs)-based forecasting model, to predict demand during peak events. We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm that leverages proxy data from non-peak periods and GNN-generated relational metadata to learn feature-specific layer parameters, thereby adapting to demand forecasts for peak events. Theoretically, we show that by considering domain similarities through task-specific metadata, our model achieves improved generalization, where the excess risk decreases as the number of training tasks increases. Empirical evaluations on large-scale industrial datasets demonstrate the superiority of our approach. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
Published: 2024

18. Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

Author: Gong, Haifan, Huang, Wenhao, Zhang, Huan, Wang, Yu, Wan, Xiang, Shen, Hong, Li, Guanbin, and Li, Haofeng
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within bronchi. Conversely, the intensity values of some foreground voxels are nearly identical to those of background voxels. This proximity in intensity values introduces significant challenges to neural network methodologies. To address the issue, we introduce a novel Intensity-Distance Guided loss function, which assigns adaptive weights to different image voxels for mining hard samples that cause the intensity confusion. The proposed loss estimates the voxel-level hardness of samples, on the basis of the following intensity and distance priors. We regard a voxel as a hard sample if it is in: (1) the background and has an intensity value close to the bronchus region; (2) the bronchus region and is of higher intensity than most voxels inside the bronchus; (3) the background region and at a short distance from the bronchus. Extensive experiments not only show the superiority of our method compared with the state-of-the-art methods, but also verify that tackling the intensity confusion issue helps to significantly improve bronchus segmentation. Project page: https://github.com/lhaof/ICM., Comment: IEEE International Conference on Multimedia & Expo (ICME) 2024
Published: 2024

19. DExter: Learning and Controlling Performance Expression with Diffusion Models

Author: Zhang, Huan, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Liang, Jinhua, Dixon, Simon, and Widmer, Gerhard
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while being conditioned on the musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by conditioning jointly on score and perceptual feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests comparing generated performances with different human interpretations. Results show that DExter is able to capture the time-varying correlation of the expressive parameters, and compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified by a proxy model predicting perceptual characteristics of differently steered performances., Comment: in submission to appsci special session
Published: 2024

20. Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

Author: Yang, Rui, Ding, Ruomeng, Lin, Yong, Zhang, Huan, and Zhang, Tong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Reward models trained on human preference data have been proven to effectively align Large Language Models (LLMs) with human intent within the framework of reinforcement learning from human feedback (RLHF). However, current reward models have limited generalization capabilities to unseen prompts and responses, which can lead to an unexpected phenomenon known as reward over-optimization, resulting in a decline in actual performance due to excessive optimization of rewards. While previous research has advocated for constraining policy optimization, our study introduces a novel approach to enhance the reward model's generalization ability against distribution shifts by regularizing the hidden states. Specifically, we retain the base model's language model head and incorporate a suite of text-generation losses to preserve the hidden states' text-generation capabilities, while concurrently learning a reward head behind the same hidden states. Our experimental results demonstrate that the introduced regularization technique markedly improves the accuracy of learned reward models across a variety of out-of-distribution (OOD) tasks and effectively alleviates the over-optimization issue in RLHF, offering a more reliable and robust preference learning paradigm., Comment: NeurIPS 2024
Published: 2024

21. Neural Network Verification with Branch-and-Bound for General Nonlinearities

Author: Shi, Zhouxing, Jin, Qirui, Kolter, Zico, Jana, Suman, Hsieh, Cho-Jui, and Zhang, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to optimize branching points offline, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including networks with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as networks involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple neural networks, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest $\alpha,\!\beta$-CROWN, the winner of the 4th International Verification of Neural Networks Competition (VNN-COMP 2023)., Comment: Preprint
Published: 2024

22. Verified Safe Reinforcement Learning for Neural Network Dynamic Models

Author: Wu, Junlin, Zhang, Huan, and Vorobeychik, Yevgeniy
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Learning reliably safe autonomous control is one of the core problems in trustworthy autonomy. However, training a controller that can be formally verified to be safe remains a major challenge. We introduce a novel approach for learning verified safe control policies in nonlinear neural dynamical systems while maximizing overall performance. Our approach aims to achieve safety in the sense of finite-horizon reachability proofs, and is comprised of three key parts. The first is a novel curriculum learning scheme that iteratively increases the verified safe horizon. The second leverages the iterative nature of gradient-based learning to leverage incremental verification, reusing information from prior verification runs. Finally, we learn multiple verified initial-state-dependent controllers, an idea that is especially valuable for more complex domains where learning a single universal verified safe controller is extremely challenging. Our experiments on five safe control problems demonstrate that our trained controllers can achieve verified safety over horizons that are as much as an order of magnitude longer than state-of-the-art baselines, while maintaining high reward, as well as a perfect safety record over entire episodes.
Published: 2024

23. Designing AI-Enabled Games to Support Social-Emotional Learning for Children with Autism Spectrum Disorders

Author: Lyu, Yue, An, Pengcheng, Zhang, Huan, Katsuragawa, Keiko, and Zhao, Jian
Subjects: Computer Science - Human-Computer Interaction
Abstract: Children with autism spectrum disorder (ASD) experience challenges in grasping social-emotional cues, which can result in difficulties in recognizing emotions and understanding and responding to social interactions. Social-emotional intervention is an effective method to improve emotional understanding and facial expression recognition among individuals with ASD. Existing work emphasizes the importance of personalizing interventions to meet individual needs and motivate engagement for optimal outcomes in daily settings. We design a social-emotional game for ASD children, which generates personalized stories by leveraging the current advancement of artificial intelligence. Via a co-design process with five domain experts, this work offers several design insights into developing future AI-enabled gamified systems for families with autistic children. We also propose a fine-tuned AI model and a dataset of social stories for different basic emotions., Comment: 2 pages, 1 table, peer-reviewed and presented at the "CHI 2024 Workshop on Child-centred AI Design, May 11, 2024, Honolulu, HI, USA"
Published: 2024

24. Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data

Author: Zhang, Huan, Finkel, Justin, Abbot, Dorian S., Gerber, Edwin P., and Weare, Jonathan
Subjects: Physics - Atmospheric and Oceanic Physics, Statistics - Machine Learning
Abstract: Blocking events are an important cause of extreme weather, especially long-lasting blocking events that trap weather systems in place. The duration of blocking events is, however, underestimated in climate models. Explainable Artificial Intelligence are a class of data analysis methods that can help identify physical causes of prolonged blocking events and diagnose model deficiencies. We demonstrate this approach on an idealized quasigeostrophic model developed by Marshall and Molteni (1993). We train a convolutional neural network (CNN), and subsequently, build a sparse predictive model for the persistence of Atlantic blocking, conditioned on an initial high-pressure anomaly. Shapley Additive ExPlanation (SHAP) analysis reveals that high-pressure anomalies in the American Southeast and North Atlantic, separated by a trough over Atlantic Canada, contribute significantly to prediction of sustained blocking events in the Atlantic region. This agrees with previous work that identified precursors in the same regions via wave train analysis. When we apply the same CNN to blockings in the ERA5 atmospheric reanalysis, there is insufficient data to accurately predict persistent blocks. We partially overcome this limitation by pre-training the CNN on the plentiful data of the Marshall-Molteni model, and then using Transfer Learning to achieve better predictions than direct training. SHAP analysis before and after transfer learning allows a comparison between the predictive features in the reanalysis and the quasigeostrophic model, quantifying dynamical biases in the idealized model. This work demonstrates the potential for machine learning methods to extract meaningful precursors of extreme weather events and achieve better prediction using limited observational data., Comment: 29 pages, 10 figures
Published: 2024

25. Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Author: Yang, Lujie, Dai, Hongkai, Shi, Zhouxing, Hsieh, Cho-Jui, Tedrake, Russ, and Zhang, Huan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control, Mathematics - Optimization and Control
Abstract: Learning-based neural network (NN) control policies have shown impressive empirical performance in a wide range of tasks in robotics and control. However, formal (Lyapunov) stability guarantees over the region-of-attraction (ROA) for NN controllers with nonlinear dynamical systems are challenging to obtain, and most existing approaches rely on expensive solvers such as sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability modulo theories (SMT). In this paper, we demonstrate a new framework for learning NN controllers together with Lyapunov certificates using fast empirical falsification and strategic regularizations. We propose a novel formulation that defines a larger verifiable region-of-attraction (ROA) than shown in the literature, and refines the conventional restrictive constraints on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov condition is rigorously verified post-hoc using branch-and-bound with scalable linear bound propagation-based NN verification techniques. The approach is efficient and flexible, and the full training and verification procedure is accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT. The flexibility and efficiency of our framework allow us to demonstrate Lyapunov-stable output feedback control with synthesized NN-based controllers and NN-based observers with formal stability guarantees, for the first time in literature. Source code at https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllers, Comment: Paper accepted by ICML 2024
Published: 2024

26. Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations

Author: Zhang, Huan, Chen, Yifan, Vanden-Eijnden, Eric, and Peherstorfer, Benjamin
Subjects: Mathematics - Numerical Analysis, Computer Science - Machine Learning
Abstract: Sequential-in-time methods solve a sequence of training problems to fit nonlinear parametrizations such as neural networks to approximate solution trajectories of partial differential equations over time. This work shows that sequential-in-time training methods can be understood broadly as either optimize-then-discretize (OtD) or discretize-then-optimize (DtO) schemes, which are well known concepts in numerical analysis. The unifying perspective leads to novel stability and a posteriori error analysis results that provide insights into theoretical and numerical aspects that are inherent to either OtD or DtO schemes such as the tangent space collapse phenomenon, which is a form of over-fitting. Additionally, the unified perspective facilitates establishing connections between variants of sequential-in-time training methods, which is demonstrated by identifying natural gradient descent methods on energy functionals as OtD schemes applied to the corresponding gradient flows.
Published: 2024

27. Phase estimation via coherent and photon-catalyzed squeezed vacuum states

Author: Zhao, Zekun, Kang, Qingqian, Zhang, Huan, Zhao, Teng, Liu, Cunjin, and Hu, Liyun
Subjects: Quantum Physics
Abstract: The research focused on enhancing the measurement accuracy through the use of non-Gaussian states has garnered increasing attention. In this study, we propose a scheme to input the coherent state mixed with photon-catalyzed squeezed vacuum state into the Mach-Zender interferometer to enhance phase measurement accuracy. The findings demonstrate that photon catalysis, particularly multi-photon catalysis, can effectively improve the phase sensitivity of parity detection and the quantum Fisher information. Moreover, the situation of photon losses in practical measurement was studied. The results indicate that external dissipation has a greater influence on phase sensitivity than the internal dissipation. Compared to input coherent state mixed with squeezed vacuum state, the utilization of coherent state mixed photon-catalyzed squeezed vacuum state, particularly the mixed multi-photon catalyzed squeezed vacuum state as input, can enhance the phase sensitivity and quantum Fisher information. Furthermore, the phase measurement accuracy can exceed the standard quantum limit, and even surpass the Heisenberg limit. This research is expected to significantly contribute to quantum precision measurement.
Published: 2024

28. WavCraft: Audio Editing and Generation with Large Language Models

Author: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, and Benetos, Emmanouil
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural language and prompts the LLM conditioned on audio descriptions and user requests. WavCraft leverages the in-context learning ability of the LLM to decomposes users' instructions into several tasks and tackle each task collaboratively with the particular module. Through task decomposition along with a set of task-specific models, WavCraft follows the input instruction to create or edit audio content with more details and rationales, facilitating user control. In addition, WavCraft is able to cooperate with users via dialogue interaction and even produce the audio content without explicit user commands. Experiments demonstrate that WavCraft yields a better performance than existing methods, especially when adjusting the local regions of audio clips. Moreover, WavCraft can follow complex instructions to edit and create audio content on the top of input recordings, facilitating audio producers in a broader range of applications. Our implementation and demos are available at this https://github.com/JinhuaLiang/WavCraft.
Published: 2024

29. A Safe Screening Rule with Bi-level Optimization of $\nu$ Support Vector Machine

Author: Yang, Zhiji, Chen, Wanyi, Zhang, Huan, Xu, Yitian, Shi, Lei, and Zhao, Jianhua
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control
Abstract: Support vector machine (SVM) has achieved many successes in machine learning, especially for a small sample problem. As a famous extension of the traditional SVM, the $\nu$ support vector machine ($\nu$-SVM) has shown outstanding performance due to its great model interpretability. However, it still faces challenges in training overhead for large-scale problems. To address this issue, we propose a safe screening rule with bi-level optimization for $\nu$-SVM (SRBO-$\nu$-SVM) which can screen out inactive samples before training and reduce the computational cost without sacrificing the prediction accuracy. Our SRBO-$\nu$-SVM is strictly deduced by integrating the Karush-Kuhn-Tucker (KKT) conditions, the variational inequalities of convex problems and the $\nu$-property. Furthermore, we develop an efficient dual coordinate descent method (DCDM) to further improve computational speed. Finally, a unified framework for SRBO is proposed to accelerate many SVM-type models, and it is successfully applied to one-class SVM. Experimental results on 6 artificial data sets and 30 benchmark data sets have verified the effectiveness and safety of our proposed methods in supervised and unsupervised tasks.
Published: 2024

30. Node-RADS: a systematic review and meta-analysis of diagnostic performance, category-wise malignancy rates, and inter-observer reliability

Author: Zhong, Jingyu, Mao, Shiqi, Chen, Haoda, Wang, Yibin, Yin, Qian, Cen, Qingqing, Lu, Junjie, Yang, Jiarui, Hu, Yangfan, Xing, Yue, Liu, Xianwei, Ge, Xiang, Jiang, Run, Song, Yang, Lu, Minda, Chu, Jingshen, Zhang, Huan, Zhang, Guangcheng, Ding, Defang, and Yao, Weiwu
Published: 2024
Full Text: View/download PDF

31. Impact of invasive apple snails on shallow water ecosystems under different nutrient conditions: results from a mesocosm study

Author: He, Liang, Guo, Shiyuan, Wang, Guanghao, Ning, Zixuan, Zhang, Huan, and Ge, Gang
Published: 2024
Full Text: View/download PDF

32. Preparation and properties of magnetic superabsorbent composite based on poly (acrylic acid-acrylamide)-g-sodium alginate/Fe3O4

Author: Zheng, Yunxiang, Zhang, Huan, Shi, Yaqing, Su, Zirui, Sun, Xinran, and Wang, Xiangpeng
Published: 2024
Full Text: View/download PDF

33. Adsorption and recovery nutrient from the tail liquid of biohydrogen production by zeolite

Author: Zhou, Xiaokai, Li, Cunjie, Lu, Chaoyang, Zhang, Yang, Li, Yameng, Zhang, Huan, Zhang, Quanguo, and Jing, Yanyan
Published: 2024
Full Text: View/download PDF

34. Ultra-High-Resolution Photon-Counting Detector CT Benefits Visualization of Abdominal Arteries: A Comparison to Standard-Reconstruction

Author: Zhang, Huan, Xing, Yue, Wang, Lingyun, Hu, Yangfan, Xu, Zhihan, Chen, Haoda, Lu, Junjie, Yang, Jiarui, Ding, Bei, Hu, Weiguo, and Zhong, Jingyu
Published: 2024
Full Text: View/download PDF

35. A simplified model for unsteady airflow analysis in ultra-long tunnels based on the resistance compensation method

Author: Fan, Xianwang, Zhang, Huan, Wan, Zhihao, Liu, Zhikai, Liu, Jiali, Yang, Junbin, Liu, Sujie, Pu, Jiaxuan, Wang, Zhaoying, Jiang, Yan, Wu, Zhangxiang, You, Shijun, and Zheng, Wandong
Published: 2024
Full Text: View/download PDF

36. A modified withdrawal time estimation and risk assessment of enrofloxacin in grass carp (Ctenopharyngodon idella) after ad libitum medicated feed based on statistical approaches in natural cultured environments

Author: Xu, Ning, Zhang, Huan, Dong, Jing, Yang, Yibin, Liu, Yongtao, Zhou, Shun, Zhu, Xia, and Ai, Xiaohui
Published: 2024
Full Text: View/download PDF

37. Bacteria from the rhizosphere of a selenium hyperaccumulator plant can improve the selenium uptake of a non-hyperaccumulator plant

Author: Zhang, Huan, Yang, Dandan, Hu, Chengxiao, Du, Xiaoping, Liang, Lianming, Wang, Xu, Shi, Guangyu, Han, Chuang, Tang, Yanni, Lei, Zheng, Yi, Ceng, and Zhao, Xiaohu
Published: 2024
Full Text: View/download PDF

38. CHIM-Net: A Combined Hierarchical Information Model for Predicting Time, Space and Intensity of Mining Microseismic Events

Author: Luo, Hao, Zhang, Huan, Pan, Yishan, Dai, Lianpeng, Kong, Chao, and Bai, Mingyu
Published: 2024
Full Text: View/download PDF

39. Urolithin a Improves Motor Dysfunction Induced by Copper Exposure in SOD1G93A Transgenic Mice Via Activation of Mitophagy

Author: Zhang, Huan, Gao, Chuanyue, Yang, Deguang, Nie, Lulin, He, Kaiwu, Chen, Chongyang, Li, Shangming, Huang, Guanqin, Zhou, Li, Huang, Xinfeng, Wu, Desheng, Liu, Jianjun, Huang, Zhenlie, Wang, Jie, Li, Weihua, Zhang, Zhaohui, Yang, Xifei, and Zou, Liangyu
Published: 2024
Full Text: View/download PDF

40. Assessing the reliability of a novel cancer-specific multi-attribute utility instrument (FACT-8D) and comparing its validity to EQ-5D-5L in colorectal cancer patients

Author: Cao, Yiyin, Zhang, Huan, Luo, Nan, Li, Haofei, Cheng, Ling Jie, and Huang, Weidong
Published: 2024
Full Text: View/download PDF

41. COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

Author: Guo, Xingang, Yu, Fangxu, Zhang, Huan, Qin, Lianhui, and Hu, Bin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Jailbreaks on large language models (LLMs) have recently received increasing attention. For a comprehensive assessment of LLM safety, it is essential to consider jailbreaks with diverse attributes, such as contextual coherence and sentiment/stylistic variations, and hence it is beneficial to study controllable jailbreaking, i.e. how to enforce control on LLM attacks. In this paper, we formally formulate the controllable attack generation problem, and build a novel connection between this problem and controllable text generation, a well-explored topic of natural language processing. Based on this connection, we adapt the Energy-based Constrained Decoding with Langevin Dynamics (COLD), a state-of-the-art, highly efficient algorithm in controllable text generation, and introduce the COLD-Attack framework which unifies and automates the search of adversarial LLM attacks under a variety of control requirements such as fluency, stealthiness, sentiment, and left-right-coherence. The controllability enabled by COLD-Attack leads to diverse new jailbreak scenarios which not only cover the standard setting of generating fluent (suffix) attack with continuation constraint, but also allow us to address new controllable attack settings such as revising a user query adversarially with paraphrasing constraint, and inserting stealthy attacks in context with position constraint. Our extensive experiments on various LLMs (Llama-2, Mistral, Vicuna, Guanaco, GPT-3.5, and GPT-4) show COLD-Attack's broad applicability, strong controllability, high success rate, and attack transferability. Our code is available at https://github.com/Yu-Fangxu/COLD-Attack., Comment: Accepted to ICML 2024
Published: 2024

42. TrustLLM: Trustworthiness in Large Language Models

Author: Huang, Yue, Sun, Lichao, Wang, Haoran, Wu, Siyuan, Zhang, Qihui, Li, Yuan, Gao, Chujie, Huang, Yixin, Lyu, Wenhan, Zhang, Yixuan, Li, Xiner, Liu, Zhengliang, Liu, Yixin, Wang, Yijue, Zhang, Zhikun, Vidgen, Bertie, Kailkhura, Bhavya, Xiong, Caiming, Xiao, Chaowei, Li, Chunyuan, Xing, Eric, Huang, Furong, Liu, Hao, Ji, Heng, Wang, Hongyi, Zhang, Huan, Yao, Huaxiu, Kellis, Manolis, Zitnik, Marinka, Jiang, Meng, Bansal, Mohit, Zou, James, Pei, Jian, Liu, Jian, Gao, Jianfeng, Han, Jiawei, Zhao, Jieyu, Tang, Jiliang, Wang, Jindong, Vanschoren, Joaquin, Mitchell, John, Shu, Kai, Xu, Kaidi, Chang, Kai-Wei, He, Lifang, Huang, Lifu, Backes, Michael, Gong, Neil Zhenqiang, Yu, Philip S., Chen, Pin-Yu, Gu, Quanquan, Xu, Ran, Ying, Rex, Ji, Shuiwang, Jana, Suman, Chen, Tianlong, Liu, Tianming, Zhou, Tianyi, Wang, William, Li, Xiang, Zhang, Xiangliang, Wang, Xiao, Xie, Xing, Chen, Xun, Wang, Xuyu, Liu, Yan, Ye, Yanfang, Cao, Yinzhi, Chen, Yong, and Zhao, Yue
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness., Comment: This work is still under work and we welcome your contribution
Published: 2024

43. Fine morphology of eggs, nymphs, wax-secreting structures and sensory pits of the planthopper Euricania clara Kato (Hemiptera: Fulgoromorpha: Ricaniidae), with comparative notes on related species

Author: Zhang, Huan, Cai, Jia-Hang, and Qin, Dao-Zheng
Published: 2024
Full Text: View/download PDF

44. Decoding the genetic landscape of juvenile dermatomyositis: insights from phosphorylation-associated single nucleotide polymorphisms

Author: Zhang, Huan, Zhang, Zhentao, Fan, Kedi, Chen, Hongru, Guo, Yufan, and Mo, Xingbo
Published: 2024
Full Text: View/download PDF

45. Circular RNA PIP5K1A Promotes Glucose and Lipid Metabolism Disorders and Inflammation in Type 2 Diabetes Mellitus

Author: Song, Ge, Zhang, YiQian, Jiang, YiHua, Zhang, Huan, Gu, Wen, Xu, Xiu, Yao, Jing, and Chen, ZhengFang
Published: 2024
Full Text: View/download PDF

46. Stability of Extensions in Incomplete Argumentation Frameworks

Author: Xiong, Anshu, Zhang, Huan, Zhang, Songmao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Destercke, Sébastien, editor, Martinez, Maria Vanina, editor, and Sanfilippo, Giuseppe, editor
Published: 2025
Full Text: View/download PDF

47. Deep Learning-Based Liver Vessel Separation with Plug-and-Play Modules: Skeleton Tracking and Graph Attention

Author: Pei, Chenhao, Wang, Wei, Zhang, Huan, Yin, Siyuan, Tang, Wen, Meng, Ming, Xiao, Weinan, Shen, Hong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chen, Chao, editor, Singh, Yash, editor, and Hu, Xiaoling, editor
Published: 2025
Full Text: View/download PDF

48. Enhancing hydrolysis of lignocellulosic biomass through molecular modification of lytic polysaccharide monooxygenase from Aspergillus niger

Author: Chen, Ru, Yu, Shuo, Chen, Feifan, Cui, Xinyu, Wang, Shuang, Zhang, Huan, Zhang, Cuiying, Du, Liping, and Ma, Lijuan
Published: 2024
Full Text: View/download PDF

49. Composition and evolutionary characterization of the gut microbiota in pigs

Author: Zhang, Shuhong, Zhang, Huan, Zhang, Cheng, Wang, Guan, Shi, Chuanxing, Li, Zhiqiang, Gao, Fengyi, Cui, Yanyan, Li, Ming, and Yang, Guangli
Published: 2024
Full Text: View/download PDF

50. Prediction of tumor regression grade in far-advanced gastric cancer after preoperative immuno-chemotherapy using dual-energy CT-derived extracellular volume fraction

Author: Chen, Yong, Jiang, Jinling, Yan, Chao, Jiang, Jiang, Shi, Bowen, Xu, Zhihan, Yuan, Fei, Zhang, Huan, and Zhang, Jun
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

14,927 results on '"Zhang,Huan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources