Author: "Shen, Yilin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shen, Yilin"' showing total 368 results

Start Over Author "Shen, Yilin"

368 results on '"Shen, Yilin"'

1. FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing

Author: Smith, James Seale, Lin, Chi-Heng, Tuli, Shikhar, Jeelani, Haris, Gao, Shangqian, Shen, Yilin, Jin, Hongxia, and Hsu, Yen-Chang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We present a method to prune LLMs that selectively prunes model blocks based on an importance score and replaces them with a low-parameter replacement strategy. Specifically, we propose a principled metric to replace each pruned block using a weight-sharing mechanism that leverages unpruned counterparts from the model and block-specific low-rank adapters. Furthermore, we facilitate the learning of these replacement blocks with output feature normalization and an adapter initialization scheme built on low-rank SVD reconstructions. Empirical evaluations demonstrate substantial performance gains over existing methods, achieving state-of-the-art performance on 5/6 benchmarks for a compression rate of 30% and 6/6 benchmarks for a compression rate of 40%. We also demonstrate that our approach can extend smaller models, boosting performance on 6/6 benchmarks using only ~0.3% tokens of extended training with minimal additional parameter costs., Comment: Accepted to NAACL 2025 - Main Conference
Published: 2025

2. DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Author: Gao, Shangqian, Lin, Chi-Heng, Hua, Ting, Zheng, Tang, Shen, Yilin, Jin, Hongxia, and Hsu, Yen-Chang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension. Our dimension-independent structural pruning method offers several benefits. Firstly, our method enables different blocks to utilize different subsets of the feature maps. Secondly, by removing structural dependence, we facilitate each block to possess varying widths along its input and output dimensions, thereby significantly enhancing the flexibility of structural pruning. We evaluate our method on various LLMs, including OPT, LLaMA, LLaMA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning., Comment: Accepted by NeurIPS 2024
Published: 2024

3. MoDeGPT: Modular Decomposition for Large Language Model Compression

Author: Lin, Chi-Heng, Gao, Shangqian, Smith, James Seale, Patel, Abhishek, Tuli, Shikhar, Shen, Yilin, Jin, Hongxia, and Hsu, Yen-Chang
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning, 15A23 (Primary), I.2.7
Abstract: Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nystr\"om approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%., Comment: 31 pages, 9 figures
Published: 2024

4. DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling

Author: Tuli, Shikhar, Lin, Chi-Heng, Hsu, Yen-Chang, Jha, Niraj K., Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language
Abstract: Traditional language models operate autoregressively, i.e., they predict one token at a time. Rapid explosion in model sizes has resulted in high inference times. In this work, we propose DynaMo, a suite of multi-token prediction language models that reduce net inference times. Our models $\textit{dynamically}$ predict multiple tokens based on their confidence in the predicted joint probability distribution. We propose a lightweight technique to train these models, leveraging the weights of traditional autoregressive counterparts. Moreover, we propose novel ways to enhance the estimated joint probability to improve text generation quality, namely co-occurrence weighted masking and adaptive thresholding. We also propose systematic qualitative and quantitative methods to rigorously test the quality of generated text for non-autoregressive generation. One of the models in our suite, DynaMo-7.3B-T3, achieves same-quality generated text as the baseline (Pythia-6.9B) while achieving 2.57$\times$ speed-up with only 5.87% and 2.67% parameter and training time overheads, respectively., Comment: Accepted at NAACL 2024
Published: 2024

5. An index-free sparse neural network using two-dimensional semiconductor ferroelectric field-effect transistors

Author: Ning, Hongkai, Wen, Hengdi, Meng, Yuan, Yu, Zhihao, Fu, Yuxiang, Zou, Xilu, Shen, Yilin, Luo, Xiai, Zhao, Qiyue, Zhang, Tao, Liu, Lei, Zhu, Shitong, Li, Taotao, Li, Weisheng, Li, Li, Gao, Li, Shi, Yi, and Wang, Xinran
Published: 2025
Full Text: View/download PDF

6. Compositional Generalization in Spoken Language Understanding

Author: Ray, Avik, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language
Abstract: State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and find that state-of-the-art SLU models often learn spurious slot correlations during training, which leads to poor performance in both compositional cases. To mitigate these limitations, we create the first compositional splits of benchmark SLU datasets and we propose the first compositional SLU model, including compositional loss and paired training that tackle each compositional case respectively. On both benchmark and compositional splits in ATIS and SNIPS, we show that our compositional SLU model significantly outperforms (up to $5\%$ F1 score) state-of-the-art BERT SLU model., Comment: Published in INTERSPEECH 2023
Published: 2023

7. Prompt Tuning for Zero-shot Compositional Learning

Author: Zhang, Lingyu, Hua, Ting, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Open World Compositional Zero-Shot Learning (OW-CZSL) is known to be an extremely challenging task, which aims to recognize unseen compositions formed from seen attributes and objects without any prior assumption of the output space. In order to achieve this goal, a model has to be "smart" and "knowledgeable". To be smart, a model should be good at reasoning the interactions between attributes and objects from the seen compositions. While "knowledgeable" means the model owns "common sense" to the open world that can "foresee" some features of the unseen compositions. Most previous work focuses on the "smart" part, while few of them provided an effective solution to achieve the "knowledgeable" goal. In this paper, we proposed a framework named Multi-Modal Prompt Tuning (MMPT) to inherit the "knowledgeable" property from the large pre-trained vision-language model. Extensive experiments show that our proposed MMPT obtains new state-of-the-art results in OW-CZSL task. On the UT-Zappos dataset, MMPT pushes the AUC score to $29.8$, while the previous best score is $26.5$. On the more challenging MIT-States dataset, the AUC score of MMPT is 1.5 times better than the current state-of-the-art.
Published: 2023

8. Token Fusion: Bridging the Gap between Token Pruning and Token Merging

Author: Kim, Minchul, Gao, Shangqian, Hsu, Yen-Chang, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. Multiple solutions rely on token pruning or token merging. In this paper, we introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging. Token pruning proves advantageous when the model exhibits sensitivity to input interpolations, while token merging is effective when the model manifests close to linear responses to inputs. We combine this to propose a new scheme called Token Fusion. Moreover, we tackle the limitations of average merging, which doesn't preserve the intrinsic feature norm, resulting in distributional shifts. To mitigate this, we introduce MLERP merging, a variant of the SLERP technique, tailored to merge multiple tokens while maintaining the norm distribution. ToFu is versatile, applicable to ViTs with or without additional training. Our empirical evaluations indicate that ToFu establishes new benchmarks in both classification and image generation tasks concerning computational efficiency and model accuracy., Comment: To appear in WACV 2024
Published: 2023

9. Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters

Author: Smith, James Seale, Hsu, Yen-Chang, Kira, Zsolt, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent work has demonstrated a remarkable ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential (i.e., continual) manner while only providing a few example images for each concept. This setting is known as continual diffusion. Here, we ask the question: Can we scale these methods to longer concept sequences without forgetting? Although prior work mitigates the forgetting of previously learned concepts, we show that its capacity to learn new tasks reaches saturation over longer sequences. We address this challenge by introducing a novel method, STack-And-Mask INcremental Adapters (STAMINA), which is composed of low-ranked attention-masked adapters and customized MLP tokens. STAMINA is designed to enhance the robust fine-tuning properties of LoRA for sequential concept learning via learnable hard-attention masks parameterized with low rank MLPs, enabling precise, scalable learning via sparse adaptation. Notably, all introduced trainable parameters can be folded back into the model after training, inducing no additional inference parameter costs. We show that STAMINA outperforms the prior SOTA for the setting of text-to-image continual customization on a 50-concept benchmark composed of landmarks and human faces, with no stored replay data. Additionally, we extended our method to the setting of continual learning for image classification, demonstrating that our gains also translate to state-of-the-art performance in this standard benchmark., Comment: CVPR-W 2024
Published: 2023

10. Surgical intervention of Lemierre’s syndrome: a case report and review of the literature

Author: Pan, Yiqi, Shi, Zhihong, Ye, Bin, Da, Qian, Wang, Chaofu, Shen, Yilin, and Xiang, Mingliang
Published: 2024
Full Text: View/download PDF

11. Academic grit scale for Chinese middle- and upper-grade primary school students: testing its factor structure and measurement invariance

Author: Lin, Rongmao, Chen, Yanping, Shen, Yilin, Hu, Ting, Huang, Ying, Yang, Yishan, Yu, Xueting, and Ding, Jinliang
Published: 2024
Full Text: View/download PDF

12. Prepare Ansatz for VQE with Diffusion Model

Author: Shen, Yilin
Subjects: Quantum Physics
Abstract: The Variational Quantum Eigensolver (VQE) is a quantum algorithm used to find the ground state energy of a given Hamiltonian. The key component of VQE is the ansatz, which is a trial wavefunction that the algorithm uses to approximate the ground state. Designing a good ansatz can significantly improve the performance of the VQE algorithm. Typical ansatz structures include the Unitary Coupled Cluster (UCC) ansatz and the Hardware-Efficient Ansatz (HEA). The primary distinction between these two structures lies in their dependence on the problem and hardware. The UCC ansatz is tailored to the target Hamiltonian, whereas the HEA is determined by the hardware topology. We believe that an intermediate approach could combine the benefits of the UCC ansatz while introducing additional parameters to increase its expressiveness and capability. In this paper, we propose utilizing a diffusion model to facilitate the generation of ansatz. We create a sequence of UCC ansatzes as training data and input this data into the diffusion model. The model then generates quantum circuits that have a similar structure to the input data. These quantum circuits are subsequently tested using a VQE task to evaluate their performance. This approach provides a systematic method for generating ansatzes that maintain a similar structure while incorporating additional parameters, enhancing their expressiveness and capability. We validate on small molecules that the diffusion model can help prepare ansatz circuits for VQE.
Published: 2023

13. CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Author: Srinivasa, Rakshith Sharma, Cho, Jaejin, Yang, Chouchang, Saidutta, Yashas Malur, Lee, Ching-Hua, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification., Comment: Accepted to Neural Information Processing Systems (NeurIPS) 2023 conference
Published: 2023

14. Exploiting viral vectors to deliver genome editing reagents in plants

Author: Shen, Yilin, Ye, Tao, Li, Zihan, Kimutai, Torotwa Herman, Song, Hao, Dong, Xiaoou, and Wan, Jianmin
Published: 2024
Full Text: View/download PDF

15. TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models

Author: Xue, Jiaqi, Zheng, Mengxin, Hua, Ting, Shen, Yilin, Liu, Yepeng, Boloni, Ladislau, and Lou, Qian
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) are progressively being utilized as machine learning services and interface tools for various applications. However, the security implications of LLMs, particularly in relation to adversarial and Trojan attacks, remain insufficiently examined. In this paper, we propose TrojLLM, an automatic and black-box framework to effectively generate universal and stealthy triggers. When these triggers are incorporated into the input data, the LLMs' outputs can be maliciously manipulated. Moreover, the framework also supports embedding Trojans within discrete prompts, enhancing the overall effectiveness and precision of the triggers' attacks. Specifically, we propose a trigger discovery algorithm for generating universal triggers for various inputs by querying victim LLM-based APIs using few-shot data samples. Furthermore, we introduce a novel progressive Trojan poisoning algorithm designed to generate poisoned prompts that retain efficacy and transferability across a diverse range of models. Our experiments and results demonstrate TrojLLM's capacity to effectively insert Trojans into text prompts in real-world black-box LLM APIs including GPT-3.5 and GPT-4, while maintaining exceptional performance on clean test sets. Our work sheds light on the potential security risks in current models and offers a potential defensive approach. The source code of TrojLLM is available at https://github.com/UCF-ML-Research/TrojLLM., Comment: Accepted by NeurIPS'23
Published: 2023

16. Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Author: Smith, James Seale, Hsu, Yen-Chang, Zhang, Lingyu, Hua, Ting, Kira, Zsolt, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact. Project page: https://jamessealesmith.github.io/continual-diffusion/, Comment: Transactions on Machine Learning Research (TMLR) 2024
Published: 2023

17. To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Author: Saidutta, Yashas Malur, Srinivasa, Rakshith Sharma, Lee, Ching-Hua, Yang, Chouchang, Shen, Yilin, and Jin, Hongxia
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Keyword spotting systems continuously process audio streams to detect keywords. One of the most challenging tasks in designing such systems is to reduce False Alarm (FA) which happens when the system falsely registers a keyword despite the keyword not being uttered. In this paper, we propose a simple yet elegant solution to this problem that follows from the law of total probability. We show that existing deep keyword spotting mechanisms can be improved by Successive Refinement, where the system first classifies whether the input audio is speech or not, followed by whether the input is keyword-like or not, and finally classifies which keyword was uttered. We show across multiple models with size ranging from 13K parameters to 2.41M parameters, the successive refinement technique reduces FA by up to a factor of 8 on in-domain held-out FA data, and up to a factor of 7 on out-of-domain (OOD) FA data. Further, our proposed approach is "plug-and-play" and can be applied to any deep keyword spotting model., Comment: Accepted for publication in ICASSP 2023
Published: 2023

18. ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Author: Zhou, Kaiwen, Zheng, Kaizhi, Pryor, Connor, Shen, Yilin, Jin, Hongxia, Getoor, Lise, and Wang, Xin Eric
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).
Published: 2023

19. GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer

Author: Yin, Miao, Uzkent, Burak, Shen, Yilin, Jin, Hongxia, and Yuan, Bo
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: The recently proposed Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks, and they are viewed as an important type of foundation model. However, ViTs are typically constructed with large-scale sizes, which then severely hinder their potential deployment in many practical resources-constrained applications. To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency. However, unlike its current popularity for CNNs and RNNs, structured pruning for ViT models is little explored. In this paper, we propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models. We first develop a graph-based ranking for measuring the importance of attention heads, and the extracted importance information is further integrated to an optimization-based procedure to impose the heterogeneous structured sparsity patterns on the ViT models. Experimental results show that our proposed GOHSP demonstrates excellent compression performance. On CIFAR-10 dataset, our approach can bring 40% parameters reduction with no accuracy loss for ViT-Small model. On ImageNet dataset, with 30% and 35% sparsity ratio for DeiT-Tiny and DeiT-Small models, our approach achieves 1.65% and 0.76% accuracy increase over the existing structured pruning methods, respectively., Comment: This manuscript was accepted to AAAI 2023 Main Track
Published: 2023

20. Numerical Optimizations for Weighted Low-rank Estimation on Language Model

Author: Hua, Ting, Hsu, Yen-Chang, Wang, Felicity, Lou, Qian, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy. The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models., Comment: long paper EMNLP 2022
Published: 2022

21. Language model compression with weighted low-rank factorization

Author: Hsu, Yen-Chang, Hua, Ting, Chang, Sungen, Lou, Qian, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the trained model's task accuracy. We analyze this previously unexplored problem, make observations, and address it by introducing Fisher information to weigh the importance of parameters affecting the model prediction. This idea leads to our method: Fisher-Weighted SVD (FWSVD). Although the factorized matrices from our approach do not result in smaller reconstruction errors, we find that our resulting task accuracy is much closer to the original model's performance. We perform analysis with the transformer-based language models, showing our weighted SVD largely alleviates the mismatched optimization objectives and can maintain model performance with a higher compression rate. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies requiring expensive model pre-training. Moreover, the evaluation of compressing an already compact model shows our method can further reduce 9% to 30% parameters with an insignificant impact on task accuracy., Comment: ICLR 2022
Published: 2022

22. Systematical analysis and application of distributed activation energy model (DAEM) with Weibull distribution for pyrolysis kinetics of lignocellulosic biomass

Author: Yang, Yantao, Jiang, Mingshen, Song, Lei, Shen, Yilin, Lei, Tingzhou, and Cai, Junmeng
Published: 2024
Full Text: View/download PDF

23. Mutual regulation between histone methyltransferase Suv39h1 and the Wnt/β-catenin signaling pathway promoted cell proliferation and inhibited apoptosis in bone marrow mesenchymal stem cells exposed to hydroquinone

Author: Xu, Tao, Shen, Yilin, Guo, Runmin, Luo, Chiheng, Niu, Yibo, Luo, Zhilong, Zhu, Zhongxin, Wu, Zehui, Zhao, Xinyu, Luo, Hao, and Gao, Yuting
Published: 2024
Full Text: View/download PDF

24. A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

Author: Hsu, Yen-Chang, Smith, James, Shen, Yilin, Kira, Zsolt, and Jin, Hongxia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient way to facilitate knowledge transfer, less attention has been put on comparing the effect of knowledge sources such as features, logits, and gradients. This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources, making a systematic comparison possible in model compression and incremental learning. Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design, providing a practical guideline for effective KD-based transfer learning.
Published: 2022

25. MGA-VQA: Multi-Granularity Alignment for Visual Question Answering

Author: Xiong, Peixi, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces. Moreover, reasoning in visual question answering requires the model to understand both image and question, and align them in the same space, rather than simply memorize statistics about the question-answer pairs. Thus, it is essential to find component connections between different modalities and within each modality to achieve better attention. Previous works learned attention weights directly on the features. However, the improvement is limited since these two modality features are in two domains: image features are highly diverse, lacking structure and grammatical rules as language, and natural language features have a higher probability of missing detailed information. To better learn the attention between visual and text, we focus on how to construct input stratification and embed structural information to improve the alignment between different level components. We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA), which learns intra- and inter-modality correlations by multi-granularity alignment, and outputs the final result by the decision fusion module. In contrast to previous works, our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations. The experiments on the VQA-v2 and GQA datasets demonstrate that our model significantly outperforms non-pretrained state-of-the-art methods on both datasets without extra pretraining data and annotations. Moreover, it even achieves better results over the pre-trained methods on GQA.
Published: 2022

26. Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding

Author: Hua, Ting, Shen, Yilin, Zhao, Changsheng, Hsu, Yen-Chang, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Domain classification is the fundamental task in natural language understanding (NLU), which often requires fast accommodation to new emerging domains. This constraint makes it impossible to retrain all previous domains, even if they are accessible to the new model. Most existing continual learning approaches suffer from low accuracy and performance fluctuation, especially when the distributions of old and new data are significantly different. In fact, the key real-world problem is not the absence of old data, but the inefficiency to retrain the model with the whole old dataset. Is it potential to utilize some old data to yield high accuracy and maintain stable performance, while at the same time, without introducing extra hyperparameters? In this paper, we proposed a hyperparameter-free continual learning model for text data that can stably produce high performance under various environments. Specifically, we utilize Fisher information to select exemplars that can "record" key information of the original model. Also, a novel scheme called dynamical weight consolidation is proposed to enable hyperparameter-free learning during the retrain process. Extensive experiments demonstrate that baselines suffer from fluctuated performance and therefore useless in practice. On the contrary, our proposed model CCFI significantly and consistently outperforms the best state-of-the-art method by up to 20% in average accuracy, and each component of CCFI contributes effectively to overall performance.
Published: 2022
Full Text: View/download PDF

27. Automatic Mixed-Precision Quantization Search of BERT

Author: Zhao, Changsheng, Hua, Ting, Shen, Yilin, Lou, Qian, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few quantization attempts that are specifically designed for natural language processing tasks. They suffer from a small compression ratio or a large error rate since manual setting on hyper-parameters is required and fine-grained subgroup-wise quantization is not supported. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each sub-group automatically, and at the same time pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method outperforms baselines by providing the same performance with much smaller model size. We also show the feasibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.
Published: 2021
Full Text: View/download PDF

28. Exploring Covariate and Concept Shift for Detection and Calibration of Out-of-Distribution Data

Author: Tian, Junjiao, Hsu, Yen-Change, Shen, Yilin, Jin, Hongxia, and Kira, Zsolt
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Moving beyond testing on in-distribution data works on Out-of-Distribution (OOD) detection have recently increased in popularity. A recent attempt to categorize OOD data introduces the concept of near and far OOD detection. Specifically, prior works define characteristics of OOD data in terms of detection difficulty. We propose to characterize the spectrum of OOD data using two types of distribution shifts: covariate shift and concept shift, where covariate shift corresponds to change in style, e.g., noise, and concept shift indicates a change in semantics. This characterization reveals that sensitivity to each type of shift is important to the detection and confidence calibration of OOD data. Consequently, we investigate score functions that capture sensitivity to each type of dataset shift and methods that improve them. To this end, we theoretically derive two score functions for OOD detection, the covariate shift score and concept shift score, based on the decomposition of KL-divergence for both scores, and propose a geometrically-inspired method (Geometric ODIN) to improve OOD detection under both shifts with only in-distribution data. Additionally, the proposed method naturally leads to an expressive post-hoc calibration function which yields state-of-the-art calibration performance on both in-distribution and out-of-distribution data. We are the first to propose a method that works well across both OOD detection and calibration and under different types of shifts. View project page at https://sites.google.com/view/geometric-decomposition., Comment: A short version of the paper is accepted to NeurIPS DistShift Workshop 2021
Published: 2021

29. Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU

Author: Shen, Yilin, Hsu, Yen-Chang, Ray, Avik, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances has a critical effect in practical use. Recent works have shown that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection. Our method designs a novel domain-regularized module (DRM) to reduce the overconfident phenomenon of a vanilla classifier, achieving a better generalization in both cases. Besides, DRM can be used as a drop-in replacement for the last layer in any neural network-based intent classifier, providing a low-cost strategy for a significant improvement. The evaluation on four datasets shows that our method built on BERT and RoBERTa models achieves state-of-the-art performance against existing approaches and the strong baselines we created for the comparisons.
Published: 2021

30. Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning

Author: Smith, James, Hsu, Yen-Chang, Balloch, Jonathan, Shen, Yilin, Jin, Hongxia, and Kira, Zsolt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Modern computer vision applications suffer from catastrophic forgetting when incrementally learning new concepts over time. The most successful approaches to alleviate this forgetting require extensive replay of previously seen data, which is problematic when memory constraints or data legality concerns exist. In this work, we consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL), where an incremental learning agent must learn new concepts over time without storing generators or training data from past tasks. One approach for DFCIL is to replay synthetic images produced by inverting a frozen copy of the learner's classification model, but we show this approach fails for common class-incremental benchmarks when using standard distillation strategies. We diagnose the cause of this failure and propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation, and show that our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks. Our method even outperforms several standard replay based methods which store a coreset of images., Comment: Accepted by the 2021 International Conference on Computer Vision (ICCV 2021)
Published: 2021

31. An Adversarial Learning based Multi-Step Spoken Language Understanding System through Human-Computer Interaction

Author: Wang, Yu, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Most of the existing spoken language understanding systems can perform only semantic frame parsing based on a single-round user query. They cannot take users' feedback to update/add/remove slot values through multiround interactions with users. In this paper, we introduce a novel multi-step spoken language understanding system based on adversarial learning that can leverage the multiround user's feedback to update slot values. We perform two experiments on the benchmark ATIS dataset and demonstrate that the new system can improve parsing performance by at least $2.5\%$ in terms of F1, with only one round of feedback. The improvement becomes even larger when the number of feedback rounds increases. Furthermore, we also compare the new system with state-of-the-art dialogue state tracking systems and demonstrate that the new interactive system can perform better on multiround spoken language understanding tasks in terms of slot- and sentence-level accuracy., Comment: 5 Pages, original work published at ICASSP 2021
Published: 2021

32. Laminated composite fabricated using high-performance polyamine thermoset: Ultra heat resistance and excellent mechanical property

Author: Shen, Yilin, Wang, Shengtao, Du, Guanben, Qin, Tao, Jiang, Shuyang, Liu, Shouqing, Duan, Zhigang, Niu, Hui, and Li, Taohong
Published: 2024
Full Text: View/download PDF

33. Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via Calibrated Dirichlet Prior RNN

Author: Shen, Yilin, Chen, Wenhu, and Jin, Hongxia
Subjects: Computer Science - Artificial Intelligence
Abstract: One major task of spoken language understanding (SLU) in modern personal assistants is to extract semantic concepts from an utterance, called slot filling. Although existing slot filling models attempted to improve extracting new concepts that are not seen in training data, the performance in practice is still not satisfied. Recent research collected question and answer annotated data to learn what is unknown and should be asked, yet not practically scalable due to the heavy data collection effort. In this paper, we incorporate softmax-based slot filling neural architectures to model the sequence uncertainty without question supervision. We design a Dirichlet Prior RNN to model high-order uncertainty by degenerating as softmax layer for RNN model training. To further enhance the uncertainty modeling robustness, we propose a novel multi-task training to calibrate the Dirichlet concentration parameters. We collect unseen concepts to create two test datasets from SLU benchmark datasets Snips and ATIS. On these two and another existing Concept Learning benchmark datasets, we show that our approach significantly outperforms state-of-the-art approaches by up to 8.18%. Our method is generic and can be applied to any RNN or Transformer based slot filling models with a softmax layer.
Published: 2020

34. Generating Dialogue Responses from a Semantic Latent Space

Author: Ko, Wei-Jen, Ray, Avik, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language
Abstract: Existing open-domain dialogue generation models are usually trained to mimic the gold response in the training set using cross-entropy loss on the vocabulary. However, a good response does not need to resemble the gold response, since there are multiple possible responses to a given prompt. In this work, we hypothesize that the current models are unable to integrate information from multiple semantically similar valid responses of a prompt, resulting in the generation of generic and uninformative responses. To address this issue, we propose an alternative to the end-to-end classification on vocabulary. We learn the pair relationship between the prompts and responses as a regression task on a latent space instead. In our novel dialog generation model, the representations of semantically related sentences are close to each other on the latent space. Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative., Comment: EMNLP 2020
Published: 2020

35. Reward Constrained Interactive Recommendation with Natural Language Feedback

Author: Zhang, Ruiyi, Yu, Tong, Shen, Yilin, Jin, Hongxia, Chen, Changyou, and Carin, Lawrence
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Text-based interactive recommendation provides richer user feedback and has demonstrated advantages over traditional interactive recommender systems. However, recommendations can easily violate preferences of users from their past natural-language feedback, since the recommender needs to explore new items for further improvement. To alleviate this issue, we propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference, which is incorporated into the standard RL objective of maximizing expected cumulative future rewards. Our proposed framework is general and is further extended to the task of constrained text generation. Empirical results show that the proposed method yields consistent improvement relative to standard RL methods., Comment: Appeared in NeurIPS 2019; Updated version
Published: 2020

36. PGLP: Customizable and Rigorous Location Privacy through Policy Graph

Author: Cao, Yang, Xiao, Yonghui, Takagi, Shun, Xiong, Li, Yoshikawa, Masatoshi, Shen, Yilin, Liu, Jinfei, Jin, Hongxia, and Xu, Xiaofeng
Subjects: Computer Science - Cryptography and Security, Computer Science - Computers and Society
Abstract: Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., \textit{Policy Graph based Location Privacy}, providing a rich interface to release private locations with customizable and rigorous privacy guarantee. First, we design the privacy metrics of PGLP by extending differential privacy. Specifically, we formalize a user's location privacy requirements using a \textit{location policy graph}, which is expressive and customizable. Second, we investigate how to satisfy an arbitrarily given location policy graph under adversarial knowledge. We find that a location policy graph may not always be viable and may suffer \textit{location exposure} when the attacker knows the user's mobility pattern. We propose efficient methods to detect location exposure and repair the policy graph with optimal utility. Third, we design a private location trace release framework that pipelines the detection of location exposure, policy graph repair, and private trajectory release with customizable and rigorous location privacy. Finally, we conduct experiments on real-world datasets to verify the effectiveness of the privacy-utility trade-off and the efficiency of the proposed algorithms., Comment: accepted in the 25th European Symposium on Research in Computer Security (ESORICS) 2020
Published: 2020

37. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

Author: Hsu, Yen-Chang, Shen, Yilin, Jin, Hongxia, and Kira, Zsolt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Deep neural networks have attained remarkable performance when applied to data that comes from the same distribution as that of the training set, but can significantly degrade otherwise. Therefore, detecting whether an example is out-of-distribution (OoD) is crucial to enable a system that can reject such samples or alert users. Recent works have made significant progress on OoD benchmarks consisting of small image datasets. However, many recent methods based on neural networks rely on training or tuning with both in-distribution and out-of-distribution data. The latter is generally hard to define a-priori, and its selection can easily bias the learning. We base our work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance. We specifically propose to decompose confidence scoring as well as a modified input pre-processing method. We show that both of these significantly help in detection performance. Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference in the difficulty of the problem, providing an analysis of when ODIN-like strategies do or do not work., Comment: CVPR 2020
Published: 2020

38. Applicability of the cognitive model of generalized anxiety disorder to adolescents’ sleep quality: A cross-sectional and longitudinal analysis

Author: Xiao, Huiwen, Shen, Yilin, Zhang, Weizhong, and Lin, Rongmao
Published: 2023
Full Text: View/download PDF

39. Iterative Delexicalization for Improved Spoken Language Understanding

Author: Ray, Avik, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language
Abstract: Recurrent neural network (RNN) based joint intent classification and slot tagging models have achieved tremendous success in recent years for building spoken language understanding and dialog systems. However, these models suffer from poor performance for slots which often encounter large semantic variability in slot values after deployment (e.g. message texts, partial movie/artist names). While greedy delexicalization of slots in the input utterance via substring matching can partly improve performance, it often produces incorrect input. Moreover, such techniques cannot delexicalize slots with out-of-vocabulary slot values not seen at training. In this paper, we propose a novel iterative delexicalization algorithm, which can accurately delexicalize the input, even with out-of-vocabulary slot values. Based on model confidence of the current delexicalized input, our algorithm improves delexicalization in every iteration to converge to the best input having the highest confidence. We show on benchmark and in-house datasets that our algorithm can greatly improve parsing performance for RNN based models, especially for out-of-distribution slot values., Comment: Published at INTERSPEECH 2019, Graz, Austria
Published: 2019

40. How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection

Author: Chen, Wenhu, Su, Yu, Shen, Yilin, Chen, Zhiyu, Yan, Xifeng, and Wang, William
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: With the rapid development in deep learning, deep neural networks have been widely adopted in many real-life natural language applications. Under deep neural networks, a pre-defined vocabulary is required to vectorize text inputs. The canonical approach to select pre-defined vocabulary is based on the word frequency, where a threshold is selected to cut off the long tail distribution. However, we observed that such simple approach could easily lead to under-sized vocabulary or over-sized vocabulary issues. Therefore, we are interested in understanding how the end-task classification accuracy is related to the vocabulary size and what is the minimum required vocabulary size to achieve a specific performance. In this paper, we provide a more sophisticated variational vocabulary dropout (VVD) based on variational dropout to perform vocabulary selection, which can intelligently select the subset of the vocabulary to achieve the required performance. To evaluate different algorithms on the newly proposed vocabulary selection problem, we propose two new metrics: Area Under Accuracy-Vocab Curve and Vocab Size under X\% Accuracy Drop. Through extensive experiments on various NLP classification tasks, our variational framework is shown to significantly outperform the frequency-based and other selection baselines on these metrics., Comment: Accepted to NAACL 2019, 11 pages, 7 figures, 3 tables
Published: 2019

41. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Author: Selvaraju, Ramprasaath R., Lee, Stefan, Shen, Yilin, Jin, Hongxia, Ghosh, Shalini, Heck, Larry, Batra, Dhruv, and Parikh, Devi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image. In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding. HINT encourages deep networks to be sensitive to the same input regions as humans. Our approach optimizes the alignment between human attention maps and gradient-based network importances - ensuring that models learn not just to look at but rather rely on visual concepts that humans found relevant for a task when making predictions. We apply HINT to Visual Question Answering and Image Captioning tasks, outperforming top approaches on splits that penalize over-reliance on language priors (VQA-CP and robust captioning) using human attention demonstrations for just 6% of the training data., Comment: Published at ICCV'2019
Published: 2019

42. Novel melamine-based engineering thermosets: Facile synthesis, extraordinary thermostability, high strength and toughness

Author: Wang, Shengtao, Shen, Yilin, Du, Guanben, Jiang, Shuyang, Liu, Shouqing, Niu, Hui, Li, Le, Qin, Tao, Duan, Zhigang, and Li, Taohong
Published: 2023
Full Text: View/download PDF

43. Novel and high-performance tannin-polyamine adhesive: New insight into phenol-amine chemistry

Author: Jiang, Shuyang, Niu, Hui, Wang, Shengtao, Qian, Zhang, Du, Guanben, Zhou, Xiaojian, Shen, Yilin, Yang, Zhaojin, and Li, Taohong
Published: 2023
Full Text: View/download PDF

44. A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

Author: Wang, Yu, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or "encoder-decoder" models), and generate the intents and semantic tags either using separate models or a joint model. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross-impact between them. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data, with about 0.5$\%$ intent accuracy improvement and 0.9 $\%$ slot filling improvement., Comment: 5 pages, published at 2018 NAACL
Published: 2018

45. A Variational Dirichlet Framework for Out-of-Distribution Detection

Author: Chen, Wenhu, Shen, Yilin, Jin, Hongxia, and Wang, William
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications. However, deep neural networks are also known to have very little control over its uncertainty for unseen examples, which potentially causes very harmful and annoying consequences in practical scenarios. In this paper, we are particularly interested in designing a higher-order uncertainty metric for deep neural networks and investigate its effectiveness under the out-of-distribution detection task proposed by~\cite{hendrycks2016baseline}. Our method first assumes there exists an underlying higher-order distribution $\mathbb{P}(z)$, which controls label-wise categorical distribution $\mathbb{P}(y)$ over classes on the K-dimension simplex, and then approximate such higher-order distribution via parameterized posterior function $p_{\theta}(z|x)$ under variational inference framework, finally we use the entropy of learned posterior distribution $p_{\theta}(z|x)$ as uncertainty measure to detect out-of-distribution examples. Further, we propose an auxiliary objective function to discriminate against synthesized adversarial examples to further increase the robustness of the proposed uncertainty measure. Through comprehensive experiments on various datasets, our proposed framework is demonstrated to consistently outperform competing algorithms., Comment: Tech Report
Published: 2018

46. User Information Augmented Semantic Frame Parsing using Coarse-to-Fine Neural Networks

Author: Shen, Yilin, Zeng, Xiangyu, Wang, Yu, and Jin, Hongxia
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Semantic frame parsing is a crucial component in spoken language understanding (SLU) to build spoken dialog systems. It has two main tasks: intent detection and slot filling. Although state-of-the-art approaches showed good results, they require large annotated training data and long training time. In this paper, we aim to alleviate these drawbacks for semantic frame parsing by utilizing the ubiquitous user information. We design a novel coarse-to-fine deep neural network model to incorporate prior knowledge of user information intermediately to better and quickly train a semantic frame parser. Due to the lack of benchmark dataset with real user information, we synthesize the simplest type of user information (location and time) on ATIS benchmark data. The results show that our approach leverages such simple user information to outperform state-of-the-art approaches by 0.25% for intent detection and 0.31% for slot filling using standard training data. When using smaller training data, the performance improvement on intent detection and slot filling reaches up to 1.35% and 1.20% respectively. We also show that our approach can achieve similar performance as state-of-the-art approaches by using less than 80% annotated training data. Moreover, the training time to achieve the similar performance is also reduced by over 60%.
Published: 2018

47. Robust Spoken Language Understanding via Paraphrasing

Author: Ray, Avik, Shen, Yilin, and Jin, Hongxia
Subjects: Computer Science - Computation and Language
Abstract: Learning intents and slot labels from user utterances is a fundamental step in all spoken language understanding (SLU) and dialog systems. State-of-the-art neural network based methods, after deployment, often suffer from performance degradation on encountering paraphrased utterances, and out-of-vocabulary words, rarely observed in their training set. We address this challenging problem by introducing a novel paraphrasing based SLU model which can be integrated with any existing SLU model in order to improve their overall performance. We propose two new paraphrase generators using RNN and sequence-to-sequence based neural networks, which are suitable for our application. Our experiments on existing benchmark and in house datasets demonstrate the robustness of our models to rare and complex paraphrased utterances, even under adversarial test distributions., Comment: Published in Proceedings of INTERSPEECH 2018
Published: 2018

48. Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

Author: Pan, Xinlei, Ohn-Bar, Eshed, Rhinehart, Nicholas, Xu, Yan, Shen, Yilin, and Kitani, Kris M.
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: Humans are able to understand and perform complex tasks by strategically structuring the tasks into incremental steps or subgoals. For a robot attempting to learn to perform a sequential task with critical subgoal states, such states can provide a natural opportunity for interaction with a human expert. This paper analyzes the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework. The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states. These subgoal states define a set of subtasks for the learning agent to complete in order to achieve the final goal. The learning agent queries for partial demonstrations corresponding to each subtask as needed when the agent struggles with the subtask. The proposed Human Interactive IRL (HI-IRL) framework is evaluated on several discrete path-planning tasks. We demonstrate that subgoal-based interactive structuring of the learning task results in significantly more efficient learning, requiring only a fraction of the demonstration data needed for learning the underlying reward function with the baseline IRL model.
Published: 2018

49. Biodegradation and potential effect of ranitidine during aerobic composting of human feces

Author: Zhu, Ping, Pan, Xusheng, Shen, Yilin, Huang, Xiang, Yu, Fang, Wu, Deli, Feng, Qingge, Zhou, John, and Li, Xiaowei
Published: 2022
Full Text: View/download PDF

50. Cu2O‑Based Core–Shell Nanostructures with Glutathione Depletion and O2 Generation Properties for Photodynamic/Chemodynamic Tumor Therapy.

Author: Zhu, Yuchao, Zhang, Yelei, Shen, Yilin, Hong, Shujing, Wang, Kang, Jia, Xiao, and Liu, Wenge
Abstract: Hypoxia and high glutathione (GSH) contents in the tumor microenvironment (TME) diminish the efficacy of single-modality photodynamic therapy (PDT) and chemodynamic therapy (CDT). Cu2O can effectively release Cu+ within the TME, facilitating CDT, and concurrently generate O2 to mitigate hypoxia. Co-delivery of Cu2O with photosensitizers (PSs) to tumors could potentially enhance PDT/CDT combination therapy. However, challenges such as instability, susceptibility to in vivo substances affecting catalytic activity, and toxicity to normal tissues limit the utility of Cu2O. To overcome these limitations, we propose the utilization of Cu2O as the core within GSH-responsive metal–organic framework (MOF) shells, uniformly grown on its surface to form Cu2O@MOFs heterostructures. This design effectively prevents leakage and deactivation of the Cu2O core. We synthesized Cu2+-doped ZIF-67 multifunctional MOF shells in situ on Cu2O and subsequently incorporated the photosensitizer ZnPc into the MOFs to create Cu2O@Cu2+/ZIF-67@ZnPc (CCZZ). Upon entry into tumor cells, this nanodrug degrades in the presence of slight acidity and an optimal GSH concentration, releasing Cu2O and ZnPc while depleting GSH. The released Cu2O and degraded ZIF-67, respectively, generate Cu+ and Co2+ for CDT and O2, while the ZnPc consumes O2 for PDT under laser irradiation, achieving synergistic PDT/CDT combination therapy. To the best of our knowledge, this is the first study based on the in situ synthesis of Cu2O@Cu-MOFs that achieves dual-enhanced PDT/CDT therapy by simultaneously depleting GSH and generating oxygen. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

368 results on '"Shen, Yilin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources