Author: "Pin Yu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pin Yu"' showing total 3,059 results

Start Over Author "Pin Yu"

3,059 results on '"Pin Yu"'

201. Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning

Author: Chen, Pin-Yu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation of the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming., Comment: Published at AAAI 2024 (Senior Member Presentation Track); Survey paper on model reprogramming; Project repository: https://github.com/IBM/model-reprogramming
Published: 2022

202. When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

Author: Yang, Chao-Han Huck, Qi, Jun, Chen, Samuel Yen-Chi, Tsao, Yu, and Chen, Pin-Yu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The rapid development of quantum computing has demonstrated many unique characteristics of quantum advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification. In particular, our proposed hybrid classical-quantum model consists of a novel random quantum temporal convolution (QTC) learning framework replacing some layers in the BERT-based decoder. Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets. Particularly, the BERT-QTC boosts the performance of the existing quantum circuit-based language model in two text classification datasets by 1.57% and 1.52% relative improvements. Furthermore, BERT-QTC can be feasibly deployed on both existing commercial-accessible quantum computation hardware and CPU-based interface for ensuring data isolation., Comment: Accepted to ICASSP 2022
Published: 2022

203. Holistic Adversarial Robustness of Deep Learning Models

Author: Chen, Pin-Yu and Liu, Sijia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability. With the proliferation of deep-learning-based technology, the potential risks associated with model development and deployment can be amplified and become dreadful vulnerabilities. This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models, including attacks, defenses, verification, and novel applications., Comment: survey paper on holistic adversarial robustness for deep learning; published at AAAI 2023 Senior Member Presentation Track
Published: 2022

204. Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations

Author: Hsiung, Lei, Tsai, Yun-Yun, Chen, Pin-Yu, and Ho, Tsung-Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Model robustness against adversarial examples of single perturbation type such as the $\ell_{p}$-norm has been widely studied, yet its generalization to more realistic scenarios involving multiple semantic perturbations and their composition remains largely unexplored. In this paper, we first propose a novel method for generating composite adversarial examples. Our method can find the optimal attack composition by utilizing component-wise projected gradient descent and automatic attack-order scheduling. We then propose generalized adversarial training (GAT) to extend model robustness from $\ell_{p}$-ball to composite semantic perturbations, such as the combination of Hue, Saturation, Brightness, Contrast, and Rotation. Results obtained using ImageNet and CIFAR-10 datasets indicate that GAT can be robust not only to all the tested types of a single attack, but also to any combination of such attacks. GAT also outperforms baseline $\ell_{\infty}$-norm bounded adversarial training approaches by a significant margin., Comment: CVPR 2023. The research demo is at https://hsiung.cc/CARBEN/
Published: 2022

205. Auto-Transfer: Learning to Route Transferrable Representations

Author: Murugesan, Keerthiram, Sadashivaiah, Vijay, Luss, Ronny, Shanmugam, Karthikeyan, Chen, Pin-Yu, and Dhurandhar, Amit
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge transfer between heterogeneous source and target networks and tasks has received a lot of attention in recent times as large amounts of quality labeled data can be difficult to obtain in many applications. Existing approaches typically constrain the target deep neural network (DNN) feature representations to be close to the source DNNs feature representations, which can be limiting. We, in this paper, propose a novel adversarial multi-armed bandit approach that automatically learns to route source representations to appropriate target representations following which they are combined in meaningful ways to produce accurate target models. We see upwards of 5\% accuracy improvements compared with the state-of-the-art knowledge transfer methods on four benchmark (target) image datasets CUB200, Stanford Dogs, MIT67, and Stanford40 where the source dataset is ImageNet. We qualitatively analyze the goodness of our transfer scheme by showing individual examples of the important features focused on by our target network at different layers compared with the (closest) competitors. We also observe that our improvement over other methods is higher for smaller target datasets making it an effective tool for small data applications that may benefit from transfer learning., Comment: Camera ready ICLR 2022
Published: 2022

206. Improving Across-Dataset Brain Tissue Segmentation Using Transformer

Author: Rao, Vishwanatha M., Wan, Zihan, Arabshahi, Soroush, Ma, David J., Lee, Pin-Yu, Tian, Ye, Zhang, Xuzhe, Laine, Andrew F., and Guo, Jia
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, I.4.6
Abstract: Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentation tool. Despite the recent success of deep convolutional neural networks (CNNs) for brain tissue segmentation, many such solutions do not generalize well to new datasets, which is critical for a reliable solution. Transformers have demonstrated success in natural image segmentation and have recently been applied to 3D medical image segmentation tasks due to their ability to capture long-distance relationships in the input where the local receptive fields of CNNs struggle. This study introduces a novel CNN-Transformer hybrid architecture designed for brain tissue segmentation. We validate our model's performance across four multi-site T1w MRI datasets, covering different vendors, field strengths, scan parameters, time points, and neuropsychiatric conditions. In all situations, our model achieved the greatest generality and reliability. Out method is inherently robust and can serve as a valuable tool for brain-related T1w MRI studies. The code for the TABS network is available at: https://github.com/raovish6/TABS.
Published: 2022
Full Text: View/download PDF

207. How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Author: Zhang, Shuai, Wang, Meng, Liu, Sijia, Chen, Pin-Yu, and Xiong, Jinjun
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Signal Processing
Abstract: Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training., Comment: 36 pages
Published: 2022

208. Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics

Author: Jiang, Chunheng, Pedapati, Tejaswini, Chen, Pin-Yu, Sun, Yizhou, and Gao, Jianxi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping $\phi$, converting a neural network $G_A$ to a directed line graph $G_B$ that is defined on those edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$ as a predictive measure universally capturing the generalization capability of $G_A$ on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods., Comment: 19 pages, 7 figures, neural architecture search, mean-field
Published: 2022

209. How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

Author: Hongkang Li, Meng Wang 0003, Songtao Lu, Xiaodong Cui, and Pin-Yu Chen
Published: 2024

210. Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark.

Author: Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong 0001, Zhangyang Wang, Sijia Liu 0001, and Tianlong Chen
Published: 2024

211. What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding.

Author: Hongkang Li, Meng Wang 0003, Tengfei Ma, Sijia Liu 0001, Zaixi Zhang, and Pin-Yu Chen
Published: 2024

212. Larimar: Large Language Models with Episodic Memory Control.

Author: Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarathkrishna Swaminathan, Sihui Dai, Aurélie C. Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jirí Navrátil 0001, Soham Dan, and Pin-Yu Chen
Published: 2024

213. Position: TrustLLM: Trustworthiness in Large Language Models.

Author: Yue Huang, Lichao Sun 0001, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric P. Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang 0001, Huan Zhang 0001, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang 0001, Mohit Bansal, James Zou 0001, Jian Pei, Jian Liu, Jianfeng Gao 0001, Jiawei Han 0001, Jieyu Zhao, Jiliang Tang, Jindong Wang 0001, Joaquin Vanschoren, John C. Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He 0001, Lifu Huang, Michael Backes 0001, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu 0001, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu 0001, Tianyi Zhou 0001, William Wang 0001, Xiang Li 0001, Xiangliang Zhang 0001, Xiao Wang, Xing Xie 0001, Xun Chen, Xuyu Wang, Yan Liu 0002, Yanfang Ye 0001, Yinzhi Cao, Yong Chen, and Yue Zhao 0016
Published: 2024

214. SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning.

Author: Shuai Zhang 0015, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, and Meng Wang 0003
Published: 2024

215. Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts.

Author: Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, and Wei-Chen Chiu
Published: 2024

216. A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts.

Author: Mohammed Nowaz Rabbani Chowdhury, Meng Wang 0003, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, and Christopher D. Carothers
Published: 2024

217. What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian Benchmarks.

Author: Ching-Yun Ko, Pin-Yu Chen, Payel Das, Jeet Mohapatra, and Luca Daniel
Published: 2024

218. Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning.

Author: Zhiyuan He, Yijun Yang, Pin-Yu Chen, Qiang Xu 0001, and Tsung-Yi Ho
Published: 2024

219. Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series.

Author: Asterios Tsiourvas, Wei Sun 0031, Georgia Perakis, Pin-Yu Chen, and Yada Zhu
Published: 2024

220. Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

Author: Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia-You Chen, Bo Li 0026, Pin-Yu Chen, Chia-Mu Yu, and Chun-Ying Huang
Published: 2024

221. AutoVP: An Automated Visual Prompting Framework and Benchmark.

Author: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Si Liu 0001, and Tsung-Yi Ho
Published: 2024

222. It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.

Author: Chen Chen 0075, Ruizhe Li 0001, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, and Chao-Han Huck Yang
Published: 2024

223. The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models.

Author: Yan Liu 0002, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, and Tsung-Yi Ho
Published: 2024

224. Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective.

Author: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, and Tsung-Yi Ho
Published: 2024

225. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models.

Author: Ming Jin 0005, Shiyu Wang 0001, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen
Published: 2024

226. Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.

Author: Yuchen Hu, Chen Chen 0075, Chao-Han Huck Yang, Ruizhe Li 0001, Chao Zhang 0031, Pin-Yu Chen, and Engsiong Chng
Published: 2024

227. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Author: Xiangyu Qi, Yi Zeng 0005, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia 0001, Prateek Mittal, and Peter Henderson 0002
Published: 2024

228. Zeroth-order Optimization for Composite Problems with Functional Constraints

Author: Li, Zichong, Chen, Pin-Yu, Liu, Sijia, Lu, Songtao, and Xu, Yangyang
Subjects: Mathematics - Optimization and Control, 90C26, 90C30, 90C25, 90C60, 90C56, 90C06
Abstract: In many real-world problems, first-order (FO) derivative evaluations are too expensive or even inaccessible. For solving these problems, zeroth-order (ZO) methods that only need function evaluations are often more efficient than FO methods or sometimes the only options. In this paper, we propose a novel zeroth-order inexact augmented Lagrangian method (ZO-iALM) to solve black-box optimization problems, which involve a composite (i.e., smooth+nonsmooth) objective and functional constraints. Under a certain regularity condition (also assumed by several existing works on FO methods), the query complexity of our ZO-iALM is $\tilde{O}(d\varepsilon^{-3})$ to find an $\varepsilon$-KKT point for problems with a nonconvex objective and nonconvex constraints, and $\tilde{O}(d\varepsilon^{-2.5})$ for nonconvex problems with convex constraints, where $d$ is the variable dimension. This appears to be the first work that develops an iALM-based ZO method for functional constrained optimization and meanwhile achieves query complexity results matching the best-known FO complexity results up to a factor of $d$. With an extensive experimental study, we show the effectiveness of our method. The applications of our method span from classical optimization problems to practical machine learning examples such as resource allocation in sensor networks and adversarial example generation., Comment: AAAI 2022
Published: 2021

229. Network Graph Based Neural Architecture Search

Author: Huang, Zhenhan, Jiang, Chunheng, Chen, Pin-Yu, and Gao, Jianxi
Subjects: Computer Science - Machine Learning
Abstract: Neural architecture search enables automation of architecture design. Despite its success, it is computationally costly and does not provide an insight on how to design a desirable architecture. Here we propose a new way of searching neural network where we search neural architecture by rewiring the corresponding graph and predict the architecture performance by graph properties. Because we do not perform machine learning over the entire graph space and use predicted architecture performance to search architecture, the searching process is remarkably efficient. We find graph based search can give a reasonably good prediction of desirable architecture. In addition, we find graph properties that are effective to predict architecture performance. Our work proposes a new way of searching neural architecture and provides insights on neural architecture design., Comment: 12 pages
Published: 2021

230. Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework

Author: Ko, Ching-Yun, Mohapatra, Jeet, Liu, Sijia, Chen, Pin-Yu, Daniel, Luca, and Weng, Lily
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: As a seminal tool in self-supervised representation learning, contrastive learning has gained unprecedented attention in recent years. In essence, contrastive learning aims to leverage pairs of positive and negative samples for representation learning, which relates to exploiting neighborhood information in a feature space. By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. Under our proposed framework, we show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks. With the integrated framework, we achieve up to 6\% improvement on the standard accuracy and 17\% improvement on the robust accuracy.
Published: 2021

231. Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines

Author: Sun, Jiachen, Mehra, Akshay, Kailkhura, Bhavya, Chen, Pin-Yu, Hendrycks, Dan, Hamm, Jihun, and Mao, Z. Morley
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Certified robustness guarantee gauges a model's robustness to test-time attacks and can assess the model's readiness for deployment in the real world. In this work, we critically examine how the adversarial robustness guarantees from randomized smoothing-based certification methods change when state-of-the-art certifiably robust models encounter out-of-distribution (OOD) data. Our analysis demonstrates a previously unknown vulnerability of these models to low-frequency OOD data such as weather-related corruptions, rendering these models unfit for deployment in the wild. To alleviate this issue, we propose a novel data augmentation scheme, FourierMix, that produces augmentations to improve the spectral coverage of the training data. Furthermore, we propose a new regularizer that encourages consistent predictions on noise perturbations of the augmented data to improve the quality of the smoothed models. We find that FourierMix augmentations help eliminate the spectral bias of certifiably robust models enabling them to achieve significantly better robustness guarantees on a range of OOD benchmarks. Our evaluation also uncovers the inability of current OOD benchmarks at highlighting the spectral biases of the models. To this end, we propose a comprehensive benchmarking suite that contains corruptions from different regions in the spectral domain. Evaluation of models trained with popular augmentation methods on the proposed suite highlights their spectral biases and establishes the superiority of FourierMix trained models at achieving better-certified robustness guarantees under OOD shifts over the entire frequency spectrum., Comment: 21 pages, 15 figures, and 9 tables
Published: 2021

232. Pessimistic Model Selection for Offline Deep Reinforcement Learning

Author: Yang, Chao-Han Huck, Qi, Zhengling, Cui, Yifan, and Chen, Pin-Yu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Systems and Control
Abstract: Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the optimal policy. Numerical studies demonstrated the superior performance of our approach over existing methods., Comment: Preprint. A non-archival and preliminary venue was presented at NeurIPS 2021 Offline Reinforcement Learning Workshop
Published: 2021

233. Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

Author: Li, Yu-Hsuan, Chao, Tzu-Yin, Huang, Ching-Chun, Chen, Pin-Yu, and Chiu, Wei-Chen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Most of the existing algorithms for zero-shot classification problems typically rely on the attribute-based semantic relations among categories to realize the classification of novel categories without observing any of their instances. However, training the zero-shot classification models still requires attribute labeling for each class (or even instance) in the training dataset, which is also expensive. To this end, in this paper, we bring up a new problem scenario: "Can we derive zero-shot learning for novel attribute detectors/classifiers and use them to automatically annotate the dataset for labeling efficiency?". Basically, given only a small set of detectors that are learned to recognize some manually annotated attributes (i.e., the seen attributes), we aim to synthesize the detectors of novel attributes in a zero-shot learning manner. Our proposed method, Zero-Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge, tackles this new research problem by applying the set operations to first decompose the seen attributes into their basic attributes and then recombine these basic attributes into the novel ones. Extensive experiments are conducted to verify the capacity of our synthesized detectors for accurately capturing the semantics of the novel attributes and show their superior performance in terms of detection and localization compared to other baseline approaches. Moreover, we demonstrate the application of automatic annotation using our synthesized detectors on Caltech-UCSD Birds-200-2011 dataset. Various generalized zero-shot classification algorithms trained upon the dataset re-annotated by ZSLA show comparable performance with those trained with the manual ground-truth annotations. Please refer to our project page for source code: https://yuhsuanli.github.io/ZSLA/, Comment: Accepted by the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). (* Yu-Hsuan Li and Tzu-Yin Chao contributed equally to this work.)
Published: 2021

234. Meta Adversarial Perturbations

Author: Yuan, Chia-Hung, Chen, Pin-Yu, and Yu, Chia-Mu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: A plethora of attack methods have been proposed to generate adversarial examples, among which the iterative methods have been demonstrated the ability to find a strong attack. However, the computation of an adversarial perturbation for a new data point requires solving a time-consuming optimization problem from scratch. To generate a stronger attack, it normally requires updating a data point with more iterations. In this paper, we show the existence of a meta adversarial perturbation (MAP), a better initialization that causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update, and propose an algorithm for computing such perturbations. We conduct extensive experiments, and the empirical results demonstrate that state-of-the-art deep neural networks are vulnerable to meta perturbations. We further show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures., Comment: Published in AAAI 2022 Workshop
Published: 2021

235. Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

Author: Mukherjee, Arpan, Tajer, Ali, Chen, Pin-Yu, and Das, Payel
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.
Published: 2021

236. When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Author: Fan, Lijie, Liu, Sijia, Chen, Pin-Yu, Zhang, Gaoyuan, and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Contrastive learning (CL) can learn generalizable feature representations and achieve the state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the self-supervised pretraining + supervised finetuning paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such a challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e.g., resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that AdvCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100, and STL-10) and finetuning schemes (linear evaluation and full model finetuning)., Comment: NeurIPS 2021. Code is available at https://github.com/LijieFan/AdvCL
Published: 2021

237. CAFE: Catastrophic Data Leakage in Vertical Federated Learning

Author: Jin, Xiao, Chen, Pin-Yu, Hsu, Chia-Yi, Yu, Chia-Mu, and Chen, Tianyi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems, such as federated learning (FL). Increasing batch size to complicate data recovery is often viewed as a promising defense strategy against data leakage. In this paper, we revisit this defense premise and propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients. We name our proposed method as catastrophic data leakage in vertical federated learning (CAFE). Comparing to existing data leakage attacks, our extensive experimental results on vertical FL settings demonstrate the effectiveness of CAFE to perform large-batch data leakage attack with improved data recovery quality. We also propose a practical countermeasure to mitigate CAFE. Our results suggest that private data participated in standard FL, especially the vertical case, have a high risk of being leaked from the training gradients. Our analysis implies unprecedented and practical data leakage risks in those learning settings. The code of our work is available at https://github.com/DeRafael/CAFE.
Published: 2021

238. How and When Adversarial Robustness Transfers in Knowledge Distillation?

Author: Shao, Rulin, Yi, Jinfeng, Chen, Pin-Yu, and Hsieh, Cho-Jui
Subjects: Computer Science - Machine Learning
Abstract: Knowledge distillation (KD) has been widely used in teacher-student training, with applications to model compression in resource-constrained deep learning. Current works mainly focus on preserving the accuracy of the teacher model. However, other important model properties, such as adversarial robustness, can be lost during distillation. This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in KD. We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model. Our experiments of KD contain a diverse set of teacher and student models with varying network architectures and sizes evaluated on ImageNet and CIFAR-10 datasets, including residual neural networks (ResNets) and vision transformers (ViTs). Our comprehensive analysis shows several novel insights that (1) With KDIGA, students can preserve or even exceed the adversarial robustness of the teacher model, even when their models have fundamentally different architectures; (2) KDIGA enables robustness to transfer to pre-trained students, such as KD from an adversarially trained ResNet to a pre-trained ViT, without loss of clean accuracy; and (3) Our derived local linearity bounds for characterizing adversarial robustness in KD are consistent with the empirical results.
Published: 2021

239. Robust Event Classification Using Imperfect Real-world PMU Data

Author: Liu, Yunchuan, Yang, Lei, Ghasemkhani, Amir, Livani, Hanif, Centeno, Virgilio A., Chen, Pin-Yu, and Zhang, Junshan
Subjects: Computer Science - Machine Learning
Abstract: This paper studies robust event classification using imperfect real-world phasor measurement unit (PMU) data. By analyzing the real-world PMU data, we find it is challenging to directly use this dataset for event classifiers due to the low data quality observed in PMU measurements and event logs. To address these challenges, we develop a novel machine learning framework for training robust event classifiers, which consists of three main steps: data preprocessing, fine-grained event data extraction, and feature engineering. Specifically, the data preprocessing step addresses the data quality issues of PMU measurements (e.g., bad data and missing data); in the fine-grained event data extraction step, a model-free event detection method is developed to accurately localize the events from the inaccurate event timestamps in the event logs; and the feature engineering step constructs the event features based on the patterns of different event types, in order to improve the performance and the interpretability of the event classifiers. Based on the proposed framework, we develop a workflow for event classification using the real-world PMU data streaming into the system in real-time. Using the proposed framework, robust event classifiers can be efficiently trained based on many off-the-shelf lightweight machine learning models. Numerical experiments using the real-world dataset from the Western Interconnection of the U.S power transmission grid show that the event classifiers trained under the proposed framework can achieve high classification accuracy while being robust against low-quality data.
Published: 2021

240. Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks

Author: Zhang, Shuai, Wang, Meng, Liu, Sijia, Chen, Pin-Yu, and Xiong, Jinjun
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: The \textit{lottery ticket hypothesis} (LTH) states that learning on a properly pruned network (the \textit{winning ticket}) improves test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, when the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.
Published: 2021

241. Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

Author: Yen, Hao, Ku, Pin-Jui, Yang, Chao-Han Huck, Hu, Hu, Siniscalchi, Sabato Marco, Chen, Pin-Yu, and Tsao, Yu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound
Abstract: In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data., Comment: Accepted to Interspeech 2023. Code is available at: https://github.com/dodohow1011/SpeechAdvReprogram. Selected as Best Student Paper Candidate
Published: 2021
Full Text: View/download PDF

242. QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks

Author: Qi, Jun, Yang, Chao-Han Huck, and Chen, Pin-Yu
Subjects: Quantum Physics, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks. To bridge the gap, this work proposes an end-to-end learning framework named QTN-VQC, by introducing a trainable quantum tensor network (QTN) for quantum embedding on a variational quantum circuit (VQC). The architecture of QTN is composed of a parametric tensor-train network for feature extraction and a tensor product encoding for quantum embedding. We highlight the QTN for quantum embedding in terms of two perspectives: (1) we theoretically characterize QTN by analyzing its representation power of input features; (2) QTN enables an end-to-end parametric model pipeline, namely QTN-VQC, from the generation of quantum embedding to the output measurement. Our experiments on the MNIST dataset demonstrate the advantages of QTN for quantum embedding over other quantum embedding approaches., Comment: Preprint. A Non-archival and preliminary venue was presented in NeurIPS 2021, Quantum Tensor Networks in Machine Learning Workshop
Published: 2021

243. AI Explainability 360: Impact and Design

Author: Arya, Vijay, Bellamy, Rachel K. E., Chen, Pin-Yu, Dhurandhar, Amit, Hind, Michael, Hoffman, Samuel C., Houde, Stephanie, Liao, Q. Vera, Luss, Ronny, Mojsilovic, Aleksandra, Mourad, Sami, Pedemonte, Pablo, Raghavendra, Ramya, Richards, John, Sattigeri, Prasanna, Shanmugam, Karthikeyan, Singh, Moninder, Varshney, Kush R., Wei, Dennis, and Zhang, Yunfeng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods and two evaluation metrics. This paper examines the impact of the toolkit with several case studies, statistics, and community feedback. The different ways in which users have experienced AI Explainability 360 have resulted in multiple types of impact and improvements in multiple metrics, highlighted by the adoption of the toolkit by the independent LF AI & Data Foundation. The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users., Comment: arXiv admin note: text overlap with arXiv:1909.03012
Published: 2021

244. Real-World Adversarial Examples involving Makeup Application

Author: Lin, Chang-Sheng, Hsu, Chia-Yi, Chen, Pin-Yu, and Yu, Chia-Mu
Subjects: Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Deep neural networks have developed rapidly and have achieved outstanding performance in several tasks, such as image classification and natural language processing. However, recent studies have indicated that both digital and physical adversarial examples can fool neural networks. Face-recognition systems are used in various applications that involve security threats from physical adversarial examples. Herein, we propose a physical adversarial attack with the use of full-face makeup. The presence of makeup on the human face is a reasonable possibility, which possibly increases the imperceptibility of attacks. In our attack framework, we combine the cycle-adversarial generative network (cycle-GAN) and a victimized classifier. The Cycle-GAN is used to generate adversarial makeup, and the architecture of the victimized classifier is VGG 16. Our experimental results show that our attack can effectively overcome manual errors in makeup application, such as color and position-related errors. We also demonstrate that the approaches used to train the models can influence physical attacks; the adversarial perturbations crafted from the pre-trained model are affected by the corresponding training data.
Published: 2021

245. Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

Author: Mehra, Akshay, Kailkhura, Bhavya, Chen, Pin-Yu, and Hamm, Jihun
Subjects: Computer Science - Machine Learning
Abstract: Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from that of the target. However, UDA is not always successful and several accounts of `negative transfer' have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data in the source domain. The failure of these UDA methods demonstrates their limitations at guaranteeing cross-domain generalization consistent with our lower bound. Thus, evaluating UDA methods in adversarial settings such as data poisoning provides a better sense of their robustness to data distributions unfavorable for UDA., Comment: Neurips 2021
Published: 2021

246. Dual-task multicomponent exercise–cognitive intervention improved cognitive function and functional fitness in older adults

Author: Chen, Yi-Ling, Tseng, Chien-Hsing, Lin, Hsin-Tzu, Wu, Pin-Yu, and Chao, Hsueh-Chin
Published: 2023
Full Text: View/download PDF

247. Necrosis in lymph nodes and their differential diagnoses: application of reticulin staining

Author: Yu, Shan-Chi, Chen, Han-Ho, and Lin, Pin-Yu
Published: 2023
Full Text: View/download PDF

248. CURE: A deep learning framework pre-trained on large-scale patient data for treatment effect estimation

Author: Liu, Ruoqi, Chen, Pin-Yu, and Zhang, Ping
Published: 2024
Full Text: View/download PDF

249. Aromatic and arginine content drives multiphasic condensation of protein-RNA mixtures

Author: Chew, Pin Yu, Joseph, Jerelle A., Collepardo-Guevara, Rosana, and Reinhardt, Aleks
Published: 2024
Full Text: View/download PDF

250. MAML is a Noisy Contrastive Learner in Classification

Author: Kao, Chia-Hsiang, Chiu, Wei-Chen, and Chen, Pin-Yu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Model-agnostic meta-learning (MAML) is one of the most popular and widely adopted meta-learning algorithms, achieving remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates, which govern the task-specific and meta-model-centric learning, respectively, the underlying learning objective of MAML remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective of the working mechanism of MAML. We discover that MAML is analogous to a meta-learner using a supervised contrastive objective. The query features are pulled towards the support features of the same class and against those of different classes. Such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, we reveal that vanilla MAML has an undesirable interference term originating from the random initialization and the cross-task interaction. We thus propose a simple but effective technique, zeroing trick, to alleviate the interference. Extensive experiments are conducted on both mini-ImageNet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed method, validating its effectiveness., Comment: 22 pages, 17 figures. Accepted by ICLR 2022
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

3,059 results on '"Pin Yu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources