Author: "Cho-Jui Hsieh" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Stochastically Controlled Compositional Gradient for Composition Problems

Author: Liu Liu, Cho-Jui Hsieh, Dacheng Tao, and Ji Liu
Subjects: Mathematical optimization, Computer Networks and Communications, Computer science, Computation, MathematicsofComputing_NUMERICALANALYSIS, Of the form, Function (mathematics), Composition (combinatorics), Computer Science Applications, Stochastic gradient descent, Artificial Intelligence, Convergence (routing), Gradient descent, Convex function, Software
Abstract: We consider composition problems of the form , which are important for machine learning. Although gradient descent and stochastic gradient descent are straightforward solutions, the essential computation of in each single iteration is expensive, let alone for large m. In this article, we devise a stochastically controlled compositional gradient algorithm. Specifically, we introduce two variants of stochastically controlled technique to estimate the inner function G(x) and the gradient of the objective function, respectively. The computational cost is largely reduced. However, the natural needs of two stochastic subsets D1 and D2 form direct barriers to guarantee the convergence of the algorithm, especially the theoretical proof of the convergence. To this end, we present a general convergence analysis by proving and , through which the proposed method significantly improve composition algorithms under low target accuracy (i.e., 1/e≪ m or n) in both strongly convex and nonconvex settings. Comprehensive experiments demonstrate the superiority of the proposed method over existing methods.
Published: 2023

2. Supracellular measurement of spatially varying mechanical heterogeneities in live monolayers

Author: Alexandra Bermudez, Zachary Gonzalez, Bao Zhao, Ethan Salter, Xuanqing Liu, Leixin Ma, Mohammad Khalid Jawed, Cho-Jui Hsieh, and Neil Y.C. Lin
Subjects: Elastic Modulus, Biophysics, Epithelial Cells, Stress, Mechanical
Abstract: The mechanical properties of tissues have profound impacts on a wide range of biological processes such as embryo development (1,2), wound healing (3-6), and disease progression (7). Specifically, the spatially varying moduli of cells largely influence the local tissue deformation and intercellular interaction. Despite the importance of characterizing such a heterogeneous mechanical property, it has remained difficult to measure the supracellular modulus field in live cell layers with a high-throughput and minimal perturbation. In this work, we developed a monolayer effective modulus measurement by integrating a custom cell stretcher, light microscopy, and AI-based inference. Our approach first quantifies the heterogeneous deformation of a slightly stretched cell layer and converts the measured strain fields into an effective modulus field using an AI inference. This method allowed us to directly visualize the effective modulus distribution of thousands of cells virtually instantly. We characterized the mean value, SD, and correlation length of the effective cell modulus for epithelial cells and fibroblasts, which are in agreement with previous results. We also observed a mild correlation between cell area and stiffness in jammed epithelia, suggesting the influence of cell modulus on packing. Overall, our reported experimental platform provides a valuable alternative cell mechanics measurement tool that can be integrated with microscopy-based characterizations.
Published: 2022

3. FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

Author: Patrick Chen, Wei-Cheng Chang, Jyun-Yu Jiang, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh
Published: 2023

4. Gradient‐Based Optimizers for Statistics and Machine Learning

Author: Cho‐Jui Hsieh
Published: 2022

5. Overview of adversarial defense

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

6. Incomplete neural network verification

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

7. Model watermarking and fingerprinting

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

8. Certified robustness training

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

9. Model reprogramming

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

10. Adversarial attacks beyond image classification

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

11. Black-box adversarial attacks

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

12. Adversary detection

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

13. White-box adversarial attacks

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

14. Background and motivation

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

15. Randomization-based defense

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

16. Training-time adversarial attacks

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

17. Physical adversarial attacks

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

18. Data augmentation for unsupervised machine learning

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

19. Complete neural network verification

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

20. Adversarial robustness in meta-learning and contrastive learning

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

21. Contrastive explanations

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

22. Verification against semantic perturbations

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

23. Adversarial training

Author: Pin-Yu Chen and Cho-Jui Hsieh
Published: 2023

24. Communication-avoiding kernel ridge regression on parallel and distributed systems

Author: James Demmel, Yang You, Richard Vuduc, Cho-Jui Hsieh, and Jingyue Huang
Subjects: Discrete mathematics, Source code, Current (mathematics), Speedup, Computer science, media_common.quotation_subject, FLOPS, Data matrix (multivariate statistics), General Earth and Planetary Sciences, Partition (number theory), Constant (mathematics), Scaling, General Environmental Science, media_common
Abstract: Kernel ridge regression (KRR) is a fundamental method in machine learning. Given an n-by-d data matrix as input, a traditional implementation requires $$\Theta (n^2)$$ memory to form an n-by-n kernel matrix and $$\Theta (n^3)$$ flops to compute the final model. These time and storage costs prohibit KRR from scaling up to large datasets. For example, even on a relatively small dataset (a 520k-by-90 input requiring 357 MB), KRR requires 2 TB memory just to store the kernel matrix. The reason is that n usually is much larger than d for real-world applications. On the other hand, weak scaling becomes a problem: if we keep d and n/p fixed as p grows (p is the number of machines), the memory needed grows as $$\Theta (p)$$ per processor and the flops as $$\Theta (p^2)$$ per processor. In the perfect weak scaling situation, both the memory needed and the flops grow as $$\Theta (1)$$ per processor (i.e. memory and flops are constant). The traditional Distributed KRR implementation (DKRR) only achieved 0.32% weak scaling efficiency from 96 to 1536 processors. We propose two new methods to address these problems: the balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling efficiency from 0.32 to 38% and achieves a 591 $$\times $$ speedup for getting the same accuracy by using the same data and the same hardware (1536 processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than the current fastest method using less training time for a variety of datasets. For the applications requiring only approximate solutions, BKRR2 improves the weak scaling efficiency to 92% and achieves 3505 $$\times $$ speedup (theoretical speedup: 4096 $$\times $$ ). The source code of this paper is available at https://people.eecs.berkeley.edu/~youyang/cakrr.zip .
Published: 2021

25. The Fourth Workshop on Adversarial Learning Methods for Machine Learning and Data Mining (AdvML 2022)

Author: Pin-Yu Chen, Cho-Jui Hsieh, Bo Li, and Sijia Liu
Published: 2022

26. PECOS

Author: Hsiang-Fu Yu, Jiong Zhang, Wei-Cheng Chang, Jyun-Yu Jiang, Wei Li, and Cho-Jui Hsieh
Published: 2022

27. FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

Author: Chen, Patrick H., Wei-cheng, Chang, Hsiang-fu, Yu, Dhillon, Inderjit S., and Cho-jui, Hsieh
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, for example, as a fast search procedure with two tower deep learning models. Graph-based methods for AKNNS in particular have received great attention due to their superior performance. These methods rely on greedy graph search to traverse the data points as embedding vectors in a database. Under this greedy search scheme, we make a key observation: many distance computations do not influence search updates so these computations can be approximated without hurting performance. As a result, we propose FINGER, a fast inference method to achieve efficient graph search. FINGER approximates the distance function by estimating angles between neighboring residual vectors with low-rank bases and distribution matching. The approximated distance can be used to bypass unnecessary computations, which leads to faster searches. Empirically, accelerating a popular graph-based method named HNSW by FINGER is shown to outperform existing graph-based methods by 20%-60% across different benchmark datasets.
Published: 2022

28. Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

Author: Li-Cheng Lan, Ti-Rong Wu, I-Chen Wu, and Cho-Jui Hsieh
Subjects: General Medicine
Abstract: Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61\% winning rate against the original program.
Published: 2021

29. Multi-Proxy Wasserstein Classifier for Image Classification

Author: Benlin Liu, Yongming Rao, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh
Subjects: General Medicine
Abstract: Most widely-used convolutional neural networks (CNNs) end up with a global average pooling layer and a fully-connected layer. In this pipeline, a certain class is represented by one template vector preserved in the feature banks of fully-connected layer. Yet, a class may have multiple properties useful for recognition while the above formulation only captures one of them. Therefore, it is desired to represent a class by multiple proxies. However, directly adding multiple linear layers turns out to be a trivial solution as no improvement can be observed. To tackle this problem, we adopt optimal transport theory to calculate a non-uniform matching flow between the elements in the feature map of a sample and the proxies of a class in a closed way. By doing so, the models are enabled to achieve partial matching as both the feature maps and the proxy set can now focus on a subset of elements from the counterpart. Such formulation also enables us to embed the samples into the Wasserstein metric space, which has many advantages over the original Euclidean space. This formulation can be achieved by a lightweight iterative algorithm, which can be easily embedded into the automatic differentiation framework. Empirical studies are performed on two widely-used classification datasets, CIFAR, and ILSVRC2012, and the substantial improvements on these two benchmarks demonstrate the effectiveness of our method.
Published: 2021

30. Investigating heterogeneities of live mesenchymal stromal cells using AI-based label-free imaging

Author: Sara Imboden, Marie C. Payne, Brandon S. Lee, Cho-Jui Hsieh, Xuanqing Liu, and Neil Y. C. Lin
Subjects: Cell biology, Science, Cell, Fluorescent Antibody Technique, Gene Expression, Immunofluorescence Microscopy, Computational biology, Stem cells, Biology, Regenerative medicine, Protein expression, Article, Flow cytometry, Artificial Intelligence, Immunochemistry, Machine learning, medicine, Image Processing, Computer-Assisted, Label free, Multidisciplinary, medicine.diagnostic_test, Staining and Labeling, Gene Expression Profiling, Mesenchymal stem cell, Computational Biology, Mesenchymal Stem Cells, Flow Cytometry, Computational biology and bioinformatics, Molecular Imaging, medicine.anatomical_structure, Medicine, Biomarkers, Biotechnology
Abstract: Mesenchymal stromal cells (MSCs) are multipotent cells that have great potential for regenerative medicine, tissue repair, and immunotherapy. Unfortunately, the outcomes of MSC-based research and therapies can be highly inconsistent and difficult to reproduce, largely due to the inherently significant heterogeneity in MSCs, which has not been well investigated. To quantify cell heterogeneity, a standard approach is to measure marker expression on the protein level via immunochemistry assays. Performing such measurements non-invasively and at scale has remained challenging as conventional methods such as flow cytometry and immunofluorescence microscopy typically require cell fixation and laborious sample preparation. Here, we developed an artificial intelligence (AI)-based method that converts transmitted light microscopy images of MSCs into quantitative measurements of protein expression levels. By training a U-Net+ conditional generative adversarial network (cGAN) model that accurately (mean $$r_s$$ r s = 0.77) predicts expression of 8 MSC-specific markers, we showed that expression of surface markers provides a heterogeneity characterization that is complementary to conventional cell-level morphological analyses. Using this label-free imaging method, we also observed a multi-marker temporal-spatial fluctuation of protein distributions in live MSCs. These demonstrations suggest that our AI-based microscopy can be utilized to perform quantitative, non-invasive, single-cell, and multi-marker characterizations of heterogeneous live MSC culture. Our method provides a foundational step toward the instant integrative assessment of MSC properties, which is critical for high-throughput screening and quality control in cellular therapies.
Published: 2021

31. Towards Efficient and Scalable Sharpness-Aware Minimization

Author: Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, and Yang You
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of the loss landscape and generalization, has demonstrated significant performance boosts on training large-scale models such as vision transformers. However, the update rule of SAM requires two sequential (non-parallelizable) gradient computations at each step, which can double the computational overhead. In this paper, we propose a novel algorithm LookSAM - that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. The empirical results illustrate that LookSAM achieves similar accuracy gains to SAM while being tremendously faster - it enjoys comparable computational complexity with first-order optimizers such as SGD or Adam. To further evaluate the performance and scalability of LookSAM, we incorporate a layer-wise modification and perform experiments in the large-batch training scenario, which is more prone to converge to sharp local minima. We are the first to successfully scale up the batch size when training Vision Transformers (ViTs). With a 64k batch size, we are able to train ViTs from scratch in minutes while maintaining competitive performance., Accepted by CVPR 2022
Published: 2022

32. Fast LSTM by dynamic decomposition on cloud and distributed systems

Author: Yuxiong He, Wenhan Wang, Yang You, Kurt Keutzer, James Demmel, Samyam Rajbhandari, and Cho-Jui Hsieh
Subjects: Floating point, Speedup, Computer science, business.industry, Deep learning, Inference, 02 engineering and technology, Human-Computer Interaction, Computer engineering, Artificial Intelligence, Hardware and Architecture, 020204 information systems, Singular value decomposition, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Cache, Language model, Latency (engineering), business, Software, Information Systems
Abstract: Long short-term memory (LSTM) is a powerful deep learning technique that has been widely used in many real-world data-mining applications such as language modeling and machine translation. In this paper, we aim to minimize the latency of LSTM inference on cloud systems without losing accuracy. If an LSTM model does not fit in cache, the latency due to data movement will likely be greater than that due to computation. In this case, we reduce model parameters. If, as in most applications we consider, the LSTM models are able to fit the cache of cloud server processors, we focus on reducing the number of floating point operations, which has a corresponding linear impact on the latency of the inference calculation. Thus, in our system, we dynamically reduce model parameters or flops depending on which most impacts latency. Our inference system is based on singular value decomposition and canonical polyadic decomposition. Our system is accurate and low latency. We evaluate our system based on models from a series of real-world applications like language modeling, computer vision, question answering, and sentiment analysis. Users of our system can use either pre-trained models or start from scratch. Our system achieves 15 $$\times $$ average speedup for six real-world applications without losing accuracy in inference. We also design and implement a distributed optimization system with dynamic decomposition, which can significantly reduce the energy cost and accelerate the training process.
Published: 2020

33. Learning to Learn with Smooth Regularization

Author: Yuanhao Xiong and Cho-Jui Hsieh
Published: 2022

34. Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples

Author: Jianhan Xu, Cenyuan Zhang, Xiaoqing Zheng, Linyang Li, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang
Published: 2022

35. Phenotyping senescent mesenchymal stromal cells using AI image translation

Author: Leya Weber, Brandon S. Lee, Sara Imboden, Cho-Jui Hsieh, and Neil Y.C. Lin
Subjects: Biotechnology
Published: 2023

36. Towards Robustness of Deep Neural Networks via Regularization

Author: Yao Li, Martin Renqiang Min, Thomas Lee, Wenchao Yu, Erik Kruus, Wei Wang, and Cho-Jui Hsieh
Published: 2021

37. 2.5D visual relationship detection

Author: Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, and Boqing Gong
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Signal Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition, Software
Abstract: Visual 2.5D perception involves understanding the semantics and geometry of a scene through reasoning about object relationships with respect to the viewer in an environment. However, existing works in visual recognition primarily focus on the semantics. To bridge this gap, we study 2.5D visual relationship detection (2.5VRD), in which the goal is to jointly detect objects and predict their relative depth and occlusion relationships. Unlike general VRD, 2.5VRD is egocentric, using the camera's viewpoint as a common reference for all 2.5D relationships. Unlike depth estimation, 2.5VRD is object-centric and not only focuses on depth. To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2.5D relationships among 512K objects from 11K images. We analyze this dataset and conduct extensive experiments including benchmarking multiple state-of-the-art VRD models on this task. Our results show that existing models largely rely on semantic cues and simple heuristics to solve 2.5VRD, motivating further research on models for 2.5D perception. The new dataset is available at https://github.com/google-research-datasets/2.5vrd.
Published: 2022

38. Measures and Best Practices for Responsible AI

Author: Sunipa Dev, Jwala Dhamala, Cho-Jui Hsieh, and Mehrnoosh Sameki
Subjects: Counterfactual thinking, Coreference, Computer science, Robustness (computer science), Causal inference, Best practice, Language model, Data science, Task (project management), Interpretability
Abstract: The use of machine learning (ML) based systems has become ubiquitous including their usage in critical applications like medicine and assistive technologies. Therefore, it is important to determine the trustworthiness of these ML models and tasks. A key component in this determination is the development of task specific datasets, metrics, and best practices which are able to measure the various aspects of responsible model development and deployment including robustness, interpretability and fairness. Further, datasets are also key when training for a given task, be it coreference resolution in language modeling or facial recognition in computer vision. Imbalances and inadequate representation in datasets can have repercussions of an undesirable nature. Some common examples include how coreference resolution systems in NLU are often not all gender inclusive, discrepancies in the measurement of how robust and trustworthy machine predictions are in domains where the selective labels problem is prevalent, and discriminatory determination of pain or care levels of people belonging to different demographics in health science applications. Development of task specific datasets which do better in this regard is also extremely vital. In this workshop, we invite contributions towards different (i) datasets which help enhance task performance and inclusivity, (ii) measures and metrics which help in determining the trustworthiness of a model/dataset, (iii) assessment or remediation tools for fairer, more transparent, robust, and reliable models, and (iv) case studies describing responsible development and deployment of AI systems across fields such as healthcare, financial services, insurance, etc. The datasets, measures, mitigation techniques, and best practices could focus on different areas including (but not restricted to) the following: Fairness and Bias Robustness Reliability and Safety Interpretability Explainability Ethical AI Causal Inference Counterfactual Example Analysis They could also be focussed on the applications in diverse fields such as industry, finance, healthcare and beyond. Text based datasets can be in languages other than English as well.
Published: 2021

39. Third Workshop on Adversarial Learning Methods for Machine Learning and Data Mining (AdvML 2021)

Author: Cho-Jui Hsieh, Pin-Yu Chen, Sijia Liu, and Bo Li
Subjects: Computer science, business.industry, Adversarial machine learning, computer.software_genre, Machine learning, Field (computer science), Adversarial system, Robustness (computer science), Research community, Learning methods, Data mining, Artificial intelligence, business, Generative adversarial network, computer
Abstract: Adversarial learning methods and their applications such as generative adversarial network, adversarial robustness, and security and privacy, have prevailed and revolutionized the research in machine learning and data mining. Their importance has not only been emphasized by the research community but also been widely recognized by the industry and the general public. Continuing the synergies in previous years, this third annual workshop aims to advance this research field. The AdvML'21 workshop consists of three tracks: (i) open-call paper submissions; (ii) invited speakers; and (iii) rising star awards and presentations. The full details about the workshop can be found at https://sites.google.com/view/advml.
Published: 2021

40. Efficient Contextual Representation Learning With Continuous Outputs

Author: Patrick H. Chen, Cho-Jui Hsieh, Liunian Harold Li, and Kai-Wei Chang
Subjects: Linguistics and Language, Computer science, business.industry, Communication, Representation (systemics), lcsh:P98-98.5, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Computer Science Applications, Human-Computer Interaction, Downstream (manufacturing), Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Computational linguistics. Natural language processing, business, Encoder, Feature learning, 0105 earth and related environmental sciences
Abstract: Contextual representation models have achieved great success in improving various downstream natural language processing tasks. However, these language-model-based encoders are difficult to train due to their large parameter size and high computational complexity. By carefully examining the training procedure, we observe that the softmax layer, which predicts a distribution of the target word, often induces significant overhead, especially when the vocabulary size is large. Therefore, we revisit the design of the output layer and consider directly predicting the pre-trained embedding of the target word for a given context. When applied to ELMo, the proposed approach achieves a 4-fold speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. Further analysis shows that the approach maintains the speed advantage under various settings, even when the sentence encoder is scaled up.
Published: 2019

41. RedSync: Reducing synchronization bandwidth for distributed deep learning training system

Author: Haohuan Fu, Jiarui Fang, Cho-Jui Hsieh, and Guangwen Yang
Subjects: Contextual image classification, Computer Networks and Communications, business.industry, Data parallelism, Computer science, Deep learning, Node (networking), 020206 networking & telecommunications, Data compression ratio, 02 engineering and technology, Theoretical Computer Science, Computer engineering, Artificial Intelligence, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Bandwidth (computing), Overhead (computing), 020201 artificial intelligence & image processing, Artificial intelligence, business, Software, Data compression
Abstract: Data parallelism has become a dominant method to scale Deep Neural Network (DNN) training across multiple nodes. Since the bandwidth requirement of synchronizing the gradients of the local model can be a bottleneck for large-scale distributed training, compressing communication traffic has gained widespread attention recently. Among several recent proposed compression algorithms, Residual Gradient Compression (RGC) is one of the most successful approaches—it can significantly compress the transmitting message size (0.1% of the gradient size) of each node and still achieve correct accuracy and the same convergence speed. However, the literature on compressing deep networks focuses almost exclusively on achieving good theoretical compression rate, while the efficiency of RGC in real implementation has been less investigated. In this paper, we develop an RGC method that is able to reduce the end-to-end training time on real-world multi-GPU systems. Our proposed RGC system design called RedSync, introduces a set of optimizations to reduce communication bandwidth while introducing limited overhead. We examine the performance of RedSync on two different multiple GPU platforms, including 128 GPUs of a supercomputer and an 8-GPU server. Our test cases include image classification on Cifar10 and ImageNet, and language modeling tasks on Penn Treebank and Wiki2 datasets. For DNNs featured with high communication to computation ratio, which has long been considered with poor scalability, RedSync shows significant performance improvement.
Published: 2019

42. Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

Author: Kurt Keutzer, Cho-Jui Hsieh, James Demmel, Zhao Zhang, and Yang You
Subjects: 020203 distributed computing, Speedup, Xeon, Artificial neural network, business.industry, Computer science, Cloud computing, 02 engineering and technology, Parallel computing, Supercomputer, Stochastic gradient descent, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), business, Xeon Phi
Abstract: Since its creation, the ImageNet-1k benchmark set has played a significant role as a benchmark for ascertaining the accuracy of different deep neural net (DNN) models on the image classification problem. Moreover, in recent years it has also served as the principal benchmark for assessing different approaches to DNN training. Finishing a 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires $10^{18}$1018 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish $3 \times 10^{17}$3×1017 single precision operations per second (according to the Nov 2018 Top 500 results). If we can make full use of the computing capability of the fastest supercomputer, we should be able to finish the training in several seconds. Over the last two years, researchers have focused on closing this significant performance gap through scaling DNN training to larger numbers of processors. Most successful approaches to scaling ImageNet training have used the synchronous mini-batch stochastic gradient descent (SGD). However, to scale synchronous SGD one must also increase the batch size used in each iteration. Thus, for many researchers, the focus on scaling DNN training has translated into a focus on developing training algorithms that enable increasing the batch size in data-parallel synchronous SGD without losing accuracy over a fixed number of epochs. In this paper, we investigate supercomputers’ capability of speeding up DNN training. Our approach is to use a large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on five neural networks: AlexNet, AlexNet-BN, GNMT, ResNet-50, and ResNet-50-v2 trained with large datasets while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from Goyal et al. [1] , our approach shows higher test accuracy on batch sizes that are larger than 16K. When we use the same baseline, our results are better than Goyal et al. for all the batch sizes (Fig. 20 ). Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe, Facebook's PyTorch, and Google's TensorFlow. The difference between this paper and the conference-version of our work [2] includes: (1) we implement our approach on Google's cloud Tensor Processing Unit (TPU) platform, which verifies our previous success on CPUs and GPUs. (2) we scale the batch size of ResNet-50-v2 to 32K and achieve 76.3 percent accuracy, which is better than the 75.3 percent accuracy achieved in our conference paper. (3) we apply our approach to Google's Neural Machine Translation (GNMT) application, which helps us to achieves 4× speedup on the cloud TPUs.
Published: 2019

43. Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Author: Kai-Wei Chang, Huan Zhang, Cho-Jui Hsieh, Chong Zhang, and Jieyu Zhao
Subjects: FOS: Computer and information sciences, Counterfactual thinking, Computer Science - Machine Learning, Computer Science - Computation and Language, business.industry, Computer science, Substitution (logic), Machine learning, computer.software_genre, Measure (mathematics), Machine Learning (cs.LG), Test (assessment), Computer Science - Computers and Society, Robustness (computer science), Computers and Society (cs.CY), Code (cryptography), Artificial intelligence, business, Focus (optics), Computation and Language (cs.CL), computer, Test data
Abstract: Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset. Our code is available at https://github.com/chong-z/nlp-second-order-attack., NAACL 2021
Published: 2021

44. Local Critic Training for Model-Parallel Learning of Deep Neural Networks

Author: Cho-Jui Hsieh, Hojung Lee, and Jong-Seok Lee
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Computer Networks and Communications, Computer science, business.industry, Process (computing), Training (meteorology), 02 engineering and technology, Convolutional neural network, Machine Learning (cs.LG), Computer Science Applications, Machine Learning, Recurrent neural network, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Learning, 020201 artificial intelligence & image processing, Artificial intelligence, Neural Networks, Computer, Layer (object-oriented design), business, Software
Abstract: In this article, we propose a novel model-parallel learning method, called local critic training, which trains neural networks using additional modules called local critic networks. The main network is divided into several layer groups, and each layer group is updated through error gradients estimated by the corresponding local critic network. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, we demonstrate that the proposed method is guaranteed to converge to a critical point. We also show that trained networks by the proposed method can be used for structural optimization. Experimental results show that our method achieves satisfactory performance, reduces training time greatly, and decreases memory consumption per machine. Code is available at https://github.com/hjdw2/Local-critic-training.
Published: 2021

45. A Review of Adversarial Attack and Defense for Classification Methods

Author: Thomas C. M. Lee, Minhao Cheng, Cho-Jui Hsieh, and Yao Li
Subjects: Statistics and Probability, FOS: Computer and information sciences, Computer Science - Machine Learning, Guard (information security), Computer Science - Cryptography and Security, Computer science, General Mathematics, Python (programming language), Data science, Field (computer science), Machine Learning (cs.LG), Adversarial system, Scalability, Natural (music), Deep neural networks, Classification methods, Statistics, Probability and Uncertainty, computer, Cryptography and Security (cs.CR), computer.programming_language
Abstract: Despite the efficiency and scalability of machine learning systems, recent studies have demonstrated that many classification methods, especially deep neural networks (DNNs), are vulnerable to adversarial examples; i.e., examples that are carefully crafted to fool a well-trained classification model while being indistinguishable from natural data to human. This makes it potentially unsafe to apply DNNs or related methods in security-critical areas. Since this issue was first identified by Biggio et al. (2013) and Szegedy et al.(2014), much work has been done in this field, including the development of attack methods to generate adversarial examples and the construction of defense techniques to guard against such examples. This paper aims to introduce this topic and its latest developments to the statistical community, primarily focusing on the generation and guarding of adversarial examples. Computing codes (in python and R) used in the numerical experiments are publicly available for readers to explore the surveyed methods. It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
Published: 2021
Full Text: View/download PDF

46. On the Transferability of Adversarial Attacks against Neural Text Classifier

Author: Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang
Published: 2021

47. Adversarially Robust Deep Image Super-Resolution Using Entropy Regularization

Author: Cho-Jui Hsieh, Jun-Ho Choi, Jun-Hyuk Kim, Huan Zhang, and Jong-Seok Lee
Subjects: Adversarial system, business.industry, Computer science, Deep learning, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, 02 engineering and technology, Artificial intelligence, business, Superresolution, Regularization (mathematics), Computer Science::Cryptography and Security
Abstract: Image super-resolution has been widely employed in various applications with boosted performance thanks to the deep learning techniques. However, many deep learning-based models are highly vulnerable to adversarial attacks, which is also applied to super-resolution models in recent studies. In this paper, we propose a defense method that is formulated as an entropy regularization loss for model training, which can be augmented to the original training loss of super-resolution models. We show that various state-of-the-art super-resolution models trained with our defense method are more robust against adversarial attacks than their original versions. To the best of our knowledge, this is the first attempt of adversarial defense for deep super-resolution models.
Published: 2021

48. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Author: Cho-Jui Hsieh, Yi Zhou, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang
Subjects: Convex hull, Network architecture, symbols.namesake, Theoretical computer science, Margin (machine learning), Computer science, Substitution (logic), symbols, Embedding, Word (computer architecture), Dirichlet distribution, Sentence
Abstract: Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
Published: 2021

49. Robust and Accurate Object Detection via Adversarial Learning

Author: Boqing Gong, Li Zhang, Cihang Xie, Cho-Jui Hsieh, Mingxing Tan, and Xiangning Chen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Detector, Computer Science - Computer Vision and Pattern Recognition, Object (computer science), Object detection, Image (mathematics), Machine Learning (cs.LG), Robustness (computer science), Distortion, Classifier (linguistics), Benchmark (computing), Computer vision, Artificial intelligence, business
Abstract: Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection. Noting that most state-of-the-art object detectors benefit from fine-tuning a pre-trained classifier, we first study how the classifiers' gains from various data augmentations transfer to object detection. The results are discouraging; the gains diminish after fine-tuning in terms of either accuracy or robustness. This work instead augments the fine-tuning stage for object detectors by exploring adversarial examples, which can be viewed as a model-dependent data augmentation. Our method dynamically selects the stronger adversarial images sourced from a detector's classification and localization branches and evolves with the detector to ensure the augmentation policy stays current and relevant. This model-dependent augmentation generalizes to different object detectors better than AutoAugment, a model-agnostic augmentation policy searched based on one particular detector. Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the COCO object detection benchmark. It also improves the detectors' robustness against natural distortions by +3.8 mAP and against domain shift by +1.3 mAP. Models are available at https://github.com/google/automl/tree/master/efficientdet/Det-AdvProp.md, Comment: CVPR 2021. Models are available at https://github.com/google/automl/tree/master/efficientdet/Det-AdvProp.md
Published: 2021
Full Text: View/download PDF

50. SSE-PT: Sequential Recommendation Via Personalized Transformer

Author: Shuqing Li, James Sharpnack, Cho-Jui Hsieh, and Liwei Wu
Subjects: Artificial neural network, business.industry, Computer science, Deep learning, 02 engineering and technology, Recommender system, Machine learning, computer.software_genre, law.invention, Personalization, law, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Transformer, computer, Temporal information, Natural language
Abstract: Temporal information is crucial for recommendation problems because user preferences are naturally dynamic in the real world. Recent advances in deep learning, especially the discovery of various attention mechanisms and newer architectures in addition to widely used RNN and CNN in natural language processing, have allowed for better use of the temporal ordering of items that each user has engaged with. In particular, the SASRec model, inspired by the popular Transformer model in natural languages processing, has achieved state-of-the-art results. However, SASRec, just like the original Transformer model, is inherently an un-personalized model and does not include personalized user embeddings. To overcome this limitation, we propose a Personalized Transformer (SSE-PT) model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 real-world datasets. Furthermore, after examining some random users’ engagement history, we find our model not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Our novel application of the Stochastic Shared Embeddings (SSE) regularization is essential to the success of personalization. Code and data are open-sourced at https://github.com/wuliwei9278/SSE-PT.
Published: 2020

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

101 results on '"Cho-Jui Hsieh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources