Author: "Zhao, Yiren" / Topic: computer science - computation and language - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Yiren"' showing total 8 results

Start Over Author "Zhao, Yiren" Topic computer science - computation and language

8 results on '"Zhao, Yiren"'

1. Scaling Laws for Mixed quantization in Large Language Models

Author: Cao, Zeyu, Zhang, Cheng, Gimenes, Pedro, Lu, Jianqiao, Cheng, Jianyi, and Zhao, Yiren
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accuracy or perplexity target for low-precision quantization, how many high-precision numbers or calculations are required to preserve as we scale LLMs to larger sizes? We first introduce a critical metric named the quantization ratio, which compares the number of parameters quantized to low-precision arithmetic against the total parameter count. Through extensive and carefully controlled experiments across different model families, arithmetic types, and quantization granularities (e.g. layer-wise, matmul-wise), we identify two central phenomenons. 1) The larger the models, the better they can preserve performance with an increased quantization ratio, as measured by perplexity in pre-training tasks or accuracy in downstream tasks. 2) The finer the granularity of mixed-precision quantization (e.g., matmul-wise), the more the model can increase the quantization ratio. We believe these observed phenomena offer valuable insights for future AI hardware design and the development of advanced Efficient AI algorithms.
Published: 2024

2. Unlocking the Global Synergies in Low-Rank Adapters

Author: Zhang, Zixi, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted., Comment: Accepted at ICML2024 ES-FoMo-II Workshop
Published: 2024

3. LQER: Low-Rank Quantization Error Reconstruction for LLMs

Author: Zhang, Cheng, Cheng, Jianyi, Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages an activation-induced scale matrix to drive the singular value distribution of quantization error towards a desirable distribution, which enables nearly-lossless W4A8 quantization on various LLMs and downstream tasks without the need for knowledge distillation, grid search, or gradient-base iterative optimization. Unlike existing methods, the computation pattern of LQER eliminates the need for specialized Scatter and Gather processes to collect high-precision weights from irregular memory locations. Our W4A8 LLMs achieve near-lossless performance on six popular downstream tasks, while using 1.36$\times$ fewer hardware resources than the leading state-of-the-art method. We open-source our framework at https://github.com/ChengZhang-98/lqer, Comment: Accepted at ICML2024
Published: 2024

4. The Curse of Recursion: Training on Generated Data Makes Models Forget

Author: Shumailov, Ilia, Shumaylov, Zakhar, Zhao, Yiren, Gal, Yarin, Papernot, Nicolas, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet., Comment: Fixed typos in eqn 4,5
Published: 2023

5. Revisiting Automated Prompting: Are We Actually Doing Better?

Author: Zhou, Yulin, Zhao, Yiren, Shumailov, Ilia, Mullins, Robert, and Gal, Yarin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompts. Our work suggests that, in addition to fine-tuning, manual prompts should be used as a baseline in this line of research.
Published: 2023

6. Dynamic Stashing Quantization for Efficient Transformer Training

Author: Yang, Guo, Lo, Daniel, Mullins, Robert, and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Performance
Abstract: Large Language Models (LLMs) have demonstrated impressive performance on a range of Natural Language Processing (NLP) tasks. Unfortunately, the immense amount of computations and memory accesses required for LLM training makes them prohibitively expensive in terms of hardware cost, and thus challenging to deploy in use cases such as on-device learning. In this paper, motivated by the observation that LLM training is memory-bound, we propose a novel dynamic quantization strategy, termed Dynamic Stashing Quantization (DSQ), that puts a special focus on reducing the memory operations, but also enjoys the other benefits of low precision training, such as the reduced arithmetic cost. We conduct a thorough study on two translation tasks (trained-from-scratch) and three classification tasks (fine-tuning). DSQ reduces the amount of arithmetic operations by $20.95\times$ and the number of DRAM operations by $2.55\times$ on IWSLT17 compared to the standard 16-bit fixed-point, which is widely used in on-device learning.
Published: 2023

7. Revisiting Structured Dropout

Author: Zhao, Yiren, Dada, Oluwatomisin, Gao, Xitong, and Mullins, Robert D
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches to natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22\%$, and training of ResNet50 on ImageNet by $0.28\%$.
Published: 2022

8. Sponge Examples: Energy-Latency Attacks on Neural Networks

Author: Shumailov, Ilia, Zhao, Yiren, Bates, Daniel, Papernot, Nicolas, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: The high energy costs of neural network training and inference led to the use of acceleration hardware such as GPUs and TPUs. While this enabled us to train large-scale neural networks in datacenters and deploy them on edge devices, the focus so far is on average-case performance. In this work, we introduce a novel threat vector against neural networks whose energy consumption or decision latency are critical. We show how adversaries can exploit carefully crafted $\boldsymbol{sponge}~\boldsymbol{examples}$, which are inputs designed to maximise energy consumption and latency. We mount two variants of this attack on established vision and language models, increasing energy consumption by a factor of 10 to 200. Our attacks can also be used to delay decisions where a network has critical real-time performance, such as in perception for autonomous vehicles. We demonstrate the portability of our malicious inputs across CPUs and a variety of hardware accelerator chips including GPUs, and an ASIC simulator. We conclude by proposing a defense strategy which mitigates our attack by shifting the analysis of energy consumption in hardware from an average-case to a worst-case perspective., Comment: Accepted at 6th IEEE European Symposium on Security and Privacy (EuroS&P)
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Zhao, Yiren"'

1. Scaling Laws for Mixed quantization in Large Language Models

2. Unlocking the Global Synergies in Low-Rank Adapters

3. LQER: Low-Rank Quantization Error Reconstruction for LLMs

4. The Curse of Recursion: Training on Generated Data Makes Models Forget

5. Revisiting Automated Prompting: Are We Actually Doing Better?

6. Dynamic Stashing Quantization for Efficient Transformer Training

7. Revisiting Structured Dropout

8. Sponge Examples: Energy-Latency Attacks on Neural Networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

8 results on '"Zhao, Yiren"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources