Author: "Zhao, Yiren" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Yiren"' showing total 160 results

Start Over Author "Zhao, Yiren"

160 results on '"Zhao, Yiren"'

1. Hardware and Software Platform Inference

Author: Zhang, Cheng, Foerster, Hanna, Mullins, Robert D., Zhao, Yiren, and Shumailov, Ilia
Subjects: Computer Science - Machine Learning
Abstract: It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce \textit{\textbf{hardware and software platform inference (HSPI)}} -- a method for identifying the underlying \GPU{} architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various \GPU{} architectures and compilers to distinguish between different \GPU{} types and software stacks. By analyzing the numerical patterns in the model's outputs, we propose a classification framework capable of accurately identifying the \GPU{} used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring \GPU{} type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different \GPU{}s with between $83.9\%$ and $100\%$ accuracy. Even in a black-box setting we are able to achieve results that are up to three times higher than random guess accuracy.
Published: 2024

2. Absorb & Escape: Overcoming Single Model Limitations in Generating Genomic Sequences

Author: Li, Zehui, Ni, Yuhao, Xia, Guoxuan, Beardall, William, Das, Akashaditya, Stan, Guy-Bart, and Zhao, Yiren
Subjects: Quantitative Biology - Genomics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Abstract Recent advances in immunology and synthetic biology have accelerated the development of deep generative methods for DNA sequence design. Two dominant approaches in this field are AutoRegressive (AR) models and Diffusion Models (DMs). However, genomic sequences are functionally heterogeneous, consisting of multiple connected regions (e.g., Promoter Regions, Exons, and Introns) where elements within each region come from the same probability distribution, but the overall sequence is non-homogeneous. This heterogeneous nature presents challenges for a single model to accurately generate genomic sequences. In this paper, we analyze the properties of AR models and DMs in heterogeneous genomic sequence generation, pointing out crucial limitations in both methods: (i) AR models capture the underlying distribution of data by factorizing and learning the transition probability but fail to capture the global property of DNA sequences. (ii) DMs learn to recover the global distribution but tend to produce errors at the base pair level. To overcome the limitations of both approaches, we propose a post-training sampling method, termed Absorb & Escape (A&E) to perform compositional generation from AR models and DMs. This approach starts with samples generated by DMs and refines the sample quality using an AR model through the alternation of the Absorb and Escape steps. To assess the quality of generated sequences, we conduct extensive experiments on 15 species for conditional and unconditional DNA generation. The experiment results from motif distribution, diversity checks, and genome integration tests unequivocally show that A&E outperforms state-of-the-art AR models and DMs in genomic sequence generation., Comment: Accepted at NeurIPS 2024
Published: 2024

3. Scaling Laws for Mixed quantization in Large Language Models

Author: Cao, Zeyu, Zhang, Cheng, Gimenes, Pedro, Lu, Jianqiao, Cheng, Jianyi, and Zhao, Yiren
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accuracy or perplexity target for low-precision quantization, how many high-precision numbers or calculations are required to preserve as we scale LLMs to larger sizes? We first introduce a critical metric named the quantization ratio, which compares the number of parameters quantized to low-precision arithmetic against the total parameter count. Through extensive and carefully controlled experiments across different model families, arithmetic types, and quantization granularities (e.g. layer-wise, matmul-wise), we identify two central phenomenons. 1) The larger the models, the better they can preserve performance with an increased quantization ratio, as measured by perplexity in pre-training tasks or accuracy in downstream tasks. 2) The finer the granularity of mixed-precision quantization (e.g., matmul-wise), the more the model can increase the quantization ratio. We believe these observed phenomena offer valuable insights for future AI hardware design and the development of advanced Efficient AI algorithms.
Published: 2024

4. QERA: an Analytical Framework for Quantization Error Reconstruction

Author: Zhang, Cheng, Wong, Jeffrey T. H., Xiao, Can, Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning
Abstract: he growing number of parameters and computational demands of large language models (LLMs) present significant challenges for their efficient deployment. Recently, there is an increasing interest in quantizing weights to extremely low precision while offsetting the resulting error with low-rank, high-precision error reconstruction terms. The combination of quantization and low-rank approximation is now popular in both adapter-based, parameter-efficient fine-tuning methods such as LoftQ and low-precision inference techniques including ZeroQuant-V2. Usually, the low-rank terms are calculated via the singular value decomposition (SVD) of the weight quantization error, minimizing the Frobenius and spectral norms of the weight approximation error. Recent methods like LQ-LoRA and LQER introduced hand-crafted heuristics to minimize errors in layer outputs (activations) rather than weights, resulting improved quantization results. However, these heuristic methods lack an analytical solution to guide the design of quantization error reconstruction terms. In this paper, we revisit this problem and formulate an analytical framework, named Quantization Error Reconstruction Analysis (QERA), and offer a closed-form solution to the problem. We show QERA benefits both existing low-precision fine-tuning and inference methods -- QERA achieves a fine-tuned accuracy gain of $\Delta_{\text{acc}}$ = 6.05% of 2-bit RoBERTa-base on GLUE compared to LoftQ; and obtains $\Delta_{\text{acc}}$ = 2.97% higher post-training quantization accuracy of 4-bit Llama-3.1-70B on average than ZeroQuant-V2 and $\Delta_{\text{ppl}}$ = - 0.28 lower perplexity on WikiText2 than LQER.
Published: 2024

5. GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning

Author: Li, Zehui, Subasri, Vallijah, Stan, Guy-Bart, Zhao, Yiren, and Wang, Bo
Subjects: Computer Science - Machine Learning, Quantitative Biology - Genomics
Abstract: Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with existing genomic databases to inform patient management. To addressing the interpretation of GVs, genomic foundation models (GFMs) have emerged. However, these models lack standardized performance assessments, leading to considerable variability in model evaluations. This poses the question: How effectively do deep learning methods classify unknown GVs and align them with clinically-verified GVs? We argue that representation learning, which transforms raw data into meaningful feature spaces, is an effective approach for addressing both indexing and classification challenges. We introduce a large-scale Genetic Variant dataset, named GV-Rep, featuring variable-length contexts and detailed annotations, designed for deep learning models to learn GV representations across various traits, diseases, tissue types, and experimental contexts. Our contributions are three-fold: (i) Construction of a comprehensive dataset with 7 million records, each labeled with characteristics of the corresponding variants, alongside additional data from 17,548 gene knockout tests across 1,107 cell types, 1,808 variant combinations, and 156 unique clinically verified GVs from real-world patients. (ii) Analysis of the structure and properties of the dataset. (iii) Experimentation of the dataset with pre-trained GFMs. The results show a significant gap between GFMs current capabilities and accurate GV representation. We hope this dataset will help advance genomic deep learning to bridge this gap., Comment: Preprint
Published: 2024

6. Unlocking the Global Synergies in Low-Rank Adapters

Author: Zhang, Zixi, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted., Comment: Accepted at ICML2024 ES-FoMo-II Workshop
Published: 2024

7. Optimised Grouped-Query Attention Mechanism for Transformers

Author: Chen, Yuang, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning
Abstract: Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance. Our AsymGQA outperforms the GQA within the same model size budget. For example, AsymGQA LLaMA-2-7B has an accuracy increase of 7.5% on MMLU compared to neighbour grouping. Our approach addresses the GQA's trade-off problem between model performance and hardware efficiency., Comment: Accepted at ICML2024 ES-FoMo-II Workshop
Published: 2024

8. HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

Author: Yu, Zhewen, Sreeram, Sudarshan, Agrawal, Krish, Wu, Junyi, Montgomerie-Corcoran, Alexander, Zhang, Cheng, Cheng, Jianyi, Bouganis, Christos-Savvas, and Zhao, Yiren
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and its scalability in data parallelism. Exploiting weights and activations sparsity can further enhance memory storage and computation efficiency. However, existing approaches focus on exploiting sparsity in non-dataflow accelerators, which cannot be applied onto dataflow accelerators because of the large hardware design space introduced. As such, this could miss opportunities to find an optimal combination of sparsity features and hardware designs. In this paper, we propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We propose a Hardware-Aware Sparsity Search (HASS) to systematically determine an efficient sparsity solution for dataflow accelerators. Over a set of models, we achieve an efficiency improvement ranging from 1.3$\times$ to 4.2$\times$ compared to existing sparse designs, which are either non-dataflow or non-hardware-aware. Particularly, the throughput of MobileNetV3 can be optimized to 4895 images per second. HASS is open-source: \url{https://github.com/Yu-Zhewen/HASS}, Comment: accepted to FPL2024
Published: 2024

9. $\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Author: Chen, Pengtao, Shen, Mingzhu, Ye, Peng, Cao, Jianjian, Tu, Chongjun, Bouganis, Christos-Savvas, Zhao, Yiren, and Chen, Tao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a lack of exploration regarding the impact of DiT structure on generation, as well as the absence of an acceleration framework tailored to the DiT architecture. To tackle these challenges, we conduct an investigation into the correlation between DiT blocks and image generation. Our findings reveal that the front blocks of DiT are associated with the outline of the generated images, while the rear blocks are linked to the details. Based on this insight, we propose an overall training-free inference acceleration framework $\Delta$-DiT: using a designed cache mechanism to accelerate the rear DiT blocks in the early sampling stages and the front DiT blocks in the later stages. Specifically, a DiT-specific cache mechanism called $\Delta$-Cache is proposed, which considers the inputs of the previous sampling image and reduces the bias in the inference. Extensive experiments on PIXART-$\alpha$ and DiT-XL demonstrate that the $\Delta$-DiT can achieve a $1.6\times$ speedup on the 20-step generation and even improves performance in most cases. In the scenario of 4-step consistent model generation and the more challenging $1.12\times$ acceleration, our method significantly outperforms existing methods. Our code will be publicly available., Comment: 12 pages, 6 figures, 6 tables
Published: 2024

10. Locking Machine Learning Models into Hardware

Author: Clifford, Eleanor, Saravanan, Adhithya, Langford, Harry, Zhang, Cheng, Zhao, Yiren, Mullins, Robert, Shumailov, Ilia, and Hayes, Jamie
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Modern Machine Learning models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed -- for example it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as Multi-Party Computation or Homomorphic encryption remain impractical for wide adoption. In this paper we take a different approach and investigate feasibility of ML-specific mechanisms that deter unauthorized model use by restricting the model to only be usable on specific hardware, making adoption on unauthorized hardware inconvenient. That way, even if IP is compromised, it cannot be trivially used without specialised hardware or major model adjustment. In a sense, we seek to enable cheap locking of machine learning models into specific hardware. We demonstrate that locking mechanisms are feasible by either targeting efficiency of model representations, such making models incompatible with quantisation, or tie the model's operation on specific characteristics of hardware, such as number of cycles for arithmetic operations. We demonstrate that locking comes with negligible work and latency overheads, while significantly restricting usability of the resultant model on unauthorized hardware., Comment: 10 pages, 2 figures of main text; 14 pages, 16 figures of appendices
Published: 2024

11. Enhancing Node Representations for Real-World Complex Networks with Topological Augmentation

Author: Zhao, Xiangyu, Li, Zehui, Shen, Mingzhu, Stan, Guy-Bart, Liò, Pietro, and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval, Computer Science - Social and Information Networks
Abstract: Graph augmentation methods play a crucial role in improving the performance and enhancing generalisation capabilities in Graph Neural Networks (GNNs). Existing graph augmentation methods mainly perturb the graph structures, and are usually limited to pairwise node relations. These methods cannot fully address the complexities of real-world large-scale networks, which often involve higher-order node relations beyond only being pairwise. Meanwhile, real-world graph datasets are predominantly modelled as simple graphs, due to the scarcity of data that can be used to form higher-order edges. Therefore, reconfiguring the higher-order edges as an integration into graph augmentation strategies lights up a promising research path to address the aforementioned issues. In this paper, we present Topological Augmentation (TopoAug), a novel graph augmentation method that builds a combinatorial complex from the original graph by constructing virtual hyperedges directly from the raw data. TopoAug then produces auxiliary node features by extracting information from the combinatorial complex, which are used for enhancing GNN performances on downstream tasks. We design three diverse virtual hyperedge construction strategies to accompany the construction of combinatorial complexes: (1) via graph statistics, (2) from multiple data perspectives, and (3) utilising multi-modality. Furthermore, to facilitate TopoAug evaluation, we provide 23 novel real-world graph datasets across various domains including social media, biology, and e-commerce. Our empirical study shows that TopoAug consistently and significantly outperforms GNN baselines and other graph augmentation methods, across a variety of application contexts, which clearly indicates that it can effectively incorporate higher-order node relations into the graph augmentation for real-world complex networks., Comment: In 27th European Conference on Artificial Intelligence (ECAI 2024). 13 pages, 2 figures, 13 tables
Published: 2024

12. Architectural Neural Backdoors from First Principles

Author: Langford, Harry, Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, and Papernot, Nicolas
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of the network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce a backdoor behavior that persists even after (full re-)training. However, the full scope and implications of architectural backdoors have remained largely unexplored. Bober-Irizar et al. [2023] introduced the first architectural backdoor; they showed how to create a backdoor for a checkerboard pattern, but never explained how to target an arbitrary trigger pattern of choice. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. This leads us to revisit the concept of architecture backdoors and taxonomise them, describing 12 distinct types. To gauge the difficulty of detecting such backdoors, we conducted a user study, revealing that ML developers can only identify suspicious components in common model definitions as backdoors in 37% of cases, while they surprisingly preferred backdoored models in 33% of cases. To contextualize these results, we find that language models outperform humans at the detection of backdoors. Finally, we discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
Published: 2024

13. DiscDiff: Latent Diffusion Model for DNA Sequence Generation

Author: Li, Zehui, Ni, Yuhao, Beardall, William A V, Xia, Guoxuan, Das, Akashaditya, Stan, Guy-Bart, and Zhao, Yiren
Subjects: Quantitative Biology - Genomics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production., Comment: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" (arXiv:2310.06150), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequences
Published: 2024

14. LQER: Low-Rank Quantization Error Reconstruction for LLMs

Author: Zhang, Cheng, Cheng, Jianyi, Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages an activation-induced scale matrix to drive the singular value distribution of quantization error towards a desirable distribution, which enables nearly-lossless W4A8 quantization on various LLMs and downstream tasks without the need for knowledge distillation, grid search, or gradient-base iterative optimization. Unlike existing methods, the computation pattern of LQER eliminates the need for specialized Scatter and Gather processes to collect high-precision weights from irregular memory locations. Our W4A8 LLMs achieve near-lossless performance on six popular downstream tasks, while using 1.36$\times$ fewer hardware resources than the leading state-of-the-art method. We open-source our framework at https://github.com/ChengZhang-98/lqer, Comment: Accepted at ICML2024
Published: 2024

15. AI models collapse when trained on recursively generated data

Author: Shumailov, Ilia, Shumaylov, Zakhar, Zhao, Yiren, Papernot, Nicolas, Anderson, Ross, and Gal, Yarin
Published: 2024
Full Text: View/download PDF

16. Latent Diffusion Model for DNA Sequence Generation

Author: Li, Zehui, Ni, Yuhao, Huygelen, Tim August B., Das, Akashaditya, Xia, Guoxuan, Stan, Guy-Bart, and Zhao, Yiren
Subjects: Computer Science - Machine Learning
Abstract: The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models that are not burdened with these problems, enabling them to reach the state-of-the-art in domains such as image generation. In light of this, we propose a novel latent diffusion model, DiscDiff, tailored for discrete DNA sequence generation. By simply embedding discrete DNA sequences into a continuous latent space using an autoencoder, we are able to leverage the powerful generative abilities of continuous diffusion models for the generation of discrete data. Additionally, we introduce Fr\'echet Reconstruction Distance (FReD) as a new metric to measure the sample quality of DNA sequence generations. Our DiscDiff model demonstrates an ability to generate synthetic DNA sequences that align closely with real DNA in terms of Motif Distribution, Latent Embedding Distribution (FReD), and Chromatin Profiles. Additionally, we contribute a comprehensive cross-species dataset of 150K unique promoter-gene sequences from 15 species, enriching resources for future generative modelling in genomics. We will make our code public upon publication., Comment: 2023 Conference on Neural Information Processing Systems (NeurIPS 2023) AI for Science Workshop
Published: 2023

17. Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Author: Zhang, Cheng, Cheng, Jianyi, Shumailov, Ilia, Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Machine Learning
Abstract: The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a $19\times$ higher arithmetic density and $5\times$ memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by $2.5\times$ in arithmetic density and $1.2\times$ in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced., Comment: Accepted by EMNLP2023
Published: 2023
Full Text: View/download PDF

18. LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

Author: Zhang, Zixi, Chadwick, Greg, McNally, Hugo, Zhao, Yiren, and Mullins, Robert
Subjects: Computer Science - Machine Learning, Computer Science - Hardware Architecture
Abstract: Test stimuli generation has been a crucial but labor-intensive task in hardware design verification. In this paper, we revolutionize this process by harnessing the power of large language models (LLMs) and present a novel benchmarking framework, LLM4DV. This framework introduces a prompt template for interactively eliciting test stimuli from the LLM, along with four innovative prompting improvements to support the pipeline execution and further enhance its performance. We compare LLM4DV to traditional constrained-random testing (CRT), using three self-designed design-under-test (DUT) modules. Experiments demonstrate that LLM4DV excels in efficiently handling straightforward DUT scenarios, leveraging its ability to employ basic mathematical reasoning and pre-trained knowledge. While it exhibits reduced efficiency in complex task settings, it still outperforms CRT in relative terms. The proposed framework and the DUT modules used in our experiments will be open-sourced upon publication.
Published: 2023

19. MiliPoint: A Point Cloud Dataset for mmWave Radar

Author: Cui, Han, Zhong, Shu, Wu, Jiacheng, Shen, Zichao, Dahnoun, Naim, and Zhao, Yiren
Subjects: Computer Science - Machine Learning
Abstract: Millimetre-wave (mmWave) radar has emerged as an attractive and cost-effective alternative for human activity sensing compared to traditional camera-based systems. mmWave radars are also non-intrusive, providing better protection for user privacy. However, as a Radio Frequency (RF) based technology, mmWave radars rely on capturing reflected signals from objects, making them more prone to noise compared to cameras. This raises an intriguing question for the deep learning community: Can we develop more effective point set-based deep learning methods for such attractive sensors? To answer this question, our work, termed MiliPoint, delves into this idea by providing a large-scale, open dataset for the community to explore how mmWave radars can be utilised for human activity recognition. Moreover, MiliPoint stands out as it is larger in size than existing datasets, has more diverse human actions represented, and encompasses all three key tasks in human activity recognition. We have also established a range of point-based deep neural networks such as DGCNN, PointNet++ and PointTransformer, on MiliPoint, which can serve to set the ground baseline for further development., Comment: Accepted at NeurIPS 2023 Datasets & Benchmarks
Published: 2023

20. Will More Expressive Graph Neural Networks do Better on Generative Tasks?

Author: Zou, Xiandong, Zhao, Xiangyu, Liò, Pietro, and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Biomolecules, Statistics - Machine Learning
Abstract: Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks -- autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM -- on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design., Comment: 2nd Learning on Graphs Conference (LoG 2023). 26 pages, 5 figures, 11 tables
Published: 2023

21. A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats

Author: Cheng, Jianyi, Zhang, Cheng, Yu, Zhewen, Bouganis, Christos-Savvas, Constantinides, George A., and Zhao, Yiren
Subjects: Computer Science - Hardware Architecture
Abstract: Model quantization represents both parameters (weights) and intermediate values (activations) in a more compact format, thereby directly reducing both computational and memory cost in hardware. The quantization of recent large language models (LLMs) faces challenges to achieve competitive memory density compared to other models such as convolutional neural networks, since values in LLMs require larger dynamic ranges. Current hardware can expedite computation for LLMs using compact numerical formats such as low-bitwidth integers or floating-point numbers. Each has advantages: integer operations simplify circuit design, whereas floating-point calculations can enhance accuracy when a wider dynamic range is required. In this work, we seek an efficient data format that combines the best of both worlds: Microscaling (MX) formats. MX formats are efficient data formats that achieve both large dynamic ranges and high memory density. In this paper, we propose a compiler named MASE for exploring mixed-precision MX formats on dataflow hardware accelerators for LLM inference. Our main contributions are twofold. First, we propose a novel orchestration abstraction to explore both software and hardware optimizations with new data formats. Second, MASE achieves LLM inference at an average precision of 4-bits, with minimal to no accuracy degradation. To our knowledge, MASE represents the first effort to harness fine-grain multi-precision MX formats in the design of LLM hardware accelerators. Over a range of LLMs and datasets, MASE achieves an average improvement of 24% in $\Delta$ accuracy with an overhead of only 3% in energy efficiency compared to designs using 8-bit fixed-point numbers.
Published: 2023

22. Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

Author: Li, Zehui, Das, Akashaditya, Beardall, William A V, Zhao, Yiren, and Stan, Guy-Bart
Subjects: Computer Science - Machine Learning, Quantitative Biology - Genomics
Abstract: Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation., Comment: 40th International Conference on Machine Learning (ICML 2023) Workshop on Computational Biology (WCB)
Published: 2023

23. Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

Author: Li, Zehui, Zhao, Xiangyu, Shen, Mingzhu, Stan, Guy-Bart, Liò, Pietro, and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Social and Information Networks
Abstract: Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/., Comment: 16 pages, 5 figures, 11 tables
Published: 2023

24. The Curse of Recursion: Training on Generated Data Makes Models Forget

Author: Shumailov, Ilia, Shumaylov, Zakhar, Zhao, Yiren, Gal, Yarin, Papernot, Nicolas, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet., Comment: Fixed typos in eqn 4,5
Published: 2023

25. Revisiting Automated Prompting: Are We Actually Doing Better?

Author: Zhou, Yulin, Zhao, Yiren, Shumailov, Ilia, Mullins, Robert, and Gal, Yarin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompts. Our work suggests that, in addition to fine-tuning, manual prompts should be used as a baseline in this line of research.
Published: 2023

26. Dynamic Stashing Quantization for Efficient Transformer Training

Author: Yang, Guo, Lo, Daniel, Mullins, Robert, and Zhao, Yiren
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Performance
Abstract: Large Language Models (LLMs) have demonstrated impressive performance on a range of Natural Language Processing (NLP) tasks. Unfortunately, the immense amount of computations and memory accesses required for LLM training makes them prohibitively expensive in terms of hardware cost, and thus challenging to deploy in use cases such as on-device learning. In this paper, motivated by the observation that LLM training is memory-bound, we propose a novel dynamic quantization strategy, termed Dynamic Stashing Quantization (DSQ), that puts a special focus on reducing the memory operations, but also enjoys the other benefits of low precision training, such as the reduced arithmetic cost. We conduct a thorough study on two translation tasks (trained-from-scratch) and three classification tasks (fine-tuning). DSQ reduces the amount of arithmetic operations by $20.95\times$ and the number of DRAM operations by $2.55\times$ on IWSLT17 compared to the standard 16-bit fixed-point, which is widely used in on-device learning.
Published: 2023

27. Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration

Author: Zhao, Xiangyu, Stärk, Hannes, Beaini, Dominique, Zhao, Yiren, and Liò, Pietro
Subjects: Computer Science - Machine Learning
Abstract: It has been increasingly demanding to develop reliable methods to evaluate the progress of Graph Neural Network (GNN) research for molecular representation learning. Existing GNN benchmarking methods for molecular representation learning focus on comparing the GNNs' performances on some node/graph classification/regression tasks on certain datasets. However, there lacks a principled, task-agnostic method to directly compare two GNNs. Additionally, most of the existing self-supervised learning works incorporate handcrafted augmentations to the data, which has several severe difficulties to be applied on graphs due to their unique characteristics. To address the aforementioned issues, we propose GraphAC (Graph Adversarial Collaboration) -- a conceptually novel, principled, task-agnostic, and stable framework for evaluating GNNs through contrastive self-supervision. We introduce a novel objective function: the Competitive Barlow Twins, that allow two GNNs to jointly update themselves from direct competitions against each other. GraphAC succeeds in distinguishing GNNs of different expressiveness across various aspects, and has demonstrated to be a principled and reliable GNN evaluation method, without necessitating any augmentations., Comment: 11th International Conference on Learning Representations (ICLR 2023) Machine Learning for Drug Discovery (MLDD) Workshop. 17 pages, 6 figures, 4 tables
Published: 2023

28. Flareon: Stealthy any2any Backdoor Injection via Poisoned Augmentation

Author: Qin, Tianrui, He, Xianghuan, Gao, Xitong, Zhao, Yiren, Ye, Kejiang, and Xu, Cheng-Zhong
Subjects: Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motion-based triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -- models trained under Flareon learn powerful target-conditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community: https://github.com/lafeat/flareon.
Published: 2022

29. Revisiting Structured Dropout

Author: Zhao, Yiren, Dada, Oluwatomisin, Gao, Xitong, and Mullins, Robert D
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches to natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22\%$, and training of ResNet50 on ImageNet by $0.28\%$.
Published: 2022

30. DARTFormer: Finding The Best Type Of Attention

Author: Brown, Jason Ross, Zhao, Yiren, Shumailov, Ilia, and Mullins, Robert D
Subjects: Computer Science - Machine Learning, I.2.7, I.2.6
Abstract: Given the wide and ever growing range of different efficient Transformer attention mechanisms, it is important to identify which attention is most effective when given a task. In this work, we are also interested in combining different attention types to build heterogeneous Transformers. We first propose a DARTS-like Neural Architecture Search (NAS) method to find the best attention for a given task, in this setup, all heads use the same attention (homogeneous models). Our results suggest that NAS is highly effective on this task, and it identifies the best attention mechanisms for IMDb byte level text classification and Listops. We then extend our framework to search for and build Transformers with multiple different attention types, and call them heterogeneous Transformers. We show that whilst these heterogeneous Transformers are better than the average homogeneous models, they cannot outperform the best. We explore the reasons why heterogeneous attention makes sense, and why it ultimately fails.
Published: 2022

31. Wide Attention Is The Way Forward For Transformers?

Author: Brown, Jason Ross, Zhao, Yiren, Shumailov, Ilia, and Mullins, Robert D
Subjects: Computer Science - Machine Learning, I.2.7
Abstract: The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch. The impact of changing the model aspect ratio on Transformers is then studied systematically. This ratio balances the number of layers and the number of attention heads per layer while keeping the total number of attention heads and all other hyperparameters constant. On average, across 4 NLP tasks and 10 attention types, single layer wide models perform 0.3% better than their deep counterparts. We show an in-depth evaluation and demonstrate how wide models require a far smaller memory footprint and can run faster on commodity hardware, in addition, these wider models are also more interpretable. For example, a single layer Transformer on the IMDb byte level text classification has 3.1x faster inference latency on a CPU than its equally accurate deeper counterpart, and is half the size. We therefore put forward wider and shallower models as a viable and desirable alternative for small models on NLP tasks, and as an important area of research for domains beyond this.
Published: 2022

32. ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks

Author: Clifford, Eleanor, Shumailov, Ilia, Zhao, Yiren, Anderson, Ross, and Mullins, Robert
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Early backdoor attacks against machine learning set off an arms race in attack and defence development. Defences have since appeared demonstrating some ability to detect backdoors in models or even remove them. These defences work by inspecting the training data, the model, or the integrity of the training procedure. In this work, we show that backdoors can be added during compilation, circumventing any safeguards in the data preparation and model training stages. The attacker can not only insert existing weight-based backdoors during compilation, but also a new class of weight-independent backdoors, such as ImpNet. These backdoors are impossible to detect during the training or data preparation processes, because they are not yet present. Next, we demonstrate that some backdoors, including ImpNet, can only be reliably detected at the stage where they are inserted and removing them anywhere else presents a significant challenge. We conclude that ML model security requires assurance of provenance along the entire technical pipeline, including the data, model architecture, compiler, and hardware specification., Comment: 10 pages, 7 figures, to be published in IEEE Secure and Trustworthy Machine Learning 2024. For website see https://ml.backdoors.uk . For source code, see https://sr.ht/~ecc/ImpNet
Published: 2022
Full Text: View/download PDF

33. Augmentation Backdoors

Author: Rance, Joseph, Zhao, Yiren, Shumailov, Ilia, and Mullins, Robert
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Data augmentation is used extensively to improve model generalisation. However, reliance on external libraries to implement augmentation methods introduces a vulnerability into the machine learning pipeline. It is well known that backdoors can be inserted into machine learning models through serving a modified dataset to train on. Augmentation therefore presents a perfect opportunity to perform this modification without requiring an initially backdoored dataset. In this paper we present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality. We evaluate our attacks on a range of computer vision benchmarks and demonstrate that an attacker is able to introduce backdoors through just a malicious augmentation routine., Comment: 12 pages, 8 figures
Published: 2022

34. Efficient Adversarial Training With Data Pruning

Author: Kaufmann, Maximilian, Zhao, Yiren, Shumailov, Ilia, Mullins, Robert, and Papernot, Nicolas
Subjects: Computer Science - Machine Learning
Abstract: Neural networks are susceptible to adversarial examples-small input perturbations that cause models to fail. Adversarial training is one of the solutions that stops adversarial examples; models are exposed to attacks during training and learn to be resilient to them. Yet, such a procedure is currently expensive-it takes a long time to produce and train models with adversarial samples, and, what is worse, it occasionally fails. In this paper we demonstrate data pruning-a method for increasing adversarial training efficiency through data sub-sampling.We empirically show that data pruning leads to improvements in convergence and reliability of adversarial training, albeit with different levels of utility degradation. For example, we observe that using random sub-sampling of CIFAR10 to drop 40% of data, we lose 8% adversarial accuracy against the strongest attackers, while by using only 20% of data we lose 14% adversarial accuracy and reduce runtime by a factor of 3. Interestingly, we discover that in some settings data pruning brings benefits from both worlds-it both improves adversarial accuracy and training time.
Published: 2022

35. Architectural Backdoors in Neural Networks

Author: Bober-Irizar, Mikel, Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, and Papernot, Nicolas
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security
Abstract: Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a link between the input and the output, and describe some possible protections against them. We evaluate our attacks on computer vision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of training settings.
Published: 2022

36. Model Architecture Adaption for Bayesian Neural Networks

Author: Wang, Duo, Zhao, Yiren, Shumailov, Ilia, and Mullins, Robert
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Bayesian Neural Networks (BNNs) offer a mathematically grounded framework to quantify the uncertainty of model predictions but come with a prohibitive computation cost for both training and inference. In this work, we show a novel network architecture search (NAS) that optimizes BNNs for both accuracy and uncertainty while having a reduced inference latency. Different from canonical NAS that optimizes solely for in-distribution likelihood, the proposed scheme searches for the uncertainty performance using both in- and out-of-distribution data. Our method is able to search for the correct placement of Bayesian layer(s) in a network. In our experiments, the searched models show comparable uncertainty quantification ability and accuracy compared to the state-of-the-art (deep ensemble). In addition, the searched models use only a fraction of the runtime compared to many popular BNN baselines, reducing the inference runtime cost by $2.98 \times$ and $2.92 \times$ respectively on the CIFAR10 dataset when compared to MCDropout and deep ensemble.
Published: 2022

37. DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning

Author: Hönig, Robert, Zhao, Yiren, and Mullins, Robert
Subjects: Computer Science - Machine Learning
Abstract: Federated Learning (FL) is a powerful technique for training a model on a server with data from several clients in a privacy-preserving manner. In FL, a server sends the model to every client, who then train the model locally and send it back to the server. The server aggregates the updated models and repeats the process for several rounds. FL incurs significant communication costs, in particular when transmitting the updated local models from the clients back to the server. Recently proposed algorithms quantize the model parameters to efficiently compress FL communication. These algorithms typically have a quantization level that controls the compression factor. We find that dynamic adaptations of the quantization level can boost compression without sacrificing model quality. First, we introduce a time-adaptive quantization algorithm that increases the quantization level as training progresses. Second, we introduce a client-adaptive quantization algorithm that assigns each individual client the optimal quantization level at every round. Finally, we combine both algorithms into DAdaQuant, the doubly-adaptive quantization algorithm. Our experiments show that DAdaQuant consistently improves client$\rightarrow$server compression, outperforming the strongest non-adaptive baselines by up to $2.8\times$., Comment: 10 pages, 5 figures, submitted to ICLR 2022
Published: 2021

38. Rapid Model Architecture Adaption for Meta-Learning

Author: Zhao, Yiren, Gao, Xitong, Shumailov, Ilia, Fusi, Nicolo, and Mullins, Robert
Subjects: Computer Science - Machine Learning
Abstract: Network Architecture Search (NAS) methods have recently gathered much attention. They design networks with better performance and use a much shorter search time compared to traditional manual tuning. Despite their efficiency in model deployments, most NAS algorithms target a single task on a fixed hardware system. However, real-life few-shot learning environments often cover a great number of tasks (T ) and deployments on a wide variety of hardware platforms (H ). The combinatorial search complexity T times H creates a fundamental search efficiency challenge if one naively applies existing NAS methods to these scenarios. To overcome this issue, we show, for the first time, how to rapidly adapt model architectures to new tasks in a many-task many-hardware few-shot learning setup by integrating Model Agnostic Meta Learning (MAML) into the NAS flow. The proposed NAS method (H-Meta-NAS) is hardware-aware and performs optimisation in the MAML framework. H-Meta-NAS shows a Pareto dominance compared to a variety of NAS and manual baselines in popular few-shot learning benchmarks with various hardware platforms and constraints. In particular, on the 5-way 1-shot Mini-ImageNet classification task, the proposed method outperforms the best manual baseline by a large margin (5.21% in accuracy) using 60% less computation.
Published: 2021

39. Markpainting: Adversarial Machine Learning meets Inpainting

Author: Khachaturov, David, Shumailov, Ilia, Zhao, Yiren, Papernot, Nicolas, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computers and Society
Abstract: Inpainting is a learned interpolation technique that is based on generative modeling and used to populate masked or missing pieces in an image; it has wide applications in picture editing and retouching. Recently, inpainting started being used for watermark removal, raising concerns. In this paper we study how to manipulate it using our markpainting technique. First, we show how an image owner with access to an inpainting model can augment their image in such a way that any attempt to edit it using that model will add arbitrary visible information. We find that we can target multiple different models simultaneously with our technique. This can be designed to reconstitute a watermark if the editor had been trying to remove it. Second, we show that our markpainting technique is transferable to models that have different architectures or were trained on different datasets, so watermarks created using it are difficult for adversaries to remove. Markpainting is novel and can be used as a manipulation alarm that becomes visible in the event of inpainting., Comment: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)
Published: 2021

40. Manipulating SGD with Data Ordering Attacks

Author: Shumailov, Ilia, Shumaylov, Zakhar, Kazhdan, Dmitry, Zhao, Yiren, Papernot, Nicolas, Erdogdu, Murat A., and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors. In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviours specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures. We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors.
Published: 2021

41. Nudge Attacks on Point-Cloud DNNs

Author: Zhao, Yiren, Shumailov, Ilia, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The wide adaption of 3D point-cloud data in safety-critical applications such as autonomous driving makes adversarial samples a real threat. Existing adversarial attacks on point clouds achieve high success rates but modify a large number of points, which is usually difficult to do in real-life scenarios. In this paper, we explore a family of attacks that only perturb a few points of an input point cloud, and name them nudge attacks. We demonstrate that nudge attacks can successfully flip the results of modern point-cloud DNNs. We present two variants, gradient-based and decision-based, showing their effectiveness in white-box and grey-box scenarios. Our extensive experiments show nudge attacks are effective at generating both targeted and untargeted adversarial point clouds, by changing a few points or even a single point from the entire point-cloud input. We find that with a single point we can reliably thwart predictions in 12--80% of cases, whereas 10 points allow us to further increase this to 37--95%. Finally, we discuss the possible defenses against such attacks, and explore their limitations.
Published: 2020

42. Learned Low Precision Graph Neural Networks

Author: Zhao, Yiren, Wang, Duo, Bates, Daniel, Mullins, Robert, Jamnik, Mateja, and Lio, Pietro
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Deep Graph Neural Networks (GNNs) show promising performance on a range of graph tasks, yet at present are costly to run and lack many of the optimisations applied to DNNs. We show, for the first time, how to systematically quantise GNNs with minimal or no loss in performance using Network Architecture Search (NAS). We define the possible quantisation search space of GNNs. The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable. LPGNAS learns the optimal architecture coupled with the best quantisation strategy for different components in the GNN automatically using back-propagation in a single search round. On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes but with similar accuracy to manually designed networks and other NAS results. In particular, on the Pubmed dataset, LPGNAS shows a better size-accuracy Pareto frontier compared to seven other manual and searched baselines, offering a 2.3 times reduction in model size but a 0.4% increase in accuracy when compared to the best NAS competitor. Finally, from our collected quantisation statistics on a wide range of datasets, we suggest a W4A8 (4-bit weights, 8-bit activations) quantisation strategy might be the bottleneck for naive GNN quantisations.
Published: 2020

43. Sponge Examples: Energy-Latency Attacks on Neural Networks

Author: Shumailov, Ilia, Zhao, Yiren, Bates, Daniel, Papernot, Nicolas, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: The high energy costs of neural network training and inference led to the use of acceleration hardware such as GPUs and TPUs. While this enabled us to train large-scale neural networks in datacenters and deploy them on edge devices, the focus so far is on average-case performance. In this work, we introduce a novel threat vector against neural networks whose energy consumption or decision latency are critical. We show how adversaries can exploit carefully crafted $\boldsymbol{sponge}~\boldsymbol{examples}$, which are inputs designed to maximise energy consumption and latency. We mount two variants of this attack on established vision and language models, increasing energy consumption by a factor of 10 to 200. Our attacks can also be used to delay decisions where a network has critical real-time performance, such as in perception for autonomous vehicles. We demonstrate the portability of our malicious inputs across CPUs and a variety of hardware accelerator chips including GPUs, and an ASIC simulator. We conclude by proposing a defense strategy which mitigates our attack by shifting the analysis of energy consumption in hardware from an average-case to a worst-case perspective., Comment: Accepted at 6th IEEE European Symposium on Security and Privacy (EuroS&P)
Published: 2020

44. Probabilistic Dual Network Architecture Search on Graphs

Author: Zhao, Yiren, Wang, Duo, Gao, Xitong, Mullins, Robert, Lio, Pietro, and Jamnik, Mateja
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We present the first differentiable Network Architecture Search (NAS) for Graph Neural Networks (GNNs). GNNs show promising performance on a wide range of tasks, but require a large amount of architecture engineering. First, graphs are inherently a non-Euclidean and sophisticated data structure, leading to poor adaptivity of GNN architectures across different datasets. Second, a typical graph block contains numerous different components, such as aggregation and attention, generating a large combinatorial search space. To counter these problems, we propose a Probabilistic Dual Network Architecture Search (PDNAS) framework for GNNs. PDNAS not only optimises the operations within a single graph block (micro-architecture), but also considers how these blocks should be connected to each other (macro-architecture). The dual architecture (micro- and marco-architectures) optimisation allows PDNAS to find deeper GNNs on diverse datasets with better performance compared to other graph NAS methods. Moreover, we use a fully gradient-based search approach to update architectural parameters, making it the first differentiable graph NAS method. PDNAS outperforms existing hand-designed GNNs and NAS results, for example, on the PPI dataset, PDNAS beats its best competitors by 1.67 and 0.17 in F1 scores.
Published: 2020

45. Towards Certifiable Adversarial Sample Detection

Author: Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: Convolutional Neural Networks (CNNs) are deployed in more and more classification systems, but adversarial samples can be maliciously crafted to trick them, and are becoming a real threat. There have been various proposals to improve CNNs' adversarial robustness but these all suffer performance penalties or other limitations. In this paper, we provide a new approach in the form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap (CTT). The system can provide certifiable guarantees of detection of adversarial inputs for certain $l_{\infty}$ sizes on a reasonable assumption, namely that the training data have the same distribution as the test data. We develop and evaluate several versions of CTT with a range of defense capabilities, training overheads and certifiability on adversarial samples. Against adversaries with various $l_p$ norms, CTT outperforms existing defense methods that focus purely on improving network robustness. We show that CTT has small false positive rates on clean test data, minimal compute overheads when deployed, and can support complex security policies.
Published: 2020

46. Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Author: Zhao, Yiren, Gao, Xitong, Guo, Xuan, Liu, Junyi, Wang, Erwei, Mullins, Robert, Cheung, Peter Y. K., Constantinides, George, and Xu, Cheng-Zhong
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Modern deep Convolutional Neural Networks (CNNs) are computationally demanding, yet real applications often require high throughput and low latency. To help tackle these problems, we propose Tomato, a framework designed to automate the process of generating efficient CNN accelerators. The generated design is pipelined and each convolution layer uses different arithmetics at various precisions. Using Tomato, we showcase state-of-the-art multi-precision multi-arithmetic networks, including MobileNet-V1, running on FPGAs. To our knowledge, this is the first multi-precision multi-arithmetic auto-generation framework for CNNs. In software, Tomato fine-tunes pretrained networks to use a mixture of short powers-of-2 and fixed-point weights with a minimal loss in classification accuracy. The fine-tuned parameters are combined with the templated hardware designs to automatically produce efficient inference circuits in FPGAs. We demonstrate how our approach significantly reduces model sizes and computation complexities, and permits us to pack a complete ImageNet network onto a single FPGA without accessing off-chip memories for the first time. Furthermore, we show how Tomato produces implementations of networks with various sizes running on single or multiple FPGAs. To the best of our knowledge, our automatically generated accelerators outperform closest FPGA-based competitors by at least 2-4x for lantency and throughput; the generated accelerator runs ImageNet classification at a rate of more than 3000 frames per second., Comment: To be published in International Conference on Field Programmable Technology 2019
Published: 2019

47. Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Author: Zhao, Yiren, Shumailov, Ilia, Cui, Han, Gao, Xitong, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters and their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show our approximation model, based on time-series information from the agent, consistently predicts RL agents' future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the target model to our RL agents, they often outperform random Gaussian noise only marginally. This highlights a serious methodological deficiency in previous work on such agents; random jamming should have been taken as the baseline for evaluation. Third, we propose a novel use for adversarial samplesin Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This appears to be a genuinely new type of attack. It potentially enables an attacker to use devices controlled by RL agents as time bombs.
Published: 2019

48. Focused Quantization for Sparse CNNs

Author: Zhao, Yiren, Gao, Xitong, Bates, Daniel, Mullins, Robert, and Xu, Cheng-Zhong
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, significantly reducing model sizes. Multiplications in quantized CNNs are replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy. In ResNet-50, we achieved a 18.08x CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a ResNet-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput., Comment: To appear in NeurIPS 2019, this is the same paper adapted for viewing on arXiv. TL;DR: Better size/accuracy trade-off of compressed sparse models with focused quantization. 11 pages, 5 figures, 4 tables
Published: 2019

49. Sitatapatra: Blocking the Transfer of Adversarial Samples

Author: Shumailov, Ilia, Gao, Xitong, Zhao, Yiren, Mullins, Robert, Anderson, Ross, and Xu, Cheng-Zhong
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: Convolutional Neural Networks (CNNs) are widely used to solve classification tasks in computer vision. However, they can be tricked into misclassifying specially crafted `adversarial' samples -- and samples built to trick one model often work alarmingly well against other models trained on the same task. In this paper we introduce Sitatapatra, a system designed to block the transfer of adversarial samples. It diversifies neural networks using a key, as in cryptography, and provides a mechanism for detecting attacks. What's more, when adversarial samples are detected they can typically be traced back to the individual device that was used to develop them. The run-time overheads are minimal permitting the use of Sitatapatra on constrained systems.
Published: 2019

50. The Taboo Trap: Behavioural Detection of Adversarial Samples

Author: Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, and Anderson, Ross
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: Deep Neural Networks (DNNs) have become a powerful toolfor a wide range of problems. Yet recent work has found an increasing variety of adversarial samplesthat can fool them. Most existing detection mechanisms against adversarial attacksimpose significant costs, either by using additional classifiers to spot adversarial samples, or by requiring the DNN to be restructured. In this paper, we introduce a novel defence. We train our DNN so that, as long as it is workingas intended on the kind of inputs we expect, its behavior is constrained, in that some set of behaviors are taboo. If it is exposed to adversarial samples, they will often cause a taboo behavior, which we can detect. Taboos can be both subtle and diverse, so their choice can encode and hide information. It is a well-established design principle that the security of a system should not depend on the obscurity of its design, but on some variable (the key) which can differ between implementations and bechanged as necessary. We discuss how taboos can be used to equip a classifier with just such a key, and how to tune the keying mechanism to adversaries of various capabilities. We evaluate the performance of a prototype against a wide range of attacks and show how our simple defense can defend against cheap attacks at scale with zero run-time computation overhead, making it a suitable defense method for IoT devices.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

160 results on '"Zhao, Yiren"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources