Author: "Venkataramani, Swagath" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Venkataramani, Swagath"' showing total 169 results

Start Over Author "Venkataramani, Swagath"

169 results on '"Venkataramani, Swagath"'

1. SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization

Author: Xie, Rui, Haq, Asad Ul, Ma, Linsen, Sun, Krystal, Sen, Sanchari, Venkataramani, Swagath, Liu, Liu, and Zhang, Tong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture
Abstract: Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations. This naturally manifests a promising potential of adaptively configuring weight quantization to improve the generative AI inference efficiency. Although configurable weight quantization can readily leverage the hardware support of variable-precision arithmetics in modern GPU and AI accelerators, little prior research has studied how one could exploit variable weight quantization to proportionally improve the AI model memory access speed and energy efficiency. Motivated by the rapidly maturing CXL ecosystem, this work develops a CXL-based design solution to fill this gap. The key is to allow CXL memory controllers play an active role in supporting and exploiting runtime configurable weight quantization. Using transformer as a representative generative AI model, we carried out experiments that well demonstrate the effectiveness of the proposed design solution.
Published: 2024

2. Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons

Author: Liu, Zhenyu, Gagnon, Garrett, Venkataramani, Swagath, and Liu, Liu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this paper, we introduce an effective method designed to simultaneously enhance adversarial robustness and execution efficiency. Unlike prior studies that enhance robustness via uniformly injecting noise, we introduce a non-uniform noise injection algorithm, strategically applied at each DNN layer to disrupt adversarial perturbations introduced in attacks. By employing approximation techniques, our approach identifies and protects essential neurons while strategically introducing noise into non-essential neurons. Our experimental results demonstrate that our method successfully enhances both robustness and efficiency across several attack scenarios, model architectures, and datasets.
Published: 2024

3. Approximate Computing and the Efficient Machine Learning Expedition

Author: Henkel, Jörg, Li, Hai, Raghunathan, Anand, Tahoori, Mehdi B., Venkataramani, Swagath, Yang, Xiaoxuan, and Zervakis, Georgios
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Approximate computing (AxC) has been long accepted as a design alternative for efficient system implementation at the cost of relaxed accuracy requirements. Despite the AxC research activities in various application domains, AxC thrived the past decade when it was applied in Machine Learning (ML). The by definition approximate notion of ML models but also the increased computational overheads associated with ML applications-that were effectively mitigated by corresponding approximations-led to a perfect matching and a fruitful synergy. AxC for AI/ML has transcended beyond academic prototypes. In this work, we enlighten the synergistic nature of AxC and ML and elucidate the impact of AxC in designing efficient ML systems. To that end, we present an overview and taxonomy of AxC for ML and use two descriptive application scenarios to demonstrate how AxC boosts the efficiency of ML systems., Comment: Accepted for publication at the International Conference on Computer-Aided Design (ICCAD) 2022
Published: 2022
Full Text: View/download PDF

4. Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Author: Fasoli, Andrea, Chen, Chia-Yu, Serrano, Mauricio, Venkataramani, Swagath, Saon, George, Cui, Xiaodong, Kingsbury, Brian, and Gopalakrishnan, Kailash
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, I.2.6
Abstract: We report on aggressive quantization strategies that greatly accelerate inference of Recurrent Neural Network Transducers (RNN-T). We use a 4 bit integer representation for both weights and activations and apply Quantization Aware Training (QAT) to retrain the full model (acoustic encoder and language model) and achieve near-iso-accuracy. We show that customized quantization schemes that are tailored to the local properties of the network are essential to achieve good performance while limiting the computational overhead of QAT. Density ratio Language Model fusion has shown remarkable accuracy gains on RNN-T workloads but it severely increases the computational cost of inference. We show that our quantization strategies enable using large beam widths for hypothesis search while achieving streaming-compatible runtimes and a full model compression ratio of 7.6$\times$ compared to the full precision model. Via hardware simulations, we estimate a 3.4$\times$ acceleration from FP16 to INT4 for the end-to-end quantized RNN-T inclusive of LM fusion, resulting in a Real Time Factor (RTF) of 0.06. On the NIST Hub5 2000, Hub5 2001, and RT-03 test sets, we retain most of the gains associated with LM fusion, improving the average WER by $>$1.5%., Comment: 5 pages, 2 figures, 1 table. Paper accepted to Interspeech 2022
Published: 2022

5. 4-bit Quantization of LSTM-based Speech Recognition Models

Author: Fasoli, Andrea, Chen, Chia-Yu, Serrano, Mauricio, Sun, Xiao, Wang, Naigang, Venkataramani, Swagath, Saon, George, Cui, Xiaodong, Kingsbury, Brian, Zhang, Wei, Tüske, Zoltán, and Gopalakrishnan, Kailash
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, I.2.6
Abstract: We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a na\"ive quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation. DBLSTM-HMMs trained with 300 or 2000 hours of SWB data achieves $<$0.5% and $<$1% average WER degradation, respectively. On the more challenging RNN-T models, our quantization strategy limits degradation in 4-bit inference to 1.3%., Comment: 5 pages, 3 figures, Andrea Fasoli and Chia-Yu Chen equally contributed to this work. Paper accepted to Interspeech 2021
Published: 2021

6. ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Author: Chen, Chia-Yu, Ni, Jiamin, Lu, Songtao, Cui, Xiaodong, Chen, Pin-Yu, Sun, Xiao, Wang, Naigang, Venkataramani, Swagath, Srinivasan, Vijayalakshmi, Zhang, Wei, and Gopalakrishnan, Kailash
Subjects: Computer Science - Machine Learning
Abstract: Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleCom), that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. Using theoretical analysis, we show that ScaleCom provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) across a wide range of applications (image, language, and speech) without significant accuracy loss., Comment: NeurIPS2020 accepted https://proceedings.neurips.cc/paper/2020/hash/9d58963592071dbf38a0fa114269959c-Abstract.html
Published: 2021

7. Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Author: Shin, Sungho, Jo, Youngmin, Choi, Jungwook, Venkataramani, Swagath, Srinivasan, Vijayalakshmi, and Sung, Wonyong
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency., Comment: This paper is accepted in ICASSP2019
Published: 2018

8. Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Author: Choi, Jungwook, Chuang, Pierce I-Jen, Wang, Zhuo, Venkataramani, Swagath, Srinivasan, Vijayalakshmi, and Gopalakrishnan, Kailash
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and others focusing on quantizing activations. This paper proposes novel techniques that target weight and activation quantizations separately resulting in an overall quantized neural network (QNN). The activation quantization technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. The weight quantization scheme, statistics-aware weight binning (SAWB), finds the optimal scaling factor that minimizes the quantization error based on the statistical characteristics of the distribution of weights without the need for an exhaustive search. The combination of PACT and SAWB results in a 2-bit QNN that achieves state-of-the-art classification accuracy (comparable to full precision networks) across a range of popular models and datasets., Comment: arXiv admin note: substantial text overlap with arXiv:1805.06085
Published: 2018

9. PACT: Parameterized Clipping Activation for Quantized Neural Networks

Author: Choi, Jungwook, Wang, Zhuo, Venkataramani, Swagath, Chuang, Pierce I-Jen, Srinivasan, Vijayalakshmi, and Gopalakrishnan, Kailash
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.
Published: 2018

10. SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Author: Sen, Sanchari, Jain, Shubham, Venkataramani, Swagath, and Raghunathan, Anand
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture, Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demands posed by DNNs have most commonly been addressed through the design of custom accelerators. However, these accelerators are prohibitive in many design scenarios (e.g., wearable devices and IoT sensors), due to stringent area/cost constraints. Accelerating DNNs on these low-power systems, comprising of mainly the general-purpose processor (GPP) cores, requires new approaches. We improve the performance of DNNs on GPPs by exploiting a key attribute of DNNs, i.e., sparsity. We propose Sparsity aware Core Extensions (SparCE)- a set of micro-architectural and ISA extensions that leverage sparsity and are minimally intrusive and low-overhead. We dynamically detect zero operands and skip a set of future instructions that use it. Our design ensures that the instructions to be skipped are prevented from even being fetched, as squashing instructions comes with a penalty. SparCE consists of 2 key micro-architectural enhancements- a Sparsity Register File (SpRF) that tracks zero registers and a Sparsity aware Skip Address (SASA) table that indicates instructions to be skipped. When an instruction is fetched, SparCE dynamically pre-identifies whether the following instruction(s) can be skipped and appropriately modifies the program counter, thereby skipping the redundant instructions and improving performance. We model SparCE using the gem5 architectural simulator, and evaluate our approach on 6 image-recognition DNNs in the context of both training and inference using the Caffe framework. On a scalar microprocessor, SparCE achieves 19%-31% reduction in application-level. We also evaluate SparCE on a 4-way SIMD ARMv8 processor using the OpenBLAS library, and demonstrate that SparCE achieves 8%-15% reduction in the application-level execution time.
Published: 2017

11. DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Author: Ganapathy, Sanjay, Venkataramani, Swagath, Ravindran, Balaraman, and Raghunathan, Anand
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning
Abstract: Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. In this work, we propose Dynamic Variable Effort Deep Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Previous efforts propose specialized hardware implementations for DNNs, statically prune the network, or compress the weights. Complementary to these approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in the inputs to DNNs to improve their compute efficiency with comparable classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms that, in the course of processing an input, identify how critical a group of computations are to classify the input. DyVEDeep dynamically focuses its compute effort only on the critical computa- tions, while skipping or approximating the rest. We propose 3 effort knobs that operate at different levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep versions for 5 popular image recognition benchmarks - one for CIFAR-10 and four for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar operations, which translates to 1.8x-2.3x performance improvement over a Caffe-based implementation, with < 0.5% loss in accuracy.
Published: 2017

12. MixTrain: accelerating DNN training via input mixing.

Author: Krithivasan, Sarada, Sen, Sanchari, Venkataramani, Swagath, and Raghunathan, Anand
Published: 2024
Full Text: View/download PDF

13. Multiplier-less Artificial Neurons Exploiting Error Resiliency for Energy-Efficient Neural Computing

Author: Sarwar, Syed Shakib, Venkataramani, Swagath, Raghunathan, Anand, and Roy, Kaushik
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Large-scale artificial neural networks have shown significant promise in addressing a wide range of classification and recognition applications. However, their large computational requirements stretch the capabilities of computing platforms. The fundamental components of these neural networks are the neurons and its synapses. The core of a digital hardware neuron consists of multiplier, accumulator and activation function. Multipliers consume most of the processing energy in the digital neurons, and thereby in the hardware implementations of artificial neural networks. We propose an approximate multiplier that utilizes the notion of computation sharing and exploits error resilience of neural network applications to achieve improved energy consumption. We also propose Multiplier-less Artificial Neuron (MAN) for even larger improvement in energy consumption and adapt the training process to ensure minimal degradation in accuracy. We evaluated the proposed design on 5 recognition applications. The results show, 35% and 60% reduction in energy consumption, for neuron sizes of 8 bits and 12 bits, respectively, with a maximum of ~2.83% loss in network accuracy, compared to a conventional neuron implementation. We also achieve 37% and 62% reduction in area for a neuron size of 8 bits and 12 bits, respectively, under iso-speed conditions., Comment: Accepted in Design, Automation and Test in Europe 2016 conference (DATE-2016)
Published: 2016

14. Energy-Efficient Object Detection using Semantic Decomposition

Author: Panda, Priyadarshini, Venkataramani, Swagath, Sengupta, Abhronil, Raghunathan, Anand, and Roy, Kaushik
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Machine-learning algorithms offer immense possibilities in the development of several cognitive applications. In fact, large scale machine-learning classifiers now represent the state-of-the-art in a wide range of object detection/classification problems. However, the network complexities of large-scale classifiers present them as one of the most challenging and energy intensive workloads across the computing spectrum. In this paper, we present a new approach to optimize energy efficiency of object detection tasks using semantic decomposition to build a hierarchical classification framework. We observe that certain semantic information like color/texture are common across various images in real-world datasets for object detection applications. We exploit these common semantic features to distinguish the objects of interest from the remaining inputs (non-objects of interest) in a dataset at a lower computational effort. We propose a 2-stage hierarchical classification framework, with increasing levels of complexity, wherein the first stage is trained to recognize the broad representative semantic features relevant to the object of interest. The first stage rejects the input instances that do not have the representative features and passes only the relevant instances to the second stage. Our methodology thus allows us to reject certain information at lower complexity and utilize the full computational effort of a network only on a smaller fraction of inputs to perform detection. We use color and texture as distinctive traits to carry out several experiments for object detection. Our experiments on the Caltech101/CIFAR10 dataset show that the proposed method yields 1.93x/1.46x improvement in average energy, respectively, over the traditional single classifier model., Comment: 10 pages, 13 figures, 3 algorithms, Submitted to IEEE TVLSI(Under Review)
Published: 2015

15. Exploring Spin-Transfer-Torque Devices for Logic Applications

Author: Pajouhi, Zoha, Venkataramani, Swagath, Yogendra, Karthik, Raghunathan, Anand, and Roy, Kaushik
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: As CMOS nears the end of the projected scaling roadmap, significant effort has been devoted to the search for new materials and devices that can realize memory and logic. Spintronics, is one of the promising directions for the Post-CMOS era. While the potential of spintronic memories is relatively well known, realizing logic remains an open and critical challenge. All Spin Logic (ASL) is a recently proposed logic style that realizes Boolean logic using spin-transfer-torque (STT) devices based on the principle of non-local spin torque. ASL has advantages such as density, non-volatility, and low operating voltage. However, it also suffers from drawbacks such as low speed and static power dissipation. Recent work has shown that, in the context of simple arithmetic circuits (adders, multipliers), the efficiency of ASL can be greatly improved using techniques that utilize its unique characteristics. An evaluation of ASL across a broad range of circuits, considering the known optimization techniques, is an important next step in determining its viability. In this work, we propose a systematic methodology for the synthesis of ASL circuits. Our methodology performs various optimizations that benefit ASL, such as intra-cycle power gating, stacking of ASL nanomagnets, and fine-grained logic pipelining. We utilize the proposed methodology to evaluate the suitability of ASL implementations for a wide range of benchmarks viz. random combinational and sequential logic, digital signal processing circuits, and the Leon SPARC3 general-purpose processor. Based on our evaluation, we identify (i) the large current requirement of nanomagnets at fast switching speeds, (ii) the static power dissipation in the all-metallic devices, and (iii) the short spin flip length in interconnects as key bottlenecks that limit the competitiveness of ASL., Comment: This work has been accepted by the IEEE Transactions on Computer Aided Design
Published: 2014
Full Text: View/download PDF

16. Automatic Synthesis Techniques for Approximate Circuits

Author: Ranjan, Ashish, Venkataramani, Swagath, Jain, Shubham, Kim, Younghoon, Ramasubramanian, Shankar Ganesh, Raha, Arnab, Roy, Kaushik, Raghunathan, Anand, Reda, Sherief, editor, and Shafique, Muhammad, editor
Published: 2019
Full Text: View/download PDF

17. Approximate Computing Techniques for Deep Neural Networks

Author: Choi, Jungwook, Venkataramani, Swagath, Reda, Sherief, editor, and Shafique, Muhammad, editor
Published: 2019
Full Text: View/download PDF

18. 14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC

Author: Kar, Monodeep, primary, Silberman, Joel, additional, Venkataramani, Swagath, additional, Srinivasan, Viji, additional, Fleischer, Bruce, additional, Rubin, Joshua, additional, Lancaster, JohnDavid, additional, Lee, Saekyu, additional, Cohen, Matthew, additional, Ziegler, Matthew, additional, Cao, Nianzheng, additional, Woodward, Sandra, additional, Agrawal, Ankur, additional, Zhou, Ching, additional, Chatarasi, Prasanth, additional, Gooding, Thomas, additional, Guillorn, Michael, additional, Hekmatshoartabari, Bahman, additional, Jacob, Philip, additional, Jain, Radhika, additional, Jain, Shubham, additional, Jung, Jinwook, additional, Kim, Kyu-Hyoun, additional, Koswatta, Siyu, additional, Lutz, Martin, additional, Mannari, Alberto, additional, Mathew, Abey, additional, Nair, Indira, additional, Ranjan, Ashish, additional, Ren, Zhibin, additional, Rider, Scot, additional, Roewer, Thomas, additional, Satterfield, David, additional, Schaal, Marcel, additional, Sen, Sanchari, additional, Tellez, Gustavo, additional, Tran, Hung, additional, Wang, Wei, additional, Zalani, Vidhi, additional, Zhang, Jintao, additional, Zhang, Xin, additional, Shah, Vinay, additional, Senger, Robert, additional, Kumar, Arvind, additional, Lu, Pong-Fei, additional, and Chang, Leland, additional
Published: 2024
Full Text: View/download PDF

19. Automatic Synthesis Techniques for Approximate Circuits

Author: Ranjan, Ashish, primary, Venkataramani, Swagath, additional, Jain, Shubham, additional, Kim, Younghoon, additional, Ramasubramanian, Shankar Ganesh, additional, Raha, Arnab, additional, Roy, Kaushik, additional, and Raghunathan, Anand, additional
Published: 2018
Full Text: View/download PDF

20. Approximate Computing Techniques for Deep Neural Networks

Author: Choi, Jungwook, primary and Venkataramani, Swagath, additional
Published: 2018
Full Text: View/download PDF

21. Exploiting On-Device Image Classification for Energy Efficiency in Ambient-Aware Systems

Author: Shoaib, Mohammed, Venkataramani, Swagath, Hua, Xian-Sheng, Liu, Jie, Li, Jin, Hua, Gang, editor, and Hua, Xian-Sheng, editor
Published: 2015
Full Text: View/download PDF

22. Approximate Computing and the Efficient Machine Learning Expedition

Author: Henkel, Jörg, primary, Li, Hai, additional, Raghunathan, Anand, additional, Tahoori, Mehdi B., additional, Venkataramani, Swagath, additional, Yang, Xiaoxuan, additional, and Zervakis, Georgios, additional
Published: 2022
Full Text: View/download PDF

23. OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning Accelerators

Author: Pal, Subhankar, primary, Venkataramani, Swagath, additional, Srinivasan, Viji, additional, and Gopalakrishnan, Kailash, additional
Published: 2022
Full Text: View/download PDF

24. Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Author: Fasoli, Andrea, primary, Chen, Chia-Yu, additional, Serrano, Mauricio, additional, Venkataramani, Swagath, additional, Saon, George, additional, Cui, Xiaodong, additional, Kingsbury, Brian, additional, and Gopalakrishnan, Kailash, additional
Published: 2022
Full Text: View/download PDF

25. Accelerating DNN Training Through Selective Localized Learning

Author: Krithivasan, Sarada, Sen, Sanchari, Venkataramani, Swagath, and Raghunathan, Anand
Subjects: localized learning, stochastic gradient decent algorithm, General Neuroscience, graphics process unit (GPU), Neurosciences. Biological psychiatry. Neuropsychiatry, runtime efficiency, Deep Neural Networks (DNNs), Neuroscience, Original Research, RC321-571
Abstract: Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy.
Published: 2022
Full Text: View/download PDF

26. A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

Author: Lee, Sae Kyu, primary, Agrawal, Ankur, additional, Silberman, Joel, additional, Ziegler, Matthew, additional, Kang, Mingu, additional, Venkataramani, Swagath, additional, Cao, Nianzheng, additional, Fleischer, Bruce, additional, Guillorn, Michael, additional, Cohen, Matthew, additional, Mueller, Silvia M., additional, Oh, Jinwook, additional, Lutz, Martin, additional, Jung, Jinwook, additional, Koswatta, Siyu, additional, Zhou, Ching, additional, Zalani, Vidhi, additional, Kar, Monodeep, additional, Bonanno, James, additional, Casatuta, Robert, additional, Chen, Chia-Yu, additional, Choi, Jungwook, additional, Haynie, Howard, additional, Herbert, Alyssa, additional, Jain, Radhika, additional, Kim, Kyu-Hyoun, additional, Li, Yulong, additional, Ren, Zhibin, additional, Rider, Scot, additional, Schaal, Marcel, additional, Schelm, Kerstin, additional, Scheuermann, Michael R., additional, Sun, Xiao, additional, Tran, Hung, additional, Wang, Naigang, additional, Wang, Wei, additional, Zhang, Xin, additional, Shah, Vinay, additional, Curran, Brian, additional, Srinivasan, Vijayalakshmi, additional, Lu, Pong-Fei, additional, Shukla, Sunil, additional, Gopalakrishnan, Kailash, additional, and Chang, Leland, additional
Published: 2022
Full Text: View/download PDF

27. 4-Bit Quantization of LSTM-Based Speech Recognition Models

Author: Fasoli, Andrea, primary, Chen, Chia-Yu, additional, Serrano, Mauricio, additional, Sun, Xiao, additional, Wang, Naigang, additional, Venkataramani, Swagath, additional, Saon, George, additional, Cui, Xiaodong, additional, Kingsbury, Brian, additional, Zhang, Wei, additional, Tüske, Zoltán, additional, and Gopalakrishnan, Kailash, additional
Published: 2021
Full Text: View/download PDF

28. Efficacy of Pruning in Ultra-Low Precision DNNs

Author: Sen, Sanchari, primary, Venkataramani, Swagath, additional, and Raghunathan, Anand, additional
Published: 2021
Full Text: View/download PDF

29. RaPiD: AI Accelerator for Ultra-low Precision Training and Inference

Author: Venkataramani, Swagath, primary, Srinivasan, Vijayalakshmi, additional, Wang, Wei, additional, Sen, Sanchari, additional, Zhang, Jintao, additional, Agrawal, Ankur, additional, Kar, Monodeep, additional, Jain, Shubham, additional, Mannari, Alberto, additional, Tran, Hoang, additional, Li, Yulong, additional, Ogawa, Eri, additional, Ishizaki, Kazuaki, additional, Inoue, Hiroshi, additional, Schaal, Marcel, additional, Serrano, Mauricio, additional, Choi, Jungwook, additional, Sun, Xiao, additional, Wang, Naigang, additional, Chen, Chia-Yu, additional, Allain, Allison, additional, Bonano, James, additional, Cao, Nianzheng, additional, Casatuta, Robert, additional, Cohen, Matthew, additional, Fleischer, Bruce, additional, Guillorn, Michael, additional, Haynie, Howard, additional, Jung, Jinwook, additional, Kang, Mingu, additional, Kim, Kyu-hyoun, additional, Koswatta, Siyu, additional, Lee, Saekyu, additional, Lutz, Martin, additional, Mueller, Silvia, additional, Oh, Jinwook, additional, Ranjan, Ashish, additional, Ren, Zhibin, additional, Rider, Scot, additional, Schelm, Kerstin, additional, Scheuermann, Michael, additional, Silberman, Joel, additional, Yang, Jie, additional, Zalani, Vidhi, additional, Zhang, Xin, additional, Zhou, Ching, additional, Ziegler, Matt, additional, Shah, Vinay, additional, Ohara, Moriyoshi, additional, Lu, Pong-Fei, additional, Curran, Brian, additional, Shukla, Sunil, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2021
Full Text: View/download PDF

30. Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

Author: Pal, Subhankar, primary, Venkataramani, Swagath, additional, Srinivasan, Viji, additional, and Gopalakrishnan, Kailash, additional
Published: 2021
Full Text: View/download PDF

31. 9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

Author: Agrawal, Ankur, primary, Lee, Sae Kyu, additional, Silberman, Joel, additional, Ziegler, Matthew, additional, Kang, Mingu, additional, Venkataramani, Swagath, additional, Cao, Nianzheng, additional, Fleischer, Bruce, additional, Guillorn, Michael, additional, Cohen, Matthew, additional, Mueller, Silvia, additional, Oh, Jinwook, additional, Lutz, Martin, additional, Jung, Jinwook, additional, Koswatta, Siyu, additional, Zhou, Ching, additional, Zalani, Vidhi, additional, Bonanno, James, additional, Casatuta, Robert, additional, Chen, Chia-Yu, additional, Choi, Jungwook, additional, Haynie, Howard, additional, Herbert, Alyssa, additional, Jain, Radhika, additional, Kar, Monodeep, additional, Kim, Kyu-Hyoun, additional, Li, Yulong, additional, Ren, Zhibin, additional, Rider, Scot, additional, Schaal, Marcel, additional, Schelm, Kerstin, additional, Scheuermann, Michael, additional, Sun, Xiao, additional, Tran, Hung, additional, Wang, Naigang, additional, Wang, Wei, additional, Zhang, Xin, additional, Shah, Vinay, additional, Curran, Brian, additional, Srinivasan, Vijayalakshmi, additional, Lu, Pong-Fei, additional, Shukla, Sunil, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2021
Full Text: View/download PDF

32. Value Similarity Extensions for Approximate Computing in General-Purpose Processors

Author: Kim, Younghoon, primary, Venkataramani, Swagath, additional, Sen, Sanchari, additional, and Raghunathan, Anand, additional
Published: 2021
Full Text: View/download PDF

33. Efficient AI System Design With Cross-Layer Approximate Computing

Author: Venkataramani, Swagath, primary, Sun, Xiao, additional, Wang, Naigang, additional, Chen, Chia-Yu, additional, Choi, Jungwook, additional, Kang, Mingu, additional, Agarwal, Ankur, additional, Oh, Jinwook, additional, Jain, Shubham, additional, Babinsky, Tina, additional, Cao, Nianzheng, additional, Fox, Thomas, additional, Fleischer, Bruce, additional, Gristede, George, additional, Guillorn, Michael, additional, Haynie, Howard, additional, Inoue, Hiroshi, additional, Ishizaki, Kazuaki, additional, Klaiber, Michael, additional, Lo, Shih-Hsien, additional, Maier, Gary, additional, Mueller, Silvia, additional, Scheuermann, Michael, additional, Ogawa, Eri, additional, Schaal, Marcel, additional, Serrano, Mauricio, additional, Silberman, Joel, additional, Vezyrtzis, Christos, additional, Wang, Wei, additional, Yee, Fanchieh, additional, Zhang, Jintao, additional, Ziegler, Matthew, additional, Zhou, Ching, additional, Ohara, Moriyoshi, additional, Lu, Pong-Fei, additional, Curran, Brian, additional, Shukla, Sunil, additional, Srinivasan, Vijayalakshmi, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2020
Full Text: View/download PDF

34. Logic Synthesis of Approximate Circuits

Author: Venkataramani, Swagath, primary, Kozhikkottu, Vivek J., additional, Sabne, Amit, additional, Roy, Kaushik, additional, and Raghunathan, Anand, additional
Published: 2020
Full Text: View/download PDF

35. A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

Author: Oh, Jinwook, primary, Lee, Sae Kyu, additional, Kang, Mingu, additional, Ziegler, Matthew, additional, Silberman, Joel, additional, Agrawal, Ankur, additional, Venkataramani, Swagath, additional, Fleischer, Bruce, additional, Guillorn, Michael, additional, Choi, Jungwook, additional, Wang, Wei, additional, Mueller, Silvia, additional, Ben-Yehuda, Shimon, additional, Bonanno, James, additional, Cao, Nianzheng, additional, Casatuta, Robert, additional, Chen, Chia-Yu, additional, Cohen, Matt, additional, Erez, Ophir, additional, Fox, Thomas, additional, Gristede, George, additional, Haynie, Howard, additional, Ivanov, Vicktoria, additional, Koswatta, Siyu, additional, Lo, Shih-Hsien, additional, Lutz, Martin, additional, Maier, Gary, additional, Mesh, Alex, additional, Nustov, Yevgeny, additional, Rider, Scot, additional, Schaal, Marcel, additional, Scheuermann, Michael, additional, Sun, Xiao, additional, Wang, Naigang, additional, Yee, Fanchieh, additional, Zhou, Ching, additional, Shah, Vinay, additional, Curran, Brian, additional, Srinivasan, Vijayalakshmi, additional, Lu, Pong-Fei, additional, Shukla, Sunil, additional, Gopalakrishnan, Kailash, additional, and Chang, Leland, additional
Published: 2020
Full Text: View/download PDF

36. D y VED eep

Author: Ganapathy, Sanjay, primary, Venkataramani, Swagath, additional, Sriraman, Giridhur, additional, Ravindran, Balaraman, additional, and Raghunathan, Anand, additional
Published: 2020
Full Text: View/download PDF

37. Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems

Author: Venkataramani, Swagath, primary, Srinivasan, Vijayalakshmi, additional, Choi, Jungwook, additional, Heidelberger, Philip, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2019
Full Text: View/download PDF

38. Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators*

Author: Venkataramani, Swagath, primary, Choi, Jungwook, additional, Srinivasan, Vijayalakshmi, additional, Gopalakrishnan, Kailash, additional, and Chang, Leland, additional
Published: 2019
Full Text: View/download PDF

39. DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator

Author: Venkataramani, Swagath, primary, Choi, Jungwook, additional, Srinivasan, Vijayalakshmi, additional, Wang, Wei, additional, Zhang, Jintao, additional, Schaal, Marcel, additional, Serrano, Mauricio J., additional, Ishizaki, Kazuaki, additional, Inoue, Hiroshi, additional, Ogawa, Eri, additional, Ohara, Moriyoshi, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2019
Full Text: View/download PDF

40. Dynamic Spike Bundling for Energy-Efficient Spiking Neural Networks

Author: Krithivasan, Sarada, primary, Sen, Sanchari, additional, Venkataramani, Swagath, additional, and Raghunathan, Anand, additional
Published: 2019
Full Text: View/download PDF

41. BiScaled-DNN

Author: Jain, Shubham, primary, Venkataramani, Swagath, additional, Srinivasan, Vijayalakshmi, additional, Choi, Jungwook, additional, Gopalakrishnan, Kailash, additional, and Chang, Leland, additional
Published: 2019
Full Text: View/download PDF

42. SparCE: Sparsity Aware General-Purpose Core Extensions to Accelerate Deep Neural Networks

Author: Sen, Sanchari, primary, Jain, Shubham, additional, Venkataramani, Swagath, additional, and Raghunathan, Anand, additional
Published: 2019
Full Text: View/download PDF

43. Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Author: Shin, Sungho, primary, Jo, Youngmin, additional, Choi, Jungwook, additional, Venkataramani, Swagath, additional, Srinivasan, Vijayalakshmi, additional, and Sung, Wonyong, additional
Published: 2019
Full Text: View/download PDF

44. A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel

Author: Ogawa, Eri, primary, Ishizaki, Kazuaki, additional, Inoue, Hiroshi, additional, Venkataramani, Swagath, additional, Choi, Jungwook, additional, Wang, Wei, additional, Srinivasan, Vijayalakshmi, additional, Ohara, Moriyoshi, additional, and Gopalakrishnan, Kailash, additional
Published: 2019
Full Text: View/download PDF

45. Data Subsetting: A Data-Centric Approach to Approximate Computing

Author: Kim, Younghoon, primary, Venkataramani, Swagath, additional, Chandrachoodan, Nitin, additional, and Raghunathan, Anand, additional
Published: 2019
Full Text: View/download PDF

46. A Scalable Multi-TeraOPS Core for AI Training and Inference

Author: Shukla, Sunil, primary, Fleischer, Bruce, additional, Ziegler, Matthew, additional, Silberman, Joel, additional, Oh, Jinwook, additional, Srinivasan, Vijayalakshmi, additional, Choi, Jungwook, additional, Mueller, Silvia, additional, Agrawal, Ankur, additional, Babinsky, Tina, additional, Cao, Nianzheng, additional, Chen, Chia-Yu, additional, Chuang, Pierce, additional, Fox, Thomas, additional, Gristede, George, additional, Guillorn, Michael, additional, Haynie, Howard, additional, Klaiber, Michael, additional, Lee, Dongsoo, additional, Lo, Shih-Hsien, additional, Maier, Gary, additional, Scheuermann, Michael, additional, Venkataramani, Swagath, additional, Vezyrtzis, Christos, additional, Wang, Naigang, additional, Yee, Fanchieh, additional, Zhou, Ching, additional, Lu, Pong-Fei, additional, Curran, Brian, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2018
Full Text: View/download PDF

47. Approximate computing: An integrated cross-layer framework

Author: Venkataramani, Swagath
Subjects: Energy efficiency, Applied sciences, Computer Engineering, Electrical and Computer Engineering, Approximate computing, Hardware design
Abstract: A new design approach, called approximate computing (AxC), leverages the flexibility provided by intrinsic application resilience to realize hardware or software implementations that are more efficient in energy or performance. Approximate computing techniques forsake exact (numerical or Boolean) equivalence in the execution of some of the application’s computations, while ensuring that the output quality is acceptable. While early efforts in approximate computing have demonstrated great potential, they consist of ad hoc techniques applied to a very narrow set of applications, leaving in question the applicability of approximate computing in a broader context. The primary objective of this thesis is to develop an integrated cross-layer approach to approximate computing, and to thereby establish its applicability to a broader range of applications. The proposed framework comprises of three key components: (i) At the circuit level, systematic approaches to design approximate circuits, or circuits that realize a slightly modified function with improved efficiency, (ii) At the architecture level, utilize approximate circuits to build programmable approximate processors, and (iii) At the software level, methods to apply approximate computing to machine learning classifiers, which represent an important class of applications that are being utilized across the computing spectrum. Towards this end, the thesis extends the state-of-the-art in approximate computing in the following important directions. Synthesis of Approximate Circuits: First, the thesis proposes a rigorous framework for the automatic synthesis of approximate circuits , which are the hardware building blocks of approximate computing platforms. Designing approximate circuits involves making judicious changes to the function implemented by the circuit such that its hardware complexity is lowered without violating the specified quality constraint. Inspired by classical approaches to Boolean optimization in logic synthesis, the thesis proposes two synthesis tools called SALSA and SASIMI that are general, i.e., applicable to any given circuit and quality specification. The framework is further extended to automatically design quality configurable circuits , which are approximate circuits with the capability to reconfigure their quality at runtime. Over a wide range of arithmetic circuits, complex modules and complete datapaths, the circuits synthesized using the proposed framework demonstrate significant benefits in area and energy. Programmable AxC Processors: Next, the thesis extends approximate computing to the realm of programmable processors by introducing the concept of quality programmable processors (QPPs). A key principle of QPPs is that the notion of quality is explicitly codified in their HW/SW interface i.e., the instruction set. Instructions in the ISA are extended with quality fields, enabling software to specify the accuracy level that must be met during their execution. The micro-architecture is designed with hardware mechanisms to understand these quality specifications and translate them into energy savings. As a first embodiment of QPPs, the thesis presents a quality programmable 1D/2D vector processor QP-Vec, which contains a 3-tiered hierarchy of processing elements. Based on an implementation of QP-Vec with 289 processing elements, energy benefits up to 2.5X are demonstrated across a wide range of applications. Software and Algorithms for AxC: Finally, the thesis addresses the problem of applying approximate computing to an important class of applications viz. machine learning classifiers such as deep learning networks. To this end, the thesis proposes two approaches—AxNN and scalable effort classifiers. Both approaches leverage domain- specific insights to transform a given application to an energy-efficient approximate version that meets a specified application output quality. In the context of deep learning networks, AxNN adapts backpropagation to identify neurons that contribute less significantly to the network’s accuracy, approximating these neurons (e.g., by using lower precision), and incrementally re-training the network to mitigate the impact of approximations on output quality. On the other hand, scalable effort classifiers leverage the heterogeneity in the inherent classification difficulty of inputs to dynamically modulate the effort expended by machine learning classifiers. This is achieved by building a chain of classifiers of progressively growing complexity (and accuracy) such that the number of stages used for classification scale with input difficulty. Scalable effort classifiers yield substantial energy benefits as a majority of the inputs require very low effort in real-world datasets. In summary, the concepts and techniques presented in this thesis broaden the applicability of approximate computing, thus taking a significant step towards bringing approximate computing to the mainstream. (Abstract shortened by ProQuest.)
Published: 2016

48. DyVEDeep: Dynamic Variable Effort Deep Neural Networks.

Author: GANAPATHY, SANJAY, VENKATARAMANI, SWAGATH, SRIRAMAN, GIRIDHUR, RAVINDRAN, BALARAMAN, and RAGHUNATHAN, ANAND
Subjects: RASPBERRY Pi, PRUNING, IMAGE recognition (Computer vision), MACHINE learning
Abstract: Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. To address this challenge, one promising approach is to exploit the error resilient nature of DNNs by skipping or approximating computations that have negligible impact on classification accuracy. Almost all prior efforts in this direction propose static DNN approximations by either pruning network connections, implementing computations at lower precision, or compressingweights. In this work, we propose Dynamic Variable Effort Deep Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Complementary to the aforementioned static approaches, DyVEDeep is a dynamic approach that exploits heterogeneity in the DNN inputs to improve their compute efficiency with comparable classification accuracy and without requiring any re-training. DyVEDeep equips DNNs with dynamic effort mechanisms that identify computations critical to classifying a given input and focus computational effort only on the critical computations, while skipping or approximating the rest. We propose three dynamic effort mechanisms that operate at different levels of granularity viz. neuron, feature, and layer levels. We build DyVEDeep versions of six popular image recognition benchmarks (CIFAR-10, AlexNet, OverFeat, VGG-16, SqueezeNet, and Deep-Compressed-AlexNet) within the Caffe deep-learning framework. We evaluate DyVEDeep on two platforms--a high-performance server with a 2.7 GHz Intel Xeon E5-2680 processor and 128 GB memory, and a low-power Raspberry Pi board with an ARM Cortex A53 processor and 1 GB memory. Across all benchmarks, DyVEDeep achieves 2.47×-5.15× reduction in the number of scalar operations, which translates to 1.94×-2.23× and 1.46×-3.46× performance improvement over well-optimized baselines on the Xeon server and the Raspberry Pi, respectively, with comparable classification accuracy. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

49. Across the Stack Opportunities for Deep Learning Acceleration

Author: Srinivasan, Vijayalakshmi, primary, Fleischer, Bruce, additional, Shukla, Sunil, additional, Ziegler, Matthew, additional, Silberman, Joel, additional, Oh, Jinwook, additional, Choi, Jungwook, additional, Mueller, Silvia, additional, Agrawal, Ankur, additional, Babinsky, Tina, additional, Cao, Nianzheng, additional, Chen, Chia-Yu, additional, Chuang, Pierce, additional, Fox, Thomas, additional, Gristede, George, additional, Guillorn, Michael, additional, Haynie, Howard, additional, Klaiber, Michael, additional, Lee, Dongsoo, additional, Lo, Shih-Hsien, additional, Maier, Gary, additional, Scheuermann, Michael, additional, Venkataramani, Swagath, additional, Vezyrtzis, Christos, additional, Wang, Naigang, additional, Yee, Fanchieh, additional, Zhou, Ching, additional, Lu, Pong-Fei, additional, Curran, Brian, additional, Chang, Leland, additional, and Gopalakrishnan, Kailash, additional
Published: 2018
Full Text: View/download PDF

50. Taming the beast

Author: Venkataramani, Swagath, primary, Srinivasan, Vijayalakshmi, additional, Choi, Jungwook, additional, Gopalakrishnan, Kailash, additional, and Chang, Leland, additional
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

169 results on '"Venkataramani, Swagath"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources