15 results on '"Akshay Krishna Ramanathan"'
Search Results
2. CiM3D: Comparator-in-Memory Designs Using Monolithic 3-D Technology for Accelerating Data-Intensive Applications
- Author
-
Wen-Kuan Yeh, Je-Min Hung, Cheng-Xin Xue, Hariram Thirucherai Govindarajan, Akshay Krishna Ramanathan, Srivatsa Srinivasa Rangachar, Sheng-Po Huang, Chun-Ying Lee, Meng-Fan Chang, Chang-Hong Shen, Fu-Kuo Hsueh, Jia-Min Shieh, Vijaykrishnan Narayanan, John Sampson, and Mon-Shu Ho
- Subjects
Computer engineering. Computer hardware ,Comparator ,Computer science ,3-D-SRAM ,Sorting ,monolithic (sequential) 3-D integrated circuit (M3D-IC) ,Parallel computing ,Electronic, Optical and Magnetic Materials ,TK7885-7895 ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Hardware and Architecture ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,sparse matrix multiplication ,Multiplication ,computing-in-memory ,Static random-access memory ,Electrical and Electronic Engineering ,Macro ,Massively parallel ,Energy (signal processing) ,Sparse matrix - Abstract
The compare operation is widely used in many applications, from fundamental sorting to primitive operations in the database and AI systems. We present SRAM-based 3-D-CAM circuit designs using a monolithic 3-D (M3D) integration process for realizing beyond-Boolean in-memory compare operation without any area overheads. We also fabricated a processing-in-memory (PiM) macro with the same 3-D-CAM circuit using M3D for performing massively parallel compare operations used in the database, machine learning, and scientific applications. We show various system designs with the 3-D-CAM supporting operations, such as data filtering, sorting, and sparse matrix–matrix multiplication (SpGEMM). Our systems exhibit up to $272\times $ , $200\times $ , and $226\times $ speedups and $151\times $ , $37\times $ , and $156\times $ energy savings compared to systems using near memory compute for the data filtering, sorting, and SpGEMM applications, respectively.
- Published
- 2021
3. Achieving Crash Consistency by Employing Persistent L1 Cache
- Author
-
Akshay Krishna Ramanathan, Sara Mahdizadeh Shahri, Yi Xiao, and Vijaykrishnan Narayanan
- Published
- 2022
4. FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks
- Author
-
Nagadastagiri Challapalle, Akshay Krishna Ramanathan, Sahithi Rampalli, Vijaykrishnan Narayanan, Nicholas Jao, and John Sampson
- Subjects
Speedup ,Artificial neural network ,Computer science ,Sorting ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Theoretical Computer Science ,Reduction (complexity) ,Turing machine ,symbols.namesake ,Recurrent neural network ,Hardware and Architecture ,Control and Systems Engineering ,Modeling and Simulation ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Hardware acceleration ,020201 artificial intelligence & image processing ,Auxiliary memory ,Information Systems - Abstract
Recently, Memory Augmented Neural Networks (MANN)s, a class of Deep Neural Networks (DNN)s have become prominent owing to their ability to capture the long term dependencies effectively for several Natural Language Processing (NLP) tasks. These networks augment conventional DNNs by incorporating memory and attention mechanisms external to the network to capture relevant information. Several MANN architectures have shown particular benefits in NLP tasks by augmenting an underlying Recurrent Neural Network (RNN) with external memory using attention mechanisms. Unlike conventional DNNs whose computational time is dominated by MAC operations, MANNs have more diverse behavior. In addition to MACs, the attention mechanisms of MANNs also consist of operations such as similarity measure, sorting, weighted memory access, and pair-wise arithmetic. Due to this greater diversity in operations, MANNs are not trivially accelerated by the same techniques used by existing DNN accelerators. In this work, we present an end-to-end hardware accelerator architecture, FARM, for the inference of RNNs and several variants of MANNs, such as the Differential Neural Computer (DNC), Neural Turing Machine (NTM) and Meta-learning model. FARM achieves an average speedup of 30x-190x and 80x-100x over CPU and GPU implementations, respectively. To address remaining memory bottlenecks in FARM, we then propose the FARM-PIM architecture, which augments FARM with in-memory compute support for MAC and content-similarity operations in order to reduce data traversal costs. FARM-PIM offers an additional speedup of 1.5x compared to FARM. Additionally, we consider an efficiency-oriented version of the PIM implementation, FARM-PIM-LP, that trades a 20% performance reduction relative to FARM for a 4x average power consumption reduction.
- Published
- 2020
5. Recent Advances in Compute-in-Memory Support for SRAM Using Monolithic 3-D Integration
- Author
-
Zhixiao Zhang, Meng-Fan Chang, Srivatsa Srinivasa, Akshay Krishna Ramanathan, and Xin Si
- Subjects
Random access memory ,Moore's law ,Computer science ,media_common.quotation_subject ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,02 engineering and technology ,020202 computer hardware & architecture ,symbols.namesake ,Memory management ,Computer architecture ,Hardware and Architecture ,Logic gate ,MOSFET ,Memory architecture ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Static random-access memory ,Electrical and Electronic Engineering ,Software ,Von Neumann architecture ,media_common - Abstract
Computing-in-memory (CiM) is a popular design alternative to overcome the von Neumann bottleneck and improve the performance of artificial intelligence computing applications. Monolithic three-dimensional (M3D) technology is a promising solution to extend Moore's law through the development of CiM for data-intensive applications. In this article, we first discuss the motivation and challenges associated with two-dimensional CiM designs, and then examine the possibilities presented by emerging M3D technologies. Finally, we review recent advances and trends in the implementation of CiM using M3D technology.
- Published
- 2019
6. ROBIN: Monolithic-3D SRAM for Enhanced Robustness with In-Memory Computation Support
- Author
-
Xueqing Li, Akshay Krishna Ramanathan, Wei-Hao Chen, Jack Sampson, Vijaykrishnan Narayanan, Swaroop Ghosh, Sumeet Kumar Gupta, Meng-Fan Chang, and Srivatsa Srinivasa
- Subjects
business.industry ,Computer science ,020208 electrical & electronic engineering ,Transistor ,NAND gate ,02 engineering and technology ,law.invention ,Addressability ,Data access ,XNOR gate ,Hardware and Architecture ,Robustness (computer science) ,law ,0202 electrical engineering, electronic engineering, information engineering ,Static random-access memory ,Electrical and Electronic Engineering ,business ,Computer hardware ,Efficient energy use - Abstract
We present a novel 3D-SRAM cells using a monolithic 3D integration technology for realizing both robustness of the cell and in-memory Boolean logic computing capability. The proposed two-layer cell designs make use of additional transistors over the SRAM layer to enable assist techniques as well as provide logic functions (such as AND/NAND, OR/NOR, and XNOR/XOR) or enable content addressability without degrading cell density. Through analysis, we provide insights into the benefits provided by three memory assist and two logic modes, and we evaluate the energy efficiency of our proposed design. We show that the assist techniques improve SRAM read stability by $2.2\times $ and increase the write margin by 17.6% while staying within the SRAM footprint. By the virtue of increased robustness, the cell enables seamless operation at lower supply voltages; and thereby, ensures energy efficiency. Energy delay product reduces by $1.6\times $ over standard 6T SRAM with a faster data access. When computing bulk In-memory operations, $6.5\times $ energy saving is achieved as compared to computing outside the memory system.
- Published
- 2019
7. Trends and Opportunities for SRAM Based In-Memory and Near-Memory Computation
- Author
-
Tanay Karnik, Vijaykrishnan Narayanan, Kurian Dileep J, Ravi Iyer, Akshay Krishna Ramanathan, Srinivasan Gopal, Nilesh Jain, Anuradha Srinivasan, Srivatsa Srinivasa, and Jainaveen Sundaram
- Subjects
Process (engineering) ,Computer science ,Computation ,media_common.quotation_subject ,020208 electrical & electronic engineering ,Workload ,02 engineering and technology ,Power budget ,020202 computer hardware & architecture ,Computer engineering ,0202 electrical engineering, electronic engineering, information engineering ,Quality (business) ,Static random-access memory ,Sparse matrix ,media_common ,Efficient energy use - Abstract
Changes in application trends along with increasing number of connected devices have led to explosion in the amount of data being generated every single day. Computing systems need to efficiently process these huge amounts of data and generate results, classify objects, stream high quality videos and so on. In-Memory Computing and Near-Memory Computing have been emerged as the popular design choices to address the challenges in executing the above-mentioned tasks. Through In-Memory Computing, SRAM Banks can be repurposed as compute engines while performing Bulk Boolean operations. Near-Memory techniques have shown promise in improving the performance of Machine learning tasks. By carefully understanding the design we describe the opportunities towards amalgamating both these design techniques for obtaining further performance enhancement and achieving lower power budget while executing fundamental Machine Learning primitives. In this work, we take the example of Sparse Matrix Multiplication, and design an I-NMC accelerator which speeds up the index handling by 10x-60x and 10x-70x energy efficiency based on the workload dimensions as compared with non I-NMC solution occupying just 1% of the overall hardware area.
- Published
- 2021
8. Monolithic 3D+-IC Based Massively Parallel Compute-in-Memory Macro for Accelerating Database and Machine Learning Primitives
- Author
-
Srivatsa Srinivasa Rangachar, Meng-Fan Chang, Sheng-Po Huang, Chang-Hong Shen, Jia-Min Shieh, Vijaykrishnan Narayanan, John Sampson, Mon-Shu Ho, Wen-Kuan Yeh, Cheng-Xin Xue, Hariram Thirucherai Govindarajan, Chun-Ying Lee, Akshay Krishna Ramanathan, Je-Min Hung, and Fu-Kuo Hsueh
- Subjects
Speedup ,Database ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Sorting ,Three-dimensional integrated circuit ,02 engineering and technology ,Machine learning ,computer.software_genre ,Application-specific integrated circuit ,0202 electrical engineering, electronic engineering, information engineering ,Multiplication ,Artificial intelligence ,Macro ,business ,Massively parallel ,computer ,Sparse matrix - Abstract
This paper demonstrates the first Monolithic 3D+-IC based Compute-in-Memory (CiM) Macro performing massively parallel beyond-Boolean operations targeting database and machine learning (ML) applications. The proposed CiM technique supports data filtering, sorting, and sparse matrix-matrix multiplication (SpGEMM) operations. Our system exhibits up to 272x speedup and 151x energy savings compared to the ASIC baseline.
- Published
- 2020
9. Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration
- Author
-
Kalsi Gurpreet S, Akshay Krishna Ramanathan, Vijaykrishnan Narayanan, Pillai Kamlesh R, Tarun Makesh Chandran, Srivatsa Srinivasa, Sreenivas Subramoney, and Omer Om J
- Subjects
010302 applied physics ,Speedup ,Artificial neural network ,Computer science ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Data flow diagram ,0103 physical sciences ,Lookup table ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Static random-access memory ,Cache ,Efficient energy use - Abstract
This paper presents a Look-Up Table (LUT) based Processing-In-Memory (PIM) technique with the potential for running Neural Network inference tasks. We implement a bitline computing free technique to avoid frequent bitline accesses to the cache sub-arrays and thereby considerably reducing the memory access energy overhead. LUT in conjunction with the compute engines enables sub-array level parallelism while executing complex operations through data lookup which otherwise requires multiple cycles. Sub-array level parallelism and systolic input data flow ensure data movement to be confined to the SRAM slice.Our proposed LUT based PIM methodology exploits substantial parallelism using look-up tables, which does not alter the memory structure/organization, that is, preserving the bit-cell and peripherals of the existing SRAM monolithic arrays. Our solution achieves 1.72x higher performance and 3.14x lower energy as compared to a state-of-the-art processing-in-cache solution. Sub-array level design modifications to incorporate LUT along with the compute engines will increase the overall cache area by 5.6%. We achieve 3.97x speedup w.r.t neural network systolic accelerator with a similar area. The re-configurable nature of the compute engines enables various neural network operations and thereby supporting sequential networks (RNNs) and transformer models. Our quantitative analysis demonstrates 101x, 3x faster execution and 91x, 11x energy efficient than CPU and GPU respectively while running the transformer model, BERT-Base.
- Published
- 2020
10. IMC-Sort: In-Memory Parallel Sorting Architecture using Hybrid Memory Cube
- Author
-
Nagadastagiri Challapalle, Vijaykrishnan Narayanan, Zheyu Li, and Akshay Krishna Ramanathan
- Subjects
010302 applied physics ,Sorting algorithm ,Speedup ,Bitonic sorter ,Hybrid Memory Cube ,Computer science ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Sorting network ,sort ,State (computer science) ,Crossbar switch - Abstract
Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose a PIM based accelerator architecture (IMC-Sort) for the sort algorithm. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8x, 1.1x speedup and 375.5x, 13.6x savings in energy consumption compared to the widely used CPU implementation and state of the art near memory custom sort accelerator respectively.
- Published
- 2020
11. Technology-Assisted Computing-In-Memory Design for Matrix Multiplication Workloads
- Author
-
Srivatsa Srivinasa, Nicholas Jao, Akshay Krishna Ramanathan, Minhwan Kim, John Sampson, and Vijaykrishnan Narayanan
- Subjects
Speedup ,Artificial neural network ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,Parallel computing ,Matrix multiplication ,020202 computer hardware & architecture ,symbols.namesake ,Memory architecture ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Multiplication ,Static random-access memory ,Sparse matrix ,Von Neumann architecture - Abstract
Recent advances in emerging technologies such as monolithic 3D Integration (M3D-IC) and emerging non-volatile memory (eNVM) have enabled to embed logic operations in memory. This alleviates the "memory wall" challenges stemming from the time and power expended on migrating data in conventional Von Neumann computing paradigms. We propose a M3D SRAM dot-product engine for compute in-SRAM support used in applications such as matrix multiplication and artificial neural networks. In addition, we propose a novel computing in RRAM-based memory architecture to efficiently solve the computation intensity of sparse dot products. Specifically, the index assessment of sparse matrix-vector multiplication used in support vector machines (SVM). At maximum throughput, our proposed RRAM architecture achieves 11.3× speed up when compared against a near-memory accelerator.
- Published
- 2019
12. Monolithic 3D+ -IC based Reconfigurable Compute-in-Memory SRAM Macro
- Author
-
Wen-Kuan Yeh, Xin Si, John Sampson, Chane-Hone Shen, Fu-Kuo Hsueh, Vijaykrishnan Narayanan, Mon-Shu Ho, Akshay Krishna Ramanathan, Jia-Min Shieh, Cheng-Xin Xue, Chun-Ying Lee, Meng-Fan Chang, Srivatsa Srinivasa, and Yung-Ning Tu
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Computation ,0202 electrical engineering, electronic engineering, information engineering ,Three-dimensional integrated circuit ,02 engineering and technology ,Static random-access memory ,Parallel computing ,Boolean operations in computer-aided design ,Macro ,Layer (object-oriented design) ,Latency (engineering) ,020202 computer hardware & architecture - Abstract
This paper presents the first monolithic 3D two-layer reconfigurable SRAM macro capable of executing multiple Compute-in-Memory (CiM) tasks as part of data readout. Fabricated using low cost FinFET based 3D+-IC, the SRAM offers concurrent data read from both layers and write from layer 2 with 0.4V $\text{V}_{\text{dd}\min}$ 12.8x improved computation latency is achieved as compared to near memory computation of successive Boolean operations.
- Published
- 2019
13. Programmable Non-Volatile Memory Design Featuring Reconfigurable In-Memory Operations
- Author
-
Akshay Krishna Ramanathan, Vijaykrishnan Narayanan, Abhronil Sengupta, Nicholas Jao, and Jack Sampson
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer science ,Reconfigurability ,02 engineering and technology ,01 natural sciences ,Bottleneck ,020202 computer hardware & architecture ,Non-volatile memory ,Gate array ,Embedded system ,0103 physical sciences ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,business ,Field-programmable gate array ,Throughput (business) - Abstract
With data volume growing exponentially in today's era, modern computing systems are increasingly bottlenecked and consistently burdened by the costs of data movement. Driven by the development of emerging non-volatile memory (NVM) technologies and by the increasing demand for high throughput in big data applications, considerable research effort has gone into embedding computing in memory and exploiting parallelism in data-intensive workloads to address the “memory wall” bottleneck. In this work, we propose a non-volatile memory design which leverages run-time reconfigurability of peripheral circuits to perform various in-memory computations like that of a field-programmable gate array (FPGA). Our architecture allows this intelligent storage system to operate as both a main memory and an accelerator for memory-intensive applications such as matrix multiplication, database query and artificial neural networks.
- Published
- 2019
14. A Monolithic-3D SRAM Design with Enhanced Robustness and In-Memory Computation Support
- Author
-
Chang-Hong Shen, Akshay Krishna Ramanathan, Vijaykrishnan Narayanan, Jack Sampson, Fu-Kuo Hsueh, Jia-Min Shieh, Chih-Chao Yang, Srivatsa Srinivasa, Swaroop Ghosh, Sumeet Kumar Gupta, Wei-Hao Chen, Meng-Fan Marvin Chang, and Xueqing Li
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Transistor ,NAND gate ,02 engineering and technology ,020202 computer hardware & architecture ,law.invention ,Data access ,XNOR gate ,Robustness (computer science) ,law ,0202 electrical engineering, electronic engineering, information engineering ,Static random-access memory ,business ,Bitwise operation ,Computer hardware ,Efficient energy use - Abstract
We present a novel 3D-SRAM cell using a Monolithic 3D integration (M3D-IC) technology for realizing both robustness and In-memory Boolean logic compute support. The proposed two-layer design makes use of additional transistors over the SRAM layer to enable assist techniques as well as provide logic functions (such as AND/NAND, OR/NOR, XNOR/XOR) without degrading cell density. Through analysis, we provide insights into the benefits provided by three memory assist and two logic modes and evaluate the energy efficiency of our proposed design. Assist techniques improve SRAM read stability by 2.2x and increase the write margin by 17.6%, while staying within the SRAM footprint. By virtue of increased robustness, the cell enables seamless operation at lower supply voltages and thereby ensures energy efficiency. Energy Delay Product (EDP) reduces by 1.6x over standard 6T SRAM with a faster data access. Transistor placement and their biasing technique in layer-2 enables In-memory bitwise Boolean computation. When computing bulk In-memory operations, 6.5x energy savings is achieved as compared to computing outside the memory system.
- Published
- 2018
15. Harnessing Emerging Technology for Compute-in-Memory Support
- Author
-
Srivatsa Srinivasa, Vijaykrishnan Narayanan, Jack Sampson, Sumitha George, Akshay Krishna Ramanathan, and Nicholas Jao
- Subjects
Focus (computing) ,Adder ,Computer science ,Technological change ,Emerging technologies ,020208 electrical & electronic engineering ,020206 networking & telecommunications ,02 engineering and technology ,Non-volatile memory ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,Throughput (business) ,Design space - Abstract
Compute-in-Memory (CiM) techniques focus on reducing data movement by integrating compute elements within or near the memory primitives. While there have been decades of research on various aspects of such logic and memory integration, the confluence of new technology changes and emerging workloads makes us revisit this design space. This work focuses on new functionality that can be embedded to SRAMs using emerging monolithic 3D integration. Properties of the new technology transform the costs of embedding such new functionality compared to prior efforts. This work also explores how compute functionality can be embedded into cross-point style non-volatile memory systems.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.