94 results on '"Csaba Andras Moritz"'
Search Results
2. Technology Development and Modeling of Switching Lattices Using Square and H Shaped Four-Terminal Switches
- Author
-
Ismail Cevik, Csaba Andras Moritz, Levent Aksoy, Nihat Akkan, Herman Sedef, Mustafa Altun, and Serzat Safaltin
- Subjects
H shaped ,Computer science ,Construct (python library) ,Technology development ,Square (algebra) ,Computer Science Applications ,Human-Computer Interaction ,CMOS ,Terminal (electronics) ,Computer Science (miscellaneous) ,Electronic engineering ,Implementation ,Dynamic logic (digital electronics) ,Information Systems - Abstract
Switching lattices formed by four-terminal switches are introduced as dense rectangular structures to implement Boolean logic functions. It is shown in literature that switching lattices offer a significant area advantage in terms of the number of switches over the conventional CMOS implementations. Although the computing potential of switching lattices has been well justified, the same thing cannot be said for their physical implementation. There have been conceptual ideas for the technology development, but no concrete and directly applicable technology has been proposed yet. In this study, we show that switching lattices can be implemented using the CMOS technology. For this purpose, we propose two different four-terminal switch structures with square and H shaped gates. We construct these structures in three-dimensional technology computer-aided design (TCAD) environment satisfying the design rules of the TSMC 65nm CMOS process and perform simulations. We develop Level 3 DC and AC models of the switches in LTspice environment using the TCAD simulation data. As an experiment, we realize logic functions with the developed models using static and dynamic logic solutions. Experimental results show that switching lattices occupy much less layout area and have competitive delay and power consumption values when compared to the conventional CMOS implementations
- Published
- 2022
- Full Text
- View/download PDF
3. Machine Learning Based Thermal Evaluation for Vertically-Composed Fine-Grained 3D CMOS
- Author
-
Csaba Andras Moritz, Sachin Bhat, Sourabh Kulkarni, and Mingyu Li
- Abstract
Thermal management in 3D integrated circuits is a critical challenge due to their high computational density. Heat dissipation paths from top circuit layers through bottom layers to substrate are heavily constraining heat extraction. Various thermal management frameworks have been proposed to address thermal issues in different granularities. All these frameworks require a thermal evaluation stage that characterizes the thermal profile of large designs with fast runtime. In this work, we present a machine learning based thermal evaluation method that predicts all standard cell temperatures based on features extracted from circuit CAD files. We have built thermal resistance networks for 10 benchmark circuits. We performed simulations to achieve the thermal data, and trained the thermal model with the data. The model is highly accurate and can identify all over-heated cells that need to be thermally-optimized. Runtime overhead is minimal. For a 435k-cell SPARC T2 core, the runtime for predicting all cell temperatures is as small as 3.12s, which is negligible compared to the runtime of other physical design stages.
- Published
- 2022
- Full Text
- View/download PDF
4. Architecting for Artificial Intelligence with Emerging Nanotechnology
- Author
-
Sachin Bhat, Sourabh Kulkarni, and Csaba Andras Moritz
- Subjects
Artificial neural network ,Neuromorphic engineering ,Hardware and Architecture ,business.industry ,Emerging technologies ,Computer science ,Bayesian network ,Graphical model ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software ,Domain (software engineering) - Abstract
Artificial Intelligence is becoming ubiquitous in products and services that we use daily. Although the domain of AI has seen substantial improvements over recent years, its effectiveness is limited by the capabilities of current computing technology. Recently, there have been several architectural innovations for AI using emerging nanotechnology. These architectures implement mathematical computations of AI with circuits that utilize physical behavior of nanodevices purpose-built for such computations. This approach leads to a much greater efficiency vs. software algorithms running on von Neumann processors or CMOS architectures, which emulate the operations with transistor circuits. In this article, we provide a comprehensive survey of these architectural directions and categorize them based on their contributions. Furthermore, we discuss the potential offered by these directions with real-world examples. We also discuss major challenges and opportunities in this field.
- Published
- 2021
- Full Text
- View/download PDF
5. Circuit Design Steps for Nano-Crossbar Arrays: Area-Delay-Power Optimization With Fault Tolerance
- Author
-
Valentina Ciriani, Elena Ioana Vatajelu, Dan Alexandrescu, Luca Frontini, Mircea R. Stan, Muhammed Ceylan Morgul, Onur Tunali, Mustafa Altun, Csaba Andras Moritz, Lorena Anghel, Istanbul Technical University (ITÜ), Università degli Studi di Milano = University of Milan (UNIMI), SPINtronique et TEchnologie des Composants (SPINTEC), Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche Interdisciplinaire de Grenoble (IRIG), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Grenoble Alpes (UGA), Architectures and Methods for Resilient Systems (TIMA-AMfoRS ), Techniques de l'Informatique et de la Microélectronique pour l'Architecture des systèmes intégrés (TIMA), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), University of Massachusetts [Amherst] (UMass Amherst), University of Massachusetts System (UMASS), University of Virginia, iROc Technologies (IROC TECHNOLOGIES), Cadence Connection-EDA Consortium-FSA-Cubic Micro, Università degli Studi di Milano [Milano] (UNIMI), Architectures and Methods for Resilient Systems (AMfoRS ), and University of Virginia [Charlottesville]
- Subjects
Computer science ,Circuit design ,Fault tolerance ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Memristor ,021001 nanoscience & nanotechnology ,Fault (power engineering) ,Computer Science Applications ,Power optimization ,law.invention ,PACS 8542 ,Logic synthesis ,CMOS ,law ,Electronic engineering ,[SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics ,Electrical and Electronic Engineering ,Crossbar switch ,0210 nano-technology ,ComputingMilieux_MISCELLANEOUS ,Hardware_LOGICDESIGN - Abstract
Nano-crossbar arrays have emerged to achieve high performance computing beyond the limits of current CMOS with the drawback of higher fault rates. They offer area and power efficiency in terms of their easy-to-fabricate and dense physical structures. They consist of regularly placed crosspoints as computing elements, which behave as diode, memristor, field effect transistor, or novel four-terminal switching devices. In this study, we establish a complete design framework for crossbar circuits explaining and analyzing every step of the process. We comparatively elaborate on these technologies in the sense of their capabilities for computation regarding area including a new logic synthesis technique for memristors, fault tolerance including a novel paradigm for four-terminal devices, delay, and power consumption. As a result, this study introduces a synthesis methodology that considers basic technology preference for switching crosspoints and fault rates of the given crossbar as well as their effects on performance metrics including power, delay, and area.
- Published
- 2021
- Full Text
- View/download PDF
6. SkyBridge-3D-CMOS 2.0: IC Technology for Stacked-Transistor 3D ICs beyond FinFETs
- Author
-
Sachin Bhat, Shaun Ghosh, Mingyu Li, Sourabh Kulkarni, and Csaba Andras Moritz
- Subjects
Emulation ,Computer science ,Transistor ,Spice ,Semiconductor device modeling ,Hardware_PERFORMANCEANDRELIABILITY ,Technology assessment ,law.invention ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,Miniaturization ,Electronic engineering ,Field-effect transistor - Abstract
For sub-5nm technology nodes, gate-all-around (GAA) FETs are positioned to replace FinFETs to enable the continued miniaturization of ICs in the future. In this paper, we introduce SkyBridge-3D-CMOS 2.0, a 3D-IC technology featuring integration of stacked vertical GAAFETs and 3D interconnects. It aims to provide an integrated solution to critical technology aspects, especially when scaling to sub-5nm nodes. We address important aspects such as 3D fabric components, CAD tool flow, compact model for the GAAFETs and a scalable manufacturing process. The fabric features junctionless accumulation-mode field effect transistors (JAMFETs) including various configurations with multiple threshold voltages and multiple nanowires per transistor, to meet performance and stand-by power constraints of modern SoCs. Furthermore, we develop BSIM-CMG-based compact models for these device configurations to enable technology assessment using SPICE simulations. To enable scalable manufacturing, we create virtual process decks incorporating etch and deposition models using Process Explorer, an industry standard process emulation tool. Technology assessment using ring oscillators shows that SkyBridge-3D-CMOS 2.0 at the chosen design point, using 16nm gate length and 10-nm nanowires, achieves ~18% performance and 31% energy efficiency improvement versus 7nm FinFET CMOS. Area analysis of standard cells shows up to 6x benefit versus aggressively scaled 2D-5T cells.
- Published
- 2021
- Full Text
- View/download PDF
7. Architecting 3D integrated circuit fabrics at nanoscale
- Author
-
Csaba Andras Moritz
- Subjects
Materials science ,3d integrated circuit ,Nanotechnology ,Nanoscopic scale - Published
- 2019
- Full Text
- View/download PDF
8. A Wafer-scale Manufacturing Pathway for Fine-grained Vertical 3D-IC Technology
- Author
-
Sachin Bhat, Csaba Andras Moritz, Sounak Shaun Ghosh, Mingyu Li, and Sourabh Kulkarni
- Subjects
010302 applied physics ,Interconnection ,Semiconductor device modeling ,Nanowire ,Three-dimensional integrated circuit ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit ,Work in process ,021001 nanoscience & nanotechnology ,01 natural sciences ,law.invention ,CMOS ,law ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,0210 nano-technology ,Scaling - Abstract
Three-dimensional integrated circuits (3D-ICs) provide a feasible path for scaling CMOS technology in the foreseeable future. IMEC and IRDS roadmaps project that 3D integration is a key avenue for the IC industry beyond 2024. They project that some form of 3D-IC technology based on nanosheets/nanowires is likely to become mainstream soon. SkyBridge-3D-CMOS (S3DC) is one among the first vertical nanowire-based fine-grained 3D-IC directions which offers paradigm shift in technology scaling as well as design. Rather than die-die and layer-layer stacking, S3DC’s core aspects, from device to circuit style to interconnect, are co-architected in a 3D fabric-centered manner building on a uniform 3D nanowire template. Nanowire-based 3D-IC technologies such as S3DC solve most of the traditional scaling issues of 2D-CMOS but present new manufacturing challenges because of their complex 3D geometry. Therefore, for these directions to become mainstream, a robust wafer-scale manufacturing pathway that addresses these challenges is vital. In this paper, we propose a wafer-scale manufacturing pathway aimed at developing and optimizing the manufacturing process flows of S3DC. Using physics-driven virtual process integration functionalized with design and process parameters, we obtained realistic 3D structures for all the underlying IC elements and finally combined them to build 3D standard cells in S3DC. Electrical characterization of resultant structures using process and device simulations were performed while considering the material properties and nanoscale physics effects. Circuit-level simulations accounting for device behavior using SPICE-compatible compact model and circuit interconnect parasitics were carried out to study the impact of variations in process steps such as patterning, lithography, etch, deposition on device and interconnect performance. Our bottom-up simulation results indicate that the proposed pathway is robust enough to be adopted for large-scale production thus paving the way for wide-spread adoption of vertical fine-grained 3D-IC technologies.
- Published
- 2021
- Full Text
- View/download PDF
9. Accelerating Simulation-based Inference with Emerging AI Hardware
- Author
-
Mario Michael Krell, Alexander Tsyplikhin, Csaba Andras Moritz, and Sourabh Kulkarni
- Subjects
business.industry ,Computer science ,Key (cryptography) ,Probabilistic logic ,Statistical inference ,Inference ,Hardware acceleration ,Leverage (statistics) ,Approximate Bayesian computation ,business ,Supercomputer ,Computer hardware - Abstract
Developing models of natural phenomena by capturing their underlying complex interactions is a core tenet of various scientific disciplines. These models are useful as simulators and can help in understanding the natural processes being studied. One key challenge in this pursuit has been to enable statistical inference over these models, which would allow these simulation-based models to learn from real-world observations. Recent efforts, such as Approximate Bayesian Computation (ABC), show promise in performing a new kind of inference to leverage these models. While the scope of applicability of these inference algorithms is limited by the capabilities of contemporary computational hardware, they show potential of being greatly parallelized. In this work, we explore hardware accelerated simulation-based inference over probabilistic models, by combining massively parallelized ABC inference algorithm with the cutting-edge AI chip solutions that are uniquely suited for this purpose. As a proof-of-concept, we demonstrate inference over a probabilistic epidemiology model used to predict the spread of COVID-19. Two hardware acceleration platforms are compared - the Tesla V100 GPU and the Graphcore Mark1 IPU. Our results show that while both of these platforms outperform multi-core CPUs, the Mk1 IPUs are 7.5x faster than the Tesla V100 GPUs for this workload.
- Published
- 2020
- Full Text
- View/download PDF
10. Nano-Crossbar Based Computing: Lessons Learned And Future Directions
- Author
-
Mustafa Alton, Ahmet Erten, Ismail Cevik, Mircea R. Stan, Osman Eksik, and Csaba Andras Moritz
- Subjects
Computer science ,Fault tolerance ,Memristor ,Hardware_PERFORMANCEANDRELIABILITY ,law.invention ,Logic synthesis ,Computer architecture ,CMOS ,law ,Miniaturization ,Hardware_INTEGRATEDCIRCUITS ,Point (geometry) ,Crossbar switch ,Diode ,Hardware_LOGICDESIGN - Abstract
In this paper, we first summarize our research activities done through our European Union's Horizon-2020 project between 2015 and 2019. The project has a goal of developing synthesis and performance optimization techniques for nano-crossbar arrays. For this purpose, different computing models including diode, memristor, FET, and four-terminal switch based models, within different technologies including carbon nanotubes, nanowires, and memristors as well as the CMOS technology have been investigated. Their capabilities to realize logic functions and to tolerate faults have been deeply analyzed. From these experiences, we think that instead of replacing CMOS with a completely new crossbar based technology, developing CMOS compatible crossbar technologies and computing models is a more viable solution to overcome challenges in CMOS miniaturization. At this point, four-terminal switch based arrays, called switching lattices, come forward with their CMOS compatibility feature as well as with their area efficient device and circuit realizations. We have showed that switching lattices can be efficiently implemented using a standard CMOS process to implement logic functions by doing experiments in a 65nm CMOS process. Further in this paper, we make an introduction of realizing memory arrays with switching lattices including ROMs and RAMs. Also we discuss challenges and promises in realizing switching lattices for under 30nm CMOS technologies including FinFET technologies.
- Published
- 2020
11. Hardware-accelerated Simulation-based Inference of Stochastic Epidemiology Models for COVID-19
- Author
-
Sourabh Kulkarni, Mario Michael Krell, Seth Nabarro, and Csaba Andras Moritz
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Distributed, Parallel, and Cluster Computing ,68T09 ,Hardware and Architecture ,Computer Science - Artificial Intelligence ,Hardware Architecture (cs.AR) ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,Software ,I.2.1 - Abstract
Epidemiology models are central to understanding and controlling large-scale pandemics. Several epidemiology models require simulation-based inference such as Approximate Bayesian Computation (ABC) to fit their parameters to observations. ABC inference is highly amenable to efficient hardware acceleration. In this work, we develop parallel ABC inference of a stochastic epidemiology model for COVID-19. The statistical inference framework is implemented and compared on Intel’s Xeon CPU, NVIDIA’s Tesla V100 GPU, Google’s V2 Tensor Processing Unit (TPU), and the Graphcore’s Mk1 Intelligence Processing Unit (IPU), and the results are discussed in the context of their computational architectures. Results show that TPUs are 3×, GPUs are 4×, and IPUs are 30× faster than Xeon CPUs. Extensive performance analysis indicates that the difference between IPU and GPU can be attributed to higher communication bandwidth, closeness of memory to compute, and higher compute power in the IPU. The proposed framework scales across 16 IPUs, with scaling overhead not exceeding 8% for the experiments performed. We present an example of our framework in practice, performing inference on the epidemiology model across three countries and giving a brief overview of the results.
- Published
- 2020
- Full Text
- View/download PDF
12. Skybridge-3D-CMOS: A Fine-Grained 3D CMOS Integrated Circuit Technology
- Author
-
Mostafizur Rahman, Jiajun Shi, Mingyu Li, Santosh Khasanvis, Sachin Bhat, and Csaba Andras Moritz
- Subjects
010302 applied physics ,Engineering ,business.industry ,Nanowire ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit ,01 natural sciences ,020202 computer hardware & architecture ,Computer Science Applications ,law.invention ,Integrated injection logic ,CMOS ,law ,0103 physical sciences ,Vertical direction ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Electrical and Electronic Engineering ,Layer (object-oriented design) ,Routing (electronic design automation) ,business ,Electronic circuit - Abstract
Parallel and monolithic three-dimensional (3-D) integration directions realize 3-D integrated circuits (ICs) by utilizing layer-by-layer implementations, with each functional layer being composed in 2-D. In contrast, vertically composed 3-D CMOS has eluded us likely due to the seemingly insurmountable requirement of highly customized complex routing and regional 3-D doping to form and connect CMOS pull-up and pull-down networks in 3-D. In the current layer-by-layer directions, routing can be worse than 2D CMOS because of the limited pin access. In this paper, we propose Skybridge-3D-CMOS (S3DC), an IC fabric that shows for the first time a pathway to achieve fine-grained static CMOS circuit implementations using the vertical direction while also solving 3-D routability. It employs a new fabric assembly scheme based on predoped vertical nanowire bundles. It implements circuits in and across nanowires. It utilizes unique connectivity features to achieve CMOS connectivity in 3-D with excellent routability. As compared to the usually severely congested monolithic 3-D implementations, S3DC eliminates the routing congestions in all benchmarks studied. Further results, for the implemented benchmarks, show 56–77% reductions in power consumption, 4X–90X increases in density, and 20% loss to 9% benefit in best operating frequencies compared with the transistor-level monolithic 3-D technology.
- Published
- 2017
- Full Text
- View/download PDF
13. NP-Dynamic Skybridge: A Fine-Grained 3D IC Technology with NP-Dynamic Logic
- Author
-
Csaba Andras Moritz, Mostafizur Rahman, Santosh Khasanvis, Jiajun Shi, and Mingyu Li
- Subjects
010302 applied physics ,Computer science ,Three-dimensional integrated circuit ,02 engineering and technology ,Integrated circuit ,021001 nanoscience & nanotechnology ,01 natural sciences ,Computer Science Applications ,law.invention ,Human-Computer Interaction ,CMOS ,Computer architecture ,law ,Logic gate ,0103 physical sciences ,Computer Science (miscellaneous) ,Node (circuits) ,0210 nano-technology ,Throughput (business) ,Dynamic logic (digital electronics) ,Information Systems ,Electronic circuit - Abstract
A new 3D IC fabric named NP-Dynamic Skybridge is proposed that provides fine-grained vertical 3D integration for future technology scaling. Relying on a template of vertical nanowires, it expands our prior work to incorporate and utilize both n- and p-type transistors in a novel NP-Dynamic circuit-style compatible with true 3D integration. This enables a wide range of elementary logics leading to more compact circuits, simple clocking schemes for cascading logic stages and low buffer requirement. We detail new design concepts for larger-scale circuits, and evaluate our approach using a 4-bit nanoprocessor implemented in 16 nm technology node. A new pipelining scheme specifically designed for our 3D NP-Dynamic circuits is employed in the nanoprocessor. We compare our approach with 2D CMOS as well as state-of-the-art transistor-level monolithic 3D IC (T-MI) approach. Benchmarking results for the 4-bit nanoprocessor show benefits of up to 56.7x density, 3.8x power and 1.7x throughput over 2D CMOS. Compared with T-MI, our new 3D fabric showed 31x density, 3x power and 1.4x throughput improvement. Additional evaluation of 4-bit and 8-bit CLA designs shows that significantly improved gains can be achieved for our 3D approach over 2D CMOS with increasing circuit bit-width, indicating potential for future scalability.
- Published
- 2017
- Full Text
- View/download PDF
14. Reconfigurable Probabilistic AI Architecture for Personalized Cancer Treatment
- Author
-
Sourabh Kulkarni, Csaba Andras Moritz, and Sachin Bhat
- Subjects
Workstation ,Computer science ,business.industry ,Probabilistic logic ,Bayesian network ,Inference ,Machine learning ,computer.software_genre ,Reconfigurable computing ,law.invention ,law ,Scalability ,Artificial intelligence ,Personalized medicine ,business ,computer ,Interpretability - Abstract
The machinery of life operates on the complex interactions between genes and proteins. Attempts to capture these interactions have culminated into the study of Genetic Networks. Genetic defects lead to erroneous interactions, which in turn lead to diseases. For personalized treatment of these diseases, a careful analysis of Genetic Networks and a patient's genetic data is required. In this work, we co-design a novel probabilistic AI model along with a reconfigurable architecture to enable personalized treatment for cancer patients. This approach enables a cost-effective and scalable solution for widespread use of personalized medicine. Our model offers interpretability and realistic confidences in its predictions, which is essential for medical applications. The resulting personalized inference on a dataset of 3k patients agrees with doctor's treatment choices in 80% of the cases. The other cases are diverging from the universal guideline, enabling individualized treatment options based on genetic data. Our architecture is validated on a hybrid SoC-FPGA platform which performs 25× faster than software, implemented on a 16-core Xeon workstation, while consuming 25× less power.
- Published
- 2019
- Full Text
- View/download PDF
15. Realization of Four-Terminal Switching Lattices: Technology Development and Circuit Modeling
- Author
-
Mustafa Altun, M. Ceylan Morgul, Csaba Andras Moritz, Serzat Safaltin, Sebahattin Gürmen, Oguz Gencer, and Levent Aksoy
- Subjects
010302 applied physics ,business.industry ,Computer science ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Semiconductor ,CMOS ,Logic gate ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Field-effect transistor ,business ,Realization (systems) ,Diode ,Network analysis - Abstract
Our European Union's Horizon-2020 project aims to develop a complete synthesis and performance optimization methodology for switching nano-crossbar arrays that leads to the design and construction of an emerging nanocomputer. Within the project, we investigate different computing models based on either two-terminal switches, realized with field effect transistors, resistive and diode devices, or four-terminal switches. Although a four-terminal switch based model offers a significant area advantage, its realization at the technology level needs further justifications and raises a number of questions about its feasibility. In this study, we answer these questions. First, by using three dimensional technology computer-aided design (TCAD) simulations, we show that four-terminal switches can be directly implemented with the CMOS technology. For this purpose, we try different semiconductor gate materials in different formations of geometric shapes. Then, by fitting the TCAD simulation data to the standard CMOS current-voltage equations, we develop a Spice model of a four-terminal switch. Finally, we successfully perform Spice circuit simulations on four-terminal switches with different sizes. As a follow-up work within the project, we will proceed to the fabrication step.
- Published
- 2019
- Full Text
- View/download PDF
16. Integrated Synthesis Methodology for Crossbar Arrays
- Author
-
E. Ioana Vatajelu, Lorena Anghel, Valentina Ciriani, Luca Frontini, Dan Alexandrescu, Csaba Andras Moritz, Mircea R. Stan, M. Ceylan Morgul, Onur Tunali, and Mustafa Altun
- Subjects
Computer science ,Process (computing) ,Fault tolerance ,02 engineering and technology ,Memristor ,021001 nanoscience & nanotechnology ,Fault (power engineering) ,Supercomputer ,020202 computer hardware & architecture ,law.invention ,Logic synthesis ,CMOS ,law ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Crossbar switch ,0210 nano-technology - Abstract
Nano-crossbar arrays have emerged as area and power efficient structures with an aim of achieving high performance computing beyond the limits of current CMOS. Due to the stochastic nature of nano-fabrication, nano arrays show different properties both in structural and physical device levels compared to conventional technologies. Mentioned factors introduce random characteristics that need to be carefully considered by synthesis process. For instance, a competent synthesis methodology must consider basic technology preference for switching elements, defect or fault rates of the given nano switching array and the variation values as well as their effects on performance metrics including power, delay, and area. Presented synthesis methodology in this study comprehensively covers the all specified factors and provides optimization algorithms for each step of the process.
- Published
- 2018
- Full Text
- View/download PDF
17. Architecting for Causal Intelligence at Nanoscale
- Author
-
Mingyu Li, Ayan K. Biswas, Csaba Andras Moritz, Mostafizur Rahman, Supriyo Bandyopadhyay, Mohammad Salehi-Fashami, Santosh Khasanvis, and Jayasimha Atulasimha
- Subjects
symbols.namesake ,Theoretical computer science ,General Computer Science ,Emerging technologies ,Computer science ,business.industry ,Probabilistic logic ,Intelligent decision support system ,symbols ,Cognition ,Software engineering ,business ,Von Neumann architecture - Abstract
Conventional Von Neumann microprocessors are inefficient for supporting machine intelligence due to layers of abstraction, limiting the feasibility of machine-learning frameworks in critical applications. A new approach for architecting intelligent systems, using physical equivalence and leveraging emerging nanotechnology, can pave the way to machine intelligence everywhere.
- Published
- 2015
- Full Text
- View/download PDF
18. Low-Power Heterogeneous Graphene Nanoribbon-CMOS Multistate Volatile Memory Circuit
- Author
-
Mostafizur Rahman, Roger K. Lake, Santosh Khasanvis, K. M. Masum Habib, and Csaba Andras Moritz
- Subjects
Graphene ,Computer science ,law.invention ,CMOS ,Nanoelectronics ,Hardware and Architecture ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Static random-access memory ,Electrical and Electronic Engineering ,Crossbar switch ,Software ,Dram ,Graphene nanoribbons ,Volatile memory - Abstract
Graphene is an emerging nanomaterial believed to be a potential candidate for post-Si nanoelectronics due to its exotic properties. Recently, a new graphene nanoribbon crossbar (xGNR) device was proposed which exhibits negative differential resistance (NDR). In this article, a multistate memory design is presented that can store multiple bits in a single cell enabled by this xGNR device, called graphene nanoribbon tunneling random access memory (GNTRAM). An approach to increase the number of bits per cell is explored alternative to physical scaling to overcome CMOS SRAM limitations. A comprehensive design for quaternary GNTRAM is presented as a baseline, implemented with a heterogeneous integration between graphene and CMOS. Sources of leakage and approaches to mitigate them are investigated. This design is extensively benchmarked against 16nm CMOS SRAMs and 3T DRAM. The proposed quaternary cell shows up to 2.27× density benefit versus 16nm CMOS SRAMs and 1.8× versus 3T DRAM. It has comparable read performance and is power efficient up to 1.32× during active period and 818× during standby against high-performance SRAMs. Multistate GNTRAM has the potential to realize high-density low-power nanoscale embedded memories. Further improvements may be possible by using graphene more extensively, as graphene transistors become available in the future.
- Published
- 2015
- Full Text
- View/download PDF
19. Wave Interference Functions for Neuromorphic Computing
- Author
-
Csaba Andras Moritz, Santosh Khasanvis, Mostafizur Rahman, and Jiajun Shi
- Subjects
Neuromorphic engineering ,CMOS ,Computer science ,Scalability ,Artificial neuron ,Electronic engineering ,Overhead (computing) ,Electrical and Electronic Engineering ,Interference (wave propagation) ,Multiplexing ,Bottleneck ,Computer Science Applications - Abstract
Neuromorphic computing mimicking the functionalities of mammalian brain holds the promise for cognitive capabilities enabling new intelligent applications. However, research efforts so far mainly focused on using analog and digital CMOS technologies to emulate neural activities, and are yet to achieve expected benefits. They suffer from limited scalability, density overhead, interconnection bottleneck and power consumption related constraints. In this paper, we present a transformative approach for neuromorphic computing with Wave Interference Functions (WIF). This is a framework using emerging nonequilibrium wave phenomenon such as spin waves. WIF leverages inherent wave attributes for multidimensional, multivalued data representation and communication, resulting in reduced connectivity requirements and efficient neural function implementations. It also yields a compact implementation of an artificial neuron. Moreover, since WIF computation and communication are in the spin domain, extremely low-power operation is possible. Our evaluations indicate upto 57× higher density, 775× lower power and 2× better performance when compared to an equivalent 8-bit 45-nm CMOS neuron. Our scalability study using arithmetic circuits for higher bit-width neuron implementations indicate upto 63× density, 884× power and 3× performance benefits in comparison to a 32-bit CMOS equivalent design at 45 nm.
- Published
- 2015
- Full Text
- View/download PDF
20. Magneto-Electric Approximate Computational Circuits for Bayesian Inference
- Author
-
Sachin Bhat, Sourabh Kulkarni, Csaba Andras Moritz, and Santosh Khasanvis
- Subjects
Computer science ,Probabilistic logic ,Linear scale ,Bayesian network ,Graphical model ,Bayesian inference ,Algorithm ,Rotation formalisms in three dimensions ,Electronic circuit ,Abstraction layer - Abstract
Probabilistic graphical models like Bayesian Networks (BNs) are powerful cognitive-computing formalisms, with many similarities to human cognition. These models have a multitude of real-world applications. New emerging-technology based circuit paradigms leveraging physical equivalence e.g., operating directly on probabilities vs. introducing layers of abstraction, have shown promise in raising the performance and overall efficiency of BNs, enabling networks with millions of random variables. While previous BNs of up to 100s of nodes have been shown to require single-digit precision without affecting application outcomes, the significantly larger number of variables requires the computational precision to be scaled to correctly support BN operations. We introduce a new computational circuit fabric based on mixed-signal magneto-electric computations operating with physical equivalence and supporting probabilistic computations with a new approximate circuit style. Precision scaling impacts area at a logarithmic vs. linear scale offering a much lower power and performance cost than in prior directions. Results show 30x area reduction for a 0.001 precision vs. prior direction, while maintaining three orders of magnitude benefits vs. 100-core processor implementations.
- Published
- 2017
- Full Text
- View/download PDF
21. Structure Discovery for Gene Expression Networks with Emerging Stochastic Hardware
- Author
-
Sachin Bhat, Csaba Andras Moritz, and Sourabh Kulkarni
- Subjects
Computer science ,business.industry ,Probabilistic logic ,Bayesian network ,Inference ,Machine learning ,computer.software_genre ,Bayesian inference ,Approximate inference ,Search algorithm ,Hardware acceleration ,Artificial intelligence ,Graphical model ,business ,computer - Abstract
Gene Expression Networks (GENs) attempt to model how genetic information stored in the DNA (Genotype) results in the synthesis of proteins, and consequently, the physical traits of an organism (Phenotype). Deciphering GENs plays an important role in a wide range of applications from genetic studies of the origins of life to personalized healthcare. Probabilistic graphical models such as Bayesian Networks (BNs) are used to perform learning and inference of GENs from genetic data. Current techniques of generating BNs of GENs from data, which are mostly approximate in nature, involve searching and scoring of multiple probabilistic graphical structures. However, while search algorithms can be efficiently implemented in software, the same is not true for scoring. Scoring of probabilistic models with inherent parallelism is inefficient when performed sequentially over conventional architectures comprising of deterministic devices. In this paper, we introduce a new nanoscale hardware acceleration framework, enabling fast and efficient Bayesian inference operations, significantly accelerating the scoring aspect of the BN learning of GENs using a combination of emerging stochastic devices and CMOS technology. The stochasticity of the devices is utilized to efficiently perform approximate inference on probabilistic networks, and the circuit framework constituting these devices is designed to exploit the inherent parallelism in these models. We demonstrate approximate inference operation over a small BN. We estimate the performance benefits of five orders of magnitude in performing inference operations using this architecture over software-only approaches.
- Published
- 2017
- Full Text
- View/download PDF
22. Vertically-composed fine-grained 3D CMOS
- Author
-
Jiajun Shi, Mostafizur Rahman, Mingyu Li, Csaba Andras Moritz, Santosh Khasanvis, and Sachin Bhat
- Subjects
Scheme (programming language) ,Computer science ,Nanowire ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit ,law.invention ,CMOS ,law ,Power consumption ,Vertical direction ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,computer ,Hardware_LOGICDESIGN ,computer.programming_language ,Electronic circuit - Abstract
Parallel and monolithic 3D integration directions realize 3D integrated circuits (ICs) by utilizing layer-by-layer implementations. In contrast, vertically composed 3D CMOS has eluded us likely due to the seemingly insurmountable CMOS circuit style connectivity requirement in 3D. In this paper, we describe Skybridge-3D-CMOS (S3DC), an IC fabric that shows for the first time a pathway to achieve fine-grained static CMOS circuit implementations leveraging the vertical direction. It employs a new fabric assembly scheme based on pre-doped vertical nanowire bundles and implements CMOS circuits in and across nanowires. It utilizes innovative connectivity features to realize CMOS connectivity in 3D. Evaluation results, for the implemented benchmarks, show 72%–77% reductions in power consumption, 13X-16X increases in density, and 2% loss to 9% benefit in best operating frequencies compared with the state-of-art transistor-level monolithic 3D technology.
- Published
- 2017
- Full Text
- View/download PDF
23. SkyNet: Memristor-based 3D IC for artificial neural networks
- Author
-
Sourabh Kulkami, Mingyu Li, Sachin Bhat, Csaba Andras Moritz, and Jiajun Shi
- Subjects
Engineering ,Artificial neural network ,business.industry ,020208 electrical & electronic engineering ,Three-dimensional integrated circuit ,02 engineering and technology ,Memristor ,021001 nanoscience & nanotechnology ,law.invention ,Phase-change memory ,Memistor ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Routing (electronic design automation) ,Crossbar switch ,0210 nano-technology ,business - Abstract
Hardware implementations of artificial neural networks (ANNs) have become feasible due to the advent of persistent 2-terminal devices such as memristor, phase change memory, MTJs, etc. Hybrid memristor crossbar/CMOS systems have been studied extensively and demonstrated experimentally. In these circuits, memristors located at each cross point in a crossbar are, however, stacked on top of CMOS circuits using back end of line processing (BOEL), limiting scaling. Each neuron's functionality is spread across layers of CMOS and memristor crossbar and thus cannot support the required connectivity to implement large-scale multi-layered ANNs. This paper introduces a new fine-grained 3D integrated circuit technology for ANNs that is one of the first IC technologies for this purpose. Synaptic weights implemented with devices are incorporated in a uniform vertical nanowire template co-locating the memory and computation requirements of ANNs within each neuron. Novel 3D routing features are used for interconnections in all three dimensions between the devices enabling high connectivity without the need for special pins or metal vias. To demonstrate the proof of concept of this fabric, classification of binary images using a perceptron-based feed forward neural network is shown. Bottom-up evaluations for the proposed fabric considering 3D implementation of fabric components reveal up to 21x density, 1.8x power benefits and a 2.6x improvement in delay when compared to 16nm hybrid memristor/CMOS technology.
- Published
- 2017
- Full Text
- View/download PDF
24. Fine-grained 3D reconfigurable computing fabric with RRAM
- Author
-
Csaba Andras Moritz, Mingyu Li, Sachin Bhat, and Jiajun Shi
- Subjects
business.industry ,Computer science ,Transistor ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Reconfigurable computing ,law.invention ,Resistive random-access memory ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Granularity ,Routing (electronic design automation) ,business ,Field-programmable gate array ,Hardware_LOGICDESIGN ,Electronic circuit - Abstract
Non-volatile 3D FPGA research to date utilizes layer-by-layer stacking of 2D CMOS / RRAM circuits. On the other hand, vertically-composed 3D FPGA that integrates CMOS and RRAM circuits has eluded us, owing to the difficult requirement of highly customized regional doping and material insertion in 3D to build and route complementary p- and n-type transistors as well as resistive switches. In the layer-by-layer non-volatile 3D FPGA, the connectivity between the monolithically stacked RRAMs and underlying CMOS circuits is likely to be limited and lead to large parasitic RCs. In this paper, we propose a fine-grained 3D reconfigurable computing fabric concept. It implements CMOS / RRAM hybrid circuits within the pre-doped vertical nanowire template. Transistors and resistive switches can be integrated with fine granularity, which reduces the routing overhead between RRAM and CMOS circuits and increases the density. We estimate the density benefit of the proposed fabric to be 27X relative to the monolithic 3D FPGA with stacked RRAMs. Estimated Elmore delays are improved by 5.4X and 2.2X for configuration and normal operation, respectively.
- Published
- 2017
- Full Text
- View/download PDF
25. Power-delivery network in 3D ICs: Monolithic 3D vs. Skybridge 3D CMOS
- Author
-
Csaba Andras Moritz, Mingyu Li, and Jiajun Shi
- Subjects
Engineering ,CMOS ,business.industry ,Vertical direction ,Electronic engineering ,Three-dimensional integrated circuit ,Routing (electronic design automation) ,business ,Power network design ,Electrical efficiency ,3d design ,Power (physics) - Abstract
Design for power-delivery network (PDN) is one of the major challenges in 3D IC technology. In the typical layer-by-layer stacked monolithic 3D (M3D) approaches, PDN has limited accessibility to the device layer away from power/ground source due to limited routability and routing resources in the vertical direction. This results in an incomplete and low-density PDN design and also severe IR-drop issue. Some improved M3D approaches try to enlarge design area to create additional vertical routing resources for robust and high-density PDN design. However, this leads to degradation of design density and in turn diminishes 3D design benefits. Skybridge 3D CMOS (S3DC) is a recently proposed fine-grained 3D IC fabric relying on vertical nanowires that presents a paradigm shift for scaling, while addressing critical challenges in 3D IC technology. Skybridge's core fabric components provide a greater degree of routing capability in both horizontal and vertical directions compared to other 3D approaches which can fully maintain the 3D design density while enabling a robust PDN design. In this paper, we present the PDN design and evaluate the IR drop in S3DC vs. the state-of-the-art transistor-level monolithic 3D IC (TR-L M3D). The typical TR-L M3D approach that can only use low-density PDN shows a severe IR-drop which is out of the standard IR-drop budget. The improved TR-L M3D version that can use high-density PDN meets the requirement of standard IR-drop budget (
- Published
- 2017
- Full Text
- View/download PDF
26. Heterogeneous graphene–CMOS ternary content addressable memory
- Author
-
Santosh Khasanvis, Mostafizur Rahman, and Csaba Andras Moritz
- Subjects
Electron mobility ,Computer Networks and Communications ,Graphene ,Computer science ,Transistor ,Nanotechnology ,Theoretical Computer Science ,law.invention ,Nanoelectronics ,CMOS ,Artificial Intelligence ,Hardware and Architecture ,law ,Hardware_INTEGRATEDCIRCUITS ,Crossbar switch ,Ternary operation ,Nanoscopic scale ,Software ,Graphene nanoribbons ,Hardware_LOGICDESIGN - Abstract
Leveraging nanotechnology for computing opens up exciting new avenues for breakthroughs. For example, graphene is an emerging nanoscale material and is believed to be a potential candidate for post-Si nanoelectronics due to high carrier mobility and extreme scalability. Recently, a new graphene nanoribbon crossbar (xGNR) device was proposed which exhibits negative differential resistance (NDR). In this paper we propose a novel graphene nanoribbon tunneling ternary content addressable memory (GNTCAM) enabled by xGNR device, featuring heterogeneous integration with CMOS transistors and routing. Benchmarking with respect to 16nm CMOS TCAM (which uses two binary SRAMs to store ternary information) shows that GNTCAM is up to 1.82× denser, up to 9.42× more power-efficient during stand-by, and has up to 1.6× faster performance during match operation. Thus, GNTCAM has the potential to realize low-power high-density nanoscale TCAMs. Further improvements may be possible by using graphene more extensively, as graphene transistors become available in future.
- Published
- 2014
- Full Text
- View/download PDF
27. On the Design of Ultra-High Density 14nm Finfet Based Transistor-Level Monolithic 3D ICs
- Author
-
Csaba Andras Moritz, Motoi Ichihashi, Deepak Nayak, Jiajun Shi, and Srinivasa Banna
- Subjects
Standard cell ,Engineering ,business.industry ,020208 electrical & electronic engineering ,Transistor ,Electrical engineering ,Three-dimensional integrated circuit ,Process design ,02 engineering and technology ,Integrated circuit ,020202 computer hardware & architecture ,law.invention ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Node (circuits) ,Routing (electronic design automation) ,business - Abstract
Conventional 2D CMOS faces severe challenges sub-22nm nodes. The monolithic 3D (M3D) IC technology enables ultra-high density vertical connections and provides a good path for technology node scaling. Transistor-level (TR-L) monolithic 3D IC is the most advanced and fine-grained M3D IC technology. In this paper, for the first time, the detailed design as well as benefits and challenges of a silicon validated 14nm Finfet process design kit (PDK) based TR-L M3D IC technology is explored. TR-L M3D standard cell layout is achieved based on 14nm Finfet design rules and feature sizes. A semi-customized RC extraction methodology is performed for accurate 3D cell RC extraction. After extensive simulation, TR-L M3D cell power, delay and area are evaluated and compared with equivalent 2D cells in the same technology node. System-level benchmarking with several circuits show up to 55% reduced footprint, 25% shorter wire length, and 18% lower power with TR-L M3D vs. 2D CMOS.
- Published
- 2016
- Full Text
- View/download PDF
28. Unconventional Nanocomputing with Physical Wave Interference Functions
- Author
-
Csaba Andras Moritz, Mostafizur Rahman, Santosh Khasanvis, and Prasad Shabadi
- Subjects
Materials science ,Quantum mechanics - Published
- 2016
- Full Text
- View/download PDF
29. Skybridge-3D-CMOS: A Vertically-Composed Fine-Grained 3D CMOS Integrated Circuit Technology
- Author
-
Csaba Andras Moritz, Mostafizur Rahman, Jiajun Shi, Sachin Bhat, Mingyu Li, and Santosh Khasanvis
- Subjects
010302 applied physics ,Flexibility (engineering) ,FOS: Computer and information sciences ,Interconnection ,Computer science ,Transistor ,Computer Science - Emerging Technologies ,02 engineering and technology ,Integrated circuit ,021001 nanoscience & nanotechnology ,01 natural sciences ,law.invention ,Emerging Technologies (cs.ET) ,CMOS ,law ,0103 physical sciences ,Electronic engineering ,Hardware_INTEGRATEDCIRCUITS ,Wafer ,Routing (electronic design automation) ,0210 nano-technology ,Electronic circuit - Abstract
Parallel and monolithic 3D integration directions offer pathways to realize 3D integrated circuits (ICs) but still lead to layer-by-layer implementations, each functional layer being composed in 2D first. This mindset causes challenging connectivity, routing and layer alignment between layers when connected in 3D, with a routing access that can be even worse than 2D CMOS, which fundamentally limits their potential. To fully exploit the opportunities in the third dimension, we propose Skybridge-3D-CMOS (S3DC), a fine-grained 3D integration approach that is directly composed in 3D, utilizing the vertical dimension vs. using a layer-by-layer assembly mindset. S3DC uses a novel wafer fabric creation with direct 3D design and connectivity in the vertical dimension. It builds on a uniform vertical nanowire template that is processed as a single wafer; it incorporates specifically architected structures for realizing devices, circuits, and heat management directly in 3D. Novel 3D interconnect concepts, including within the silicon layers, enable significantly improved routing flexibility in all three dimensions and a high-density 3D design paradigm overall. Intrinsic components for fabric-level 3D heat management are introduced. Extensive bottom-up simulations and experiments have been presented to validate the key fabric-enabling concepts. Evaluation results indicate up to 40x density and 10x performance-per-watt benefits against conventional 16-nm CMOS for the circuits studied; benefits are also at least an order of magnitude beyond what was shown to be possible with other 3D directions.
- Published
- 2016
- Full Text
- View/download PDF
30. Integrated Device–Fabric Explorations and Noise Mitigation in Nanoscale Fabrics
- Author
-
Csaba Andras Moritz, Pavan Panchapakeshan, Pritish Narayanan, Jorge Kina, and Chi On Chui
- Subjects
Nanofabrics ,Computer science ,Transistor ,Physical layer ,Computer Science Applications ,law.invention ,Noise ,Application-specific integrated circuit ,law ,Logic gate ,Noise control ,Electronic engineering ,Electrical and Electronic Engineering ,AND gate - Abstract
An integrated device-fabric methodology for evaluating and validating nanoscale computing fabrics is presented. The methodology integrates physical layer assumptions for materials and device structures with accurate 3-D simulations of device electrostatics and operations and circuit-level noise and cascading validations. Electrical characteristics of six different crossed nanowire field-effect transistors (xnwFETs) are simulated and current and capacitance data are obtained. Behavioral models incorporating device data are generated and used in fabric level simulations to evaluate noise implications of devices and sequencing schemes. Device characteristics are found to have different implications for logic “1” and logic “0” noise with faster devices being more (less) resilient to logic “1” (logic “0”) noise. A new noise resilient dynamic sequencing scheme is presented which isolates logic “0” noise events and prevents them from propagating to cascaded circuit stages, thereby enabling faster devices. Performance implications and optimizations for fabrics incorporating the new noise resilient scheme are discussed. The scheme is also analyzed and validated against an external noise source (power supply drooping). These results show that noise resilient nanofabrics can be designed through a combination of device engineering and fabric-level optimizations of the sequencing scheme. Performance optimizations and implications of device and physical layer assumptions on manufacturing are discussed.
- Published
- 2012
- Full Text
- View/download PDF
31. FastTrack: Toward Nanoscale Fault Masking With High Performance
- Author
-
Csaba Andras Moritz, Pavan Panchapakeshan, Prachi Joshi, M. M. U. Khan, and Pritish Narayanan
- Subjects
Majority rule ,Engineering ,business.industry ,Fault tolerance ,Computer Science Applications ,Application-specific integrated circuit ,CMOS ,Computer engineering ,Logic gate ,Redundancy (engineering) ,Electronic engineering ,Electrical and Electronic Engineering ,Fault model ,business ,AND gate - Abstract
High defect rates are associated with novel nanodevice-based systems owing to unconventional and self-assembly-based manufacturing processes. Furthermore, in emerging nanosystems, fault mechanisms and distributions may be very different from CMOS due to unique physical layer aspects, and emerging circuits and logic styles. Development of analytical fault models for nanosystems is necessary to explore the design of novel fault tolerance schemes that could be more effective than conventional schemes. In this paper, we first develop a detailed analytical fault model for the nanoscale application specific integrated circuits (NASIC) computing fabric and show that the probability of 0-to-1 faults is much higher than of 1-to-0 faults. We then show that in fabrics with unequal fault probabilities, using biased voting schemes, as opposed to conventional majority voting, could provide better yield. However, due to the high defect rates, voting will need to be combined with more fine-grained structural redundancy for acceptable yield. This entails degradation in performance (operating frequency) due to an increase in circuit fan-in and fan-out. We, therefore, introduce a new class of redundancy schemes called FastTrack that combine nonuniform structural redundancy with uniquely biased nanoscale voters to achieve greater yield without a commensurate loss in performance. A variety of such techniques are employed on a wire streaming processor (WISP-0) implemented on the NASIC fabric. We show that FastTrack schemes can provide 23% higher effective yield than conventional redundancy schemes even at 10% defect rates along with 79% lesser performance degradation.
- Published
- 2012
- Full Text
- View/download PDF
32. Energy-Efficient Hardware Data Prefetching
- Author
-
Saurabh Chheda, Csaba Andras Moritz, M. Bennaser, Yao Guo, and Pritish Narayanan
- Subjects
Instruction prefetch ,Random access memory ,Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Energy consumption ,CAS latency ,Memory management ,Hardware and Architecture ,Embedded system ,Overhead (computing) ,Electrical and Electronic Engineering ,business ,Software ,Energy (signal processing) ,Computer hardware ,Efficient energy use - Abstract
Extensive research has been done in prefetching techniques that hide memory latency in microprocessors leading to performance improvements. However, the energy aspect of prefetching is relatively unknown. While aggressive prefetching techniques often help to improve performance, they increase energy consumption by as much as 30% in the memory system. This paper provides a detailed evaluation on the energy impact of hardware data prefetching and then presents a set of new energy-aware techniques to overcome prefetching energy overhead of such schemes. These include compiler-assisted and hardware-based energy-aware techniques and a new power-aware prefetch engine that can reduce hardware prefetching related energy consumption by 7-11 ×. Combined with the effect of leakage energy reduction due to performance improvement, the total energy consumption for the memory system after the application of these techniques can be up to 12% less than the baseline with no prefetching.
- Published
- 2011
- Full Text
- View/download PDF
33. Programmable cellular architectures at the nanoscale
- Author
-
Pritish Narayanan, Csaba Andras Moritz, and Teng Wang
- Subjects
Nanofabrics ,SIMPLE (military communications protocol) ,Computer Networks and Communications ,Computer science ,business.industry ,Applied Mathematics ,Nanowire ,Context (language use) ,CMOS ,Cellular neural network ,Hardware_INTEGRATEDCIRCUITS ,Key (cryptography) ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Nanodevice ,Computer hardware - Abstract
This paper presents the first fully programmable digital cellular design for nanodevice-based computational fabrics. The system has a fully regular structure and consists of a large number of simple functional units called cells. It is programmable, based on a small number of global signals routed from supporting CMOS and associated nanoscale circuitry. The architecture may be adapted to suit a multitude of information-processing paradigms. One example is shown on a two-dimensional (2D) semiconductor nanowire fabric including corresponding circuit-level aspects. Key metrics such as the density and performance are evaluated. It is seen that this digital cellular design may be up to 22 times denser than an equivalent projected 16 nm CMOS version for image-processing applications. High performance is achieved, with megapixel-size images estimated to require only a few microseconds for processing. Possible manufacturing routes and defect tolerance aspects in the context of image-processing applications are also discussed.
- Published
- 2010
- Full Text
- View/download PDF
34. Data Memory Subsystem Resilient to Process Variations
- Author
-
Csaba Andras Moritz, Yao Guo, and M. Bennaser
- Subjects
Computer science ,CPU cache ,Cache coloring ,Cache-only memory architecture ,Pipeline burst cache ,Parallel computing ,Cache pollution ,Hardware and Architecture ,Cache invalidation ,Bus sniffing ,Memory architecture ,Cache ,Electrical and Electronic Engineering ,Cache algorithms ,Software - Abstract
As technology scales, more sophisticated fabrication processes cause variations in many different parameters in the device. These variations could severely affect the performance of processors by making the latency of circuits less predictable and thus requiring conservative design approaches. In this paper, we use Monte Carlo simulations in addition to worst-case circuit analysis to establish the overall delay due to process variations in a data cache sub-system under both typical and worst-case conditions. The distribution of the cache critical-path-delay in the typical scenario was determined by performing Monte Carlo simulations at different supply voltages, threshold voltages, and transistor lengths on a complete cache design. In addition to establishing the delay variation, we present an adaptive variable-cycle-latency cache architecture that mitigates the impact of process variations on access latency by closely following the typical latency behavior rather than assuming a conservative worst-case design-point. Simulation results show that our adaptive data cache can achieve a 9% to 31% performance improvement in a superscalar processor, on the SPEC2000 applications studied, compared to a conventional design. The area overhead for the additional circuits of the adaptive technique has less than 1% of the total cache area. Additional performance improvement potential exists in processors where the data cache access is on the critical path, by allowing a more aggressive clock rate.
- Published
- 2008
- Full Text
- View/download PDF
35. Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization
- Author
-
Raksit Ashok, Csaba Andras Moritz, Yao Guo, Richard Weiss, and Vladimir Vlassov
- Subjects
Computer Networks and Communications ,CPU cache ,business.industry ,Computer science ,Distributed computing ,Multiprocessing ,Bottleneck ,Synchronization ,Theoretical Computer Science ,Shared memory ,Artificial Intelligence ,Hardware and Architecture ,Embedded system ,Synchronization (computer science) ,Data synchronization ,business ,Software ,Cache coherence - Abstract
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed. In this paper, we propose a novel approach called synchronization coherence that can provide transparent fine-grained synchronization and caching in a multiprocessor machine and single-chip multiprocessor. Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols. It reduces network utilization as well as synchronization related processing overheads while adding minimal hardware complexity as compared to cache coherence mechanisms or previously reported fine-grained synchronization techniques. In addition to its benefit of making synchronization transparent to processor nodes, for the applications studied, it provides up to 23% improvement in performance and up to 24% improvement in energy efficiency with no L2 caches compared to previous fine-grained synchronization techniques. The performance improvement increases up to 38% when simulating with an ideal L2 cache system.
- Published
- 2008
- Full Text
- View/download PDF
36. Fault-Tolerant Nanoscale Processors on Semiconductor Nanowire Grids
- Author
-
Csaba Andras Moritz, Pritish Narayanan, Yao Guo, Michael Leuchtenburg, Catherine Dezan, Teng Wang, M. Bennaser, Lab-STICC_UBO_CACS_MOCS, Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques (IBNM), Université de Brest (UBO)-Université européenne de Bretagne - European University of Brittany (UEB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques (IBNM), and Université de Brest (UBO)-Université européenne de Bretagne - European University of Brittany (UEB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
010302 applied physics ,Nanofabrics ,Computer science ,Fault tolerance ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Integrated circuit design ,021001 nanoscience & nanotechnology ,01 natural sciences ,CMOS ,Application-specific integrated circuit ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Redundancy (engineering) ,Granularity ,Electrical and Electronic Engineering ,0210 nano-technology ,ComputingMilieux_MISCELLANEOUS ,Electronic circuit - Abstract
Nanoscale processor designs pose new challenges not encountered in the world of conventional CMOS designs and manufacturing. Nanoscale devices based on crossed semiconductor nanowires (NWs) have promising characteristics in addition to providing great density advantage over conventional CMOS devices. This density advantage could, however, be easily lost when assembled into nanoscale systems and especially after techniques dealing with high defect rates and manufacturing related layout/doping constraints are incorporated. Most conventional defect/fault-tolerance techniques are not suitable in nanoscale designs because they are designed for very small defect rates and assume arbitrary layouts for required circuits. Reconfigurable approaches face fundamental challenges including a complex interface between the micro and nano components required for programming. In this paper, we present our work on adding fault-tolerance to all components of a processor implemented on a 2-D semiconductor NW fabric called nanoscale application specific integrated circuits (NASICs). We combine and explore structural redundancy, built-in nanoscale error correcting circuitry, and system-level redundancy techniques and adapt the techniques to the NASIC fabric. Faulty signals caused by defects and other error sources are masked on-the-fly at various levels of granularity. Faults can be masked at up to 15% rates, while maintaining a 7 density advantage compared to an equivalent CMOS processor at projected 18-nm technology. Detailed analysis of yield, density, and area tradeoffs is provided for different error sources and fault distributions.
- Published
- 2007
- Full Text
- View/download PDF
37. Fine-grained 3-D integrated circuit fabric using vertical nanowires
- Author
-
Csaba Andras Moritz, Mostafizur Rahman, Jiajun Shi, Santosh Khasanvis, and Mingyu Li
- Subjects
Interconnection ,Engineering ,business.industry ,Electrical engineering ,Nanowire ,Integrated circuit ,Bottleneck ,law.invention ,Power (physics) ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Parasitic extraction ,business ,Scaling - Abstract
Continuous scaling of CMOS to sub-20nm technologies is proving to be challenging as MOSFETs are reaching fundamental limits and interconnection bottleneck is dominating IC power and performance. Migrating to fine-grained 3-D, to advance scaling, has been elusive due to incompatibility of CMOS in 3-D. We propose a new 3-D IC fabric, called Skybridge that addresses device, circuit, connectivity, heat management and manufacturing requirements in integrated 3D compatible manner. Our bottom-up evaluations accounting for material structures, manufacturing process, device, and circuit parasitics, reveal 60.5x density, and 16.5x performance/Watts benefits compared to CMOS for a 16-bit CLA. Experimental demonstration of the core device concept and key manufacturing steps mitigate technology risks.
- Published
- 2015
- Full Text
- View/download PDF
38. Architecting connectivity for fine-grained 3-D vertically integrated circuits
- Author
-
Csaba Andras Moritz, Jiajun Shi, Mostafizur Rahman, Santosh Khasanvis, and Mingyu Li
- Subjects
Repeater ,Engineering ,business.industry ,Interconnect bottleneck ,Integrated circuit ,Bottleneck ,law.invention ,CMOS ,law ,Logic gate ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Node (circuits) ,business ,Repeater insertion - Abstract
Conventional CMOS technology is reaching fundamental scaling limits, and interconnection bottleneck is dominating IC power and performance. Migrating to 3-D integrated circuits, though promising, has eluded us due to inherent customization and manufacturing requirements in CMOS that are incompatible with 3-D organization. Skybridge, a fine-grained 3-D IC fabric technology was recently proposed towards this aim, which offers a paradigm shift in technology scaling and design. In this paper we present specifically architected core Skybridge structures to enable fine-grained connectivity in 3-D intrinsically. We develop predictive models for interconnect length distribution for Skybridge, and use them to quantify the benefits in terms of expected reduction in interconnect lengths and repeater counts when compared to 2-D CMOS in 16nm node. Our estimation indicates up to 10x reduction in longest global interconnect length vs. 16nm 2-D CMOS, and up to 2 orders of magnitude reduction in the number of repeaters for a design consisting of 10 million logic gates. These results show great promise in alleviating interconnect bottleneck due to a higher degree of connectivity in 3-D, leading to shorter global interconnects and reduced power and area overhead due to repeater insertion.
- Published
- 2015
- Full Text
- View/download PDF
39. Architecting 3-D integrated circuit fabric with intrinsic thermal management features
- Author
-
Mingyu Li, Santosh Khasanvis, Csaba Andras Moritz, Mostafizur Rahman, and Jiajun Shi
- Subjects
Computer science ,business.industry ,Circuit design ,Stacking ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit ,Die (integrated circuit) ,law.invention ,CMOS ,law ,Thermal ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Key (cryptography) ,Layer (object-oriented design) ,business - Abstract
Migration to 3-D provides a possible pathway for future Integrated Circuits (ICs) beyond 2-D CMOS, which is at the brink of its own fundamental limits. Partial attempts so far for 3-D integration using die to die and layer to layer stacking do not represent true progression, and suffer from their own challenges with lack of intrinsic thermal management being among the major ones. Our proposal for 3-D IC, Skybridge, is a truly fine-grained vertical nanowire based fabric that solves technology scaling challenges, and at the same time achieves orders of magnitude benefits over 2-D CMOS. Key to Skybridge's 3-D integration is the fabric centric mindset, where device, circuit, connectivity, thermal management and manufacturing issues are co-addressed in a 3-D compatible manner. In this paper we present architected fine-grained 3-D thermal management features that are intrinsic components of the fabric and part of circuit design; a key difference with respect to die-die and layer-layer stacking approaches where thermal management considerations are coarse-grained at system level. Our bottom-up evaluation methodology, with simulations at both device and circuit level, shows that in the best case Skybridge's thermal extraction features are very effective in thermal management, reducing temperature of a heated region by up to 92%.
- Published
- 2015
- Full Text
- View/download PDF
40. Manufacturing pathway and experimental demonstration for nanoscale fine-grained 3-D integrated circuit fabric
- Author
-
Csaba Andras Moritz, Jiajun Shi, Santosh Khasanvis, Mingyu Li, and Mostafizur Rahman
- Subjects
FOS: Computer and information sciences ,Interconnection ,Process (engineering) ,Computer science ,Other Computer Science (cs.OH) ,Overhead (engineering) ,Nanowire ,Integrated circuit ,law.invention ,CMOS ,Computer Science - Other Computer Science ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Scaling ,Lithography - Abstract
At sub-20nm technologies CMOS scaling faces severe challenges primarily due to fundamental device scaling limitations, interconnection overhead and complex manufacturing. Migration to 3D has been long sought as a possible pathway to continue scaling, however, intrinsic requirements of CMOS are not compatible for fine-grained 3D integration. We proposed a truly fine-grained 3D integrated circuit fabric called Skybridge that solves nanoscale challenges and achieves orders of magnitude benefits over CMOS. In Skybridge, device, circuit, connectivity, thermal management and manufacturing issues are addressed in an integrated 3D compatible manner. At the core of Skybridge assembly are uniform vertical nanowires, which are functionalized with architected features for fabric integration. All active components are created primarily using sequential material deposition steps on these nanowires. Lithography and doping are performed prior to any functionalization and their precision requirements are significantly reduced. This paper introduces Skybridge manufacturing pathway that is developed based on extensive process, device simulations and experimental metrology, and uses established processes. Experimental demonstrations of key process steps are also shown.
- Published
- 2015
- Full Text
- View/download PDF
41. Physically equivalent magneto-electric nanoarchitecture for probabilistic reasoning
- Author
-
Mohammad Salehi-Fashami, Santosh Khasanvis, Mostafizur Rahman, Supriyo Bandyopadhyay, Ayan K. Biswas, Jayasimha Atulasimha, Mingyu Li, and Csaba Andras Moritz
- Subjects
Speedup ,Theoretical computer science ,Computer engineering ,Computer science ,Causal inference ,Computation ,Probabilistic logic ,Bayesian network ,Inference ,Mixed-signal integrated circuit ,Abstraction layer - Abstract
Probabilistic machine intelligence paradigms such as Bayesian Networks (BNs) are widely used in critical real-world applications. However they cannot be employed efficiently for large problems on conventional computing systems due to inefficiencies resulting from layers of abstraction and separation of logic and memory. We present an unconventional nanoscale magneto-electric machine paradigm, architected with the principle of physical equivalence to efficiently implement causal inference in BNs. It leverages emerging straintronic magneto-tunneling junctions in a novel mixed-signal circuit framework for direct computations on probabilities, while blurring the boundary between memory and computation. Initial evaluations, based on extensive bottom-up simulations, indicate up to four orders of magnitude inference runtime speedup vs. best-case performance of 100-core microprocessors, for BNs with a million random variables. These could be the target applications for emerging magneto-electric devices to enable capabilities for leapfrogging beyond present day computing.
- Published
- 2015
- Full Text
- View/download PDF
42. Architecting NP-Dynamic Skybridge
- Author
-
Mingyu Li, Santosh Khasanvis, Jiajun Shi, Csaba Andras Moritz, and Mostafizur Rahman
- Subjects
Engineering ,Interconnection ,business.industry ,Three-dimensional integrated circuit ,Vertical integration ,Design for manufacturability ,law.invention ,Microprocessor ,Logic style ,CMOS ,law ,Embedded system ,Hardware_INTEGRATEDCIRCUITS ,Multiplier (economics) ,business - Abstract
This paper introduces a new fine-grained 3D IC fabric technology called NP-Dynamic Skybridge. Skybridge is a family of 3D IC technologies that provides fine-grained vertical integration. In comparison to the original 3D Skybridge, the NP-Dynamic approach enables a more comprehensive logic style for improved efficiency. It addresses device, circuit, connectivity and manufacturability requirements with an integrated 3D mindset. The NP-Dynamic 3D circuit style enables wide range of logic expressions, simple clocking scheme, and reduces buffer requirements. Architected interconnect framework in 3D provides a high degree of connectivity. Bottom-up evaluations for 16-nm NP-Dynamic Skybridge, considering material properties, nanoscale transport, 3D circuit style, 3D placement and layout reveal up to 50x density and 25x power benefits for 4-bit CLA in comparison to 16-nm CMOS at comparable performance. For 4-bit multiplier, NP-Dynamic Skybridge shows up to 90x density benefit and 8x lower power vs. CMOS.
- Published
- 2015
- Full Text
- View/download PDF
43. Self-similar Magneto-electric Nanocircuit Technology for Probabilistic Inference Engines
- Author
-
Jayasimha Atulasimha, Ayan K. Biswas, Csaba Andras Moritz, Mingyu Li, Mohammad Salehi-Fashami, Santosh Khasanvis, Mostafizur Rahman, and Supriyo Bandyopadhyay
- Subjects
FOS: Computer and information sciences ,Computer science ,Bayesian probability ,Cognitive computing ,Probabilistic logic ,Computer Science - Emerging Technologies ,AC power ,Rotation formalisms in three dimensions ,Computer Science Applications ,Abstraction layer ,Emerging Technologies (cs.ET) ,CMOS ,Computer engineering ,Graphical model ,Electrical and Electronic Engineering - Abstract
Probabilistic graphical models are powerful mathematical formalisms for machine learning and reasoning under uncertainty that are widely used for cognitive computing. However, they cannot be employed efficiently for large problems (with variables in the order of 100K or larger) on conventional systems, due to inefficiencies resulting from layers of abstraction and separation of logic and memory in CMOS implementations. In this paper, we present a magnetoelectric probabilistic technology framework for implementing probabilistic reasoning functions. The technology leverages straintronic magneto-tunneling junction (S-MTJ) devices in a novel mixed-signal circuit framework for direct computations on probabilities while enabling in-memory computations with persistence. Initial evaluations of the Bayesian likelihood estimation operation occurring during Bayesian Network inference indicate up to 127× lower area, 214× lower active power, and 70× lower latency compared to an equivalent 45-nm CMOS Boolean implementation.
- Published
- 2015
44. Coupling compiler-enabled and conventional memory accessing for energy efficiency
- Author
-
Csaba Andras Moritz, Raksit Ashok, and Saurabh Chheda
- Subjects
General Computer Science ,Cache coloring ,CPU cache ,business.industry ,Computer science ,Cache-only memory architecture ,Uniform memory access ,Parallel computing ,Cache pollution ,Memory map ,Non-uniform memory access ,Embedded system ,Page cache ,business - Abstract
This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.
- Published
- 2004
- Full Text
- View/download PDF
45. Cool-Cache
- Author
-
C. Mani Krishna, Csaba Andras Moritz, Osman Unsal, Raksit Ashok, and Israel Koren
- Subjects
Multimedia ,Computer science ,Cache pollution ,computer.software_genre ,Smart Cache ,Hardware and Architecture ,Cache invalidation ,Leverage (statistics) ,Cache ,Compiler ,computer ,Cache algorithms ,Software ,Efficient energy use - Abstract
The unique characteristics of multimedia/embedded applications dictate media-sensitive architectural and compiler approaches to reduce the power consumption of the data cache. Our goal is exploring energy savings for embedded/multimedia workloads without sacrificing performance. Here, we present two complementary media-sensitive energy-saving techniques that leverage static information. While our first technique is applicable to existing architectures, in our second technique we adopt a more radical approach and propose a new tagless caching architecture by reevaluating the architecture--compiler interface.Our experiments show that substantial energy savings are possible in the data cache. Across a wide range of cache and architectural configurations, we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application.
- Published
- 2003
- Full Text
- View/download PDF
46. Wave-based multi-valued computation framework
- Author
-
Csaba Andras Moritz, Sankara Narayanan Rajapandian, Mostafizur Rahman, and Santosh Khasanvis
- Subjects
Computer Science::Hardware Architecture ,Superposition principle ,Adder ,Computer science ,law ,Computation ,Circuit design ,Electronic engineering ,Arithmetic function ,Integrated circuit ,External Data Representation ,Interference (wave propagation) ,law.invention - Abstract
We present a novel multi-valued computation framework called Wave Interference Functions (WIF), based on emerging non-equilibrium wave phenomenon such as spin waves. WIF offers new features for data representation and computation, which can be game changing for post-CMOS integrated circuits (ICs). Information encoding wave attributes inherently leads to multi-dimensional multi-valued data representation and communication. Multi-valued computation is natively supported with wave interactions, such as wave superposition or interference. We introduce the concept of a multi-valued Interference Function that is more sophisticated than conventional Boolean and Majority functions, leading to compact circuits for logic. We present WIF implementation of multi-valued operators to realize any desired logic/arithmetic function using the Interference Function. We evaluate 2-digit to 16-digit quaternary (radix-4) full adder designs with WIF operators in terms of power, performance and area. Estimates indicate up to 63× higher density, 884× lower power and 3× better performance when compared to equivalent 45nm CMOS adders. WIF features completely change conventional assumptions on circuit design, opening new avenues to implement future nanoscale ICs for general purpose processing and other applications inherently suited to multi-valued computation.
- Published
- 2014
- Full Text
- View/download PDF
47. Nanowire Volatile RAM as an Alternative to SRAM
- Author
-
Csaba Andras Moritz, Mostafizur Rahman, and Santosh Khasanvis
- Subjects
FOS: Computer and information sciences ,Engineering ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer Science - Emerging Technologies ,Benchmarking ,Hardware_PERFORMANCEANDRELIABILITY ,AC power ,Power (physics) ,Design for manufacturability ,Reduction (complexity) ,Emerging Technologies (cs.ET) ,Hardware and Architecture ,Electronic engineering ,Node (circuits) ,Static random-access memory ,Electrical and Electronic Engineering ,business ,Software ,Volatile memory - Abstract
Maintaining benefits of CMOS technology scaling is becoming challenging, primarily due to increased manufacturing complexities and unwanted passive power dissipations. This is particularly challenging in SRAM, where manufacturing precision and leakage power control are critical issues. To alleviate these challenges, we proposed a novel volatile memory alternative to SRAM called nanowire volatile RAM (NWRAM). Due to NWRAM's regular grid-based layout and innovative circuit style, manufacturing complexities are reduced and, at the same time, considerable benefits are attained in terms of performance and leakage power reduction. In this article we elaborate NWRAM's circuit aspects and manufacturability, and quantify benefits at 16nm technology node through simulation against state-of-the-art 6T-SRAM and gridded 8T-SRAM designs. Our results show that when lower bounds in design rules are considered, 10T-NWRAM's read and write time are 1.38x and 2x faster, and the leakage power is 14x better in comparison to high-performance 6T-SRAM. Similarly the 10T-NWRAM achieves 1.3x and 1.9x read and write performance, and 35x leakage power improvements compared to high-performance 8T-SRAM. 10T-NWRAM's density is comparable to 6T-SRAM and 8T-SRAM for lower bounds, but exhibits higher active power in similar comparisons. This article details all benchmarking results and provides thorough analysis of NWRAM's evaluations.
- Published
- 2014
48. LoGPG: Modeling network contention in message-passing programs
- Author
-
Matthew I. Frank and Csaba Andras Moritz
- Subjects
Computer science ,business.industry ,Distributed computing ,Locality ,Message passing ,Multiprocessing ,Network interface ,Execution time ,Computational Theory and Mathematics ,Hardware and Architecture ,Asynchronous communication ,Factor (programming language) ,Signal Processing ,business ,computer ,Computer network ,computer.programming_language - Abstract
In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP and LogGP models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.
- Published
- 2001
- Full Text
- View/download PDF
49. Performance Modeling and Evaluation of MPI
- Author
-
Csaba Andras Moritz and Khalid Al-Tawil
- Subjects
ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,SIMPLE (military communications protocol) ,Workstation ,Computer Networks and Communications ,Computer science ,Distributed computing ,Message passing ,Message Passing Interface ,Parallel computing ,Theoretical Computer Science ,law.invention ,Parallel processing (DSP implementation) ,Artificial Intelligence ,Hardware and Architecture ,law ,Software - Abstract
Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). We develop a simple set of communication benchmarks to extract the LogGP parameters. Our objective in this is to compare the performance of MPI communication on several platforms and to identify a performance model suitable for MPI performance characterization. In particular, two problems are addressed: how LogGP quantifies MPI performance and what extra features are required for modeling MPI, and how MPI performance compare on the three computing platforms: Cray Research T3D, Convex Exemplar 1600SP, and workstations clusters.
- Published
- 2001
- Full Text
- View/download PDF
50. Introduction to Special Section on Cognitive and Natural Computing With Nanotechnology
- Author
-
Kang L. Wang and Csaba Andras Moritz
- Subjects
Nanofabrics ,Engineering ,Natural computing ,business.industry ,Cognitive computing ,Control reconfiguration ,Nanotechnology ,Cognition ,Computer Science Applications ,Automaton ,symbols.namesake ,Software ,symbols ,Electrical and Electronic Engineering ,business ,Von Neumann architecture - Abstract
A key opportunity of the 21st Century is to merge progress across various science and engineering fields to materialize on the promise of artificial cognition. These interdisciplinary research directions towards cognitive systems may be influenced by the models of the brain and/or rely on a new kind of physical implementation of a mathematical model of cognitive systems. This particular special issue on "Cognitive and Natural Computing with Nanotechnology" was directed to the broad nanomaterials, nanodevice, nanofabrics, nanocircuit, and nanoarchitecture research communities working specifically on novel nanotechnology-enabled directions. The special issue selected papers on innovative ideas for solutions to the principal challenge of realizing architectures that can enable decision-making, intelligent information and sensorial processing, and autonomous learning and adaptation. These systems may employ a variety of fundamental principles and do not necessarily need to emulate the biological or natural automata. Rather, their key distinguishing aspect is that their plasticity, reconfiguration, and functional underpinnings are achieved without involving software. In particular, such systems could 1) introduce new architectural concepts enabled by nanoscale capabilities, resembling the neocortex and natural systems; 2) leverage new materials and nanodevices and their interactions to achieve core cognitive functions; 3) design or build on novel nanofabrics enabling efficient implementation of cognitive computational approaches including achieving high degree of connectivity and collective functions. The ten papers ultimately selected articulate a vision for a cognitive computing direction beyond von Neumann microprocessors and/or present a technology component that contributes to this vision. Nine of these papers appear in this issue and one paper was included in an earlier IEEE Transactions on Nanotechnology (TNANO) issue this year. The nine papers are briefly summarized.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.