134 results on '"Francky Catthoor"'
Search Results
2. CFET SRAM DTCO, Interconnect Guideline, and Benchmark for CMOS Scaling
- Author
-
Hsiao-Hsuan Liu, Shairfe M. Salahuddin, Boon Teik Chan, Pieter Schuddinck, Yang Xiang, Geert Hellings, Pieter Weckx, Julien Ryckaert, and Francky Catthoor
- Subjects
Electrical and Electronic Engineering ,Electronic, Optical and Magnetic Materials - Published
- 2023
3. Power, Performance, Area, and Cost Analysis of Face-to-Face-Bonded 3-D ICs
- Author
-
Anthony Agnesina, Moritz Brunion, Jinwoo Kim, Alberto Garcia-Ortiz, Dragomir Milojevic, Francky Catthoor, Gioele Mirabelli, Manu Komalan, and Sung Kyu Lim
- Subjects
Electrical and Electronic Engineering ,Industrial and Manufacturing Engineering ,Electronic, Optical and Magnetic Materials - Published
- 2023
4. Graphene-Based Interconnect Exploration for Large SRAM Caches for Ultrascaled Technology Nodes
- Author
-
Zhenlin Pei, Mahta Mayahinia, Hsiao-Hsuan Liu, Mehdi Tahoori, Francky Catthoor, Zsolt Tokei, and Chenyun Pan
- Subjects
Electrical and Electronic Engineering ,Electronic, Optical and Magnetic Materials - Published
- 2023
5. 3D SRAM Macro Design in 3D Nanofabric Process Technology
- Author
-
Dawit Burusie Abdi, Shairfe M. Salahuddin, Juergen Boemmels, Edouard Giacomin, Pieter Weckx, Julien Ryckaert, Geert Hellings, and Francky Catthoor
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering - Published
- 2023
6. Time-Dependent Electromigration Modeling for Workload-Aware Design-Space Exploration in STT-MRAM
- Author
-
Mahta Mayahinia, Mehdi Tahoori, Manu Perumkunnil Komalan, Houman Zahedmanesh, Kristof Croes, Tommaso Marinelli, Jose Ignacio Gomez Perez, Timon Evenblij, Gouri Sankar Kar, and Francky Catthoor
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
7. Thermal Performance Analysis of Mempool RISC-V Multicore SoC
- Author
-
Sankatali Venkateswarlu, Subrat Mishra, Herman Oprins, Bjorn Vermeersch, Moritz Brunion, Jun-Han Han, Mircea R. Stan, Pieter Weckx, and Francky Catthoor
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
8. Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning
- Author
-
Ali Safa, Jonah Van Assche, Mark Daniel Alea, Francky Catthoor, and Georges G.E. Gielen
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
9. Efficient Backside Power Delivery for High-Performance Computing Systems
- Author
-
Hesheng Lin, Geert van der Plas, Xiao Sun, Dimitrios Velenis, Francky Catthoor, Rudy Lauwereins, and Eric Beyne
- Subjects
Technology ,Science & Technology ,system integration ,Engineering, Electrical & Electronic ,Air-core inductor ,system optimization ,backside power delivery network (PDN) ,Engineering ,Hardware and Architecture ,INTEGRATED VOLTAGE REGULATOR ,Computer Science ,CORE ,Electrical and Electronic Engineering ,Computer Science, Hardware & Architecture ,buck converter ,Software - Abstract
ispartof: IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS vol:30 issue:11 pages:1748-1756 status: published
- Published
- 2022
10. Multitimescale Mitigation for Performance Variability Improvement in Time-Critical Systems
- Author
-
Ji-Yung Lin, Pieter Weckx, Subrat Mishra, Alessio Spessot, and Francky Catthoor
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
11. A Co-Simulation Methodology for the Design of Integrated Silicon Spin Qubits With Their Control/Readout Cryo-CMOS Electronics
- Author
-
Benjamin Gys, Rohith Acharya, Steven Van Winckel, Kristiaan De Greve, Georges Gielen, and Francky Catthoor
- Subjects
Electrical and Electronic Engineering - Published
- 2022
12. Plasmonic MIM and MSM Waveguide Couplers for Plasmonic Integrated Computing System
- Author
-
Samantha Lubaba Noor, Pol Van Dorpe, Dennis Lin, Francky Catthoor, and Azad Naeemi
- Subjects
Plasmons ,Technology ,DEVICES ,FABRICATION ,FINITE ,Physics, Applied ,Claddings ,MIM waveguide ,Engineering ,Energy per bit ,DESIGN ,power transmission ,waveguide coupling ,Electrical and Electronic Engineering ,SILICON ,Couplers ,plasmon detec- tor ,Science & Technology ,Physics ,Engineering, Electrical & Electronic ,Optics ,Detectors ,TRANSPORT ,Atomic and Molecular Physics, and Optics ,Photonics ,Physical Sciences ,Couplings ,Performance evaluation ,MODES - Abstract
ispartof: IEEE PHOTONICS JOURNAL vol:14 issue:4 status: published
- Published
- 2022
13. Dynamic Quantization Range Control for Analog-in-Memory Neural Networks Acceleration
- Author
-
Nathan Laubeuf, Jonas Doevenspeck, Ioannis A. Papistas, Michele Caselli, Stefan Cosemans, Peter Vrancx, Debjyoti Bhattacharjee, Arindam Mallik, Peter Debacker, Diederik Verkest, Francky Catthoor, and Rudy Lauwereins
- Subjects
Technology ,Science & Technology ,Computer Science ,quantization ,Electrical and Electronic Engineering ,Computer Science, Hardware & Architecture ,Computer Science, Software Engineering ,in-memory-computing ,Computer Graphics and Computer-Aided Design ,Neural networks ,COMPUTING SRAM MACRO ,Computer Science Applications - Abstract
Analog in Memory Computing (AiMC) based neural network acceleration is a promising solution to increase the energy efficiency of deep neural networks deployment. However, the quantization requirements of these analog systems are not compatible with state-of-the-art neural network quantization techniques. Indeed, while the quantization of the weights and activations is considered by modern deep neural network quantization techniques, AiMC accelerators also impose the quantization of each Matrix Vector Multiplication (MVM) result. In most demonstrated AiMC implementations, the quantization range of MVM results is considered a fixed parameter of the accelerator. This work demonstrates that dynamic control over this quantization range is possible but also desirable for analog neural networks acceleration. An AiMC compatible quantization flow coupled with a hardware aware quantization range driving technique is introduced to fully exploit these dynamic ranges. Using CIFAR-10 and ImageNet as benchmarks, the proposed solution results in networks that are both more accurate and more robust to the inherent vulnerability of analog circuits than fixed quantization range based approaches.
- Published
- 2022
14. AERO: Design Space Exploration Framework for Resource-Constrained CNN Mapping on Tile-Based Accelerators
- Author
-
Simei Yang, Debjyoti Bhattacharjee, Vinay B. Y. Kumar, Saikat Chatterjee, Sayandip De, Peter Debacker, Diederik Verkest, Arindam Mallik, and Francky Catthoor
- Subjects
Electrical and Electronic Engineering - Published
- 2022
15. 84%-Efficiency Fully Integrated Voltage Regulator for Computing Systems Enabled by 2.5-D High-Density MIM Capacitor
- Author
-
Hesheng Lin, Dimitrios Velenis, Philip Nolmans, Xiao Sun, Francky Catthoor, Rudy Lauwereins, Geert Van der Plas, and Eric Beyne
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
16. On the Use of Spiking Neural Networks for Ultralow-Power Radar Gesture Recognition
- Author
-
Georges Gielen, Andre Bourdoux, Ilja Ocket, Francky Catthoor, and Ali Safa
- Subjects
Spiking neural network ,law ,Gesture recognition ,Computer science ,Speech recognition ,Electrical and Electronic Engineering ,Radar ,Condensed Matter Physics ,law.invention ,Power (physics) - Published
- 2022
17. Beyond-Cu Intermediate-Length Interconnect Exploration for SRAM Application
- Author
-
Zhenlin Pei, Francky Catthoor, Zsolt Tokei, and Chenyun Pan
- Subjects
Integrated circuit interconnections ,GRAPHENE ,Technology ,energy-delay-area product ,delay ,Materials Science ,Resistance ,design ,Materials Science, Multidisciplinary ,Physics, Applied ,Engineering ,technology co-optimization ,MULTIGATE ,Nanoscience & Nanotechnology ,Electrical and Electronic Engineering ,Science & Technology ,mean free path ,Physics ,Engineering, Electrical & Electronic ,Random access memory ,Wires ,SRAM ,Computer Science Applications ,Quantum capacitance ,Physical Sciences ,Interconnect ,energy-delay product ,Performance evaluation ,Science & Technology - Other Topics - Abstract
ispartof: IEEE TRANSACTIONS ON NANOTECHNOLOGY vol:21 pages:367-373 status: published
- Published
- 2022
18. A Survey on Memory-centric Computer Architectures
- Author
-
Anteneh Gebregiorgis, Hoang Anh Du Nguyen, Jintao Yu, Rajendra Bishnoi, Mottaqiallah Taouil, Francky Catthoor, and Said Hamdioui
- Subjects
Technology ,Science & Technology ,RANDOM-ACCESS MEMORY ,CHIP ,Engineering, Electrical & Electronic ,BENCHMARK ,Computation-in-memory ,OPERATIONS ,Engineering ,classification ,DESIGN ,Hardware and Architecture ,MEMRISTOR ,Computer Science ,resistive computing ,computer architectures ,LOGIC ,IMPLEMENTATION ,Science & Technology - Other Topics ,Electrical and Electronic Engineering ,Nanoscience & Nanotechnology ,Computer Science, Hardware & Architecture ,Software ,EMBEDDED DRAM DEVELOPMENT - Abstract
Faster and cheaper computers have been constantly demanding technological and architectural improvements. However, current technology is suffering from three technology walls: leakage wall, reliability wall, and cost wall. Meanwhile, existing architecture performance is also saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of novel technologies and architectures have been introduced and developed intensively. Our previous work has presented a comprehensive classification and broad overview of memory-centric computer architectures. In this article, we aim to discuss the most important classes of memory-centric architectures thoroughly and evaluate their advantages and disadvantages. Moreover, for each class, the article provides a comprehensive survey on memory-centric architectures available in the literature.
- Published
- 2022
19. Breathing pattern estimation using wearable bioimpedance for assessing COPD severity
- Author
-
Dolores Blanco-Almazan, Willemijn Groenendaal, Lien Lijnen, Rana Onder, Christophe Smeets, David Ruttens, Francky Catthoor, Raimon Jane, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Blanco-Almazan, Dolores/0000-0002-8532-5394, Jane, Raimon/0000-0002-6541-8729, Blanco-Almazan, Dolores, Groenendaal, Willemijn, Lijnen, Lien, ONDER, Rana, SMEETS, Christophe, RUTTENS, David, Catthoor, Francky, and Jane, Raimon
- Subjects
Technology ,Respiration - Measurement ,Health Informatics ,EXERCISE ,Bioengineering ,Impedància (Electricitat) ,PRESSURE ,PULMONARY ,chronic obstructive pulmonary disease ,ACTIVATION ,Health Information Management ,SIGNALS ,Proves funcionals respiratòries ,MUSCLES ,6MWT ,Bioenginyeria ,Impedance (Electricity) ,Chronic obstructive pulmonary diseases ,Electrical and Electronic Engineering ,Malalties pulmonars obstructives cròniques ,VOLUMES ,Breathing pattern ,Lungs -- Diseases ,Science & Technology ,Computer Science, Information Systems ,IMPEDANCE PNEUMOGRAPHY ,Bioimpedance ,Wearables ,Chronic obstructive pulmonary disease ,Enginyeria biomèdica [Àrees temàtiques de la UPC] ,Pulmons -- Malalties ,Respiratory function tests ,Computer Science Applications ,VARIABILITY ,wearables ,Respiració -- Mesurament ,Computer Science ,Computer Science, Interdisciplinary Applications ,Mathematical & Computational Biology ,breathing pattern ,BURDEN ,Life Sciences & Biomedicine ,Medical Informatics - Abstract
Breathing pattern has been shown to be different in chronic obstructive pulmonary disease (COPD) patients compared to healthy controls during rest and walking. In this study we evaluated respiratory parameters and the breathing variability of COPD patients as a function of their severity. Thoracic bioimpedance was acquired on 66 COPD patients during the performance of the six-minute walk test (6MWT), as well as 5 minutes before and after the test while the patients were seated, i.e. resting and recovery phases. The patients were classified by their level of airflow limitation into moderate and severe groups. We characterized the breathing patterns by evaluating common respiratory parameters using only wearable bioimpedance. Specifically, we computed the median and the coefficient of variation of the parameters during the three phases of the protocol, and evaluated the statistical differences between the two COPD severity groups. We observed significant differences between the COPD severity groups only during the sitting phases, whereas the behavior during the 6MWT was similar. Particularly, we observed an inverse relationship between breathing pattern variability and COPD severity, which may indicate that the most severely diseased patients had a more restricted breathing compared to the moderate patients. This work was supported in part by the Universities and Research Secretariat from the Ministry of Business and Knowledge/Generalitat de Catalunya under the Grant GRC 2017 SGR 01770, in part by the Agencia Estatal de Investigacion from the Spanish Ministry of Science ánd Innovation and the European Regional Development Fund, under the Grants RTI2018 098472-B-I00 and PID2021-126455OB-I00, and in part by the CERCA Programme/Generalitat de Catalunya.
- Published
- 2022
20. Workload-Aware Electromigration Analysis in Emerging Spintronic Memory Arrays
- Author
-
Timon Evenblij, Kristof Croes, Tommaso Marinelli, Sarath Mohanachandran Nair, Houman Zahedmanesh, Gouri Sankar Kar, Kevin Garello, Francky Catthoor, M. Perumkunnil, Mehdi B. Tahoori, and Mahta Mayahinia
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,Spintronics ,Computer science ,Spin-transfer torque ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,01 natural sciences ,Electromigration ,Electronic, Optical and Magnetic Materials ,Power (physics) ,Reliability (semiconductor) ,Electric power transmission ,0103 physical sciences ,Electronic engineering ,Static random-access memory ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,Current density - Abstract
Electromigration (EM) has emerged as a major reliability concern for interconnects in advanced technology nodes. Most of the existing EM analysis works focus on the power lines. There exists a limited amount of work which analyzes EM failures in the signal lines. However, various emerging spintronic-based memory technologies such as the Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) and the Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) have high current densities as compared to the conventional Static Random Access Memory (SRAM). These high current densities can lead to EM failures in the signal lines such as bit-line (BL) of these memories. Furthermore, these signal lines have workload-dependent stress as opposed to the conventional DC stress of power distribution networks. In this work, we model the EM failures in the BL of a typical STT memory array with realistic workloads. The analysis is based on physics-based EM model, which is calibrated based on industrial measurement data. The results show that the current densities in the STT arrays can be large enough to cause EM failures in the signal lines with running realistic workloads and that these failures are highly workload-dependent.
- Published
- 2021
21. A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow
- Author
-
Edouard Giacomin, Sumanth Gudaparthi, Juergen Boemmels, Rajeev Balasubramonian, Francky Catthoor, and Pierre-Emmanuel Gaillardon
- Subjects
Electrical and Electronic Engineering ,Computer Science Applications - Published
- 2021
22. Extended Methodology to Determine SRAM Write Margin in Resistance-Dominated Technology Node
- Author
-
Hsiao-Hsuan Liu, Shairfe M. Salahuddin, Dawit Abdi, Rongmei Chen, Pieter Weckx, Philippe Matagne, and Francky Catthoor
- Subjects
Electrical and Electronic Engineering ,Electronic, Optical and Magnetic Materials - Abstract
ispartof: IEEE TRANSACTIONS ON ELECTRON DEVICES vol:69 issue:6 pages:3113-3117 status: published
- Published
- 2022
23. J SWof 5.5 MA/cm2 and RA of 5.2-Ω · μm2 STT-MRAM Technology for LLC Application
- Author
-
Siddharth Rao, Sebastien Couet, M. Perumkunnil, Francky Catthoor, Gouri Sankar Kar, Arnaud Furnemont, Sushil Sakhare, D. Crotti, and Simon Van Beek
- Subjects
010302 applied physics ,Physics ,Magnetoresistive random-access memory ,Hardware_MEMORYSTRUCTURES ,business.industry ,Spice ,01 natural sciences ,Electronic, Optical and Magnetic Materials ,CMOS ,0103 physical sciences ,Optoelectronics ,Breakdown voltage ,Node (circuits) ,Cache ,Static random-access memory ,Electrical and Electronic Engineering ,business ,Energy (signal processing) - Abstract
Due to the complexity of device processing, the trade-off between yield and area has resulted in diminishing rate of scaling for the high-density static random access memory (SRAM) cell at advanced CMOS nodes. An introduction of extreme ultraviolet (EUV) and multipatterning has added additional cost to technology in order to realize 3-D device structure and ultrascaled metal routing. In this era, spin-transfer torque (STT)-MRAM technology can provide an alternative to high-density SRAM and for the last level cache (LLC) applications. In this article, we discuss the memory design and technology tradeoff to enable the STT-MRAM as a viable option. We have realized the technology over 300-mm wafer, measuring 1 million samples to build a SPICE model for circuit simulation. Occupying up to 83.3% of an area that of SRAM macro has been designed and simulated for the scaled 5-nm CMOS node. The simulated MRAM macro shows the best read and write access time of 3.1 and 6.2 ns, respectively. Magnetic tunneling junction (MTJ) pillar of 38-nm diameter is realized at 90-nm pitch, measuring resistance area (RA) of 5.2- $\Omega \cdot \mu \text{m}^{2}$ , ${J}_{sw}$ of 5.5 MA/cm2 with improved $\Delta $ avg of 70, and breakdown voltage of 0.99 V. The energy comparison shows increasing gains versus SRAM for the increasing cache sizes crossing over at of 0.3 and 4 MB for the single-cycle read and write operations, respectively.
- Published
- 2020
24. Capacitive Memory Window with Non-Destructive Read in Ferroelectric Capacitors
- Author
-
Shankha Mukherjee, Jasper Bizindavyi, Sergiu Clima, Mihaela I. Popovici, Xiaoyu Piao, Kostantine Katcko, Francky Catthoor, Shimeng Yu, Valeri V. Afanas’ev, and Jan Van Houdt
- Subjects
Electrical and Electronic Engineering ,Electronic, Optical and Magnetic Materials - Published
- 2023
25. Bitwidth-Optimized Energy-Efficient FFT Design via Scaling Information Propagation
- Author
-
Xinzhe Liu, Fupeng Chen, Yajun Ha, David Blinder, Peter Schelkens, Francky Catthoor, Dessislava Nikolova, Raees Kizhakkumkara Muhamad, Multidimensional signal processing and communication, Electronics and Informatics, and Faculty of Engineering
- Subjects
Very-large-scale integration ,business.industry ,Orthogonal frequency-division multiplexing ,Computer science ,Fast Fourier transform ,FFT design ,Bitwidth-optimized ,energy-efficiënt ,Computer Science Applications ,Computer engineering ,Control and Systems Engineering ,Modelling and Simulation ,Frequency domain ,Bit error rate ,Time domain ,Electrical and Electronic Engineering ,business ,scaling information propagation ,Digital signal processing ,Data compression - Abstract
The Fast Fourier Transform (FFT) is an efficient algorithm widely used in digital signal processing to transform between the time domain and the frequency domain. For fixed-point VLSI implementations, dynamic range growth inevitably occurs at each stage of the FFT operation. However, current methods either waste bitwidth or consume excessive resources when dealing with the dynamic range growth issue. To address this issue, we propose an efficient scaling method called Scaling Information Propagation (SIP) to alleviate the problem of dynamic range growth, which makes full use of bitwidth with much less extra area consumed than the state-of-the-art solutions. In two consecutive transform operations, the SIP method extracts scaling information and makes scaling decisions in the former transform, then executes those in the latter one. We implement the FFT's VLSI architecture in the orthogonal frequency division multiplexing (OFDM) and the holographic video compression (HVC) systems to verify the SIP method. Compared to the state-of-the-art, experimental results after VLSI synthesis show that our method achieves 9.38% energy reduction and 8.36% area savings when requiring 1.02 × 10-7 bit error ratio (BER) of the OFDM system, and 33.47% energy reduction and 30.98% area savings when requiring 20dB signal-to-noise ratio (SNR) of the HVC system, respectively.
- Published
- 2021
26. High-Performance Logic-on-Memory Monolithic 3-D IC Designs for Arm Cortex-A Processors
- Author
-
Xiaoqing Xu, Lennart Bamberg, Kyungwook Chang, Manu Komalan, Francky Catthoor, Alberto Garcia-Ortiz, Dragomir Milojevic, Sung Kyu Lim, Sai Pentapati, Brian Cline, Lingjun Zhu, and Saurabh Sinha
- Subjects
Technology ,Computer science ,Integrated circuit ,law.invention ,Engineering ,law ,Electrical and Electronic Engineering ,Physical design ,Computer Science, Hardware & Architecture ,Register-transfer level ,Science & Technology ,business.industry ,physical design ,Power integrity ,Engineering, Electrical & Electronic ,Monolithic 3-D (M3-D) ,ARM architecture ,power delivery network (PDN) ,Hardware and Architecture ,power integrity ,Logic gate ,Embedded system ,Computer Science ,Cache ,Routing (electronic design automation) ,business ,METHODOLOGY ,Software ,thermal analysis - Abstract
Monolithic 3-D IC (M3-D) is a promising solution to improve the performance and energy-efficiency of modern processors. But, designers are faced with challenges in design tools and methodologies, especially for power and thermal verifications. We developed a new physical design flow that optimally places and routes cache modules in one tier and logic gates in the other. Our tool also builds high-quality clock and power delivery networks targeting logic-on-memory M3-D designs. Finally, we developed a sign-off analysis tool flow to evaluate power, performance, area (PPA), thermal, and voltage-drop quality for given M3-D designs. Using our complete register transfer level (RTL)-to-Graphic Design System (GDS) tool flow, we designed commercial quality 2-D and M3-D implementation of Arm Cortex-A7 and Cortex-A53 processors in a commercial 28-nm technology. Experimental results show that our 3-D processors offer 20% (A7) and 21% (A53) performance gain, compared with their 2-D commercial counterparts. The voltage-drop degradation of our 3-D Cortex-A7 and Cortex-A53 processors is less than 3% of the supply voltage, while temperature increase is 10.71 °C and 13.04 °C, respectively.
- Published
- 2021
27. Failure probability of a FinFET-based SRAM cell utilizing the most probable failure point
- Author
-
Dimitrios Soudris, Michail Noltsis, Francky Catthoor, Eleni Maragkoudaki, and Dimitrios Rodopoulos
- Subjects
Technology ,Correctness ,Scale (ratio) ,Computer science ,Reliability (computer networking) ,Monte Carlo method ,02 engineering and technology ,01 natural sciences ,law.invention ,Most probable failure point (MPFP) ,Engineering ,law ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Electrical and Electronic Engineering ,Computer Science, Hardware & Architecture ,Monte Carlo ,010302 applied physics ,Science & Technology ,Transistor ,Process (computing) ,Engineering, Electrical & Electronic ,Reliability ,SRAM ,020202 computer hardware & architecture ,Reliability engineering ,Hardware and Architecture ,Computer Science ,Software ,Downscaling - Abstract
© 2018 Elsevier B.V. Application requirements along with the unceasing demand for ever-higher scale of device integration, has driven technology towards an aggressive downscaling of transistor dimensions. This development is confronted with variability challenges, mainly the growing susceptibility to time-zero and time-dependent variations. To model such threats and estimate their impact on a system's operation, the reliability community has focused largely on Monte Carlo-based simulations and methodologies. When assessing yield and failure probability metrics, an essential part of the process is to accurately capture the lower tail of a distribution. Nevertheless, the incapability of widely-used Monte Carlo techniques to achieve such a task has been identified and recently, state-of-the-art methodologies focusing on a Most Probable Failure Point (MPFP) approach have been presented. However, to strictly prove the correctness of such approaches and utilize them on large scale, an examination of the concavity of the space under study is essential. To this end, we develop an MPFP methodology to estimate the failure probability of a FinFET-based SRAM cell, studying the concavity of the Static Noise Margin (SNM) while comparing the results against a Monte Carlo methodology. ispartof: INTEGRATION-THE VLSI JOURNAL vol:69 pages:111-119 ispartof: location:GREECE, Thessaloniki status: published
- Published
- 2019
28. Experimental Validation of Process-Induced Variability Aware SPICE Simulation Platform for Sub-20 nm FinFET Technologies
- Author
-
F. M. Bufler, Neha Sharan, Doyoung Jang, Bertrand Parvais, Francky Catthoor, Udayan Ganguly, Amita Rawat, Thomas Chiarella, and Electronics and Informatics
- Subjects
010302 applied physics ,Discrete mathematics ,Physics ,Spice ,Autocorrelation ,Sigma ,Lambda ,01 natural sciences ,Subthreshold slope ,Square (algebra) ,Electronic, Optical and Magnetic Materials ,Distribution (mathematics) ,Logic gate ,0103 physical sciences ,Electrical and Electronic Engineering - Abstract
We propose an experimentally validated physics-based process-induced variability (PIV) aware SPICE simulation framework–enabling the estimation of performance variation due to line-edge-roughness (LER), metal-gate-granularity (MGG), random-dopant-fluctuation (RDF), and oxide-thickness-variation (OTV) at sub-20 nm technology node devices. The framework utilizes LER, RDF, OTV, and MGG defining parameters such as fin-edge correlation coefficient ( $\rho $ ), autocorrelation length ( $\Lambda $ ), grain-size (GS), $\sigma [\text {EOT}]$ , etc. as the inputs, and produces ${I}_{d} $ ,– ${V}_{g}$ distribution of ensemble size 250 as an output. We have validated the framework against 14 nm FinFET experimental data for ${I}_{d}$ ,– ${V}_{g}$ trends as well as for the threshold-voltage ( ${V}_{T}$ ), ON-current ( ${I}_{ \mathrm{\scriptscriptstyle ON}}$ ), and subthreshold slope (SS) distributions for a range of device dimensions with a reasonably good match. The worst and the best case ${R}$ square errors are 0.64 and 0.98, respectively, for the validation. The very nature of the proposed framework allows the designers to use it for a vast range of process technologies. Such models are of dual importance, as it enables a PIV aware prediction of circuit-level performance, and provides a platform to estimate PIV parameters efficiently, on-par with sophisticated structural characterization tools.
- Published
- 2021
29. Dynamic Reliability Management in Neuromorphic Computing
- Author
-
Adarsha Balaji, Jeffrey L. Krichmar, Anup Das, Nagarajan Kandasamy, Shihao Song, Nikil Dutt, Francky Catthoor, and Jui Hanamshet
- Subjects
Spiking neural network ,Dynamic reliability management ,FOS: Computer and information sciences ,Computer science ,020208 electrical & electronic engineering ,Computer Science - Neural and Evolutionary Computing ,02 engineering and technology ,020202 computer hardware & architecture ,Neuromorphic engineering ,Computer architecture ,Hardware and Architecture ,Hardware Architecture (cs.AR) ,0202 electrical engineering, electronic engineering, information engineering ,Neural and Evolutionary Computing (cs.NE) ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,Software - Abstract
Neuromorphic computing systems uses non-volatile memory (NVM) to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate NVMs cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor's parameters from their nominal values. Aggressive device scaling increases power density and temperature, which accelerates the aging, challenging the reliable operation of neuromorphic systems. Existing reliability-oriented techniques periodically de-stress all neuron and synapse circuits in the hardware at fixed intervals, assuming worst-case operating conditions, without actually tracking their aging at run time. To de-stress these circuits, normal operation must be interrupted, which introduces latency in spike generation and propagation, impacting the inter-spike interval and hence, performance, e.g., accuracy. We propose a new architectural technique to mitigate the aging-related reliability problems in neuromorphic systems, by designing an intelligent run-time manager (NCRTM), which dynamically destresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning workloads, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with state-of-the-art machine learning workloads on a neuromorphic hardware. Our results demonstrate that NCRTM significantly improves the reliability of neuromorphic hardware, with marginal impact on performance., Comment: Accepted in ACM JETC
- Published
- 2021
- Full Text
- View/download PDF
30. A Classification of Memory-Centric Computing
- Author
-
Jintao Yu, Muath Abu Lebdeh, Said Hamdioui, Francky Catthoor, Mottaqiallah Taouil, and Hoang Anh Du Nguyen
- Subjects
Technology ,RANDOM-ACCESS MEMORY ,PROCESSOR ,Computer science ,Emerging technologies ,Computation ,Reliability (computer networking) ,Computation-in-memory ,Field (computer science) ,GeneralLiterature_MISCELLANEOUS ,memory-centric computer architectures ,Terminology ,Engineering ,DESIGN ,MEMRISTOR ,LOGIC ,Electrical and Electronic Engineering ,Architecture ,Nanoscience & Nanotechnology ,Computer Science, Hardware & Architecture ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,EMBEDDED DRAM DEVELOPMENT ,ARCHITECTURE ,Science & Technology ,CHALLENGES ,CHIP ,Engineering, Electrical & Electronic ,CMOS ,Computer architecture ,Hardware and Architecture ,Computer Science ,resistive computing ,Parallelism (grammar) ,Science & Technology - Other Topics ,Software - Abstract
Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reliability wall, and cost wall. On top of that, a performance increase due to architectural improvements is also gradually saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of research is focusing on proposing and developing new technologies and architectures. In this article, we present a comprehensive classification of memory-centric computing architectures; it is based on three metrics: computation location, level of parallelism, and used memory technology. The classification not only provides an overview of existing architectures with their pros and cons but also unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored. Hence, it sets up a direction for future research in the field.
- Published
- 2020
31. Accurate Determination of Interlayer Resistivity of 2-D Layered Systems: Graphene Case Study
- Author
-
Azad Naeemi, Inge Asselberghs, Xiangyu Wu, Chenyun Pan, Zsolt Tokei, Francky Catthoor, and Ramy Nashed
- Subjects
Technology ,Materials science ,delay ,twisted graphene ,01 natural sciences ,law.invention ,Physics, Applied ,Engineering ,RAMAN-SPECTROSCOPY ,law ,Electrical resistivity and conductivity ,0103 physical sciences ,AB-stacked graphene ,Graphite ,Electrical and Electronic Engineering ,Twist ,FIELD ,010302 applied physics ,Science & Technology ,interconnect ,Phonon scattering ,Condensed matter physics ,Graphene ,Physics ,graphene ,Engineering, Electrical & Electronic ,Thermal conduction ,Electronic, Optical and Magnetic Materials ,interlayer resistivity ,Physical Sciences ,energy-delay product ,Bilayer graphene ,Order of magnitude - Abstract
In this article, we provide an accurate method to determine the interlayer resistivity of 2-D layered systems by directly measuring the resistance at a mono- to bi-layer step and feeding the measurement to a distributed resistance model. We take CVD-grown few-layer graphene (up to four layers) with different twist angles ranging from AB-stacked to totally decoupled graphene as an example. Our results show that the interlayer resistivity of AB-stacked CVD grown bilayer graphene (BLG) is in the range of 50–140 $\Omega $ .m, which is two to five orders of magnitude greater than the previously reported values for AB-stacked graphite. On the other hand, twisted BLG shows an interlayer resistivity as low as ${6}~\Omega $ .m and it decreases monotonically with increasing the twist angle, suggesting that interlayer conduction is not limited by phonon scattering, as previously reported. Furthermore, the total resistance of twisted BLG was found to be about one order of magnitude lower than its AB-stacked counterpart, which might lead to lower delay and energy-delay product in twisted graphene interconnects. In addition to that, the universality of our approach allows for accurate determination of interlayer resistivity of other 2-D layered systems such as metal dichalcogenides.
- Published
- 2020
32. Understanding Energy Efficiency Benefits of Carbon Nanotube Field-Effect Transistors for Digital VLSI
- Author
-
P. Schuddinck, Max M. Shulaker, Romain Ritzenthaler, Alessio Spessot, Dimitrios Rodopoulos, Chi-Shuen Lee, Praveen Raghavan, Aaron Thean, Peter Debacker, Luca Mattii, Francky Catthoor, Syed Muhammed Yasser Sherazi, Marie Garcia Bardon, D. Yakimets, Rogier Baert, Gage Hills, Subhasish Mitra, H.-S. Philip Wong, Doyoung Jang, Gerben Doornbos, and Iuliana Radu
- Subjects
Technology ,Materials science ,Materials Science ,Nanowire ,Materials Science, Multidisciplinary ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,01 natural sciences ,Physics, Applied ,law.invention ,VIRTUAL-SOURCE MODEL ,Engineering ,law ,0103 physical sciences ,carbon nanotube field-effect transistor (CNFET) ,LENGTH ,Hardware_INTEGRATEDCIRCUITS ,CONTACTS ,Parasitic extraction ,Nanoscience & Nanotechnology ,Electrical and Electronic Engineering ,010302 applied physics ,Very-large-scale integration ,Science & Technology ,Physics ,Carbon nanotube (CNT) ,Transistor ,Engineering, Electrical & Electronic ,021001 nanoscience & nanotechnology ,Chip ,Engineering physics ,Computer Science Applications ,energy-efficient digital very-large-scale integrated (VLSI) circuits ,CAPACITANCE ,FETS ,Logic gate ,Physical Sciences ,Science & Technology - Other Topics ,Field-effect transistor ,0210 nano-technology ,Efficient energy use - Abstract
© 2018 IEEE. Carbon Nanotube Field-Effect Transistors (CNFETs) are highly promising to improve the energy efficiency of digital logic circuits. Here, we quantify the Very-Large-Scale Integrated (VLSI) circuit-level energy efficiency of CNFETs versus advanced technology options (ATOs) currently under consideration [e.g., silicon-germanium (SiGe) channels and progressing from today's FinFETs to gate-all-around nanowires/nanosheets]. We use industry-practice physical designs of digital VLSI processor cores in future technology nodes with millions of transistors (including effects from parasitics and interconnect wires) and technology parameters extracted from experimental data. Our analysis shows that CNFETs are projected to offer 9× energy-delay product (EDP) benefit (∼3× faster while simultaneously consuming ∼3× less energy) compared to Si/SiGe FinFET. The ATOs provide
- Published
- 2018
33. Impact and mitigation of SRAM read path aging
- Author
-
Mottaqiallah Taouil, Wim Dehaene, Francky Catthoor, Stefan Cosemans, Daniel Kraak, Said Hamdioui, Innocent Agbo, and Pieter Weckx
- Subjects
Bit-line swing ,Technology ,Computer science ,02 engineering and technology ,01 natural sciences ,Physics, Applied ,Engineering ,Memory cell ,0103 physical sciences ,SRAM sense amplifier ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Static random-access memory ,Nanoscience & Nanotechnology ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,SD ,010302 applied physics ,Science & Technology ,Voltage swing ,Sense amplifier ,Physics ,Engineering, Electrical & Electronic ,BTI ,Swing ,Condensed Matter Physics ,Atomic and Molecular Physics, and Optics ,020202 computer hardware & architecture ,Surfaces, Coatings and Films ,Electronic, Optical and Magnetic Materials ,Physical Sciences ,RELIABILITY ,SIMULATION ,Path (graph theory) ,Science & Technology - Other Topics ,Voltage ,Degradation (telecommunications) - Abstract
© 2018 Elsevier Ltd This paper proposes an appropriate method to estimate and mitigate the impact of aging on the read path of a high performance SRAM design; it analyzes the impact of the memory cell, and sense amplifier (SA), and their interaction. The method considers different workloads, technology nodes, and inspects both the bit-line swing (BLS) (which reflect the degradation of the cell) and the sensing delay (SD) (which reflects the degradation of the sense amplifier); the voltage swing on the bit lines has a direct impact on the proper functionality of the sense amplifier. The results with respect to the quantification of the aging, show for the considered SRAM read-path design that the cell degradation is marginal as compared to the sense amplifier, while the SD degradation strongly depends on the workload, supply voltage, temperature, and technology nodes (up to 41% degradation). The mitigation schemes, one targeting the cell and one the sense amplifier, confirm the same and show that sense amplifier mitigation (up to 15.2% improvement) is more effective for the SRAM read path than cell mitigation (up to 11.4% improvement). ispartof: MICROELECTRONICS RELIABILITY vol:87 pages:158-167 status: published
- Published
- 2018
34. Runtime Slack Creation for Processor Performance Variability using System Scenarios
- Author
-
Michail Noltsis, Nikolaos Zompakis, Francky Catthoor, Dimitrios Rodopoulos, and Dimitrios Soudris
- Subjects
010302 applied physics ,Serviceability (computer) ,Computer science ,Reliability (computer networking) ,Linux kernel ,02 engineering and technology ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Computer Science Applications ,Reliability engineering ,Control theory ,Component (UML) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Dependability ,Electrical and Electronic Engineering ,Frequency scaling ,Efficient energy use - Abstract
Modern microprocessors contain a variety of mechanisms used to mitigate errors in the logic and memory, referred to as Reliability, Availability, and Serviceability (RAS) techniques. Many of these techniques, such as component disabling, come at a performance cost. With the aggressive downscaling of device dimensions, it is reasonable to expect that chip-wide error rates will intensify in the future and perhaps vary throughout system lifetime. As a result, it is important to reclaim the temporal RAS overheads in a systematic way and enable dependable performance. The current article presents a closed-loop control scheme that actuates processor’s frequency based on detected timing interference to ensure performance dependability. The concepts of slack and deadline vulnerability factor are introduced to support the formulation of a discrete time control problem. Default application timing is derived using the system scenario methodology, the applicability of which is demonstrated through simulations. Additionally, the proposed concept is demonstrated on a real platform and application: a Proportional-Integral-Differential controller, implemented within the application, actuates the Dynamic Voltage and Frequency Scaling (DVFS) framework of the Linux kernel to effectively reclaim temporal overheads injected at runtime. The current article discusses the responsiveness and energy efficiency of the proposed performance dependability scheme. Finally, additional formulation is introduced to predict the upper bound of timing interference that can be absorbed by actuating the DVFS of any processor and is also validated on a representative reduction to practice.
- Published
- 2017
35. Impact and Mitigation of Sense Amplifier Aging Degradation Using Realistic Workloads
- Author
-
Francky Catthoor, Daniel Kraak, Said Hamdioui, Innocent Agbo, Stefan Cosemans, Mottaqiallah Taouil, and Pieter Weckx
- Subjects
010302 applied physics ,Engineering ,Input offset voltage ,Sense amplifier ,business.industry ,Workload ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Process variation ,Hardware and Architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Static random-access memory ,Electrical and Electronic Engineering ,business ,Software ,Simulation ,Degradation (telecommunications) ,Voltage - Abstract
Designers typically add design margins to compensate for time-zero variability (due to process variation) and time-dependent (due to, e.g., bias temperature instability) variability. These variabilities become worse with scaling, which leads to larger design margin requirements. As an alternative, mitigation schemes can be applied to counteract the variability. This paper investigates the impact of aging on the offset voltage of the memory’s sense amplifier (SA). For the analysis, the degradation of the SAs in the L1 data and instruction caches of an ARM processor is quantified while using realistic workloads extracted from the SPEC CPU2006 Benchmark suite. Furthermore, the effect of our mitigation scheme, i.e., an online control circuit that balances the SA workload, is analyzed. The simulation results show that the mitigation scheme reduces the offset voltage degradation due to aging with up to 40% for the benchmarks, depending on the stress conditions (temperature, voltage, and workload).
- Published
- 2017
36. Will Chips of the Future Learn How to Feel Pain and Cure Themselves?
- Author
-
Guido Groeseneken and Francky Catthoor
- Subjects
010302 applied physics ,Engineering ,business.industry ,Design systems ,02 engineering and technology ,Transistor scaling ,Computer security ,computer.software_genre ,01 natural sciences ,020202 computer hardware & architecture ,Hardware and Architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,Electrical and Electronic Engineering ,business ,computer ,Know-how ,Software - Abstract
Extended transistor scaling has brought us a lot of benefits, but also a myriad of problems, including severe reliability issues [1] . To extend the scaling path as far as possible, system architects and technologists have to work together. They have to find solutions—e.g., at system level—to realize self-healing chips, chips that can detect or “feel” where errors occur and that know how to deal with them or in a way “cure” them. Only then will it be feasible to design systems in technologies with transistors scaled down to 5 nm dimensions.
- Published
- 2017
37. Integral Impact of BTI, PVT Variation, and Workload on SRAM Sense Amplifier
- Author
-
Praveen Raghavan, Innocent Agbo, Said Hamdioui, Pieter Weckx, Francky Catthoor, Halil Kukner, Mottaqiallah Taouil, and Daniel Kraak
- Subjects
010302 applied physics ,Engineering ,Negative-bias temperature instability ,business.industry ,Sense amplifier ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Reliability (semiconductor) ,Hardware and Architecture ,0103 physical sciences ,MOSFET ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Node (circuits) ,Static random-access memory ,Electrical and Electronic Engineering ,business ,Software ,Degradation (telecommunications) ,Voltage - Abstract
The CMOS technology scaling faced over the past recent decades severe variability and reliability challenges. One of the major reliability challenges is bias temperature instability (BTI). This paper analyzes the impact of BTI on the sensing delay of standard latch-type sense amplifier (SA), which is one of the critical components of high performance memories; the analysis is done by incorporating the impact of process, voltage, and temperature variations (in order to investigate the severity of the integral impact) and by considering different workloads and four technology nodes (i.e., 45, 32, 22, and 16 nm). The results show the importance of taking the SA degradation into consideration for robust memory design; the SA degradation depends on the application and technology node, and the sensing delay can increase with 184.58% for the worst case conditions at 16 nm. The results also show that the BTI impact for nominal conditions at 16 nm reaches a 12.10% delay increment. On top of that, when extrinsic conditions are considered, the degradation can reach up to 168.45% at 398 K for 16 nm.
- Published
- 2017
38. Parameterized dataflow scenarios
- Author
-
Sverre Hendseth, Francky Catthoor, Mladen Skelin, Marc Geilen, Electronic Systems, Cyber-Physical Systems Center Eindhoven, and CompSOC Lab- Predictable & Composable Embedded Systems
- Subjects
010302 applied physics ,worst-case performance ,Theoretical computer science ,Dataflow ,Computer science ,Parameterized complexity ,02 engineering and technology ,Parallel computing ,max-plus algebra ,parameterized dataflow scenarios ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,parameterized dataflow ,020202 computer hardware & architecture ,Automaton ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Finite-state machine (FSM) ,Concurrent computing ,Electrical and Electronic Engineering ,Software ,Dataflow architecture ,synchronous dataflow (SDF) - Abstract
A number of modeling approaches combining dataflow and finite-state machines (FSMs) have been proposed to capture applications that combine streaming data with finite control. FSM-based scenario-aware dataflow (FSM-SADF) is such an FSM/dataflow hybrid that occupies a sweet spot in the tradeoff between analyzability and expressiveness. However, the model suffers from compactness issues when the number of scenarios increases. This hampers its use in analysis of applications exposing high levels of data-dependent dynamics. In this paper, we address this problem by combining parameterized dataflow with finite control of FSM-SADF. We refer to the generalization as FSM-based parameterized SADF (FSM- $\pi$ SADF). We introduce the formal semantics of the model, in terms of max-plus algebra and in particular max-plus automata. Thereafter, by leveraging the existing results of FSM-SADF, we propose a worst-case performance analysis framework for FSM- $\pi$ SADF. We show that by using FSM- $\pi$ SADF and its analysis framework, one can, unlike with FSM-SADF, compactly capture streaming applications exhibiting high levels of data-dependent dynamics in presence of finite control. Furthermore, we show that for practical models our analysis typically yields tighter bounds on worst-case performance indicators such as throughput and latency than the existing techniques based on conservative FSM-SADF modeling (if such modeling can be applied at all). We evaluate our approach on a realistic case-study from the multimedia domain.
- Published
- 2017
39. A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors
- Author
-
Christakis Lezos, Konstantinos Masselos, Hans Cappelle, Karthick Parashar, Ioannis Latifis, Francky Catthoor, and Grigoris Dimitroulakos
- Subjects
Programming language ,Computer science ,Inline expansion ,020207 software engineering ,02 engineering and technology ,Parallel computing ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Dead code elimination ,020202 computer hardware & architecture ,Computer Science Applications ,Functional compiler ,Threaded code ,Compiler construction ,0202 electrical engineering, electronic engineering, information engineering ,Interprocedural optimization ,Compiler ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Electrical and Electronic Engineering ,computer ,Compiler correctness - Abstract
This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set Processors (ASIPs). Custom instructions are represented via specialized intrinsic functions in the generated code, and the generated code can be used as input to any C/C++ compiler supporting the target processor. Furthermore, the specialized instruction set of the target processor is described in a parameterized way using a target processor-independent architecture description approach, thus allowing the support of any processor. The compiler has been used for the generation of application code for two different ASIPs for several benchmarks. The code generated by the compiler achieves a speedup between 2× --74× and 2× --97× compared to the code generated by the MathWorks MATLAB-to-C compiler. Experimental results also prove that the compiler efficiently exploits SIMD custom instructions achieving a 3.3 factor speedup compared to cases where no SIMD processing is used. Thus the compiler can be employed to reduce the development time/effort/cost and time to market through raising the abstraction of application design in an embedded systems/system-on-chip development context.
- Published
- 2017
40. Mapping Spiking Neural Networks to Neuromorphic Hardware
- Author
-
Nikil Dutt, Siebren Schaafsma, Giacomo Indiveri, Khanh Huynh, Yuefeng Wu, Anup Das, Francky Catthoor, Francesco Dell'Anna, Jeffrey L. Krichmar, Adarsha Balaji, University of Zurich, and Das, Anup
- Subjects
FOS: Computer and information sciences ,Technology ,Computer Science - Machine Learning ,Operating Systems (cs.OS) ,Computer science ,Computer Science - Emerging Technologies ,02 engineering and technology ,Parallel computing ,Machine Learning (cs.LG) ,Synapse ,Computer Science - Operating Systems ,Engineering ,Hardware ,0202 electrical engineering, electronic engineering, information engineering ,Interspike interval (ISI) ,PLASTICITY ,Electrical and Electronic Engineering ,Computer Science, Hardware & Architecture ,Cluster analysis ,Neuromorphics ,Metaheuristic ,10194 Institute of Neuroinformatics ,Neurons ,Spiking neural network ,Science & Technology ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,Engineering, Electrical & Electronic ,Distortion ,Energy consumption ,neuromorphic computing ,020202 computer hardware & architecture ,1712 Software ,Emerging Technologies (cs.ET) ,STATES ,Hardware and Architecture ,Design methodology ,Computer Science ,Synapses ,570 Life sciences ,biology ,spiking neural network (SNN) ,Software - Abstract
Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition SNNs into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and inter-cluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion on the shared interconnect, improving application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a meta-heuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on the DynapSE neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and average spike latency by 21%, compared to state-of-the-art techniques., 14 pages, 14 images, 69 references, Accepted in IEEE Transactions on Very Large Scale Integration (VLSI) Systems
- Published
- 2019
41. Sense amplifier offset voltage analysis for both time-zero and time-dependent variability
- Author
-
Innocent Agbo, Wim Dehaene, Daniel Kraak, Francky Catthoor, Said Hamdioui, Mottaqiallah Taouil, Stefan Cosemans, Praveen Raghavan, and Pieter Weckx
- Subjects
010302 applied physics ,Time zero ,Input offset voltage ,Computer science ,Sense amplifier ,020208 electrical & electronic engineering ,Workload ,02 engineering and technology ,Hardware_PERFORMANCEANDRELIABILITY ,Condensed Matter Physics ,Local variation ,01 natural sciences ,Process corners ,Atomic and Molecular Physics, and Optics ,Surfaces, Coatings and Films ,Electronic, Optical and Magnetic Materials ,Quality (physics) ,Control theory ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Hardware_INTEGRATEDCIRCUITS ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,Voltage - Abstract
© 2019 This paper presents an accurate technique to extensively analyze the impact of time-zero (i.e., global and local variation) and time-dependent (i.e., voltage, temperature, workload, and aging) variation on the offset voltage specification of a memory sense amplifier design using 45 nm predictive technology model (PTM) high performance library. The results show that increasing the supply voltage both for time-zero and time-dependent reduces the offset voltage specification marginally, irrespective of the process corners. In contrast, the offset voltage specification is very sensitive to the temperature and the workload, i.e., the applied voltage patterns. The results also show that a balanced workload results in a significantly lower offset voltage specification. The above results can be used to estimate the required offset voltage accurately for a given lifetime, and operational conditions such as workload, temperature, and voltage; hence, enable the designer to take appropriate measures for a high quality, robust, optimal and reliable design. ispartof: MICROELECTRONICS RELIABILITY vol:99 pages:52-61 status: published
- Published
- 2019
42. Power-Accuracy Trade-Offs for Heartbeat Classification on Neural Networks Hardware
- Author
-
Adarsha Balaji, Federico Corradi, Anup Das, Sandeep Pande, Francky Catthoor, and Siebren Schaafsma
- Subjects
Technology ,Spiking Neural Network (SNN) ,SPIKING NEURONS ,Science & Technology ,Artificial neural network ,Heartbeat ,Computer science ,business.industry ,Trade offs ,020206 networking & telecommunications ,Engineering, Electrical & Electronic ,02 engineering and technology ,TRANSFORM ,Heartbeat Classification ,020202 computer hardware & architecture ,Power (physics) ,Engineering ,Convolution Neural Network (CNN) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,business ,Computer hardware ,SYSTEM - Abstract
© 2018 American Scientific Publishers. All rights reserved. Heartbeat classification using electrocardiogram (ECG) data is an essential feature of modern day wearable devices. State-of-the-art machine learning-based heartbeat classifiers are designed using convolutional neural networks (CNN). Despite their high classification accuracy, CNNs require significant computational resources and power. This makes the mapping of CNNs on resource- And power-constrained wearable devices challenging. In this paper, we propose heartbeat classification using spiking neural networks (SNN), an alternative approach based on a biologically inspired, event-driven neural networks. SNNs compute and transfer information using discrete spikes that require fewer operations and less complex hardware resources, making them energy-efficient compared to CNNs. However, due to complex error-backpropagation involving spikes, supervised learning of deep SNNs remains challenging. We propose an alternative approach to SNN-based heartbeat classification. We start with an optimized CNN implementation of the heartbeat classification task and then convert the CNN operations, such as multiply-accumulate, pooling and softmax, into spiking equivalent with a minimal loss of accuracy. We evaluate the SNN-based heartbeat classification using publicly available ECG database of the Massachusetts Institute of Technology and Beth Israel Hospital (MIT/BIH), and demonstrate a minimal loss in accuracy when compared to 85.92% accuracy of a CNN-based hearbeat classification. We demonstrate that, for every operation, the activation of SNN neurons in each layer is sparse when compared to CNN neurons, in the same layer. We also show that this sparsity increases with an increase in the number of layers of the neural network. In addition, we detail the power-accuracy trade-off of the SNN and show a 87.76% and 96.82% reduction in SNN neuron and synapse activity, respectively, for accuracy loss ranging between 0.6% and 1.00%, when compared to a CNN-only implementation. ispartof: JOURNAL OF LOW POWER ELECTRONICS vol:14 issue:4 pages:508-519 status: published
- Published
- 2018
43. The defect-centric perspective of device and circuit reliability—From gate oxide defects to circuits
- Author
-
Guido Groeseneken, Francky Catthoor, Michael Waltl, Marko Simicic, Pieter Weckx, Jacopo Franco, Gerhard Rzepa, Dimitri Linten, Erik Bury, Tibor Grasser, Bertrand Parvais, Moon Ju Cho, V. Putcha, Ben Kaczer, Robin Degraeve, Peter Debacker, Wolfgang Goes, Praveen Raghavan, and Philippe Roussel
- Subjects
010302 applied physics ,Computer science ,Circuit design ,Perspective (graphical) ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Condensed Matter Physics ,Circuit reliability ,01 natural sciences ,020202 computer hardware & architecture ,Electronic, Optical and Magnetic Materials ,Reliability (semiconductor) ,Gate oxide ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Materials Chemistry ,Electronic engineering ,Electrical and Electronic Engineering ,Electronic circuit - Abstract
As-fabricated (time-zero) variability and mean device aging are nowadays routinely considered in circuit simulations and design. Time-dependent variability (reliability-related variability) is an emerging concern that needs to be considered in circuit design as well. This phenomenon in deeply scaled devices can be best understood within the so-called defect-centric picture in terms of an ensemble of individual defects. The properties of gate oxide defects are discussed. It is further shown how in particular the electrical properties can be used to construct time-dependent variability distributions and can be propagated up to transistor-level circuits.
- Published
- 2016
44. A brief overview of gate oxide defect properties and their relation to MOSFET instabilities and device and circuit time-dependent variability
- Author
-
V. Putcha, Ben Kaczer, Philippe Roussel, Dimitri Linten, Tibor Grasser, Gerhard Rzepa, Erik Bury, Marco Simicic, Bertrand Parvais, Pieter Weckx, Jacopo Franco, Michael Waltl, Francky Catthoor, Adrian Chasin, Electricity, and Electronics and Informatics
- Subjects
Materials science ,Gate dielectric ,Random Telegraph Noise (RTN) ,02 engineering and technology ,01 natural sciences ,Trap (computing) ,symbols.namesake ,Computer Science::Hardware Architecture ,Gate oxide ,Bias Temperature Instability (BTI) ,0103 physical sciences ,MOSFET ,Physics::Atomic Physics ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,Quantum tunnelling ,Electronic circuit ,010302 applied physics ,Condensed Matter::Quantum Gases ,Negative-bias temperature instability ,business.industry ,variability ,Fermi level ,021001 nanoscience & nanotechnology ,Condensed Matter Physics ,gate oxide defects ,Atomic and Molecular Physics, and Optics ,Electronic, Optical and Magnetic Materials ,Surfaces, Coatings and Films ,circuit simulations ,symbols ,Optoelectronics ,0210 nano-technology ,business - Abstract
A paradigm for MOSFET instabilities is outlined based on gate oxide traps and the detailed understanding of their properties. A model with trap energy levels in the gate dielectric and their misalignment with the channel Fermi level is described, offering the most successful strategy to reduce both Positive and Negative Bias Temperature Instability (PBTI and NBTI) in a range of gate stacks. Trap temporal properties are determined by tunneling between the carrier reservoir and the trap itself, as well as thermal barriers related to atomic reconfiguration. Trap electrostatic impact depends on the gate voltage and its spatial position, randomized by variations in the channel potential. All internal properties of traps are distributed, resulting in distributions of the externally observable trap parameters, and in turn in time-dependent variability in devices and circuits.
- Published
- 2018
45. Technology/System Codesign and Benchmarking for Lateral and Vertical GAA Nanowire FETs at 5-nm Technology Node
- Author
-
Aaron Thean, Praveen Raghavan, D. Yakimets, Diederik Verkest, Francky Catthoor, Zsolt Tokei, Nadine Collaert, Azad Naeemi, Chenyun Pan, and Peter Debacker
- Subjects
Engineering ,Multi-core processor ,business.industry ,Electrical engineering ,Nanowire ,Capacitance ,Electronic, Optical and Magnetic Materials ,Logic gate ,Technology system ,Field-effect transistor ,Electrical and Electronic Engineering ,business ,Scaling ,Leakage (electronics) - Abstract
For sub-7-nm technology nodes, the gate-all-around (GAA) nanowire-based device structure is a strong candidate to sustain scaling according to Moore’s Law. For the first time, the performance of two GAA device options—lateral FET (LFET) and vertical FET (VFET)—is benchmarked and analyzed at the system level using an ARM core processor, based on realistic compact device models at the 5-nm technology node. Tradeoffs among energy, frequency, leakage, and area are evaluated by a multi- $V_{\rm th}$ optimization flow. A variety of relevant device configurations, including various number of fins, nanowires, and nanowire stacks, are explored. The results demonstrate that an LFET GAA core has a larger maximum frequency than its VFET counterpart because the channel stress that can be created in the LFETs results in a larger ON current. For fast timing targets, the LFET cores are therefore superior. However, for slow timing targets (e.g., 5 ns), the VFET cores with three nanowires offer a 7% area reduction and a 20% energy saving compared with the LFET cores with 2fin/2stack at the same leakage power.
- Published
- 2015
46. A Scalable MIMO Detector Processor With Near-ASIC Energy Efficiency
- Author
-
Robert Fasthuber, Praveen Raghavan, Francky Catthoor, and Liesbet Van der Perre
- Subjects
Engineering ,business.industry ,Detector ,MIMO ,Integrated circuit ,Multi-user MIMO ,law.invention ,Application-specific integrated circuit ,Hardware and Architecture ,law ,Scalability ,Electronic engineering ,Algorithm design ,Electrical and Electronic Engineering ,business ,Software ,Efficient energy use - Abstract
Emerging 4G wireless communication systems need to deliver much higher data rates, more flexibility, and a significantly higher energy efficiency than current systems. To cope with this immense increase of requirements, new design approaches are a necessity. This paper focuses on the design of an advanced multiple-input–multiple-output (MIMO) detector, which is typically a bottleneck in the wireless receiver. In the proposed template-based design approach innovative architecture concepts, such as very wide register and distributed loop buffer, and algorithm-architecture co-optimizations are combined. The resulting MIMO detector processor, which is scalable to eight and more antennas, achieves a high area efficiency of 571 GOPS/ $\mathrm{mm}^{2}$ and a high energy efficiency of 3.3 GOPS/mW in the Taiwan Semiconductor Manufacturing Company (TSMC) 40-nm technology. By exploiting the dynamically varying requirements, the proposal has the potential to achieve a higher average energy efficiency than an application-specific integrated circuit (ASIC) equivalent. However, a penalty in total area consumption exists. The proposed architecture style offers an interesting and a very promising tradeoff in between the traditional ASIC and the other programmable processor solutions.
- Published
- 2015
47. Array Interleaving—An Energy-Efficient Data Layout Transformation
- Author
-
Tom Vander Aa, Namita Sharma, Francky Catthoor, Praveen Raghavan, and Preeti Ranjan Panda
- Subjects
Interleaving ,Computer science ,business.industry ,Parallel computing ,Computer Graphics and Computer-Aided Design ,Computer Science Applications ,Dope vector ,Reduction (complexity) ,Set (abstract data type) ,Transformation (function) ,Computer data storage ,Electrical and Electronic Engineering ,business ,Energy (signal processing) ,Efficient energy use - Abstract
Optimizations related to memory accesses and data storage make a significant difference to the performance and energy of a wide range of data-intensive applications. These techniques need to evolve with modern architectures supporting wide memory accesses. We investigate array interleaving , a data layout transformation technique that achieves energy efficiency by combining the storage of data elements from multiple arrays in contiguous locations, in an attempt to exploit spatial locality. The transformation reduces the number of memory accesses by loading the right set of data into vector registers, thereby minimizing redundant memory fetches. We perform a global analysis of array accesses, and account for possibly different array behavior in different loop nests that might ultimately lead to changes in data layout decisions for the same array across program regions. Our technique relies on detailed estimates of the savings due to interleaving, and also the cost of performing the actual data layout modifications. We also account for the vector register widths and the possibility of choosing the appropriate granularity for interleaving. Experiments on several benchmarks show a 6--34% reduction in memory energy due to the strategy.
- Published
- 2015
48. Technology/Circuit/System Co-Optimization and Benchmarking for Multilayer Graphene Interconnects at Sub-10-nm Technology Node
- Author
-
Chenyun Pan, Ahmet Ceyhan, Zsolt Tokei, Azad Naeemi, Francky Catthoor, and Praveen Raghavan
- Subjects
Adder ,Materials science ,Graphene ,Contact resistance ,Clock rate ,Capacitance ,Electronic, Optical and Magnetic Materials ,law.invention ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Node (circuits) ,Electrical and Electronic Engineering ,Routing (electronic design automation) ,Voltage - Abstract
Based on realistic circuit- and system-level simulations, graphene interconnects are analyzed in terms of multiple material properties, such as the mean free path (MFP), the contact resistance, and the edge roughness. The benchmarking results indicate that the advantage of using graphene interconnects occurs only under certain circumstances. The device-level parameters, including the supply and threshold voltages, and the circuit-level parameters, including the wire length and width, have large impacts on both the delay and energy-delay product (EDP). At the circuit level, one representative circuit, a 32-bit adder, is investigated, where up to 40% and 70% improvements in delay and EDP are observed. At the system-level analysis, an ARM Cortex-M0 processor is synthesized, and placement and routing are performed. After replacing copper interconnects with multilayer graphene interconnects, up to 15% and 22% performance improvements in clock frequency and EDP have been observed. It is also demonstrated that the benefits of using graphene for the ARM core processor are strongly dependent on the quality of the graphene, such as the MFP and the edge roughness.
- Published
- 2015
49. Demonstrating HW–SW Transient Error Mitigation on the Single-Chip Cloud Computer Data Plane
- Author
-
Dimitrios Soudris, Dimitrios Rodopoulos, Francky Catthoor, and Antonis Papanikolaou
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Dynamic frequency scaling ,Real-time computing ,Fidelity ,Cloud computing ,Chip ,Single-chip Cloud Computer ,Hardware and Architecture ,Cache ,Electrical and Electronic Engineering ,Error detection and correction ,business ,Software ,Computer hardware ,Decoding methods ,media_common - Abstract
Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without extreme overheads in power, timing, or silicon area. We demonstrate a hybrid (hardware–software) scheme that mitigates bit flips in data that reside in low-level caches. The methodology is shown to be applicable in streaming applications and we illustrate that with a video decoding case study on a state-of-the-art many-core chip. The single-chip cloud computer is an experimental processor created by Intel Labs. Dedicated on-chip memories are utilized to keep safe copies for key application data, thus allowing rollbacks upon error detection. The experimental results illustrate the tradeoff between application delay, consumed energy, and output fidelity as the injected errors are corrected. When output fidelity is considered as a hard constraint, application slack used for mitigation can be reclaimed with dynamic frequency scaling. Output fidelity is guaranteed regardless of the error injection intensity and the application's timing constraints are respected up to a certain upper bound of error injection.
- Published
- 2015
50. Atomistic Pseudo-Transient BTI Simulation With Inherent Workload Memory
- Author
-
Michail Noltsis, Francky Catthoor, Dimitrios Soudris, Pieter Weckx, and Dimitrios Rodopoulos
- Subjects
Engineering ,Dependency (UML) ,business.industry ,Spice ,Electronic, Optical and Magnetic Materials ,Differentiator ,Reliability (semiconductor) ,Logic gate ,Electronic engineering ,Transient (oscillation) ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,Representation (mathematics) ,business ,Simulation ,Voltage - Abstract
Bias Temperature Instability (BTI) is a major concern for the reliability of decameter to nanometer devices. Older modeling approaches fail to capture time-dependent device variability or maintain a crude view of the device's stress. Previously, a two-state atomistic model has been introduced, which is based on gate stack defect kinetics. Its complexity has been preventing seamless integration in simulations of large device inventories over typical system lifetimes. In this paper, we present an approach that alleviates this complexity. We introduce a novel signal representation for the gate stress. Using this format, atomistic BTI simulations require less model iterations while exhibiting minimum accuracy degradation. We also enable full temperature and voltage supply dependency since these attributes are far from constant in modern integrated systems. The proposed simulation methodology retains both the atomistic property and the workload memory that remain major differentiators of defect-based BTI simulation, in comparison to state-of-the-art approaches.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.