101 results on '"Fabien Clermidy"'
Search Results
2. Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster
- Author
-
Ivan Miro-Panades, Fabien Clermidy, Philippe Flatresse, Jeremy Constantin, Frank K. Gurkaynak, Andreas Burg, Michael Gautschi, Antonio Pullini, Edith Beigne, Adam Teman, Luca Benini, Davide Rossi, Igor Loi, Rossi, Davide, Pullini, Antonio, Loi, Igor, Gautschi, Michael, Gurkaynak, Frank Kagan, Teman, Adam, Constantin, Jeremy, Burg, Andrea, Miro-Panades, Ivan, Beigne, Edith, Clermidy, Fabien, Flatresse, Philippe, and Benini, Luca
- Subjects
Power management ,parallel processing ,Reduced instruction set computing ,Computer science ,business.industry ,020208 electrical & electronic engineering ,UTBB FD-SOI ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Memory management ,Software ,body biasing ,Parallel processing (DSP implementation) ,Hardware and Architecture ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,power management ,Electrical and Electronic Engineering ,business ,energy efficiency ,Computer hardware ,Efficient energy use - Abstract
This article presents an ultra-low-power parallel computing platform and its system-on-chip (SoC) embodiment, targeting a wide range of emerging near-sensor processing tasks for Internet of Things (IoT) applications. The proposed SoC achieves 193 million operations per second (MOPS) per mW at 162 MOPS (32 bits), improving the first-generation Parallel Ultra-Low-Power (PULP) architecture by 6.4 and 3.2 times in performance and energy efficiency, respectively.
- Published
- 2017
- Full Text
- View/download PDF
3. 2.3 A 220GOPS 96-Core Processor with 6 Chiplets 3D-Stacked on an Active Interposer Offering 0.6ns/mm Latency, 3Tb/s/mm2 Inter-Chiplet Interconnects and 156mW/mm2@ 82%-Peak-Efficiency DC-DC Converters
- Author
-
Lucile Arnaud, David Coriat, Cesar Fuguet, Perceval Coudrain, Julian Pontes, Ivan Miro-Panades, Sebastien Thuries, J. Durupt, Didier Varreau, D. Lattard, Alexis Farcy, Alexandre Arriordaz, Eric Guthmuller, Alain Greiner, Christian Bernard, Severine Cheramy, Gael Pillonnet, Guillaume Moritz, Alain Gueugnot, Yvain Thonnart, Quentin L. Meunier, Frédéric Berger, Jean Charbonnier, Pascal Vivet, Fabien Clermidy, Michel Harrand, Arnaud Garnier, and Denis Dutoit
- Subjects
Power management ,Multi-core processor ,Through-silicon via ,Silicon ,Computer science ,business.industry ,020208 electrical & electronic engineering ,chemistry.chemical_element ,020206 networking & telecommunications ,02 engineering and technology ,Switched capacitor ,Network on a chip ,CMOS ,chemistry ,0202 electrical engineering, electronic engineering, information engineering ,Interposer ,business ,Computer hardware - Abstract
In the context of high-performance computing and big-data applications, the quest for performance requires modular, scalable, energy-efficient, low-cost manycore systems. Partitioning the system into multiple chiplets 3D-stacked onto large-scale interposers - organic substrate [1], 2.5D passive interposer [2] or silicon bridge [3] -leads to large modular architectures and cost reductions in advanced technologies by the Known Good Die (KGD) strategy and yield management. However, these approaches lack flexible efficient long-distance communications, smooth integration of heterogeneous chiplets, and easy integration of less-scalable analog functions, such as power management [4] and system IOs. To tackle these issues, this paper presents an active interposer integrating: i) a Switched Capacitor Voltage Regulator (SCVR) for on-chip power management; ii) flexible system interconnect topologies between all chiplets for scalable cache coherency support; iii) energy-efficient 3D-plugs for dense inter-layer communication; iv) a memory-IO controller and PHY for socket communication. The chip (Fig. 2.3.7) integrates 96 cores in 6 chiplets in 28nm FDSOI CMOS, 30-stacked in a face-to-face configuration using 20µm-pitch micro-bumps (µ-bumps) onto a 200 mm2 active interposer with 40µm-pitch Through Silicon Via (TSV) middle in a 65nm technology node. Even though complex functions are integrated, active-interposer yield is high thanks to the mature 65nm node and a reduced complexity (0.08transistors/µm2), with 30% of interposer area devoted to a SCVR variability-tolerant capacitors scheme.
- Published
- 2020
- Full Text
- View/download PDF
4. Advanced 3D Technologies and Architectures for 3D Smart Image Sensors
- Author
-
Perrine Batude, Olivier Bichler, Laurent Millet, Sebastien Thuries, Alexandre Valentian, Karim Ben Chehida, Maria Lepecq, Monte Alegre, Thomas Dombek, Luis Cubero, Fabien Clermidy, Maxence Bouvier, Cheramy Severine, Pascal Vivet, Gilles Sicard, Stéphane Chevobbe, and Didier Lattard
- Subjects
010302 applied physics ,Pixel ,business.industry ,Computer science ,Process (computing) ,Context (language use) ,Image processing ,02 engineering and technology ,01 natural sciences ,Automation ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Image sensor ,business ,Computer hardware - Abstract
Image Sensors will get more and more pervasive into their environment. In the context of Automotive and IoT, low cost image sensors, with high quality pixels, will embed more and more smart functions, such as the regular low level image processing but also object recognition, movement detection, light detection, etc. 3D technology is a key enabler technology to integrate into a single device the pixel layer and associated acquisition layer, but also the smart computing features and the required amount of memory to process all the acquired data. More computing and memory within the 3D Smart Image Sensors will bring new features and reduce the overall system power consumption. Advanced 3D technology with ultra-fine pitch vertical interconnect density will pave the way towards new architectures for 3D Smart Image Sensors, allowing local vertical communication between pixels, and the associated computing and memory structures. The presentation will give an overview of recent 3D technologies solutions, such as Hybrid Bonding technology and the Monolithic 3D CoolCube™ technology, with respective 3D interconnect pitch in the order of 1 μm and l00nm. Recent 3D Image Sensors will be presented, showing the capability of 3D technology to implement fine grain pixel acquisition and processing with ultra-high speed image acquisition and tile-based processing. As further perspectives, multi-layer 3D image sensor based on events and spiking will reduce power consumption with new detection and learning processing capabilities.
- Published
- 2019
- Full Text
- View/download PDF
5. Asynchronous Circuit Designs for the Internet of Everything: A Methodology for Ultralow-Power Circuits with GALS Architecture
- Author
-
Yvain Thonnart, Pascal Vivet, Edith Beigne, Jean-Frederic Christmann, and Fabien Clermidy
- Subjects
Asynchronous system ,Clock signal ,business.industry ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Computer architecture ,Synchronizer ,Asynchronous communication ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,The Internet ,Electrical and Electronic Engineering ,business ,Hardware_LOGICDESIGN ,Electronic circuit ,Asynchronous circuit - Abstract
Asynchronous circuits have characteristics that differ significantly from those of synchronous circuits in terms of their power and robustness to variations. In this article, we show how it is possible to exploit these characteristics to design robust ultralow-power circuits within the scope of the Internet of Everything (IoE) and with globally asynchronous and locally synchronous (GALS) architectures. More specifically, our aim is to describe the fundamentals of asynchronous circuit design; to detail specific methodologies with practical examples of low-power, asynchronous circuits; and to offer clear guidelines that differentiate the usefulness of an asynchronous circuit compared to a synchronous one according to different application needs.
- Published
- 2016
- Full Text
- View/download PDF
6. Guidance to reliability improvement in CBRAM using advanced KMC modelling
- Author
-
Mathieu Bernard, L. Perniola, G. Molas, Fabien Clermidy, B. De Salvo, Alain Toffoli, C. Carabasse, C. Cagli, J. Guy, and A. Roule
- Subjects
010302 applied physics ,Engineering ,Programmable metallization cell ,business.industry ,Intrinsic resistance ,Integrated circuit ,01 natural sciences ,On resistance ,Uncorrelated ,law.invention ,Reliability (semiconductor) ,law ,0103 physical sciences ,Kinetic Monte Carlo ,business ,Reset (computing) ,Simulation - Abstract
In this paper, we use Kinetic Monte Carlo (KMC) simulations to investigate CBRAM variability. A full consistent model able to simulate SET, RESET, retention and endurance characteristics was proposed for the 1st time, allowing to describe experimental data obtained on Al 2 O 3 /CuTex based CBRAM. The role of oxygen vacancy generation during programming is described and its impact on reliability (retention and endurance) is elucidated. The origin of the resistance spread is discussed and linked to the conductive filament shape and operating conditions. The cycle to cycle contribution on resistance variability is uncorrelated from the intrinsic resistance distribution limit. Finally, guidelines are given in order to optimize the memory distribution, reduce tail bits and improve CBRAM reliability.
- Published
- 2017
- Full Text
- View/download PDF
7. Impact of Sb doping on power consumption and retention reliability of GeS2 based conductive bridge random access memory
- Author
-
F. Aussenac, E. Vianello, P. Francois, G. Molas, Mathieu Bernard, Fabien Clermidy, Vincent Delaye, P. Blaise, B. De Salvo, C. Carabasse, E. Souchier, and J. Guy
- Subjects
Materials science ,business.industry ,Programmable metallization cell ,Doping ,Metals and Alloys ,Nanotechnology ,Surfaces and Interfaces ,Electrolyte ,Surfaces, Coatings and Films ,Electronic, Optical and Magnetic Materials ,Resistive random-access memory ,Reliability (semiconductor) ,Gate array ,Materials Chemistry ,Optoelectronics ,Data retention ,business ,Electrical conductor - Abstract
In this paper we present the impact of Sb doping of the GeS2 electrolyte in W/GeS2/Ag based conductive bridge random access memory (CBRAM) on the memory performance. In particular, the CBRAM resistance window, RON and ROFF values versus programming current, power consumption and reliability are analyzed in depth. We demonstrated that the Sb concentration governs the optimal operating conditions. In particular, high Sb doping allows low programming current operation (suitable for low power applications), while low Sb content improves the ROFF/RON ratio (needed in particular for nonvolatile field-programmable gate array applications). Finally, we observed that the high temperature retention could be improved by increasing the Sb doping. This result was interpreted by means of ab initio calculations, indicating that Sb reduces the dissolution rate of the Ag-based conductive filament in the electrolyte.
- Published
- 2014
- Full Text
- View/download PDF
8. A Novel Programming Technique to Boost Low-Resistance State Performance in Ge-Rich GST Phase Change Memory
- Author
-
Athanasios Kiouseloglou, Sylvain Maitrejean, Fabien Clermidy, Luca Perniola, Gilles Reimbold, Guido Torelli, Alessandro Cabrini, Barbara De Salvo, A. Persico, Gabriele Navarro, A. Roule, and Veronique Sousa
- Subjects
Scheme (programming language) ,Materials science ,Electronic, Optical and Magnetic Materials ,Reduction (complexity) ,Set (abstract data type) ,Phase-change memory ,Phase change ,Memory cell ,Electronic engineering ,State (computer science) ,Electrical and Electronic Engineering ,Low resistance ,computer ,computer.programming_language - Abstract
In this paper, we examine the problem of the drift of the low-resistance state (LRS) in phase change memories based on C or N doped and undoped Ge-rich Ge2Sb2Te5. A novel procedure, named R-SET technique, is proposed to boost the SET speed of these innovative phase change materials by overcoming the decrease of crystallization speed caused by Ge enrichment. The R-SET technique allows, at the same time, an optimized SET programming of the memory cell and the reduction of the LRS drift with respect to standard SET procedures. A circuit that generates the desired R-SET pulse based on a time reference scheme is proposed and discussed.
- Published
- 2014
- Full Text
- View/download PDF
9. Designing digital circuits with nano-scale devices: Challenges and opportunities
- Author
-
Alexandre Valentian, Marc Belleville, Olivier P. Thomas, and Fabien Clermidy
- Subjects
Digital electronics ,Engineering ,Operating point ,business.industry ,Electrical engineering ,Condensed Matter Physics ,Track (rail transport) ,Electronic, Optical and Magnetic Materials ,Power consumption ,Materials Chemistry ,Digital integrated circuits ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Nanoscopic scale ,Energy (signal processing) - Abstract
This paper presents an overview of the challenges and opportunities when designing digital integrated circuits in nano-scale technologies. Major applications requirements and nano-technologies design limitations are introduced. Design solutions currently under development like adaptive techniques aiming to cope with variations and to track an optimal energy operating point are presented.
- Published
- 2013
- Full Text
- View/download PDF
10. Design and Architectural Assessment of 3-D Resistive Memory Technologies in FPGAs
- Author
-
Ian O'Connor, Fabien Clermidy, Pierre-Emmanuel Gaillardon, L. Perniola, M. Haykel Ben Jamaa, G. De Micheli, Davide Sacchetto, and Giovanni Betti Beneventi
- Subjects
phase-change memory ,oxide memory ,nonvolatile memory ,3-D integration ,business.industry ,Computer science ,Reading (computer) ,Uniform memory access ,Semiconductor memory ,programmable logic arrays ,RRAM ,Computer Science Applications ,Non-volatile memory ,Nano-RAM ,Electronic engineering ,Computing with Memory ,Non-volatile random-access memory ,Electrical and Electronic Engineering ,business ,Computer hardware ,Random access - Abstract
Emerging nonvolatile memories (eNVMs) such as phase-change random access memories (PCRAMs) or oxide-based resistive random access memories (OxRRAMs) are promising candidates to replace Flash and Static Random Access Memories in many applications. This paper introduces a novel set of building blocks for field-programmable gate arrays (FPGAs) using eNVMs. We propose an eNVM-based configuration point, a look-up table structure with reduced programming complexity and a high-performance switchbox arrangement. We show that these blocks yield an improvement in area and write time of up to 3× and 33×, respectively, versus a regular Flash implementation. By integrating the designed blocks in an FPGA, we demonstrate an area and delay reduction of up to 28% and 34%, respectively, on a set of benchmark circuits. These reductions are due to the eNVM 3-D integration and to their low on-resistance state value. Finally, we survey many flavors of the technologies and we show that the best results in terms of area and delay are obtained with Pt/TiO2/Pt stack, while the lowest leakage power is achieved by InGeTe stack.
- Published
- 2013
- Full Text
- View/download PDF
11. Cost model for monolithic 3D integrated circuits
- Author
-
Daniel Gitlin, Maud Vinet, and Fabien Clermidy
- Subjects
010302 applied physics ,Scheme (programming language) ,Computer science ,Semiconductor device modeling ,02 engineering and technology ,Integrated circuit ,Solid modeling ,01 natural sciences ,Die (integrated circuit) ,020202 computer hardware & architecture ,law.invention ,Power (physics) ,Range (mathematics) ,law ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,Cost benefit ,computer ,computer.programming_language - Abstract
A cost model for monolithic 3D-ICs is presented that takes into account increased process complexity and associated yield impact as well as area reduction. The model enables more accurate PPC (Power, Performance and Cost) understanding and the range of applicability for monolithic 3D-IC technology. The model shows that depending on the die area and partitioning scheme, the cost benefit can be 50% or higher.
- Published
- 2016
- Full Text
- View/download PDF
12. Opportunities brought by sequential 3D CoolCube™ integration
- Author
-
Cristiano Santos, Daniel Gitlin, M. Brocard, G. Berhault, Jessy Micout, Sebastien Thuries, Perrine Batude, Laurent Brunet, Fabien Clermidy, C.-M. V. Lu, Paul Besombes, Francois Andrieu, O. Billoint, Maud Vinet, G. Cibrario, F. Deprat, Claire Fenouillet-Beranger, Vincent Mazzochi, O. Faynot, N. Rambal, and Bernard Previtali
- Subjects
010302 applied physics ,Very-large-scale integration ,Engineering ,business.industry ,Transistor ,Power saving ,Electrical engineering ,02 engineering and technology ,Transistor scaling ,01 natural sciences ,020202 computer hardware & architecture ,law.invention ,Reduction (complexity) ,CMOS ,law ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,business - Abstract
3D VLSI with a CoolCube™ monolithic integration flow allows vertically stacking several layers of devices with a unique connecting via density above tens of million/mm2. This results in increased devices density and gains in power and performance thanks to wire-length reduction without the extra cost associated to transistor scaling. In addition to power saving, this true 3D integration opens perspectives in terms of heterogeneous integration. We will review the opportunities brought by CoolCube™ and will present the most advanced technological demonstration of 3D CMOS over CMOS CoolCube™ integration.
- Published
- 2016
- Full Text
- View/download PDF
13. Distributed Dynamic Rate Adaptation on a Network on Chip with Traffic Distortion
- Author
-
Christian Bernard, Yves Durand, and Fabien Clermidy
- Subjects
Router ,Network on a chip ,Computer science ,Distributed algorithm ,Distortion ,Distributed computing ,Throughput (business) ,Queue ,Data transmission ,Communication channel - Abstract
A NoC-based system subject to real-time constraints requires hard bounds on end-to-end data transfer latencies. Regulating the channel injection rates solves the problem by suppressing link congestion and router queue saturation, provided that the channel rates guarantee fairness in the data distribution. We propose a distributed algorithm for the computation of a channel rate vector solution, suitable for runtime execution on the system. The algorithm takes into account the capacity constraints on every link, but also fulfills the distortion constraints. It leads to a near-optimal solution in a few iterations, with less than 10% of vector distance from the optimal solution. The algorithm is valid for software implementation on manycore systems and applicable to any Network-On-Chip based system. Our hardware implementation is distributed into the network infrastructure and converges in around 500 clock cycles on a 4x4 network configuration.
- Published
- 2016
- Full Text
- View/download PDF
14. Impact of intermediate BEOL technology on standard cell performances of 3D VLSI
- Author
-
M. Brocard, Fabien Clermidy, G. Berhault, Perrine Batude, G. Cibrario, Claire Fenouillet-Beranger, F. Deprat, Olivier Rozeau, Laurent Brunet, Francois Andrieu, Sebastien Thuries, O. Billoint, and Joris Lacord
- Subjects
Very-large-scale integration ,Permittivity ,Standard cell ,Engineering ,business.industry ,Transistor ,Process (computing) ,Line (electrical engineering) ,law.invention ,law ,Electronic engineering ,Sensitivity (control systems) ,business ,Electrical conductor - Abstract
While the 3D sequential process is still under development, the electrical influence of specific process for the bottom tier needs to be studied. As another MOS transistor layer is fabricated on top of the bottom one, contamination risk and thermal stability issues appear, thus requiring adaptation of conductors/dielectrics for intermediate Back-End Of Line (iBEOL) processing. As materials differ from usual copper/low-k, it is necessary to study how standard cells electrical characteristics will be affected. We modeled different descriptions of iBEOL in 14nm FDSOI process and simulated standard cells characteristics. The average power consumption is almost the same while large cells with high drive timing degradation can be up to 20% in the worst case. This sensitivity analysis allowed us to identify which parameters (permittivity, resistivity) have the greatest impact depending on standard cell type and provide technology and design guidelines. Our goal here was to limit the performance degradation to around 5% maximum for the bottom tier standard cells.
- Published
- 2016
- Full Text
- View/download PDF
15. Matrix Nanodevice-Based Logic Architectures and Associated Functional Mapping Method
- Author
-
Ian O'Connor, J. Liu, Gabriela Nicolescu, Fabien Clermidy, Pierre-Emmanuel Gaillardon, and M. Amadou
- Subjects
Interconnection ,Engineering ,business.industry ,Fault tolerance ,Parallel computing ,Atomic packing factor ,Network topology ,Matrix (mathematics) ,Hardware and Architecture ,Scalability ,Overhead (computing) ,Electrical and Electronic Engineering ,business ,Field-programmable gate array ,Software - Abstract
This article describes a novel computing architecture organization based on nanoscale logic cells. We propose the use of a cluster of matrix arrangements of cells. In order to interconnect such fine-grained logic cells within a matrix, conventional techniques are not suitable due to a large interconnect overhead. Therefore, we propose the use of static and incomplete interconnect topologies to create matrices of cells. We also propose a method to map functions onto such architectures. We then explore the main parameters of the structure (size of matrices and interconnect topologies) and their impact on the main performance metrics (packing efficiency, speed, and fault tolerance). A cluster packing method also allows the evaluation of the number of matrices used by complex functions and the fill factor for various matrix sizes. The analyses show that this approach is particularly suited for matrices of 16 cells interconnected by modified omega networks. We can conclude that this architecture could improve the scalability of traditional FPGAs by a factor of 8.5.
- Published
- 2011
- Full Text
- View/download PDF
16. An Asynchronous Power Aware and Adaptive NoC Based Circuit
- Author
-
Yvain Thonnart, X. Popon, Didier Varreau, Helene Lhermet, Fabien Clermidy, Alexandre Valentian, Edith Beigne, Hugo Lebreton, P. Vivet, Sylvain Miermont, and Xuan-Tu Tran
- Subjects
Engineering ,business.industry ,Globally asynchronous locally synchronous ,Hardware_PERFORMANCEANDRELIABILITY ,Network on a chip ,Asynchronous communication ,Low-power electronics ,Embedded system ,Dynamic demand ,Hardware_INTEGRATEDCIRCUITS ,System on a chip ,Electrical and Electronic Engineering ,business ,Frequency scaling ,Power control - Abstract
In complex embedded applications, optimisation and adaptation of both dynamic and leakage power have become an issue at SoC grain. A fully power-aware globally-asynchronous locally-synchronous network-on-chip (NoC) circuit is presented in this paper. Network-on-chip architecture combined with a globally-asynchronous locally-synchronous paradigm is a natural enabler for DVFS mechanisms. The circuit is arranged around an asynchronous network-on-chip providing scalable communication and a 17 Gb/s throughput while automatically reducing its power consumption by activity detection. Both dynamic and static power consumptions are globally reduced using adaptive design techniques applied locally for each synchronous NoC units. No fine control software is required during voltage and frequency scaling. Power control is localized and a minimal latency cost is observed.
- Published
- 2009
- Full Text
- View/download PDF
17. A Reconfigurable Baseband Platform Based on an Asynchronous Network-on-Chip
- Author
-
Pascal Vivet, Didier Lattard, Fabien Clermidy, F. Berens, Romain Lemaire, Edith Beigne, and Yves Durand
- Subjects
Ethernet ,Network architecture ,Engineering ,Network on a chip ,Asynchronous communication ,business.industry ,Embedded system ,Globally asynchronous locally synchronous ,Baseband ,System on a chip ,Electrical and Electronic Engineering ,Chip ,business - Abstract
In order to face the inherent complexity of new radio access technologies and to address the development of multi-standard devices, an innovative reconfigurable baseband architecture based on a distributed control and communication framework is proposed. This architecture is tailored to the possibilities and limitations of next-generation CMOS nanotechnologies in terms of leakage and timing closure. A combination of technology features, message passing control model, network-on-chip, asynchronous implementation, clocking and power reduction policies is used. The 79.5 chip was manufactured in a 130 nm CMOS technology and is integrated in a prototyping platform to perform real-time experimentation of advanced MIMO OFDM based telecom techniques. It is composed of 23 functional units, such as computing intensive IPs, channel coding blocks, programmable DMA engines, an ARM946ES core, and an Ethernet interface. These elements are interconnected via an asynchronous layered network-on-chip using an interface that controls the communication and configuration parameters during application scheduling.
- Published
- 2008
- Full Text
- View/download PDF
18. 8.1 a 4x4x2 homogeneous scalable 3d network-on-chip circuit with 326mflit/s 0.66pj/b robust and fault-tolerant asynchronous 3d links
- Author
-
Jean Michailos, Didier Lattard, Cristiano Santos, Yvain Thonnart, Fabien Clermidy, Romain Lemaire, Frédéric Pétrot, Severine Cheramy, Ivan Miro-Panades, Florian Darve, Christian Bernard, Pascal Vivet, Eric Flamand, and Edith Beigne
- Subjects
Engineering ,business.industry ,020208 electrical & electronic engineering ,MIMO ,Fault tolerance ,02 engineering and technology ,020202 computer hardware & architecture ,CMOS ,Asynchronous communication ,Robustness (computer science) ,Embedded system ,Scalability ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,Baseband ,business ,Computer hardware - Abstract
By shortening communication distance across dies, 3D technologies are a key to continued improvements in computing density. For 4G telecom baseband processing, specific computing units arranged in a regular network-on-chip (NoC) array provide power-efficient computation [1]. However, for advanced MIMO processing, more computing power is required when the number of antennas increases. This paper presents a homogeneous 3D circuit composed of regular tiles assembled using a 4×4×2 network-on-chip, using robust and fault tolerant asynchronous 3D links, providing 326MFlit/s @ 0.66pJ/b, fabricated in CMOS 65nm technology using 1980 TSVs in a Face2Back configuration.
- Published
- 2016
- Full Text
- View/download PDF
19. MCAPI-compliant Hardware Buffer Manager Mechanism to Support Communication in Multi-Core Architectures
- Author
-
Thomas Mesquida, Fabien Clermidy, Romain Lemaire, and Thiago Raupp da Rosa
- Subjects
Hardware architecture ,Multi-core processor ,business.industry ,Computer science ,MCAPI ,Interface (computing) ,Programming complexity ,02 engineering and technology ,020202 computer hardware & architecture ,Software ,Embedded system ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Computer hardware - Abstract
High performance and high power efficiency are two mandatory constraints for multi-core systems in order to successfully handle the most recent applications in several fields, e.g. image processing and communication standards. Nowadays, hardware accelerators are often used along with several processing cores to achieve the desired performance while keeping high power efficiency. However, such systems impose an increased programming complexity due to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant overheads. This work presents a hardware mechanism in co-design with a standard programming interface (API) for embedded systems focusing to decrease overheads imposed by software implementation while increasing programmability and communication performance. The results show gains of up to 97% in latency and an increase of 40 times in throughput for synthetic traffics and an average decrease of 95% in communication time for an image processing application.
- Published
- 2016
- Full Text
- View/download PDF
20. Technology scaling: The CoolCubeTM paradigm
- Author
-
Hossam Sarhan, O. Billoint, Fabien Clermidy, and Sebastien Thuries
- Subjects
Engineering ,business.industry ,Emerging technologies ,Complex system ,Technology scaling ,Electronic engineering ,Leverage (statistics) ,Routing (electronic design automation) ,business ,Design methods ,Maturity (finance) ,Industrial engineering ,Scaling - Abstract
Scaling race towards aggressive nodes is getting more and more difficult as dimensions are getting close to the atoms ones. New solutions have to be investigated to find new ways of scaling while keeping the Moore's law benefits in terms of area, power, performance and cost. These solutions should leverage on existing technologies for reducing their development cost by proposing new usages. Also, as cost of designing complex systems explodes, these new technologies should come with design techniques close to existing ones without increasing their complexity. Finally, current technological issues such as increased complexity of Back-End-Of-Line (BEOL) with the successive delays of Extreme-Ultra-Violet lithography should be solved. This multi-dimensional problem makes candidates difficult to emerge. The sequential 3D CoolCubeTM technology is one of them: bringing very fine pitch 3D interconnects and leveraging on existing technologies, it's opening the door to real 3D-VLSI with the hope of reducing the actual pressure on BEOL while providing real 3D routing possibilities. However, many roadblock in terms of design still exist. In this paper, we show three aspects of design methodology showing its growing maturity and a PPA analysis based on this methodology showing that gains can meet the classical scaling requirements.
- Published
- 2015
- Full Text
- View/download PDF
21. Intermediate BEOL process influence on power and performance for 3DVLSI
- Author
-
Perrine Batude, Claire Fenouillet-Beranger, Fabien Clermidy, O. Billoint, Hossam Sarhan, Sebastien Thuries, Alexandre Ayres De Sousa, and F. Deprat
- Subjects
Very-large-scale integration ,Materials science ,business.industry ,Electrical engineering ,Three-dimensional integrated circuit ,chemistry.chemical_element ,Dielectric ,Tungsten ,Capacitance ,Copper ,Back end of line ,chemistry ,Electrical resistivity and conductivity ,Optoelectronics ,business - Abstract
3D VLSI technology based on CoolCube™ process offers ultra-high density of integration with up to 108 3D Vias (3D-V) per mm2 offering gate level 3D integration capability. For process stability and wide range of temperature compliancy, Intermediate Back End of Line (IBEOL) is targeted to be made with Tungsten lines in a SiO 2 (k=3.9) dielectric, increasing equivalent resistivity by 6 and capacitance by 1.6 compared to standard Back End of Line (BEOL) (copper lines in low k dielectrics). In this study we propose to study impact in Performance, Power and Area (PPA) using W/SiO 2 compared to Cu/low-k IBEOL. Results show area gain up to 60.9% and performance gain up to 21.7% for 3D cases comparing to 2D using 28 nm FDSOI technology. Using W/SiO 2 shows limited impact on performance with maximal 1.93% degradation comparing to Cu/low-k IBEOL.
- Published
- 2015
- Full Text
- View/download PDF
22. Interconnect Challenges for 3D Multi-cores: From 3D Network-on-Chip to Cache Interconnects
- Author
-
Ivan Miro-Panades, Y. Thonnart, Pascal Vivet, Eric Guthmuller, Fabien Clermidy, and Christian Bernard
- Subjects
Smart system ,Network on a chip ,business.industry ,Asynchronous communication ,Computer science ,Embedded system ,Bandwidth (computing) ,Cloud computing ,Electronics ,Cache ,business ,Chip - Abstract
With the era of massive multi-core architecture targeting cloud computing for high end performances or advanced consumer electronics with tighter power consumption constraints, 3D integration technology will allow to design large scale multi-core. Thanks to advanced available 3D technology, it will be possible to maintain overall power consumption budget, increase chip to chip bandwidth, and preserve overall system cost by smart system partitioning. One of the main challenge of such multi-cores is clearly the interconnect infrastructure. For designing such 3D multi-cores, it is required to address two primary concerns: the 3D physical link by itself, and advanced interconnects scaled to 3D. The paper present an overview of 3Dinterconnects with 3D asynchronous Network-on-Chip architectures, with focus on 3D asynchronous links, and advanced interconnect structures for memory caches in 3D.
- Published
- 2015
- Full Text
- View/download PDF
23. An Unbalanced Area Ratio Study for High Performance Monolithic 3D Integrated Circuits
- Author
-
Hossam Sarhan, Fabien Clermidy, Sebastien Thuries, and O. Billoint
- Subjects
Engineering ,Fine grain ,law ,business.industry ,Stacking ,Electronic engineering ,Area ratio ,Integrated circuit ,Performance improvement ,business ,3d design ,law.invention ,Power (physics) - Abstract
Monolithic 3D (M3D) integration technology offers fine grain gate level stacking capability compared to 3D Through Silicon Vias (3D-TSV) which is well adapted for coarse-grain applications. As a result, design partitioning, i.e. Which cell on which tier, highly affects the 3D design performance. Previous partitioning methodologies focus on minimizing number of 3D interconnects for equal area ratio between the stacked partitions. This paper demonstrates that un-balancing the tier to tier area ratio of the M3D design brings better performance than classical balanced 3D design approach. Our study highlights that neither balanced area ratio nor the number of 3D interconnections remains mandatory criteria for M3D. We show that our technique can achieve up to 24% performance improvement compared to 2D and 15% better performance than the state-of-the-art technique without extra power penalty.
- Published
- 2015
- Full Text
- View/download PDF
24. Emerging resistive memories for low power embedded applications and neuromorphic systems
- Author
-
Olivier Bichler, B. DeSalvo, Christian Gamrat, Olivier P. Thomas, Elisa Vianello, Fabien Clermidy, Luca Perniola, Commissariat à l'énergie atomique et aux énergies alternatives - Laboratoire d'Electronique et de Technologie de l'Information (CEA-LETI), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Département d'Architectures, Conception et Logiciels Embarqués-LIST (DACLE-LIST), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), and Laboratoire d'Intégration des Systèmes et des Technologies (LIST)
- Subjects
ReRAM ,Computer science ,Embedded systems ,Field programmable gate arrays (FPGA) ,Complex networks ,Integrated circuit design ,Non-volatile flip-flops ,artificial synapses ,Synaptic plasticity ,Synapse ,[SPI]Engineering Sciences [physics] ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Spiking neural network ,Stochastic systems ,Feedforward neural networks ,Spiking neural networks ,Artificial neural network ,Neuromorphic systems ,Logic level ,Resistive random-access memory ,Non-volatile memory ,Flip flop circuits ,Energy efficiency ,CMOS ,Neuromorphic engineering ,Neuromorphic circuits ,Logic gate ,Embedded application ,Low power electronics ,Neural networks ,Random access storage ,Logic circuits ,Hardware_LOGICDESIGN - Abstract
Conference of IEEE International Symposium on Circuits and Systems, ISCAS 2015 ; Conference Date: 24 May 2015 Through 27 May 2015; Conference Code:115760; International audience; In this work, we will focus on the role that new nonvolatile resistive memory technologies can play in emerging fields of application, such as non-volatile logic circuits or neuromorphic circuits, to save energy and increase performance. Concerning the introduction of non-volatile functionalities at the logic level, we will demonstrate hybrid CMOS logic plus ReRAM (specifically CBRAM and OXRAM) circuits for ultra low power FPGA and fixed-logic IC design, as Non Volatile Flip-Flops. Concerning neuromorphic circuits, we will focus on the emulation of synaptic plasticity effects with resistive memory synapses. We will present large-scale energy efficient neuromorphic systems based on ReRAM as stochastic-binary synapses. Prototype applications such as complex visual- and auditory-pattern extraction will be also discussed using feedforward spiking neural networks.
- Published
- 2015
- Full Text
- View/download PDF
25. Fine-grain DVFS and AVFS techniques for complex SoC design: An overview of architectural solutions through technology nodes
- Author
-
Yvain Thonnart, Ivan Miro-Panades, D. Lattard, Fabien Clermidy, P. Vivet, and Edith Beigne
- Subjects
Engineering ,business.industry ,Hardware_PERFORMANCEANDRELIABILITY ,Reduction (complexity) ,Asynchronous communication ,Frequency domain ,Embedded system ,Electronic engineering ,Overhead (computing) ,Node (circuits) ,System on a chip ,business ,Frequency scaling ,Voltage - Abstract
In this paper we propose to give an overview of fine-grain design techniques we demontrated past years in our lab for power reduction in complex SoCs. Those works are based on Globally Asynchronous and Locally Synchronous systems in which each IP is an independent voltage and frequency domain. After having proposed some simple DFS architectures based on GALS architectures in 130nm technology, we extended our works to fine-grain Dynamic Voltage and Frequency Scaling architectures to reduce dynamic and static power reduction at 65 nm node. Furthermore, considering 32 nm deep submicron technologies, we demonstrated an Adaptive Voltage and Frequency architecture to compensate for in-die PVT variations. Area overhead and power reduction results are discussed all along the paper.
- Published
- 2015
- Full Text
- View/download PDF
26. From 2D to Monolithic 3D
- Author
-
Perrine Batude, Claire Fenouillet-Beranger, F. Deprat, Ogun Turkyilmaz, Iyad Rayane, Olivier Rozeau, Maud Vinet, Hossam Sarhan, Sebastien Thuries, O. Billoint, G. Cibrario, and Fabien Clermidy
- Subjects
Standard cell ,Through-silicon via ,Computer science ,Process (engineering) ,law ,Distributed computing ,Time to market ,Hardware_INTEGRATEDCIRCUITS ,Multiple patterning ,Node (circuits) ,Place and route ,Integrated circuit ,law.invention - Abstract
Design of conventional 2D integrated circuits is becoming more and more challenging as we strive to keep on following Moore's law. Cost, thermal behavior, multiple patterning, increasing number of design rules, transistor characteristics, variability and back end properties coupled with a constant need for a higher integration of functions / peripherals are creating an increasingly complex equation to solve for designers. Moving to the next node and taking advantage of the technology are now far from being straightforward as time to market has never been so short for industry. In order to overcome or at least postpone the time when we'll have to face the "next node migration constraints", a possible solution could be staying at the same node and go 3D with possible benefits such as wire length reduction, power savings and increased operating frequency. Since more than ten years now, interconnect technologies like Through Silicon Via (TSV), High Density (HD)-TSV and Copper to Copper (Cu-Cu) have arisen to take advantage of this possible 3-dimensional physical implementation with proofs of concept [1] or more recently industrial products [2]. Main drawback of these technologies is that they are not shrinking at the same speed as transistors are, making them somehow power hungry; moreover the more they will shrink, the more precision will be needed for chip to chip alignment. To reach the highest possible standard cell and tier to tier interconnect densities required for cost-effective chips, 3D sequential integration process [3][4][5] (also known as Monolithic 3D or CoolCubeTM) is currently developed with main features being sequential fabrication of MOS layers and correlation of tier to tier interconnect size with process node allowing fine-grain 3D partitioning of designs. These particularities make it a durable opportunity to slow down next node design migration while still improving integration. To fully benefit from CoolCubeTM technology, a whole new way of designing circuits, from synthesis to place and route, will be required as some new challenges will arise. The point of this presentation is to show the possible use and limitations of the aforementioned technologies with a focus on Monolithic 3D and to give some insights about market expectations, challenges and available design techniques.
- Published
- 2015
- Full Text
- View/download PDF
27. A Co-design Approach for Hardware Optimizations in Multicore Architectures Using MCAPI
- Author
-
Romain Lemaire, Fabien Clermidy, and Thiago Raupp da Rosa
- Subjects
Reduction (complexity) ,Multi-core processor ,Software ,Computer architecture ,Point (typography) ,Application programming interface ,business.industry ,Computer science ,MCAPI ,Synchronization (computer science) ,business ,Electrical efficiency ,Computer hardware - Abstract
Current SoC platforms targeting high-performance with high power efficiency rely on replicating several processing cores while adding dedicated hardware units for specific tasks. However, programming such architectures demand a high effort when compared to homogeneous multiprocessors since there is no widely used standard for heterogeneous embedded systems. The use of standard application programming interfaces (APIs) increases the programmability but also costs performance/memory usage overheads. Providing mechanisms at the software level leveraging on dedicated hardware resources can help reducing that impact. To address this point, this work presents a co-design approach for improving programming based on a standard API deployed through a mix of hardware and software support for tasks synchronization. Results present a reduction of up to 88% in network traffic and processor active times during synchronization phases when compared to a pure software implementation.
- Published
- 2015
- Full Text
- View/download PDF
28. Resistive Memories for Ultra-Low-Power embedded computing design
- Author
-
T. Benoist, Daniele Garbin, G. Molas, M. Reyboz, Fabien Clermidy, E. Vianello, Olivier P. Thomas, N. Jovanovic, O. Turkyilmaz, J. Coignus, Alain Toffoli, L. Perniola, C. Nguyen, Bastien Giraud, C. Charpin, Giorgio Palma, and M. Alayan
- Subjects
ComputingMilieux_GENERAL ,Ultra low power ,Resistive touchscreen ,Engineering ,business.industry ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit design ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Field-programmable gate array ,business ,Hardware_LOGICDESIGN - Abstract
This paper addresses two technologies as an example of optimized devices for FPGA and fixed-logic IC design (as non volatile Flip-Flops).
- Published
- 2014
- Full Text
- View/download PDF
29. Monolithic 3D integration: A powerful alternative to classical 2D scaling
- Author
-
O. Faynot, Ogun Turkyilmaz, F. Deprat, F. Ponthenier, M.-P. Samson, Hossam Sarhan, G. Cibrario, L. Pasini, V. Lu, Claude Tabone, J-E. Michallet, M. Vinet, Perrine Batude, N. Rambal, Fabien Clermidy, O. Billoint, JM Hartmannn, Claire Fenouillet-Beranger, O. Rozeau, Benoit Sklenard, Laurent Brunet, Sebastien Thuries, and Bernard Previtali
- Subjects
Engineering ,business.industry ,law ,Scale (chemistry) ,MOSFET ,Transistor ,Hardware_INTEGRATEDCIRCUITS ,Electrical engineering ,Electronic engineering ,business ,3d ic design ,Scaling ,law.invention - Abstract
Monolithic or sequential 3D Integration is a powerful technological enabler for actual 3D IC design as the stacked layers can be connected at the transistor scale. This paper reviews the opportunities brought by M3DI and highlights the applications benefiting from this small 3D contact pitch. It also presents the technological challenges of this concept and offers a general overview of the potential solutions to obtain a high performance low temperature top transistor while keeping bottom MOSFET integrity.
- Published
- 2014
- Full Text
- View/download PDF
30. 3D technologies for reconfigurable architectures
- Author
-
Pierre-Emmanuel Gaillardon, Fabien Clermidy, O. Turkyimaz, and O. Billoint
- Subjects
Computer science ,business.industry ,Embedded system ,Path (graph theory) ,Lower cost ,business ,Field-programmable gate array ,Reconfigurable computing - Abstract
FPGA have always taken benefit of the most advanced technology nodes for offering better performance than CPU and better time-to-market than ASSP. However, with the slow-down of technologies and its exponentially increasing cost, FPGA race towards better integration is nowadays compromised. One alternative path to scaling is to go 3D. This promising solution can offer scaling at a lower cost while solving some FPGA issues such as yield or I/Os management. However, 3D solutions come with some drawbacks with heterogeneous performances of 3D/2D links and limited 3D interconnections. In this paper, we show some recent advances on the usage of 3D technologies for enhancing FPGA capacities.
- Published
- 2014
- Full Text
- View/download PDF
31. 3D sequential integration opportunities and technology optimization
- Author
-
Claude Tabone, Ogun Turkyilmaz, Fabien Clermidy, Perrine Batude, Hossam Sarhan, J-E. Michallet, Claire Fenouillet-Beranger, M. Vinet, Laurent Brunet, G. Cibrario, Olivier Rozeau, Benoit Sklenard, Sebastien Thuries, O. Billoint, F. Deprat, and Bernard Previtali
- Subjects
Materials science ,business.industry ,Transistor ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,law.invention ,Planar ,CMOS ,law ,MOSFET ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Field-effect transistor ,business ,Scaling - Abstract
Compared with TSV-based 3D ICs, monolithic or sequential 3D ICs presnts “true” benefits of going to the vertical dimension as the stacked layers can be connected at the transistor scale. The high versatility of this technology is evidenced via several examples requiring small 3D contact pitch. Monolithic 3D is shown to enable substantial gain in area and performance as compared to planar technology without scaling the transistor technology node. This paper summarizes the technological challenges of this concept: it offers a general overview of the potential solutions to obtain a high performance low temperature top transistor while keeping bottom MOSFET integrity.
- Published
- 2014
- Full Text
- View/download PDF
32. A 460MHz at 397mV, 2.6GHz at 1.3V, 32b VLIW DSP, embedding FMAX tracking
- Author
-
Fabien Clermidy, Thomas Benoist, Robin Wilson, Bastien Giraud, Alexandre Valentian, Bertrand Pelloux-Prayer, Olivier P. Thomas, Sylvain Clerc, Julien Le Coz, Ivan Miro Panades, Jean-Philippe Noel, Christian Bernard, David Turgis, O. Billoint, Fady Abouzeid, Anuj Grover, Edith Beigne, Philippe Flatresse, Philippe Magarshack, Yvain Thonnart, Philippe Roche, and Sebastien Bernard
- Subjects
Signal processing ,Computer science ,Very long instruction word ,business.industry ,Datapath ,Clock rate ,Register file ,Electronic engineering ,Serial port ,business ,Digital signal processing - Abstract
Wide-voltage-range-operation DSPs bring more versatility to achieve high energy efficiency in mobile applications to increase signal processing complexity and handle a large range of performance specifications. This paper describes a 32b DSP fabricated in 28nm UTBB FDSOI technology [1]. Body-bias-voltage (VBB) scaling from 0V up to ±2V (Pwell/Nwell) decreases the DSP core VDDMIN to 397mV and increases clock frequency by +400% at 500mV and +114% at 1.3V. In addition to technology gains, dedicated design features are included to increase frequency over the full VDD range, considering parameter variations. As depicted in Fig. 27.1.1, the 32b datapath VLIW DSP is organized around a MAC dedicated to complex arithmetic and two dedicated operators: a cordic/divider and a compare/select. Data enters the circuit through a serial interface and code is run from a 64×32b register file. It has been shown in [1] that a given operating frequency can be achieved at a lower VDD in UTBB FDSOI compared to bulk by applying a forward-body bias. An additional design step is achieved in this work by (1) increasing the frequency at low VDD thanks to a specific selection and design of standard cells with respect to power vs. performance and (2) dynamically tracking the maximum frequency to cope with variations.
- Published
- 2014
- Full Text
- View/download PDF
33. 3DCoB: A new design approach for Monolithic 3D Integrated circuits
- Author
-
Fabien Clermidy, Hossam Sarhan, Sebastien Thuries, and O. Billoint
- Subjects
Engineering ,business.industry ,Circuit design ,Mixed-signal integrated circuit ,Integrated circuit design ,Integrated circuit ,law.invention ,law ,Benchmark (computing) ,Electronic engineering ,Parasitic extraction ,Physical design ,Full custom ,business - Abstract
3D Monolithic Integration (3DMI) technology provides very high dense vertical interconnects with low parasitics. Previous 3DMI design approaches provide either cell-on-cell or transistor-on-transistor integration. In this paper we present 3D Cell-on-Buffer (3DCoB) as a novel design approach for 3DMI. Our approach provides a fully compatible sign-off physical implementation flow with the conventional 2D tools. We implement our approach on a set of benchmark circuits using 28nm-FDSOI technology. The sign-off performance results show 35% improvement compared to the same 2D design.
- Published
- 2014
- Full Text
- View/download PDF
34. Advanced technologies for brain-inspired computing
- Author
-
Christian Gamrat, Olivier Bichler, Marc Duranton, Bilel Blehadj, Rodolphe Heliot, Alexandre Valentian, Fabien Clermidy, Olivier Temam, Commissariat à l'énergie atomique et aux énergies alternatives - Laboratoire d'Electronique et de Technologie de l'Information (CEA-LETI), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Département d'Architectures, Conception et Logiciels Embarqués-LIST (DACLE-LIST), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria), and Laboratoire d'Intégration des Systèmes et des Technologies (LIST)
- Subjects
Engineering ,Through silicon vias ,Emerging technologies ,Monolithic integration ,Memristor ,Capacitance ,law.invention ,Advanced technology ,[SPI]Engineering Sciences [physics] ,Phase change ,Dimension (vector space) ,law ,Electronic engineering ,[INFO]Computer Science [cs] ,Resistive touchscreen ,Computer aided design ,Artificial neural network ,business.industry ,Resistive memory ,High capacitance ,Brain-inspired computing ,Analog neurons ,Monolithic integrated circuits ,business ,Neural networks ,Random access storage - Abstract
Conference of 2014 19th Asia and South Pacific Design Automation Conference, ASP-DAC 2014 ; Conference Date: 20 January 2014 Through 23 January 2014; Conference Code:103285; International audience; This paper aims at presenting how new technologies can overcome classical implementation issues of Neural Networks. Resistive memories such as Phase Change Memories and Conductive-Bridge RAM can be used for obtaining low-area synapses thanks to programmable resistance also called Memristors. Similarly, the high capacitance of Through Silicon Vias can be used to greatly improve analog neurons and reduce their area. The very same devices can also be used for improving connectivity of Neural Networks as demonstrated by an application. Finally, some perspectives are given on the usage of 3D monolithic integration for better exploiting the third dimension and thus obtaining systems closer to the brain.
- Published
- 2014
- Full Text
- View/download PDF
35. 3D FPGA using high-density interconnect Monolithic Integration
- Author
-
Ogun Turkyilmaz, Gerald Cibrario, Olivier Rozeau, Perrine Batude, and Fabien Clermidy
- Published
- 2014
- Full Text
- View/download PDF
36. Reconfigurable architectures and emerging technologies
- Author
-
Fabien Clermidy
- Subjects
Engineering ,Computer architecture ,business.industry ,Emerging technologies ,Electronic engineering ,business ,Field-programmable gate array - Abstract
Reconfigurable logic architectures such as Field Programmable Gate Arrays (FPGA) are known to be generic and highly versatile. This makes them an excellent compromise between costs, development time and performances. Suited for a wide range of application, they offer an intrinsic regularity compatible with the most advanced technological processes.
- Published
- 2013
- Full Text
- View/download PDF
37. A CBRAM-based compact interconnect switch for non-volatile reconfigurable logic circuits
- Author
-
Marina Reyboz, Santhosh Onkaraiah, Fabien Clermidy, Elisa Vianello, Jean-Michel Portal, Christophe Muller, and Marc Belleville
- Subjects
Interconnection ,Bridging (networking) ,Programmable metallization cell ,Computer science ,Logic gate ,Electronic engineering ,Static random-access memory ,Field-programmable gate array ,Random access ,Electronic circuit - Abstract
This paper presents a 2-to-2 interconnect switch based on Conductive Bridging Random Access Memories (CBRAMs), which can be used to form a switch box in reconfigurable logic circuits like FPGAs. Interconnect switching as well as configuration storage are achieved by the same resistive switching devices. The solution is stable without read disturb and false programming, and brings an area saving of more than two, compared to the current SRAM based circuits. It is a promising breakthrough for including permanent retention mechanisms in embedded systems at low cost.
- Published
- 2013
- Full Text
- View/download PDF
38. 3D stacking for multi-core architectures: From WIDEIO to distributed caches
- Author
-
D. Dutoit, Ivan Miro-Panades, P. Vivet, Fabien Clermidy, and Eric Guthmuller
- Subjects
Multi-core processor ,Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Registered memory ,Uniform memory access ,Semiconductor memory ,Memory controller ,Distributed cache ,CAS latency ,Computer architecture ,Embedded system ,Interleaved memory ,Static random-access memory ,business ,Auxiliary memory ,Dram - Abstract
3D stacking has been viewed as a breakthrough solution for increasing performance in multi-core architectures. The hope is to solve some of the main issues in current multi-core architectures: external memory pressure and latency; I/O bottleneck; communication power consumption. In this paper, some advances of this field of research are shown, starting with a WIDEIO experience on a real chip for solving DRAM accesses issue. The integration of a 512 bit-width bus is demonstrated in a Network-on-Chip (NoC) multi-core framework and the resulting performance based on a 65nm prototype with 10μm diameter Through Silicon Vias (TSV). The potentiality of 3D scaling thanks to 3D asynchronous Network-on-Chip implementation is then shown. Finally, an innovative 3D stacked distributed cache strategy aimed at lowering memory latency and external memory bandwidth requirements is presented. This new memory partitioning demonstrates the efficiency of 3D stacking to rethink architectures for addressing multi-core scaling challenges.
- Published
- 2013
- Full Text
- View/download PDF
39. A hybrid CBRAM/CMOS Look-Up-Table structure for improving performance efficiency of Field-Programmable-Gate-Array
- Author
-
Elisa Vianello, Marina Reyboz, Santhosh Onkaraiah, Fabien Clermidy, Ogun Turkyilmaz, Jean-Michel Portal, and Christophe Muller
- Subjects
Power gain ,Logic synthesis ,CMOS ,Computer science ,Programmable metallization cell ,Electronic engineering ,Static random-access memory ,Integrated circuit design ,Field-programmable gate array ,Electrical efficiency ,Programmable logic array - Abstract
At most advanced technology nodes, Field Programmable Gate Arrays (FPGA) present great advantages compared to more conventional processor architectures; their natural regularity, modularity and inherent reliability due to duplicated identical tiles provide a solution to overcome new technologies with increasing variability. However, FPGA market is still limited by power efficiency issue, due to two coordinated factors like interconnection-dominated design and large usage of memories, computation being performed thanks to Look-Up-Table (LUT). In this paper, we propose a solution to improve the performance and reduce the power consumption of LUT in FPGA using CBRAM-based structures. Our proposed design shows significant improvement compared to the traditional SRAM-based FPGA in: critical delay is reduced by ~23% due to compact structure (1T-2R) and power gain by reduction in static power consumption by ~18%.
- Published
- 2013
- Full Text
- View/download PDF
40. Self-checking ripple-carry adder with Ambipolar Silicon NanoWire FET
- Author
-
Luca Amaru, Ogun Turkyilmaz, Fabien Clermidy, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli
- Subjects
Adder ,Engineering ,business.industry ,Ambipolar diffusion ,Bipolar junction transistor ,Transistor ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Fault detection and isolation ,law.invention ,CMOS ,Transmission gate ,law ,Logic gate ,Electronic engineering ,business - Abstract
For the rapid adoption of new and aggressive technologies such as ambipolar Silicon NanoWire (SiNW), addressing fault-tolerance is necessary. Traditionally, transient fault detection implies large hardware overhead or performance decrease compared to permanent fault detection. In this paper, we focus on on-line testing and its application to ambipolar SiNW. We demonstrate on self - checking ripple - carry adder how ambipolar design style can help reduce the hardware overhead. When compared with equivalent CMOS process, ambipolar SiNW design shows a reduction in area of at least 56% (28%) with a decreased delay of 62% (6%) for Static (Transmission Gate) design style.
- Published
- 2013
- Full Text
- View/download PDF
41. Network-on-chip traffic modeling for data flow applications
- Author
-
Romain Prolonge and Fabien Clermidy
- Subjects
Flexibility (engineering) ,Data flow diagram ,Network on a chip ,Computer science ,Distributed computing ,Locality ,Real-time computing ,Traffic model - Abstract
Network traffic modeling is largely used for investigating Network-on-Chip characteristics and performances. When traffic patterns, based on real applications, are accurate but long to obtain and limited in terms of flexibility, synthetic traffics are flexible but do not offer the required accuracy.In this paper, we present a new traffic model with more accuracy than synthetic patterns and more flexibility than those based on real applications. It introduces concepts used in data-flow applications such as data dependencies and tasks locality. In a 3GPP-LTE application test case, we demonstrated that our model is more accurate than synthetic traffics. Indeed, the difference between our model and the application can be less than 1%, while, the difference between synthetic traffics and the application is up to 54%. Moreover, our proposal can be 33% faster than realistic traffics.
- Published
- 2013
- Full Text
- View/download PDF
42. A dynamic stream link for efficient data flow control in NoC based heterogeneous MPSoC
- Author
-
D. Fuin, C. Pilkington, Pierre Paulin, Fabien Clermidy, S. Basset, Pascal Vivet, M. Langevin, C. Helmstetter, and R. Lemaire
- Subjects
Data flow diagram ,Flexibility (engineering) ,Network on a chip ,Forcing (recursion theory) ,Exploit ,Dataflow ,Computer science ,business.industry ,Embedded system ,MPSoC ,Communications protocol ,business - Abstract
As Systems-on-Chip size increase, the communication costs become critical and Networks-on-Chip (NoC) bring innovative solutions. Efficient stream-based protocols over NoC have been widely studied to address dataflow communications. They are usually controlled by a set of static parameters. However, new applications, such as high-resolution video decoders, present more data-dependent behaviors forcing communication protocols to support higher dynamicity. For this purpose, we present in this paper dynamic stream links for stream-based end-to-end NoC communications by introducing two link protocols, both independent of the transfer size, allowing to improve the hardware/software control flexibility. The proposed protocols have been modeled in a MPSoC virtual platform and the hardware cost evaluated. Based on simulations, we provide guidelines to exploit these protocols according to application needs.
- Published
- 2013
- Full Text
- View/download PDF
43. Design challenges for nano-scale devices
- Author
-
Marc Belleville, Alexandre Valentian, Olivier Thomas, and Fabien Clermidy
- Subjects
Engineering ,Operating point ,Nanoelectronics ,business.industry ,Electronic engineering ,Integrated circuit design ,business ,Track (rail transport) ,Nanoscopic scale ,Energy (signal processing) - Abstract
This paper presents an overview of the design challenges and solutions under development for Nano-scale technologies. Major applications requirements and nano-technologies design limitations are introduced. Adaptive techniques aiming to cope with variations and to track an optimal energy operating point are presented.
- Published
- 2012
- Full Text
- View/download PDF
44. Platform 2012, a many-core computing accelerator for embedded SoCs
- Author
-
Germain Haugou, Bruno Jego, Fabien Clermidy, Thierry Lepley, Diego Melpignano, Eric Flamand, Denis Dutoit, Luca Benini, Melpignano D., Benini L., Flamand E., Jego B., Lepley T., Haugou G., Clermidy F., and Dutoit D.
- Subjects
Power management ,Visual analytics ,Computer science ,business.industry ,CMOS ,power consumption ,P2012 ,Synchronization ,Asynchronous communication ,Embedded system ,Programming paradigm ,System on a chip ,power management ,business - Abstract
P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multibanked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-specific HW IPs to the cluster. The first P2012 SoC prototype in 28nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18mm2 with 2W power consumption (worst-case). P2012 can run standard OpenCLTM and proprietary Native Programming Model SW components to achieve the highest level of control on applicationto- resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs.
- Published
- 2012
- Full Text
- View/download PDF
45. Session details: NoCs next top model: from system-level to prototype
- Author
-
Fabien Clermidy
- Subjects
Computer architecture ,Computer science ,System level ,Session (computer science) - Published
- 2012
- Full Text
- View/download PDF
46. Introduction
- Author
-
Pierre-Emmanuel Gaillardon, Ian O’Connor, and Fabien Clermidy
- Published
- 2012
- Full Text
- View/download PDF
47. Innovative Structures for Routing and Configuration
- Author
-
Ian O'Connor, Fabien Clermidy, and Pierre-Emmanuel Gaillardon
- Subjects
Transport engineering ,Link-state routing protocol ,Computer architecture ,Emerging technologies ,Computer science ,Context (language use) ,Routing (electronic design automation) ,Field-programmable gate array ,Bottleneck ,Hierarchical routing ,Resistive random-access memory - Abstract
The goal of this chapter is to illustrate how emerging technologies can help to improve performance metrics of conventional Field-Programmable Gate Arrays structures. It is widely recognized that in traditional FPGAs, both the memory and the routing circuitry (with 43% of area for each contribution) represent the principal bottleneck to scaling and performance increase. In this context, we investigated 3D integration techniques for passive and active devices. The technologies surveyed will be a resistive memory technology, monolithic 3D integration and a vertical 1D transistor technology.
- Published
- 2012
- Full Text
- View/download PDF
48. Towards Autonomous Scalable Integrated Systems
- Author
-
Olivier Brousse, Marc Belleville, Nadine Azemard, Gilles Sassatelli, Bettina Rebaud, Diego Puschini, Philippe Maurine, Pascal Benoit, Fabien Clermidy, Gabriel Marchesan Almeida, Michel Robert, Lionel Torres, Conception et Test de Systèmes MICroélectroniques (SysMIC), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Commissariat à l'énergie atomique et aux énergies alternatives - Laboratoire d'Electronique et de Technologie de l'Information (CEA-LETI), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), and Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Distribute Genetic Algorithm ,Ambient intelligence ,Task Migration ,business.industry ,Computer science ,Distributed computing ,020208 electrical & electronic engineering ,Cloud computing ,Globally Asynchronous Locally Synchronous ,02 engineering and technology ,MPSoC ,020202 computer hardware & architecture ,System model ,Variability Compensation ,Paradigm shift ,Complexity management ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,MPSoC Design ,[SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics ,business ,Block (data storage) - Abstract
International audience; The evolution of silicon integration capabilities has, for several decades, been pushing the limits of complexity management. In the nanotechnology era, billions of transistors can be assembled to form high-performance systems with varied functionalities. Today we could say that we connect processors for MPSoC design, just as we once assembled transistors for SoC. The building block has evolved and a concomitant paradigm shift is taking place. Indeed, the intrinsic capacity of a processor goes far beyond simply acting as a switch. A processor is indeed capable of controlling a system and performing complex calculations. The assembly of hundreds of elements offers new prospects, such as ambient intelligence or chip cloud computing. In this chapter, we analyze scalability in terms of functionality, technology and structure, and propose a general autonomous integrated system model. We then develop our approach to encompass distributed MPSoC systems. Self-adaptability, at the core of the evolution towards autonomous computing, is illustrated through several contributions aimed at compensating for technology, applicative and environmental variability phenomena. Finally, a macroscopic example of a multi-agent bio-inspired approach shows what we believe to be the future of integrated systems.
- Published
- 2012
- Full Text
- View/download PDF
49. Disruptive Logic Blocks
- Author
-
Ian O'Connor, Fabien Clermidy, and Pierre-Emmanuel Gaillardon
- Subjects
Computer science ,Logic block ,Logic gate ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Electronics ,Crossbar switch ,Field-programmable gate array ,Dynamic logic (digital electronics) ,Hardware_LOGICDESIGN ,Design for manufacturability ,Carbon nanotube field-effect transistor - Abstract
In this chapter, emerging technologies will be used to create disruptive elements for Field Programmable Gate Arrays. We focus mainly on the combinational function blocks, in order to improve the computing performance of future reconfigurable systems. We propose to study the use of an ambipolar carbon electronics process and two different silicon nanowire crossbar processes. Carbon electronics, and especially the Carbon Nanotube Field Effect Transistor, exhibits the property of ambipolarity, which means that n- and p-type behaviors are achievable within the same device. It thus becomes possible to obtain a device with tunable polarity, thanks to the addition of a second (polarity) gate to the device. This novel programmability of CNFETs is leveraged in a compact in-field reconfigurable logic gate and in a new approach to designing compact dynamic logic gates. We then propose the use of a sublithographic silicon nanowire crossbar process. It is worth noticing that using the crossbar organization helps to compact the dimensions (up to 6×) required by the logic circuits. Nevertheless, a technological process build around a sublithographic arrangement of nanowires is highly unreliable, and its feasibility remains uncertain when considering all the access contacts. In order to correct the lack of manufacturability of the sublithographic crossbar process, we propose a variant on this crossbar process. This is realized on a modified Fully Depleted Silicon-On-Insulator process, and enables the construction of circuits in a crossbar scheme with lithographic dimensions.
- Published
- 2012
- Full Text
- View/download PDF
50. Disruptive Architectural Proposals and Performance Analysis
- Author
-
Ian O'Connor, Fabien Clermidy, and Pierre-Emmanuel Gaillardon
- Subjects
Computer architecture ,Computer science ,Multitier architecture ,Topology (electrical circuits) ,Benchmarking ,Routing (electronic design automation) ,Architecture ,Layer (object-oriented design) ,Field-programmable gate array ,Network topology - Abstract
In this chapter, we explore disruptive architecture proposals. In the previous chapter, we showed that it is possible to obtain very compact reconfigurable in-field computation cells. Since these cells require architectural modifications, we proposed an architecture for this compact logic, characterized by the association of a logic layer, to adapt the granularity and the use of fixed interconnection topologies to reduce the routing impact. To compare this approach with conventional FPGAs in an objective way, it was necessary to develop a specific toolflow suited to our requirements, able to describe the designed architecture. Based on the VTR toolflow, the tool integrates fixed topology routing and the specific organization of the layered architecture. Benchmarking simulations were performed. In a first approach, a local exploration of the proposed layer is done, in order to study the impact of the fixed interconnect topologies. We showed that the Modified Omega topology gives the best mapping rates on the structure with about 90% of mapping success for 6-node graphs. In a second approach, complete architectural benchmarking was conducted and we showed that the proposed architecture leads to an improvement, in area saving, of 46% in average, with respect to CMOS. We also discovered that the routing delay is less distributed and tends to be more controllable than in the traditional approach.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.