287 results on '"Poncino, M."'
Search Results
252. Analysis of energy dissipation in the memory hierarchy of embedded systems: a case study.
- Author
-
Benini, L., Macii, A., Macii, E., and Poncino, M.
- Published
- 2000
- Full Text
- View/download PDF
253. FPGA synthesis using Look-Up Table and Multiplexor Based architectures.
- Author
-
Macii, E. and Poncino, M.
- Published
- 1994
- Full Text
- View/download PDF
254. Computation of exact random pattern detection probability.
- Author
-
Farhat, H., Lioy, A., and Poncino, M.
- Published
- 1993
- Full Text
- View/download PDF
255. On the resetability of synchronous sequential circuits.
- Author
-
Lioy, A. and Poncino, M.
- Published
- 1993
- Full Text
- View/download PDF
256. Connectivity and spectral analysis of finite state machines.
- Author
-
Macii, E. and Poncino, M.
- Published
- 1994
- Full Text
- View/download PDF
257. Synthesis of fully testable combinational circuits.
- Author
-
Evans, A.H., Macii, E., and Poncino, M.
- Published
- 1994
- Full Text
- View/download PDF
258. Implicit evaluation of encoding rotations for large FSMs.
- Author
-
Poncino, M.
- Published
- 1996
- Full Text
- View/download PDF
259. Property verification of communication protocols based on probabilistic reachability analysis.
- Author
-
Baldi, M., Macii, E., and Poncino, M.
- Published
- 1996
- Full Text
- View/download PDF
260. Comparing different Boolean unification algorithms.
- Author
-
Macii, E., Odasso, G., and Poncino, M.
- Published
- 1998
- Full Text
- View/download PDF
261. F-gate: a device for glitch power minimization.
- Author
-
Benini, L., Macii, A., Macii, E., Poncino, M., and Scarsi, R.
- Published
- 1998
- Full Text
- View/download PDF
262. Reducing peak power consumption of combinational test sets.
- Author
-
Macii, A., Macii, E., and Poncino, M.
- Published
- 1998
- Full Text
- View/download PDF
263. A comparative study of complexity-based capacitance macro-models.
- Author
-
Macii, E., Poncino, M., and Scarsi, R.
- Published
- 1998
- Full Text
- View/download PDF
264. Hardware simulation: a flexible approach to verification and performance evaluation of communication protocols.
- Author
-
Baldi, M., Macii, E., and Poncino, M.
- Published
- 1995
- Full Text
- View/download PDF
265. The design of easily scalable bus arbiters with different dynamic priority assignment schemes.
- Author
-
Macii, E. and Poncino, M.
- Published
- 1995
- Full Text
- View/download PDF
266. Glitch power minimization by gate freezing.
- Author
-
Benini, L., De Micheli, G., Macii, A., Macii, E., Poncino, M., and Scarsi, R.
- Published
- 1999
- Full Text
- View/download PDF
267. A study of the resetability of synchronous sequential circuits
- Author
-
Lioy, A. and Poncino, M.
- Published
- 1993
- Full Text
- View/download PDF
268. RTL power estimation in an HDL-based design flow.
- Author
-
Bruno, M., Macii, A., and Poncino, M.
- Subjects
- *
ELECTRONIC circuit design , *ELECTRONICS , *SEMICONDUCTORS , *STATISTICS , *TECHNOLOGICAL innovations , *COMPUTER science - Abstract
Power estimation at the register-transfer level (RTL) is usually narrowed down to the problem of building accurate power models for the modules corresponding to RTL operators. It is shown that, when RTL power estimation is integrated into a realistic design flow based on an HDL description, other types of primitives need to be accurately modelled. In particular, a significant part of the RTL functionality is realised by sparse logic elements. The proposed estimation strategy replaces the low-effort synthesis that is typically used for this type of fine-grain primitives with an empirical power model based on parameters that can be extracted from either the internal representation of the design or from RTL simulation data. The model can be made scalable with respect to technology, and provides very good accuracy (13% on average, measured on a set of industrial benchmarks). Using a similar statistical paradigm, accurate (about 20% average error) models for the power consumption of internal wires are also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
269. Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks
- Author
-
Massimo Poncino, Daniele Jahier Pagliari, Andrea Bartolini, Alessio Burrello, Luca Benini, Enrico Macii, Burrello A., Pagliari D.J., Bartolini A., Benini L., Macii E., and Poncino M.
- Subjects
Network architecture ,IoT ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,Predictive maintenance ,Sequence analysis ,Temporal Convolutional Networks ,Machine learning ,computer.software_genre ,Convolutional neural network ,Article ,Constant false alarm rate ,Random forest ,Recurrent neural network ,Artificial intelligence ,business ,computer ,Sequence analysi - Abstract
In modern data centers, storage system failures are major contributors to downtimes and maintenance costs. Predicting these failures by collecting measurements from disks and analyzing them with machine learning techniques can effectively reduce their impact, enabling timely maintenance. While there is a vast literature on this subject, most approaches attempt to predict hard disk failures using either classic machine learning solutions, such as Random Forests (RFs) or deep Recurrent Neural Networks (RNNs). In this work, we address hard disk failure prediction using Temporal Convolutional Networks (TCNs), a novel type of deep neural network for time series analysis. Using a real-world dataset, we show that TCNs outperform both RFs and RNNs. Specifically, we can improve the Fault Detection Rate (FDR) of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx $$\end{document}≈7.5% (FDR = 89.1%) compared to the state-of-the-art, while simultaneously reducing the False Alarm Rate (FAR = 0.052%). Moreover, we explore the network architecture design space showing that TCNs are consistently superior to RNNs for a given model size and complexity and that even relatively small TCNs can reach satisfactory performance. All the codes to reproduce the results presented in this paper are available at https://github.com/ABurrello/tcn-hard-disk-failure-prediction.
- Published
- 2021
270. Adaptive Random Forests for Energy-Efficient Inference on Microcontrollers
- Author
-
Enrico Macii, Francesco Daghero, Massimo Poncino, Daniele Jahier Pagliari, Alessio Burrello, Andrea Calimera, Luca Benini, Chen Xie, Daghero F., Burrello A., Xie C., Benini L., Calimera A., MacIi E., Poncino M., and Pagliari D.J.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Decision tree ,Inference ,Energy consumption ,Machine Learning (cs.LG) ,Random forest ,Machine Learning ,Microcontroller ,Computer engineering ,Embedded System ,Embedded Systems ,Latency (engineering) ,Energy (signal processing) ,Efficient energy use - Abstract
Random Forests (RFs) are widely used Machine Learning models in low-power embedded devices, due to their hardware friendly operation and high accuracy on practically relevant tasks. The accuracy of a RF often increases with the number of internal weak learners (decision trees), but at the cost of a proportional increase in inference latency and energy consumption. Such costs can be mitigated considering that, in most applications, inputs are not all equally difficult to classify. Therefore, a large RF is often necessary only for (few) hard inputs, and wasteful for easier ones. In this work, we propose an early-stopping mechanism for RFs, which terminates the inference as soon as a high-enough classification confidence is reached, reducing the number of weak learners executed for easy inputs. The early-stopping confidence threshold can be controlled at runtime, in order to favor either energy saving or accuracy. We apply our method to three different embedded classification tasks, on a single-core RISC-V microcontroller, achieving an energy reduction from 38% to more than 90% with a drop of less than 0.5% in accuracy. We also show that our approach outperforms previous adaptive ML methods for RFs., Published in: 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC), 2021
- Published
- 2022
271. Robust and Energy-efficient PPG-based Heart-Rate Monitoring
- Author
-
Daniele Jahier Pagliari, Matteo Risso, Massimo Pontino, Luca Benini, Enrico Macii, Alessio Burrello, Simone Benatti, Risso M., Burrello A., Pagliari D.J., Benatti S., Macii E., Benini L., and Poncino M.
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Heart rate,Measurement,Medical services,Inference algorithms,Integrated circuit modeling,Monitoring,Motion artifacts ,Computer science ,Real-time computing ,Latency (audio) ,Inference ,Motion artifacts ,Machine Learning (cs.LG) ,Set (abstract data type) ,Microcontroller ,Heart rate Measurement ,Deep Learning ,Medical services ,FOS: Electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Inference algorithms ,Enhanced Data Rates for GSM Evolution ,Microcontrollers ,Electrical Engineering and Systems Science - Signal Processing ,Efficient energy use - Abstract
A wrist-worn PPG sensor coupled with a lightweight algorithm can run on a MCU to enable non-invasive and comfortable monitoring, but ensuring robust PPG-based heart-rate monitoring in the presence of motion artifacts is still an open challenge. Recent state-of-the-art algorithms combine PPG and inertial signals to mitigate the effect of motion artifacts. However, these approaches suffer from limited generality. Moreover, their deployment on MCU-based edge nodes has not been investigated. In this work, we tackle both the aforementioned problems by proposing the use of hardware-friendly Temporal Convolutional Networks (TCN) for PPG-based heart estimation. Starting from a single "seed" TCN, we leverage an automatic Neural Architecture Search (NAS) approach to derive a rich family of models. Among them, we obtain a TCN that outperforms the previous state-of-the- art on the largest PPG dataset available (PPGDalia), achieving a Mean Absolute Error (MAE) of just 3.84 Beats Per Minute (BPM). Furthermore, we tested also a set of smaller yet still accurate (MAE of 5.64 - 6.29 BPM) networks that can be deployed on a commercial MCU (STM32L4) which require as few as 5k parameters and reach a latency of 17.1 ms consuming just 0.21 mJ per inference.
- Published
- 2022
- Full Text
- View/download PDF
272. Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks
- Author
-
Luca Benini, Lorenzo Lamberti, Matteo Risso, Daniele Jahier Pagliari, Massimo Poncino, Alessio Burrello, Enrico Macii, Francesco Conti, Risso M., Burrello A., Pagliari D.J., Conti F., Lamberti L., MacIi E., Benini L., and Poncino M.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Network architecture ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,Neural Architecture Search ,Deep Learning ,Edge Computing ,Temporal Convolutional Networks ,Machine Learning (cs.LG) ,Convolution ,Set (abstract data type) ,Dilation (metric space) ,Feature (computer vision) ,Pruning (decision trees) ,Artificial intelligence ,business ,Algorithm - Abstract
Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks. One key feature of TCNs is time-dilated convolution, whose optimization requires extensive experimentation. We propose an automatic dilation optimizer, which tackles the problem as a weight pruning on the time-axis, and learns dilation factors together with weights, in a single training. Our method reduces the model size and inference latency on a real SoC hardware target by up to 7.4x and 3x, respectively with no accuracy drop compared to a network without dilation. It also yields a rich set of Pareto-optimal TCNs starting from a single model, outperforming hand-designed solutions in both size and accuracy.
- Published
- 2021
- Full Text
- View/download PDF
273. TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference
- Author
-
Marcello Zanghieri, Francesco Conti, Massimo Poncino, Enrico Macii, Alessio Burrello, Alberto Dequino, Daniele Jahier Pagliari, Luca Benini, Burrello A., Dequino A., Pagliari D.J., Conti F., Zanghieri M., MacIi E., Benini L., and Poncino M.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Deep-Learning ,Computer Science - Artificial Intelligence ,business.industry ,Computer science ,Deep learning ,Temporal Convolutional Network ,Edge-Computing ,Internet-of-Things ,Network topology ,Machine Learning (cs.LG) ,Computational science ,Microcontroller ,Internet-of-Thing ,Artificial Intelligence (cs.AI) ,Low-power electronics ,Benchmark (computing) ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,Latency (engineering) ,business ,Energy (signal processing) - Abstract
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to $103 \times $ lower latency and $20.3 \times $ lower energy than the Cube-AI toolkit executed on the STM32L4 and from $2.9 \times $ to $26.6 \times $ lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.
- Published
- 2021
- Full Text
- View/download PDF
274. Manufacturing as a Data-Driven Practice: Methodologies, Technologies, and Tools
- Author
-
Edoardo Patti, Andrea Calimera, Daniele Jahier Pagliari, Andrea Acquaviva, Tania Cerquitelli, Lorenzo Bottaccioli, Massimo Poncino, Cerquitelli T., Pagliari D.J., Calimera A., Bottaccioli L., Patti E., Acquaviva A., and Poncino M.
- Subjects
Process (engineering) ,Computer science ,Internet of Things ,Data modeling ,Tools ,Busine ,Industrie ,Protocol ,Orchestration (computing) ,Electrical and Electronic Engineering ,data analytics ,Data mining ,Data-centric architectures, data management, data analytics, Industry 4.0, Internet of Things, technologies ,technologies ,Data-centric architectures ,business.industry ,Service robot ,Information technology ,Business value ,Industry 4.0 ,Data science ,Software quality ,Internet of Things (IoT) ,Manufacturing ,data-centric architecture ,Software deployment ,Data analytic ,data management ,Software architecture ,business - Abstract
In recent years, the introduction and exploitation of innovative information technologies in industrial contexts have led to the continuous growth of digital shop floor environments. The new Industry 4.0 model allows smart factories to become very advanced IT industries, generating an ever-increasing amount of valuable data. As a consequence, the necessity of powerful and reliable software architectures is becoming prominent along with data-driven methodologies to extract useful and hidden knowledge supporting the decision-making process. This article discusses the latest software technologies needed to collect, manage, and elaborate all data generated through innovative Internet-of-Things (IoT) architectures deployed over the production line, with the aim of extracting useful knowledge for the orchestration of high-level control services that can generate added business value. This survey covers the entire data life cycle in manufacturing environments, discussing key functional and methodological aspects along with a rich and properly classified set of technologies and tools, useful to add intelligence to data-driven services. Therefore, it serves both as a first guided step toward the rich landscape of the literature for readers approaching this field and as a global yet detailed overview of the current state of the art in the Industry 4.0 domain for experts. As a case study, we discuss, in detail, the deployment of the proposed solutions for two research project demonstrators, showing their ability to mitigate manufacturing line interruptions and reduce the corresponding impacts and costs.
- Published
- 2021
275. Dual- assignment policies in ITD-aware synthesis
- Author
-
Calimera, A., Bahar, R.I., Macii, E., and Poncino, M.
- Subjects
- *
TEMPERATURE effect , *COMPLEMENTARY metal oxide semiconductors , *ELECTRIC potential , *ELECTRONIC circuit design , *MICROELECTRONICS , *LOGIC circuits - Abstract
Abstract: Traditionally, the effects of temperature on delay of CMOS devices have been evaluated using the highest operating temperature as a worst-case corner. This conservative approach was based on the fact that, in older technologies, CMOS devices systematically degraded their performance as temperature increases. With the progressive scaling of technology, however, there has been a continuous reduction of the gap between supply and threshold voltages of devices, mostly due to low-power constraints. The latter have accelerated this trend by using libraries containing multiple instances of a cell with different ranges of threshold voltages; in particular, the use of high- cells to control sub-threshold leakage currents has made this gap smaller and smaller. The consequence of this trend is the occurrence of the so-called inverted temperature dependence (ITD), under which cells get faster as temperature increases. This new thermal dependence has made the old worst-case design approach obsolete, posing new EDA challenges. Beside complicating timing analysis, in particular, ITD has important and unforeseeable consequences for power-aware design, especially in dual- logic synthesis. Due to a contrasting temperature dependence between low- cells (which enjoy the classical, direct temperature dependence) and high- cells (for which an inverted temperature dependence holds), a single-temperature worst-case design approach fails to generate netlists that are compliant with timing constraints for the entire temperature range. In this work, we first validate the relevance of ITD on an industrial 65nm CMOS multi- library. Then, we describe an ITD-aware, dual- assignment algorithm that guarantees temperature-insensitive operation of the circuits, together with a significant reduction of both leakage and total power consumption. The algorithm has been tested over standard benchmarks using three different replacement policies. Experimental results show an average leakage power savings of 50% w.r.t. circuits synthesized with a standard, commercial flow that does not take ITD into account and thus, to ensure that no temperature-induced timing faults occur, needs to resort to over-design (i.e., over-constraining the timing bound so as to make sure that temperature fluctuations never make the circuits violating the specified required time for all paths). [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
276. Implementation of a thermal management unit for canceling temperature-dependent clock skew variations
- Author
-
Chakraborty, A., Duraisami, K., Sathanur, A., Sithambaram, P., Macii, A., Macii, E., and Poncino, M.
- Subjects
- *
TEMPERATURE control of electronics , *INTEGRATED circuit interconnections , *LINE drivers (Integrated circuits) , *NANOSTRUCTURED materials , *TEMPERATURE lapse rate - Abstract
Thermal gradients across the die are becoming increasingly prominent as we scale further down into the sub-nanometer regime. While temperature was never a primary concern, its non-negligible impact on delay and reliability is getting significant attention lately. One of the principal factors affecting designs today is timing criticality, which, in today''s technologies is mostly determined by wire delays. Clocks, which are the backbone of the interconnect network, are extremely prone to temperature dependent delay variations and need to be designed with extreme care so as to meet accurate timing constraints. Their skew has to be minimized in order to guarantee functionality, albeit in the presence of these process variations. Temperature, on the other hand, is dynamic in nature and its effects hence need run-time monitoring and management. One of the most efficient ways to manage temperature dependent skew is through the use of buffers with dynamically tunable delays. The use of such buffers in the clock distribution network allows modulating the delay on selected branches of the clock network based on a thermal profile, so as to keep the skew within acceptable bounds. A runtime scheme obviously requires an on-line management unit. Our work predominantly focuses on the implementation of one such unit, while studying its impact on design parameters such as area, wire-length and power. Results show negligible a impact (0.67% in area, 0.62% in wire-length, 0.33% in power, and 0.37% in via-number) on the design. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
277. Energy-efficient adaptive machine learning on IoT end-nodes with class-dependent confidence
- Author
-
Francesco Daghero, Daniele Jahier Pagliari, Enrico Macii, Luca Benini, Alessio Burrello, Massimo Poncino, Daghero F., Burrello A., Pagliari D.J., Benini L., Macii E., and Poncino M.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Current (mathematics) ,Edge device ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Machine Learning (cs.LG) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Latency (engineering) ,Set (psychology) ,Computer Science::Databases ,010302 applied physics ,Class (computer programming) ,business.industry ,Work (physics) ,Energy consumption ,020202 computer hardware & architecture ,Artificial intelligence ,business ,Energy-efficient machine, IoT applications ,computer ,Efficient energy use - Abstract
Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for "easy" inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach., Comment: Published in: 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)
- Published
- 2020
278. A Comparison Analysis of BLE-Based Algorithms for Localization in Industrial Environments
- Author
-
Enrico Macii, Massimo Poncino, Marina Zafiri, Davide Cannizzaro, Edoardo Patti, Daniele Jahier Pagliari, Andrea Acquaviva, Cannizzaro D., Zafiri M., Pagliari D.J., Patti E., Macii E., Poncino M., and Acquaviva A.
- Subjects
Computer Networks and Communications ,computer.internet_protocol ,Computer science ,lcsh:TK7800-8360 ,trilateration ,fingerprint ,02 engineering and technology ,smart industry ,industry 4.0 ,bluetooth low energy ,indoor location ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Bluetooth Low Energy ,lcsh:Electronics ,020206 networking & telecommunications ,Beacon ,Hardware and Architecture ,Control and Systems Engineering ,Received signal strength indication ,Signal Processing ,020201 artificial intelligence & image processing ,computer ,Algorithm ,Trilateration - Abstract
Proximity beacons are small, low-power devices capable of transmitting information at a limited distance via Bluetooth low energy protocol. These beacons are typically used to broadcast small amounts of location-dependent data (e.g., advertisements) or to detect nearby objects. However, researchers have shown that beacons can also be used for indoor localization converting the received signal strength indication (RSSI) to distance information. In this work, we study the effectiveness of proximity beacons for accurately locating objects within a manufacturing plant by performing extensive experiments in a real industrial environment. To this purpose, we compare localization algorithms based either on trilateration or environment fingerprinting combined with a machine-learning based regressor (k-nearest neighbors, support-vector machines, or multi-layer perceptron). Each algorithm is analyzed in two different types of industrial environments. For each environment, various configurations are explored, where a configuration is characterized by the number of beacons per square meter and the density of fingerprint points. In addition, the fingerprinting approach is based on a preliminary site characterization, it may lead to location errors in the presence of environment variations (e.g., movements of large objects). For this reason, the robustness of fingerprinting algorithms against such variations is also assessed. Our results show that fingerprint solutions outperform trilateration, showing also a good resilience to environmental variations. Given the similar error obtained by all three fingerprint approaches, we conclude that k-NN is the preferable algorithm due to its simple deployment and low number of hyper-parameters.
- Published
- 2019
- Full Text
- View/download PDF
279. Fast Computation of Discharge Current Upper Bounds for Clustered Power Gating
- Author
-
Massimo Poncino, Ashoka Sathanur, E. Macii, Alberto Macii, Luca Benini, Sathanur A., Benini L., Macii A., Macii E., and Poncino M.
- Subjects
Engineering ,Power gating ,business.industry ,Computation ,Static timing analysis ,Upper and lower bounds ,Leakage power optimization ,Reduction (complexity) ,Hardware and Architecture ,Low-power electronics ,Power electronics ,low power design ,maximum current estimation ,Signal integrity ,Electrical and Electronic Engineering ,business ,Algorithm ,Software - Abstract
The capability of accurately estimating an upper bound of the maximum current drawn by a digital macroblock from the ground or power supply line constitutes a major asset of automatic power-gating flows. In fact, the maximum current information is essential to properly size the sleep transistor in such a way that speed degradation and signal integrity violations are avoided. Loose upper bounds can be determined with a reasonable computational cost, but they lead to oversized sleep transistors. On the other hand, exact computation of the maximum drawn current is an NP-hard problem, even when conservative simplifying assumptions are made on gate-level current profiles. In this paper, we present a scalable algorithm for tightening upper bound computation, with a controlled and tunable computational cost. The algorithm exploits state-of-the-art commercial timing analysis engines, and it is tightly integrated into an industrial power-gating flow for leakage power reduction. The results we have obtained on large circuits demonstrate the scalability and effectiveness of our estimation approach.
- Published
- 2011
280. Row-Based Power-Gating: A Novel Sleep Transistor Insertion Methodology for Leakage Power Optimization in Nanometer CMOS Circuits
- Author
-
Massimo Poncino, E. Macii, Alberto Macii, Ashoka Sathanur, Luca Benini, Sathanur A., Benini L., Macii A., Macii E., and Poncino M.
- Subjects
power optimization ,Engineering ,Power gating ,Hardware_PERFORMANCEANDRELIABILITY ,law.invention ,Hardware_GENERAL ,law ,Low-power electronics ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Electrical and Electronic Engineering ,Leakage (electronics) ,Electronic circuit ,business.industry ,Transistor ,Electrical engineering ,Power optimization ,Leakage power ,CMOS ,Nanoelectronics ,Hardware and Architecture ,power gating ,logic synthesi ,low-power design ,business ,Software ,Hardware_LOGICDESIGN - Abstract
Leakage power has become a serious concern in nanometer CMOS technologies, and power-gating has shown to offer a viable solution to the problem with a small penalty in performance. This paper focuses on leakage power reduction through automatic insertion of sleep transistors for power-gating. In particular, we propose a novel, layout-aware methodology that facilitates sleep transistor insertion and virtual-ground routing on row-based layouts. We also introduce a clustering algorithm that is able to handle simultaneously timing and area constraints, and we extend it to the case of multi- Vt sleep transistors to increase leakage savings. The results we have obtained on a set of benchmark circuits show that the leakage savings we can achieve are, by far, superior to those obtained using existing power-gating solutions and with much tighter timing and area constraints.
- Published
- 2011
281. Design of a Flexible Reactivation Cell for Safe Power-Mode Transition in Power-Gated Circuits
- Author
-
Massimo Poncino, Luca Benini, Alberto Macii, E. Macii, Andrea Calimera, Calimera A., Benini L., Macii A., Macii E., and Poncino M.
- Subjects
Engineering ,business.industry ,Transistor ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Integrated circuit design ,law.invention ,Dynamic voltage scaling ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Ground bounce ,Electrical and Electronic Engineering ,Power MOSFET ,Power network design ,business ,Electronic circuit - Abstract
Power-gating is one of the most promising and widely adopted solutions for controlling sub-threshold leakage power in nanometer circuits. Although single-cycle power-mode transition reduces wake-up latency, it develops large discharge current spikes, thereby causing IR-drop and inductive ground bounce for the neighboring circuit blocks, which can suffer from power plane integrity degradation. We propose a new reactivation solution that helps in controlling power supply fluctuations and achieving minimum reactivation times. Our structure limits the turn-on current below a given threshold through a sequential activation of the sleep transistors (STs), which are connected in parallel and sized using a novel optimal sizing algorithm. We also introduce a distributed physical implementation, which allows minimum layout disruption after ST insertion and minimizes routing congestion.
- Published
- 2009
282. Exploiting Temporal Discharge Current Information to Improve the Efficiency of Clustered Power-Gating
- Author
-
Luca Benini, Massimo Poncino, A. Sathanur, Enrico Macii, Alberto Macii, Sathanur A., Benini L., Macii A., Macii E., and Poncino M.
- Subjects
Engineering ,Power gating ,business.industry ,Transistor ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,law.invention ,CMOS ,law ,Low-power electronics ,Power electronics ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Electrical and Electronic Engineering ,business ,Dimensioning ,Hardware_LOGICDESIGN ,Electronic circuit ,Leakage (electronics) - Abstract
The use of sleep transistors as power-gating devices to cut-off sub-threshold leakage stand-by currents has become a very popular solution to tackle the rise of leakage consumption in nanometer CMOS circuits. Clustered power-gating is now the de-fact standard for application of this leakage saving technique in industry. Cell clustering, sleep transistor sizing and peak current estimation are among the key steps of state-of-the-art clustered power-gating methodologies. In this work, we propose to exploit the information on the temporal variations of the discharge currents of the gates in a circuit to improve the quality of the solutions generated by an existing cell clustering algorithm. This translates to power-gated circuits with lower leakage consumption compared to implementations based clusters formed assuming a time-invariant, worst-case behavior of the currents drawn by the cells. The achieved leakage savings can be as high as 17%.
- Published
- 2009
283. Reducing the Energy Consumption of sEMG-Based Gesture Recognition at the Edge Using Transformers and Dynamic Inference.
- Author
-
Xie C, Burrello A, Daghero F, Benini L, Calimera A, Macii E, Poncino M, and Jahier Pagliari D
- Subjects
- Humans, Physical Phenomena, Databases, Factual, Fatigue, Gestures, Electric Power Supplies
- Abstract
Hand gesture recognition applications based on surface electromiographic (sEMG) signals can benefit from on-device execution to achieve faster and more predictable response times and higher energy efficiency. However, deploying state-of-the-art deep learning (DL) models for this task on memory-constrained and battery-operated edge devices, such as wearables, requires a careful optimization process, both at design time, with an appropriate tuning of the DL models' architectures, and at execution time, where the execution of large and computationally complex models should be avoided unless strictly needed. In this work, we pursue both optimization targets, proposing a novel gesture recognition system that improves upon the state-of-the-art models both in terms of accuracy and efficiency. At the level of DL model architecture, we apply for the first time tiny transformer models (which we call bioformers ) to sEMG-based gesture recognition. Through an extensive architecture exploration, we show that our most accurate bioformer achieves a higher classification accuracy on the popular Non-Invasive Adaptive hand Prosthetics Database 6 (Ninapro DB6) dataset compared to the state-of-the-art convolutional neural network (CNN) TEMPONet (+3.1%). When deployed on the RISC-V-based low-power system-on-chip (SoC) GAP8, bioformers that outperform TEMPONet in accuracy consume 7.8×-44.5× less energy per inference. At runtime, we propose a three-level dynamic inference approach that combines a shallow classifier, i.e., a random forest (RF) implementing a simple "rest detector" with two bioformers of different accuracy and complexity, which are sequentially applied to each new input, stopping the classification early for "easy" data. With this mechanism, we obtain a flexible inference system, capable of working in many different operating points in terms of accuracy and average energy consumption. On GAP8, we obtain a further 1.03×-1.35× energy reduction compared to static bioformers at iso-accuracy.
- Published
- 2023
- Full Text
- View/download PDF
284. Q-PPG: Energy-Efficient PPG-Based Heart Rate Monitoring on Wearable Devices.
- Author
-
Burrello A, Pagliari DJ, Risso M, Benatti S, Macii E, Benini L, and Poncino M
- Subjects
- Algorithms, Artifacts, Heart Rate physiology, Signal Processing, Computer-Assisted, Photoplethysmography, Wearable Electronic Devices
- Abstract
Hearth Rate (HR) monitoring is increasingly performed in wrist-worn devices using low-cost photoplethysmography (PPG) sensors. However, Motion Artifacts (MAs) caused by movements of the subject's arm affect the performance of PPG-based HR tracking. This is typically addressed coupling the PPG signal with acceleration measurements from an inertial sensor. Unfortunately, most standard approaches of this kind rely on hand-tuned parameters, which impair their generalization capabilities and their applicability to real data in the field. In contrast, methods based on deep learning, despite their better generalization, are considered to be too complex to deploy on wearable devices. In this work, we tackle these limitations, proposing a design space exploration methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring, all derived from a single "seed" model. Our flow involves a cascade of two Neural Architecture Search (NAS) tools and a hardware-friendly quantizer, whose combination yields both highly accurate and extremely lightweight models. When tested on the PPG-Dalia dataset, our most accurate model sets a new state-of-the-art in Mean Absolute Error. Furthermore, we deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution. Our most accurate quantized network achieves 4.41 Beats Per Minute (BPM) of Mean Absolute Error (MAE), with an energy consumption of 47.65 mJ and a memory footprint of 412 kB. At the same time, the smallest network that obtains a MAE 8 BPM, among those generated by our flow, has a memory footprint of 1.9 kB and consumes just 1.79 mJ per inference.
- Published
- 2021
- Full Text
- View/download PDF
285. On the impact of smart sensor approximations on the accuracy of machine learning tasks.
- Author
-
Jahier Pagliari D and Poncino M
- Abstract
Smart sensors present in ubiquitous Internet of Things (IoT) devices often obtain high energy efficiency by carefully tuning how the sensing, the analog to digital (A/D) conversion and the digital serial transmission are implemented. Such tuning involves approximations , i.e. alterations of the sensed signals that can positively affect energy consumption in various ways. However, for many IoT applications, approximations may have an impact on the quality of the produced output, for example on the classification accuracy of a Machine Learning (ML) model. While the impact of approximations on ML algorithms is widely studied, previous works have focused mostly on processing approximations. In this work, in contrast, we analyze how the signal alterations imposed by smart sensors impact the accuracy of ML classifiers. We focus in particular on data alterations introduced in the serial transmission from a smart sensor to a processor, although our considerations can also be extended to other sources of approximation, such as A/D conversion. Results on several types of models and on two different datasets show that ML algorithms are quite resilient to the alterations produced by smart sensors, and that the serial transmission energy can be reduced by up to 70% without a significant impact on classification accuracy. Moreover, we also show that, contrarily to expectations, the two generic approximation families identified in our work yield similar accuracy losses., Competing Interests: The authors declare no conflict of interest., (© 2020 The Authors.)
- Published
- 2020
- Full Text
- View/download PDF
286. LAPSE: Low-Overhead Adaptive Power Saving and Contrast Enhancement for OLEDs.
- Author
-
Pagliari DJ, Macii E, and Poncino M
- Abstract
Organic Light Emitting Diode (OLED) display panels are becoming increasingly popular especially in mobile devices; one of the key characteristics of these panels is that their power consumption strongly depends on the displayed image. In this paper we propose LAPSE, a new methodology to concurrently reduce the energy consumed by an OLED display and enhance the contrast of the displayed image, that relies on image-specific pixel-by-pixel transformations. Unlike previous approaches, LAPSE focuses specifically on reducing the overheads required to implement the transformation at runtime. To this end, we propose a transformation that can be executed in real time, either in software, with low time overhead, or in a hardware accelerator with a small area and low energy budget. Despite the significant reduction in complexity, we obtain comparable results to those achieved with more complex approaches in terms of power saving and image quality. Moreover, our method allows to easily explore the full quality-versus-power tradeoff by acting on a few basic parameters; thus, it enables the runtime selection among multiple display quality settings, according to the status of the system.
- Published
- 2018
- Full Text
- View/download PDF
287. The Human Brain Project and neuromorphic computing.
- Author
-
Calimera A, Macii E, and Poncino M
- Subjects
- Humans, Brain physiology, Computer Simulation, Neural Networks, Computer
- Abstract
Understanding how the brain manages billions of processing units connected via kilometers of fibers and trillions of synapses, while consuming a few tens of Watts could provide the key to a completely new category of hardware (neuromorphic computing systems). In order to achieve this, a paradigm shift for computing as a whole is needed, which will see it moving away from current "bit precise" computing models and towards new techniques that exploit the stochastic behavior of simple, reliable, very fast, lowpower computing devices embedded in intensely recursive architectures. In this paper we summarize how these objectives will be pursued in the Human Brain Project.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.