11 results on '"Eggimann, Manuel"'
Search Results
2. Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine
- Author
-
Prasad, Arpan Suravi, Scherer, Moritz, Conti, Francesco, Rossi, Davide, Di Mauro, Alfio, Eggimann, Manuel, Gómez, Jorge Tómas, Li, Ziyun, Sarwar, Syed Shakib, Wang, Zhao, De Salvo, Barbara, and Benini, Luca
- Subjects
Computer Science - Hardware Architecture - Abstract
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs depends strongly on the energy of non-volatile memory (NVM) access for network weights. This work introduces Siracusa, a near-sensor heterogeneous SoC for next-generation XR devices manufactured in 16 nm CMOS. Siracusa couples an octa-core cluster of RISC-V digital signal processing cores with a novel tightly-coupled "At-Memory" integration between a state-of-the-art digital neural engine called N-EUREKA and an on-chip NVM based on magnetoresistive memory(MRAM), achieving 1.7x higher throughput and 3x better energy efficiency than XR SoCs using NVM as background memory. The fabricated SoC prototype achieves an area efficiency of 65.2 GOp/s/mm2 and a peak energy efficiency of 8.84 TOp/J for DNN inference while supporting complex heterogeneous application workloads, which combine ML with conventional signal processing and control., Comment: Final accepted manuscript pre-print submitted to the IEEE Journal of Solid-State Circuits
- Published
- 2023
3. PELS: A Lightweight and Flexible Peripheral Event Linking System for Ultra-Low Power IoT Processors
- Author
-
Ottaviano, Alessandro, Balas, Robert, Sauter, Philippe, Eggimann, Manuel, and Benini, Luca
- Subjects
Computer Science - Hardware Architecture - Abstract
A key challenge for ultra-low-power (ULP) devices is handling peripheral linking, where the main central processing unit (CPU) periodically mediates the interaction among multiple peripherals following wake-up events. Current solutions address this problem by either integrating event interconnects that route single-wire event lines among peripherals or by general-purpose I/O processors, with a strong trade-off between the latency, efficiency of the former, and the flexibility of the latter. In this paper, we present an open-source, peripheral-agnostic, lightweight, and flexible Peripheral Event Linking System (PELS) that combines dedicated event routing with a tiny I/O processor. With the proposed approach, the power consumption of a linking event is reduced by 2.5 times compared to a baseline relying on the main core for the event-linking process, at a low area of just 7 kGE in its minimal configuration, when integrated into a ULP RISC-V IoT processor., Comment: 6 pages, accepted at DATE24 conference, camera-ready version
- Published
- 2023
4. Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing
- Author
-
Conti, Francesco, Paulin, Gianna, Rossi, Davide, Di Mauro, Alfio, Rutishauser, Georg, Ottavi, Gianmarco, Eggimann, Manuel, Okuhara, Hayate, and Benini, Luca
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Emerging Artificial Intelligence-enabled Internet-of-Things (AI-IoT) System-on-a-Chip (SoC) for augmented reality, personalized healthcare, and nano-robotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized Deep Neural Network (DNN) inference, as well as signal processing and control requiring high-precision floating-point. We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages. Marsellus achieves up to 180 Gop/s or 3.32 Top/s/W on 2-bit precision arithmetic in software, and up to 637 Gop/s or 12.4 Top/s/W on hardware-accelerated DNN layers., Comment: Post-print accepted by IEEE Journal of Solid-State Circuits
- Published
- 2023
5. Non-invasive urinary bladder volume estimation with artefact-suppressed bio-impedance measurements
- Author
-
Dheman, Kanika, Walser, Stefan, Mayer, Philipp, Eggimann, Manuel, Kozomara, Marko, Franke, Denise, Hermanns, Thomas, Sax, Hugo, Schürle, Simone, and Magno, Michele
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Urine output is a vital parameter to gauge kidney health. Current monitoring methods include manually written records, invasive urinary catheterization or ultrasound measurements performed by highly skilled personnel. Catheterization bears high risks of infection while intermittent ultrasound measures and manual recording are time consuming and might miss early signs of kidney malfunction. Bioimpedance (BI) measurements may serve as a non-invasive alternative for measuring urine volume in vivo. However, limited robustness have prevented its clinical translation. Here, a deep learning-based algorithm is presented that processes the local BI of the lower abdomen and suppresses artefacts to measure the bladder volume quantitatively, non-invasively and without the continuous need for additional personnel. A tetrapolar BI wearable system called ANUVIS was used to collect continuous bladder volume data from three healthy subjects to demonstrate feasibility of operation, while clinical gold standards of urodynamic (n=6) and uroflowmetry tests (n=8) provided the ground truth. Optimized location for electrode placement and a model for the change in BI with changing bladder volume is deduced. The average error for full bladder volume estimation and for residual volume estimation was -29 +/-87.6 ml, thus, comparable to commercial portable ultrasound devices (Bland Altman analysis showed a bias of -5.2 ml with LoA between 119.7 ml to -130.1 ml), while providing the additional benefit of hands-free, non-invasive, and continuous bladder volume estimation. The combination of the wearable BI sensor node and the presented algorithm provides an attractive alternative to current standard of care with potential benefits in providing insights into kidney function.
- Published
- 2023
6. Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode
- Author
-
Rossi, Davide, Conti, Francesco, Eggimann, Manuel, Di Mauro, Alfio, Tagliavini, Giuseppe, Mach, Stefan, Guermandi, Marco, Pullini, Antonio, Loi, Igor, Chen, Jie, Flamand, Eric, and Benini, Luca
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence - Abstract
The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 $\mathrm{\mu}$W fully retentive cognitive sleep mode up to 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-cores cluster supporting multi-precision SIMD integer and floating-point computation. Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3TOPS/W for 8-bit DNN inference with hardware acceleration). On floating-point (FP) compuation, it achieves SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine-learning (ML) accelerators boost energy efficiency in cognitive sleep and active states, respectively., Comment: 13 pages, 11 figures, 8 tables, journal paper
- Published
- 2021
- Full Text
- View/download PDF
7. A 5 \mu W Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing
- Author
-
Eggimann, Manuel, Rahimi, Abbas, and Benini, Luca
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors. It recently gained attention for embedded smart sensing due to its inherent error-resiliency and suitability to highly parallel hardware implementations. In this work, we propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator for always-on classification in energy-constrained sensor nodes. By using energy-efficient standard cell memory (SCM), the design is easily cross-technology mappable. It achieves extremely low power, 5 $\mu W$ in typical applications, and an energy-efficiency improvement over the state-of-the-art (SoA) digital architectures of up to 3$\times$ in post-layout simulations for always-on wearable tasks such as EMG gesture recognition. As part of the accelerator's architecture, we introduce novel hardware-friendly embodiments of common HDC-algorithmic primitives, which results in 3.3$\times$ technology scaled area reduction over the SoA, achieving the same accuracy levels in all examined targets. The proposed architecture also has a fully configurable datapath using microcode optimized for HDC stored on an integrated SCM based configuration memory, making the design "general-purpose" in terms of HDC algorithm flexibility. This flexibility allows usage of the accelerator across novel HDC tasks, for instance, a newly designed HDC applied to the task of ball bearing fault detection.
- Published
- 2021
8. TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars
- Author
-
Scherer, Moritz, Magno, Michele, Erb, Jonas, Mayer, Philipp, Eggimann, Manuel, and Benini, Luca
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning - Abstract
This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices using low power short-range RADAR sensors. A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction. The final algorithm has a model size of only 46 thousand parameters, yielding a memory footprint of only 92 KB. Two datasets containing 11 challenging hand gestures performed by 26 different people have been recorded containing a total of 20,210 gesture instances. On the 11 hand gesture dataset, accuracies of 86.6% (26 users) and 92.4% (single user) have been achieved, which are comparable to the state-of-the-art, which achieves 87% (10 users) and 94% (single user), while using a TCN-based network that is 7500x smaller than the state-of-the-art. Furthermore, the gesture recognition classifier has been implemented on a Parallel Ultra-Low Power Processor, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 100 mW is achieved. We provide open-source access to all the code and data collected and used in this work on tinyradar.ethz.ch.
- Published
- 2020
- Full Text
- View/download PDF
9. InfiniWolf: Energy Efficient Smart Bracelet for Edge Computing with Dual Source Energy Harvesting
- Author
-
Magno, Michele, Wang, Xiaying, Eggimann, Manuel, Cavigelli, Lukas, and Benini, Luca
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
This work presents InfiniWolf, a novel multi-sensor smartwatch that can achieve self-sustainability exploiting thermal and solar energy harvesting, performing computationally high demanding tasks. The smartwatch embeds both a System-on-Chip (SoC) with an ARM Cortex-M processor and Bluetooth Low Energy (BLE) and Mr. Wolf, an open-hardware RISC-V based parallel ultra-low-power processor that boosts the processing capabilities on board by more than one order of magnitude, while also increasing energy efficiency. We demonstrate its functionality based on a sample application scenario performing stress detection with multi-layer artificial neural networks on a wearable multi-sensor bracelet. Experimental results show the benefits in terms of energy efficiency and latency of Mr. Wolf over an ARM Cortex-M4F micro-controllers and the possibility, under specific assumptions, to be self-sustainable using thermal and solar energy harvesting while performing up to 24 stress classifications per minute in indoor conditions.
- Published
- 2020
- Full Text
- View/download PDF
10. HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data
- Author
-
Wang, Xiaying, Cavigelli, Lukas, Eggimann, Manuel, Magno, Michele, and Benini, Luca
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0.5m/px. Segmenting SAR data still requires skilled personnel, limiting the potential for large-scale use. We show that it is possible to automatically and reliably perform urban scene segmentation from next-gen resolution SAR data (0.15m/px) using deep neural networks (DNNs), achieving a pixel accuracy of 95.19% and a mean IoU of 74.67% with data collected over a region of merely 2.2km${}^2$. The presented DNN is not only effective, but is very small with only 63k parameters and computationally simple enough to achieve a throughput of around 500Mpx/s using a single GPU. We further identify that additional SAR receive antennas and data from multiple flights massively improve the segmentation accuracy. We describe a procedure for generating a high-quality segmentation ground truth from multiple inaccurate building and road annotations, which has been crucial to achieving these segmentation results.
- Published
- 2019
- Full Text
- View/download PDF
11. Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
- Author
-
Eggimann, Manuel, Gloor, Christelle, Scheidegger, Florian, Cavigelli, Lukas, Schaffner, Michael, Smolic, Aljosa, and Benini, Luca
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Hardware Architecture ,Computer Science - Graphics ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF) that delivers promising quality and performance in the domains of HDR tone mapping, disparity and optical flow estimation. We present an efficient hardware accelerator that implements a tiled variant of the PF with low on-chip memory requirements and a significantly reduced external memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded GPUs when scaled to the same technology node). The low area and bandwidth requirements make the accelerator highly suitable for integration into SoCs where silicon area budget is constrained and external memory is typically a heavily contended resource.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.