Author: "Benini P" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Benini P"' showing total 1,099 results

Start Over Author "Benini P" Search Limiters Full Text

1,099 results on '"Benini P"'

1. IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Author: Guo, Hang, Li, Yawei, Dai, Tao, Xia, Shu-Tao, and Benini, Luca
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Fine-tuning large-scale text-to-image diffusion models for various downstream tasks has yielded impressive results. However, the heavy computational burdens of tuning large models prevent personal customization. Recent advances have attempted to employ parameter-efficient fine-tuning (PEFT) techniques to adapt the floating-point (FP) or quantized pre-trained weights. Nonetheless, the adaptation parameters in existing works are still restricted to FP arithmetic, hindering hardware-friendly acceleration. In this work, we propose IntLoRA, to further push the efficiency limits by using integer type (INT) low-rank parameters to adapt the quantized diffusion models. By working in the integer arithmetic, our IntLoRA offers three key advantages: (i) for fine-tuning, the pre-trained weights are quantized, reducing memory usage; (ii) for storage, both pre-trained and low-rank weights are in INT which consumes less disk space; (iii) for inference, IntLoRA weights can be naturally merged into quantized pre-trained weights through efficient integer multiplication or bit-shifting, eliminating additional post-training quantization. Extensive experiments demonstrate that IntLoRA can achieve performance on par with or even superior to the vanilla LoRA, accompanied by significant efficiency improvements. Code is available at \url{https://github.com/csguoh/IntLoRA}., Comment: Technical Report
Published: 2024

2. PEtra: A Flexible and Open-Source PE Loop Tracer for Polymer Thin-Film Transducers

Author: Wessner, Marc-Andre, Villani, Federico, Papa, Sofia, Keller, Kirill, Ferrari, Laura, Greco, Francesco, Benini, Luca, and Leitner, Christoph
Subjects: Electrical Engineering and Systems Science - Systems and Control, Physics - Instrumentation and Detectors
Abstract: Accurate characterization of ferroelectric properties in polymer piezoelectrics is critical for optimizing the performance of flexible and wearable ultrasound transducers, such as screen-printed PVDF devices. Standard charge measurement techniques, like the Sawyer-Tower circuit, often fall short when applied to ferroelectric polymers due to low-frequency leakage. In this work, we present PEtra, an open-source and versatile piezoelectric loop tracer. PEtra employs a transimpedance amplifier (LMP7721, TI) to convert picoampere-level currents into measurable voltages, covering a frequency range of 0.1 Hz to 5 Hz for a gain setting of 10^7 V/A, and 0.1 Hz to 200 Hz for gain settings between 10^3 V/A to 10^6 V/A (10-fold increments). We demonstrate through simulations and experimental validations that PEtra achieves a sensitivity down to 2 pA, effectively addressing the limitations of traditional charge measurement methods. Compared to the Sawyer-Tower circuit, PEtra directly amplifies currents without the need for a reference capacitor. As a result, it is less susceptible to leakage and can operate at lower frequencies, improving measurement accuracy and reliability. PEtra's design is fully open source, offering researchers and engineers a versatile tool to drive advancements in flexible PVDF transducer technology.
Published: 2024

3. PuLsE: Accurate and Robust Ultrasound-based Continuous Heart-Rate Monitoring on a Wrist-Worn IoT Device

Author: Giordano, Marco, Leitner, Christoph, Vogt, Christian, Benini, Luca, and Magno, Michele
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: This work explores the feasibility of employing ultrasound (US) US technology in a wrist-worn IoT device for low-power, high-fidelity heart-rate (HR) extraction. US offers deep tissue penetration and can monitor pulsatile arterial blood flow in large vessels and the surrounding tissue, potentially improving robustness and accuracy compared to PPG. We present an IoT wearable system prototype utilizing a commercial microcontroller MCU employing the onboard ADC to capture high frequency US signals and an innovative low-power US pulser. An envelope filter lowers the bandwidth of the US signal by a factor of >5x, reducing the system's acquisition requirements without compromising accuracy (correlation coefficient between HR extracted from enveloped and raw signals, r(92)=0.99, p<0.001). The full signal processing pipeline is ported to fixed point arithmetic for increased energy efficiency and runs entirely onboard. The system has an average power consumption of 5.8mW, competitive with PPG based systems, and the HR extraction algorithm requires only 68kB of RAM and 71ms of processing time on an ARM Cortex-M4 MCU. The system is estimated to run continuously for more than 7 days on a smartwatch battery. To accurately evaluate the proposed circuit and algorithm and identify the anatomical location on the wrist with the highest accuracy for HR extraction, we collected a dataset from 10 healthy adults at three different wrist positions. The dataset comprises roughly 5 hours of HR data with an average of 80.6+-16.3 bpm. During recording, we synchronized the established ECG gold standard with our US-based method. The comparisons yields a Pearson correlation coefficient of r(92)=0.99, p<0.001 and a mean error of 0.69+-1.99 bpm in the lateral wrist position near the radial artery. The dataset and code have been open-sourced at https://github.com/mgiordy/Ultrasound-Heart-Rate
Published: 2024

4. ControlPULPlet: A Flexible Real-time Multi-core RISC-V Controller for 2.5D Systems-in-package

Author: Ottaviano, Alessandro, Balas, Robert, Fischer, Tim, Benz, Thomas, Bartolini, Andrea, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: The increasing complexity of real-time control algorithms and the trend toward 2.5D technology necessitate the development of scalable controllers for managing the complex, integrated operation of chiplets within 2.5D systems-in-package. These controllers must provide real-time computing capabilities and have chiplet-compatible IO interfaces for communication with the controlled components. This work introduces ControlPULPlet, a chiplet-compatible, real-time multi-core RISC-V controller, which is available as an open-source release. It includes a 32-bit CV32RT core for efficient interrupt handling and a specialized direct memory access (DMA) engine to automate periodic sensor readouts. A tightly-coupled programmable multi-core accelerator is integrated via a dedicated AXI4 port. A flexible AXI4-compatible die-to-die (D2D) link supports inter-chiplet communication in 2.5D systems and enables high-bandwidth transfers in traditional 2D monolithic setups. We designed and fabricated ControlPULPlet as a silicon prototype called Kairos using TSMC's 65nm CMOS technology. Kairos executes predictive control algorithms at up to 290 MHz while consuming just 30 mW of power. The D2D link requires only 16.5 kGE in physical area per channel, adding just 2.9% to the total system area. It supports off-die access with an energy efficiency of 1.3 pJ/b and achieves a peak duplex transfer rate of 51 Gb/s per second at 200 MHz., Comment: 4.5 pages, 11 figures, submitted to Transactions on Circuits and Systems Part II - Express Briefs (TCAS-II)
Published: 2024

5. MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices

Author: Hamdi, Mohamed Amine, Daghero, Francesco, Sarda, Giuseppe Maria, Van Delm, Josse, Symons, Arne, Benini, Luca, Verhelst, Marian, Pagliari, Daniele Jahier, and Burrello, Alessio
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, I.2.2, D.1.3
Abstract: Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous edge platforms, coupling within the same micro-controller unit (MCU) instruction processors and hardware accelerators for tensor computations, is becoming one of the crucial challenges of the TinyML field. The best-performing DNN compilation toolchains are usually deeply customized for a single MCU family, and porting to a different heterogeneous MCU family implies labor-intensive re-development of almost the entire compiler. On the opposite side, retargetable toolchains, such as TVM, fail to exploit the capabilities of custom accelerators, resulting in the generation of general but unoptimized code. To overcome this duality, we introduce MATCH, a novel TVM-based DNN deployment framework designed for easy agile retargeting across different MCU processors and accelerators, thanks to a customizable model-based hardware abstraction. We show that a general and retargetable mapping framework enhanced with hardware cost models can compete with and even outperform custom toolchains on diverse targets while only needing the definition of an abstract hardware model and a SoC-specific API. We tested MATCH on two state-of-the-art heterogeneous MCUs, GAP9 and DIANA. On the four DNN models of the MLPerf Tiny suite MATCH reduces inference latency by up to 60.88 times on DIANA, compared to using the plain TVM, thanks to the exploitation of the on-board HW accelerator. Compared to HTVM, a fully customized toolchain for DIANA, we still reduce the latency by 16.94%. On GAP9, using the same benchmarks, we improve the latency by 2.15 times compared to the dedicated DORY compiler, thanks to our heterogeneous DNN mapping approach that synergically exploits the DNN accelerator and the eight-cores cluster available on board., Comment: 13 pages, 11 figures, 4 tables
Published: 2024

6. vCLIC: Towards Fast Interrupt Handling in Virtualized RISC-V Mixed-criticality Systems

Author: Zelioli, Enrico, Ottaviano, Alessandro, Balas, Robert, Wistoff, Nils, Garofalo, Angelo, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: The widespread diffusion of compute-intensive edge-AI workloads and the stringent demands of modern autonomous systems require advanced heterogeneous embedded architectures. Such architectures must support high-performance and reliable execution of parallel tasks with different levels of criticality. Hardware-assisted virtualization is crucial for isolating applications concurrently executing these tasks under real-time constraints, but interrupt virtualization poses challenges in ensuring transparency to virtual guests while maintaining real-time system features, such as interrupt vectoring, nesting, and tail-chaining. Despite its rapid advancement to address virtualization needs for mixed-criticality systems, the RISC-V ecosystem still lacks interrupt controllers with integrated virtualization and real-time features, currently relying on non-deterministic, bus-mediated message-signaled interrupts (MSIs) for virtualization. To overcome this limitation, we present the design, implementation, and in-system assessment of vCLIC, a virtualization extension to the RISC-V CLIC fast interrupt controller. Our approach achieves 20x interrupt latency speed-up over the software emulation required for handling non-virtualization-aware systems, reduces response latency by 15% compared to existing MSI-based approaches, and is free from interference from the system bus, at an area cost of just 8kGE when synthesized in an advanced 16nm FinFet technology., Comment: 4 pages, 4 figures, accepted for presentation at the 42nd IEEE International Conference on Computer Design (ICCD 2024)
Published: 2024

7. Bifurcation for families of Ahlfors island maps

Author: Astorg, Matthieu, Benini, Anna Miriam, and Fagella, Nuria
Subjects: Mathematics - Dynamical Systems, Mathematics - Complex Variables, 37F46, 30D05, 37F10, 30D30, 37F44
Abstract: We extend Ma\~n\'e-Sad-Sullivan and Lyubich's equivalent characterization of stability to the setting of Ahlfors island maps, which include notably all meromorphic maps. As a consequence we also obtain the density of $J$-stability for finite type maps in the sense of Epstein.
Published: 2024

8. When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation

Author: Zhou, Yuli, Sun, Guolei, Li, Yawei, Benini, Luca, and Konukoglu, Ender
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos, due to similar colors and textures, poor light conditions, etc. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. But its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2's ability in VCOS. First, we assess SAM2's performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has excellent zero-shot ability of detecting camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2's parameters for VCOS. The code will be available at https://github.com/zhoustan/SAM2-VCOS, Comment: Technical report
Published: 2024

9. Circuits and Systems for Embodied AI: Exploring uJ Multi-Modal Perception for Nano-UAVs on the Kraken Shield

Author: Potocnik, Viviane, Di Mauro, Alfio, Lamberti, Lorenzo, Kartsch, Victor, Scherer, Moritz, Conti, Francesco, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: Embodied artificial intelligence (AI) requires pushing complex multi-modal models to the extreme edge for time-constrained tasks such as autonomous navigation of robots and vehicles. On small form-factor devices, e.g., nano-sized unmanned aerial vehicles (UAVs), such challenges are exacerbated by stringent constraints on energy efficiency and weight. In this paper, we explore embodied multi-modal AI-based perception for Nano-UAVs with the Kraken shield, a 7g multi-sensor (frame-based and event-based imagers) board based on Kraken, a 22 nm SoC featuring multiple acceleration engines for multi-modal event and frame-based inference based on spiking (SNN) and ternary (TNN) neural networks, respectively. Kraken can execute SNN real-time inference for depth estimation at 1.02k inf/s, 18 {\mu}J/inf, TNN real-time inference for object classification at 10k inf/s, 6 {\mu}J/inf, and real-time inference for obstacle avoidance at 221 frame/s, 750 {\mu}J/inf., Comment: 5 pages, 5 figures
Published: 2024

10. FlooNoC: A 645 Gbps/link 0.15 pJ/B/hop Open-Source NoC with Wide Physical Links and End-to-End AXI4 Parallel Multi-Stream Support

Author: Fischer, Tim, Rogenmoser, Michael, Benz, Thomas, Gürkaynak, Frank K., and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this paper, we address this critical need by introducing the FlooNoC Network-on-Chip (NoC), featuring very wide, fully Advanced eXtensible Interface (AXI4) compliant links designed to meet the massive bandwidth needs at high energy efficiency. At the transport level, non-blocking transactions are supported for latency tolerance. Additionally, a novel end-to-end ordering approach for AXI4, enabled by a multi-stream capable Direct Memory Access (DMA) engine simplifies network interfaces and eliminates inter-stream dependencies. Furthermore, dedicated physical links are instantiated for short, latency-critical messages. A complete end-to-end reference implementation in 12nm FinFET technology demonstrates the physical feasibility and power performance area (PPA) benefits of our approach. Utilizing wide links on high levels of metal, we achieve a bandwidth of 645 Gbps per link and a total aggregate bandwidth of 103 Tbps for an 8x4 mesh of processors cluster tiles, with a total of 288 RISC-V cores. The NoC imposes a minimal area overhead of only 3.5% per compute tile and achieves a leading-edge energy efficiency of 0.15 pJ/B/hop at 0.8 V. Compared to state-of-the-art NoCs, our system offers three times the energy efficiency and more than double the link bandwidth. Furthermore, compared to a traditional AXI4-based multi-layer interconnect, our NoC achieves a 30% reduction in area, corresponding to a 47% increase in GFLOPSDP within the same floorplan.
Published: 2024

11. The Denjoy-Wolff Theorem in simply connected domains

Author: Benini, Anna Miriam and Bracci, Filippo
Subjects: Mathematics - Complex Variables
Abstract: We characterize the simply connected domains $\Omega\subsetneq\mathbb{C}$ that exhibit the Denjoy-Wolff Property, meaning that every holomorphic self-map of $\Omega$ without fixed points has a Denjoy-Wolff point. We demonstrate that this property holds if and only if every automorphism of $\Omega$ without fixed points in $\Omega$ has a Denjoy-Wolff point. Furthermore, we establish that the Denjoy-Wolff Property is equivalent to the existence of what we term an ``$H$-limit'' at each boundary point for a Riemann map associated with the domain. The $H$-limit condition is stronger than the existence of non-tangential limits but weaker than unrestricted limits. As an additional result of our work, we prove that there exist bounded simply connected domains where the Denjoy-Wolff Property holds but which are not visible in the sense of Bharali and Zimmer. Since visibility is a sufficient condition for the Denjoy-Wolff Property, this proves that in general it is not necessary., Comment: 23 pages, 2 figures
Published: 2024

12. An Ultra-Low Power Wearable BMI System with Continual Learning Capabilities

Author: Mei, Lan, Ingolfsson, Thorir Mar, Cioflan, Cristian, Kartsch, Victor, Cossettini, Andrea, Wang, Xiaying, and Benini, Luca
Subjects: Electrical Engineering and Systems Science - Signal Processing, Electrical Engineering and Systems Science - Systems and Control
Abstract: Driven by the progress in efficient embedded processing, there is an accelerating trend toward running machine learning models directly on wearable Brain-Machine Interfaces (BMIs) to improve portability and privacy and maximize battery life. However, achieving low latency and high classification performance remains challenging due to the inherent variability of electroencephalographic (EEG) signals across sessions and the limited onboard resources. This work proposes a comprehensive BMI workflow based on a CNN-based Continual Learning (CL) framework, allowing the system to adapt to inter-session changes. The workflow is deployed on a wearable, parallel ultra-low power BMI platform (BioGAP). Our results based on two in-house datasets, Dataset A and Dataset B, show that the CL workflow improves average accuracy by up to 30.36% and 10.17%, respectively. Furthermore, when implementing the continual learning on a Parallel Ultra-Low Power (PULP) microcontroller (GAP9), it achieves an energy consumption as low as 0.45mJ per inference and an adaptation time of only 21.5ms, yielding around 25h of battery life with a small 100mAh, 3.7V battery on BioGAP. Our setup, coupled with the compact CNN model and on-device CL capabilities, meets users' needs for improved privacy, reduced latency, and enhanced inter-session performance, offering good promise for smart embedded real-world BMIs., Comment: 12 pages, 8 figures, to be published in IEEE Transactions on Biomedical Circuits and Systems (TBioCAS)
Published: 2024
Full Text: View/download PDF

13. Train-On-Request: An On-Device Continual Learning Workflow for Adaptive Real-World Brain Machine Interfaces

Author: Mei, Lan, Cioflan, Cristian, Ingolfsson, Thorir Mar, Kartsch, Victor, Cossettini, Andrea, Wang, Xiaying, and Benini, Luca
Subjects: Electrical Engineering and Systems Science - Signal Processing, Electrical Engineering and Systems Science - Systems and Control
Abstract: Brain-machine interfaces (BMIs) are expanding beyond clinical settings thanks to advances in hardware and algorithms. However, they still face challenges in user-friendliness and signal variability. Classification models need periodic adaptation for real-life use, making an optimal re-training strategy essential to maximize user acceptance and maintain high performance. We propose TOR, a train-on-request workflow that enables user-specific model adaptation to novel conditions, addressing signal variability over time. Using continual learning, TOR preserves knowledge across sessions and mitigates inter-session variability. With TOR, users can refine, on demand, the model through on-device learning (ODL) to enhance accuracy adapting to changing conditions. We evaluate the proposed methodology on a motor-movement dataset recorded with a non-stigmatizing wearable BMI headband, achieving up to 92% accuracy and a re-calibration time as low as 1.6 minutes, a 46% reduction compared to a naive transfer learning workflow. We additionally demonstrate that TOR is suitable for ODL in extreme edge settings by deploying the training procedure on a RISC-V ultra-low-power SoC (GAP9), resulting in 21.6 ms of latency and 1 mJ of energy consumption per training step. To the best of our knowledge, this work is the first demonstration of an online, energy-efficient, dynamic adaptation of a BMI model to the intrinsic variability of EEG signals in real-time settings., Comment: 5 pages, 6 figures, to be published in 2024 IEEE Biomedical Circuits and Systems Conference (BioCAS)
Published: 2024

14. fence.t.s: Closing Timing Channels in High-Performance Out-of-Order Cores through ISA-Supported Temporal Partitioning

Author: Wistoff, Nils, Heiser, Gernot, and Benini, Luca
Subjects: Computer Science - Cryptography and Security
Abstract: Microarchitectural timing channels exploit information leakage between security domains that should be isolated, bypassing the operating system's security boundaries. These channels result from contention for shared microarchitectural state. In the RISC-V instruction set, the temporal fence instruction (fence.t) was proposed to close timing channels by providing an operating system with the means to temporally partition microarchitectural state inexpensively in simple in-order cores. This work explores challenges with fence.t in superscalar out-of-order cores featuring large and pervasive microarchitectural state. To overcome these challenges, we propose a novel SW-supported temporal fence (fence.t.s), which reuses existing mechanisms and supports advanced microarchitectural features, enabling full timing channel protection of an exemplary out-of-order core (OpenC910) at negligible hardware costs and a minimal performance impact of 1.0 %., Comment: 8 pages, 3 figures, 1 algorithm, 1 listing. Accepted at the 2024 International Conference on Applications in Electronics Pervading Industry, Environment and Society (APPLEPIES 2024)
Published: 2024

15. Derived algebraic geometry of 2d lattice Yang-Mills theory

Author: Benini, Marco, Fernández, Tomás, and Schenkel, Alexander
Subjects: Mathematical Physics, High Energy Physics - Theory, Mathematics - Algebraic Geometry, 14A30, 70S15, 81T25
Abstract: A derived algebraic geometric study of classical $\mathrm{GL}_n$-Yang-Mills theory on the $2$-dimensional square lattice $\mathbb{Z}^2$ is presented. The derived critical locus of the Wilson action is described and its local data supported in rectangular subsets $V =[a,b]\times [c,d]\subseteq \mathbb{Z}^2$ with both sides of length $\geq 2$ is extracted. A locally constant dg-category-valued prefactorization algebra on $\mathbb{Z}^2$ is constructed from the dg-categories of perfect complexes on the derived stacks of local data., Comment: 25 pages
Published: 2024

16. Optimization and Deployment of Deep Neural Networks for PPG-based Blood Pressure Estimation Targeting Low-power Wearables

Author: Burrello, Alessio, Carlucci, Francesco, Pollo, Giovanni, Wang, Xiaying, Poncino, Massimo, Macii, Enrico, Benini, Luca, and Pagliari, Daniele Jahier
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: PPG-based Blood Pressure (BP) estimation is a challenging biosignal processing task for low-power devices such as wearables. State-of-the-art Deep Neural Networks (DNNs) trained for this task implement either a PPG-to-BP signal-to-signal reconstruction or a scalar BP value regression and have been shown to outperform classic methods on the largest and most complex public datasets. However, these models often require excessive parameter storage or computational effort for wearable deployment, exceeding the available memory or incurring too high latency and energy consumption. In this work, we describe a fully-automated DNN design pipeline, encompassing HW-aware Neural Architecture Search (NAS) and Quantization, thanks to which we derive accurate yet lightweight models, that can be deployed on an ultra-low-power multicore System-on-Chip (SoC), GAP8. Starting from both regression and signal-to-signal state-of-the-art models on four public datasets, we obtain optimized versions that achieve up to 4.99% lower error or 73.36% lower size at iso-error. Noteworthy, while the most accurate SoA network on the largest dataset can not fit the GAP8 memory, all our optimized models can; our most accurate DNN consumes as little as 0.37 mJ while reaching the lowest MAE of 8.08 on Diastolic BP estimation.
Published: 2024

17. Accelerating Image-based Pest Detection on a Heterogeneous Multi-core Microcontroller

Author: Bompani, Luca, Crupi, Luca, Palossi, Daniele, Baldoni, Olmo, Brunelli, Davide, Conti, Francesco, Rusci, Manuele, and Benini, Luca
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, J.3, I.4.8
Abstract: The codling moth pest poses a significant threat to global crop production, with potential losses of up to 80% in apple orchards. Special camera-based sensor nodes are deployed in the field to record and transmit images of trapped insects to monitor the presence of the pest. This paper investigates the embedding of computer vision algorithms in the sensor node using a novel State-of-the-Art Microcontroller Unit (MCU), the GreenWaves Technologies' GAP9 System-on-Chip, which combines 10 RISC-V general purposes cores with a convolution hardware accelerator. We compare the performance of a lightweight Viola-Jones detector algorithm with a Convolutional Neural Network (CNN), MobileNetV3-SSDLite, trained for the pest detection task. On two datasets that differentiate for the distance between the camera sensor and the pest targets, the CNN generalizes better than the other method and achieves a detection accuracy between 83% and 72%. Thanks to the GAP9's CNN accelerator, the CNN inference task takes only 147 ms to process a 320$\times$240 image. Compared to the GAP8 MCU, which only relies on general-purpose cores for processing, we achieved 9.5$\times$ faster inference speed. When running on a 1000 mAh battery at 3.7 V, the estimated lifetime is approximately 199 days, processing an image every 30 seconds. Our study demonstrates that the novel heterogeneous MCU can perform end-to-end CNN inference with an energy consumption of just 4.85 mJ, matching the efficiency of the simpler Viola-Jones algorithm and offering power consumption up to 15$\times$ lower than previous methods. Code at: https://github.com/Bomps4/TAFE_Pest_Detection, Comment: 11 pages, 7 figures, 4 tables
Published: 2024

18. Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

Author: Scherer, Moritz, Macan, Luka, Jung, Victor, Wiese, Philip, Bompani, Luca, Burrello, Alessio, Conti, Francesco, and Benini, Luca
Subjects: Computer Science - Machine Learning, Computer Science - Hardware Architecture
Abstract: With the rise of Embodied Foundation Models (EFMs), most notably Small Language Models (SLMs), adapting Transformers for edge applications has become a very active field of research. However, achieving end-to-end deployment of SLMs on microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this paper, we demonstrate high-efficiency end-to-end SLM deployment on a multicore RISC-V (RV32) MCU augmented with ML instruction extensions and a hardware neural processing unit (NPU). To automate the exploration of the constrained, multi-dimensional memory vs. computation tradeoffs involved in aggressive SLM deployment on heterogeneous (multicore+NPU) resources, we introduce Deeploy, a novel Deep Neural Network (DNN) compiler, which generates highly-optimized C code requiring minimal runtime support. We demonstrate that Deeploy generates end-to-end code for executing SLMs, fully exploiting the RV32 cores' instruction extensions and the NPU: We achieve leading-edge energy and throughput of \SI{490}{\micro\joule \per Token}, at \SI{340}{Token \per \second} for an SLM trained on the TinyStories dataset, running for the first time on an MCU-class device without external memory., Comment: Accepted for publication at ESWEEK - CASES 2024
Published: 2024

19. Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Author: Wiese, Philip, İslamoğlu, Gamze, Scherer, Moritz, Macan, Luka, Jung, Victor J. B., Burrello, Alessio, Conti, Francesco, and Benini, Luca
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate an Attention-based model in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables an end-to-end 8-bit MobileBERT, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s at 32.5 Inf/s consuming 52.0 mW (0.65 V, 22 nm FD-SOI technology)., Comment: Pre-print manuscript submitted for review to the IEEE Design and Test Special Issue on tinyML
Published: 2024

20. A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR

Author: Zhang, Yichao, Bertuletti, Marco, Zhang, Chi, Riedel, Samuel, Vanelli-Coralli, Alessandro, and Benini, Luca
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We introduce an open-source architecture for next-generation Radio-Access Network baseband processing: 1024 latency-tolerant 32-bit RISC-V cores share 4 MiB of L1 memory via an ultra-low latency interconnect (7-11 cycles), a modular Direct Memory Access engine provides an efficient link to a high bandwidth memory, such as HBM2E (98% peak bandwidth at 910GBps). The system achieves leading-edge energy efficiency at sub-ms latency in key 6G baseband processing kernels: Fast Fourier Transform (93 GOPS/W), Beamforming (125 GOPS/W), Channel Estimation (96 GOPS/W), and Linear System Inversion (61 GOPS/W), with only 9% data movement overhead.
Published: 2024

21. Holographic duals of symmetry broken phases

Author: Antinucci, Andrea, Benini, Francesco, and Rizi, Giovanni
Subjects: High Energy Physics - Theory
Abstract: We explore a novel interpretation of Symmetry Topological Field Theories (SymTFTs) as theories of gravity, proposing a holographic duality where the bulk SymTFT (with the gauging of a suitable Lagrangian algebra) is dual to the universal effective field theory (EFT) that describes spontaneous symmetry breaking on the boundary. We test this conjecture in various dimensions and with many examples involving different continuous symmetry structures, including non-Abelian and non-invertible symmetries, as well as higher groups. For instance, we find that many Abelian SymTFTs are dual to free theories of Goldstone bosons or generalized Maxwell fields, while non-Abelian SymTFTs relate to non-linear sigma models with target spaces defined by the symmetry groups. We also extend our analysis to include the non-invertible $\mathbb{Q}/\mathbb{Z}$ axial symmetry, finding it to be dual to axion-Maxwell theory, and a non-Abelian 2-group structure in four dimensions, deriving a new parity-violating interaction that has implications for the low-energy dynamics of U(N) QCD., Comment: 35 pages + appendices; v2: typos corrected and refs added
Published: 2024
Full Text: View/download PDF

22. H-Watch: An Open, Connected Platform for AI-Enhanced COVID19 Infection Symptoms Monitoring and Contact Tracing

Author: Polonelli, Tommaso, Schulthess, Lukas, Mayer, Philipp, Magno, Michele, and Benini, Luca
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: The novel COVID-19 disease has been declared a pandemic event. Early detection of infection symptoms and contact tracing are playing a vital role in containing COVID-19 spread. As demonstrated by recent literature, multi-sensor and connected wearable devices might enable symptom detection and help tracing contacts, while also acquiring useful epidemiological information. This paper presents the design and implementation of a fully open-source wearable platform called H-Watch. It has been designed to include several sensors for COVID-19 early detection, multi-radio for wireless transmission and tracking, a microcontroller for processing data on-board, and finally, an energy harvester to extend the battery lifetime. Experimental results demonstrated only 5.9 mW of average power consumption, leading to a lifetime of 9 days on a small watch battery. Finally, all the hardware and the software, including a machine learning on MCU toolkit, are provided open-source, allowing the research community to build and use the H-Watch.
Published: 2024

23. Culsans: An Efficient Snoop-based Coherency Unit for the CVA6 Open Source RISC-V application processor

Author: Tedeschi, Riccardo, Valente, Luca, Ottavi, Gianmarco, Zelioli, Enrico, Wistoff, Nils, Giacometti, Massimiliano, Sajjad, Abdul Basit, Benini, Luca, and Rossi, Davide
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Symmetric Multi-Processing (SMP) based on cache coherency is crucial for high-end embedded systems like automotive applications. RISC-V is gaining traction, and open-source hardware (OSH) platforms offer solutions to issues such as IP costs and vendor dependency. Existing multi-core cache-coherent RISC-V platforms are complex and not efficient for small embedded core clusters. We propose an open-source SystemVerilog implementation of a lightweight snoop-based cache-coherent cluster of Linux-capable CVA6 cores. Our design uses the MOESI protocol via the Arm's AMBA ACE protocol. Evaluated with Splash-3 benchmarks, our solution shows up to 32.87% faster performance in a dual-core setup and an average improvement of 15.8% over OpenPiton. Synthesized using GF 22nm FDSOI technology, the Cache Coherency Unit occupies only 1.6% of the system area., Comment: 4 pages, 4 figures, DSD2024 and SEAA2024 Works in Progress Session AUG 2024; Updated the acknowledgments
Published: 2024

24. Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs

Author: Lamberti, Lorenzo, Bellone, Lorenzo, Macan, Luka, Natalizio, Enrico, Conti, Francesco, Palossi, Daniele, and Benini, Luca
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Electrical Engineering and Systems Science - Systems and Control
Abstract: Nano-sized unmanned aerial vehicles (UAVs) are ideal candidates for flying Internet-of-Things smart sensors to collect information in narrow spaces. This requires ultra-fast navigation under very tight memory/computation constraints. The PULP-Dronet convolutional neural network (CNN) enables autonomous navigation running aboard a nano-UAV at 19 frame/s, at the cost of a large memory footprint of 320 kB -- and with drone control in complex scenarios hindered by the disjoint training of collision avoidance and steering capabilities. In this work, we distill a novel family of CNNs with better capabilities than PULP-Dronet, but memory footprint reduced by up to 168x (down to 2.9 kB), achieving an inference rate of up to 139 frame/s; we collect a new open-source unified collision/steering 66 k images dataset for more robust navigation; and we perform a thorough in-field analysis of both PULP-Dronet and our tiny CNNs running on a commercially available nano-UAV. Our tiniest CNN, called Tiny-PULP-Dronet v3, navigates with a 100% success rate a challenging and never-seen-before path, composed of a narrow obstacle-populated corridor and a 180{\deg} turn, at a maximum target speed of 0.5 m/s. In the same scenario, the SoA PULP-Dronet consistently fails despite having 168x more parameters., Comment: 13 pages, 6 figures, 7 tables, accepted for publication at IEEE Internet of Things Journal, July 2024
Published: 2024

25. Entanglement asymmetry in conformal field theory and holography

Author: Benini, Francesco, Godet, Victor, and Singh, Amartya Harsh
Subjects: High Energy Physics - Theory, Condensed Matter - Statistical Mechanics, Condensed Matter - Strongly Correlated Electrons, Quantum Physics
Abstract: Entanglement asymmetry is a measure of symmetry breaking in quantum subsystems, inspired by quantum information theory, particularly suited to study out-of-equilibrium states. We study the entanglement asymmetry of a class of excited "coherent states" in conformal quantum field theories with a U(1) symmetry, employing Euclidean path-integral methods with topological symmetry defects and the replica formalism. We compute, at leading order in perturbation theory, the asymmetry for a variety of subsystems, including finite spherical subregions in flat space, in finite volume, and at positive temperature. We also study its Lorentzian time evolution, showcasing the dynamical restoration of the symmetry due to thermalization, as well as the presence of a quantum Mpemba effect. Our results are universal, and apply in any number of dimensions. We also show that the perturbative entanglement asymmetry is related to the Fisher information metric, which has a known holographic dual called Hollands-Wald canonical energy, and that it is captured by the AdS bulk charge contained in the entanglement wedge., Comment: 29 pages plus appendices, 11 figures; v2: many improvements, figures redone, new app D, refs added
Published: 2024

26. Design and Experimental Investigation of Trikarenos: A Fault-Tolerant 28nm RISC-V-based SoC

Author: Rogenmoser, Michael, Wiese, Philip, Forlin, Bruno Endres, Gürkaynak, Frank K., Rech, Paolo, Menicucci, Alessandra, Ottavi, Marco, and Benini, Luca
Subjects: Physics - Instrumentation and Detectors, Computer Science - Hardware Architecture, High Energy Physics - Experiment
Abstract: We present a fault-tolerant by-design RISC-V SoC and experimentally assess it under atmospheric neutrons and 200 MeV protons. The dedicated ECC and Triple-Core Lockstep countermeasures correct most errors, guaranteeing a device cross-section lower than $5.36 \times 10^{-12}$ cm$^2$., Comment: 4 pages (excluding title page), accepted at RADECS 2024
Published: 2024

27. Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads

Author: Perotti, Matteo, Raeber, Michele, Sinigaglia, Mattia, Cavalcante, Matheus, Rossi, Davide, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: Multi-core vector processor architectures excel in handling computationally intensive vectorizable tasks but struggle to achieve optimal resource utilization when facing sequential and control tasks that cannot be vectorized. This work presents Spatzformer, the first reconfigurable RISC-V V (RVV) architecture developed from a baseline open-source dual-core cluster based on Snitch scalar cores augmented with compact Spatz vector units. Spatzformer operates in two distinct modes: split mode, working as a dual-core vector architecture to handle vectorizable tasks concurrently, and merge mode, where two vector units are driven by a single scalar core, allowing the remaining scalar core to handle non-vectorizable control tasks. We implement Spatzformer in a 12-nm technology node and characterize the cost of the added architectural reconfigurability. We show that merge mode accelerates mixed scalar-vector kernels by up to 1.8x compared to split mode. Moreover, it accelerates the vector kernels that require fine-grained synchronization (such as FFT) by up to 20% with respect to the baseline. The reconfigurability features do not degrade the architecture's maximum frequency (1.2GHz, TT, 0.8V, 25C) and have a negligible area impact (+1.4%), with a worst-case energy efficiency drop of only 7% with respect to the non-reconfigurable baseline., Comment: To be published in the 2024 IEEE 35th International Conference on Application Specific Systems (ASAP), Architectures and Processors
Published: 2024

28. Ultra-Lightweight Collaborative Mapping for Robot Swarms

Author: Niculescu, Vlad, Polonelli, Tommaso, Magno, Michele, and Benini, Luca
Subjects: Computer Science - Robotics
Abstract: A key requirement in robotics is the ability to simultaneously self-localize and map a previously unknown environment, relying primarily on onboard sensing and computation. Achieving fully onboard accurate simultaneous localization and mapping (SLAM) is feasible for high-end robotic platforms, whereas small and inexpensive robots face challenges due to constrained hardware, therefore frequently resorting to external infrastructure for sensing and computation. The challenge is further exacerbated in swarms of robots, where coordination, scalability, and latency are crucial concerns. This work introduces a decentralized and lightweight collaborative SLAM approach that enables mapping on virtually any robot, even those equipped with low-cost hardware and only 1.5 MB of memory, including miniaturized insect-size devices. Moreover, the proposed solution supports large swarm formations with the capability to coordinate hundreds of agents. To substantiate our claims, we have successfully implemented collaborative SLAM on centimeter-size drones weighing 46 g. Remarkably, we achieve a mapping accuracy below 30 cm, a result comparable to high-end state-of-the-art solutions while reducing the cost, memory, and computation requirements by two orders of magnitude. Our approach is innovative in three main aspects. First, it enables onboard infrastructure-less collaborative mapping with a lightweight and cost-effective (\$20) solution in terms of sensing and computation. Second, we optimize the data traffic within the swarm to support hundreds of cooperative agents using standard wireless protocols such as ultra-wideband (UWB), Bluetooth, or WiFi. Last, we implement a distributed swarm coordination policy to decrease mapping latency and enhance accuracy., Comment: 14 pages, 13 figures
Published: 2024

29. Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones

Author: Lamberti, Lorenzo, Niculescu, Vlad, Barcis, Michał, Bellone, Lorenzo, Natalizio, Enrico, Benini, Luca, and Palossi, Daniele
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Pocket-sized autonomous nano-drones can revolutionize many robotic use cases, such as visual inspection in narrow, constrained spaces, and ensure safer human-robot interaction due to their tiny form factor and weight -- i.e., tens of grams. This compelling vision is challenged by the high level of intelligence needed aboard, which clashes against the limited computational and storage resources available on PULP (parallel-ultra-low-power) MCU class navigation and mission controllers that can be hosted aboard. This work moves from PULP-Dronet, a State-of-the-Art convolutional neural network for autonomous navigation on nano-drones. We introduce Tiny-PULP-Dronet: a novel methodology to squeeze by more than one order of magnitude model size (50x fewer parameters), and number of operations (27x less multiply-and-accumulate) required to run inference with similar flight performance as PULP-Dronet. This massive reduction paves the way towards affordable multi-tasking on nano-drones, a fundamental requirement for achieving high-level intelligence., Comment: 3 Figures, 1 table. Accepted for publication at IEEE Artificial Intelligence Circuits and Systems (AICAS), 2022
Published: 2024

30. GAP9Shield: A 150GOPS AI-capable Ultra-low Power Module for Vision and Ranging Applications on Nano-drones

Author: Müller, Hanna, Kartsch, Victor, and Benini, Luca
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Signal Processing
Abstract: The evolution of AI and digital signal processing technologies, combined with affordable energy-efficient processors, has propelled the development of both hardware and software for drone applications. Nano-drones, which fit into the palm of the hand, are suitable for indoor environments and safe for human interaction; however, they often fail to deliver the required performance for complex tasks due to the lack of hardware providing sufficient sensing and computing performance. Addressing this gap, we present the GAP9Shield, a nano-drone-compatible module powered by the GAP9, a 150GOPS-capable SoC. The system also includes a 5MP OV5647 camera for high-definition imaging, a WiFi-BLE NINA module, and a 5D VL53L1-based ranging subsystem, which enhances obstacle avoidance capabilities. In comparison with similarly targeted state-of-the-art systems, GAP9Shield provides a 20% higher sample rate (RGB images) while offering a 20% weight reduction. In this paper, we also highlight the energy efficiency and processing power capabilities of GAP9 for object detection (YOLO), localization, and mapping, which can run within a power envelope of below 100 mW and at low latency (as 17 ms for object detection), highlighting the transformative potential of GAP9 for the new generation of nano-drone applications., Comment: This work has been accepted for publication at the European Robotics Forum 2024
Published: 2024

31. BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring

Author: Benfenati, Luca, Ingolfsson, Thorir Mar, Cossettini, Andrea, Pagliari, Daniele Jahier, Burrello, Alessio, and Benini, Luca
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG Database, consisting of 664 EEG recordings from 24 pediatric patients, of which 198 contain seizure events. Key contributions include optimizing fine-tuning on the CHB-MIT dataset, where the impact of model architecture, pre-processing, and post-processing techniques are thoroughly examined to enhance sensitivity and reduce false positives per hour (FP/h). We also explored custom training strategies to ascertain the most effective setup. The model undergoes a novel second pre-training phase before subject-specific fine-tuning, enhancing its generalization capabilities. The optimized model demonstrates substantial performance enhancements, achieving as low as 0.23 FP/h, 2.5$\times$ lower than the baseline model, with a lower but still acceptable sensitivity rate, showcasing the effectiveness of applying a BERT-based approach on EEG-based seizure detection., Comment: 4 pages, 2 tables, 2 figures
Published: 2024

32. Basilisk: An End-to-End Open-Source Linux-Capable RISC-V SoC in 130nm CMOS

Author: Scheffler, Paul, Sauter, Philippe, Benz, Thomas, Gürkaynak, Frank K., and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: Open-source hardware (OSHW) is rapidly gaining traction in academia and industry. The availability of open RTL descriptions, EDA tools, and even PDKs enables a fully auditable supply chain for end-to-end (RTL to layout) open-source silicon, significantly strengthening security and transparency. Despite promising developments, existing OSHW efforts have so far fallen short of producing end-to-end open-source SoCs at the complexity and performance level needed to run a general-purpose OS. We present Basilisk, the first end-to-end open-source, Linux-capable RISC-V SoC taped out in IHP's open 130 nm technology. Basilisk features a 64-bit RISC-V core, a fully digital HyperRAM DRAM controller, and a rich set of IO peripherals including USB 1.1 and VGA. To tape out Basilisk, we create a reusable tool pipeline to convert its industry-grade SystemVerilog description to Verilog. We optimized logic synthesis in the open source Yosys synthesis tool, obtaining an increase in Basilisk's peak clock speed by 2.3x to 77 MHz and reducing its cell area by 1.6x to 1.1 MGE while also reducing synthesis runtime and RAM usage. We further optimized place and route in OpenROAD, enabling convergence to zero DRC violations while increasing core area utilization by 10% and reducing die area by 12%., Comment: 3 pages, 4 figures. Accepted at SSH-SoC 2024 workshop
Published: 2024

33. Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

Author: Paulin, Gianna, Scheffler, Paul, Benz, Thomas, Cavalcante, Matheus, Fischer, Tim, Eggimann, Manuel, Zhang, Yichao, Wistoff, Nils, Bertaccini, Luca, Colagrande, Luca, Ottavi, Gianmarco, Gürkaynak, Frank K., Rossi, Davide, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stencils (83 %), sparse-dense (42 %), and sparse-sparse (49 %) matrix multiply., Comment: 2 pages, 7 figures. Accepted at the 2024 IEEE Symposium on VLSI Technology & Circuits
Published: 2024

34. Low Latency Visual Inertial Odometry with On-Sensor Accelerated Optical Flow for Resource-Constrained UAVs

Author: Kühne, Jonas, Magno, Michele, and Benini, Luca
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Visual Inertial Odometry (VIO) is the task of estimating the movement trajectory of an agent from an onboard camera stream fused with additional Inertial Measurement Unit (IMU) measurements. A crucial subtask within VIO is the tracking of features, which can be achieved through Optical Flow (OF). As the calculation of OF is a resource-demanding task in terms of computational load and memory footprint, which needs to be executed at low latency, especially in robotic applications, OF estimation is today performed on powerful CPUs or GPUs. This restricts its use in a broad spectrum of applications where the deployment of such powerful, power-hungry processors is unfeasible due to constraints related to cost, size, and power consumption. On-sensor hardware acceleration is a promising approach to enable low latency VIO even on resource-constrained devices such as nano drones. This paper assesses the speed-up in a VIO sensor system exploiting a compact OF sensor consisting of a global shutter camera and an Application Specific Integrated Circuit (ASIC). By replacing the feature tracking logic of the VINS-Mono pipeline with data from this OF camera, we demonstrate a 49.4% reduction in latency and a 53.7% reduction of compute load of the VIO pipeline over the original VINS-Mono implementation, allowing VINS-Mono operation up to 50 FPS instead of 20 FPS on the quad-core ARM Cortex-A72 processor of a Raspberry Pi Compute Module 4., Comment: This article has been accepted for publication in the IEEE Sensors Journal (JSEN)
Published: 2024
Full Text: View/download PDF

35. GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG

Author: Frey, Sebastian, Lucchini, Mattia Alberto, Kartsch, Victor, Ingolfsson, Thorir Mar, Bernardi, Andrea Helga, Segessenmann, Michael, Osieleniec, Jakub, Benatti, Simone, Benini, Luca, and Cossettini, Andrea
Subjects: Electrical Engineering and Systems Science - Signal Processing, Electrical Engineering and Systems Science - Systems and Control
Abstract: Recent advancements in head-mounted wearable technology are revolutionizing the field of biopotential measurement, but the integration of these technologies into practical, user-friendly devices remains challenging due to issues with design intrusiveness, comfort, and data privacy. To address these challenges, this paper presents GAPSES, a novel smart glasses platform designed for unobtrusive, comfortable, and secure acquisition and processing of electroencephalography (EEG) and electrooculography (EOG) signals. We introduce a direct electrode-electronics interface with custom fully dry soft electrodes to enhance comfort for long wear. An integrated parallel ultra-low-power RISC-V processor (GAP9, Greenwaves Technologies) processes data at the edge, thereby eliminating the need for continuous data streaming through a wireless link, enhancing privacy, and increasing system reliability in adverse channel conditions. We demonstrate the broad applicability of the designed prototype through validation in a number of EEG-based interaction tasks, including alpha waves, steady-state visual evoked potential analysis, and motor movement classification. Furthermore, we demonstrate an EEG-based biometric subject recognition task, where we reach a sensitivity and specificity of 98.87% and 99.86% respectively, with only 8 EEG channels and an energy consumption per inference on the edge as low as 121 uJ. Moreover, in an EOG-based eye movement classification task, we reach an accuracy of 96.68% on 11 classes, resulting in an information transfer rate of 94.78 bit/min, which can be further increased to 161.43 bit/min by reducing the accuracy to 81.43%. The deployed implementation has an energy consumption of 24 uJ per inference and a total system power of only 16.28 mW, allowing for continuous operation of more than 12 h with a small 75 mAh battery., Comment: 10 pages, 5 figures, 5 tables. This paper has been submitted to IEEE Transactions on Biomedical Circuits and Systems
Published: 2024

36. HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms

Author: Van Delm, Josse, Vandersteegen, Maarten, Burrello, Alessio, Sarda, Giuseppe Maria, Conti, Francesco, Pagliari, Daniele Jahier, Benini, Luca, and Verhelst, Marian
Subjects: Computer Science - Programming Languages, Computer Science - Distributed, Parallel, and Cluster Computing, D.3.4
Abstract: Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges TVM with DORY to maximize the utilization of heterogeneous accelerators and minimize data movements. HTVM allows deploying the MLPerf(TM) Tiny suite on DIANA, an SoC with a RISC-V CPU, and digital and analog compute-in-memory AI accelerators, at 120x improved performance over plain TVM deployment., Comment: Presented at DAC2023. Open-source code is available at https://github.com/KULeuven-MICAS/htvm
Published: 2024
Full Text: View/download PDF

37. A Gigabit, DMA-enhanced Open-Source Ethernet Controller for Mixed-Criticality Systems

Author: Liang, Chaoqun, Ottaviano, Alessandro, Benz, Thomas, Sinigaglia, Mattia, Benini, Luca, Garofalo, Angelo, and Rossi, Davide
Subjects: Computer Science - Hardware Architecture
Abstract: The ongoing revolution in application domains targeting autonomous navigation, first and foremost automotive "zonalization", has increased the importance of certain off-chip communication interfaces, particularly Ethernet. The latter will play an essential role in next-generation vehicle architectures as the backbone connecting simultaneously and instantaneously the zonal/domain controllers. There is thereby an incumbent need to introduce a performant Ethernet controller in the open-source HW community, to be used as a proxy for architectural explorations and prototyping of mixed-criticality systems (MCSs). Driven by this trend, in this work, we propose a fully open-source, DMA-enhanced, technology-agnostic Gigabit Ethernet architecture that overcomes the limitations of existing open-source architectures, such as Lowrisc's Ethernet, often tied to FPGA implementation, performance-bound by sub-optimal design choices such as large memory buffers, and in general not mature enough to bridge the gap between academia and industry. Besides the area advantage, the proposed design increases packet transmission speed up to almost 3x compared to Lowrisc's and is validated through implementation and FPGA prototyping into two open-source, heterogeneous MCSs., Comment: 4 pages,4 figures, 21st ACM International Conference on Computing Frontiers Workshops and Special Sessions
Published: 2024
Full Text: View/download PDF

38. Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

Author: Potocnik, Viviane, Colagrande, Luca, Fischer, Tim, Bertaccini, Luca, Pagliari, Daniele Jahier, Burrello, Alessio, and Benini, Luca
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture, C.4, C.3, I.2
Abstract: Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we present the first end-to-end inference results of transformer models on an open-source many-tiny-core RISC-V platform implementing distributed Softmax primitives and leveraging ISA extensions for SIMD floating-point operand streaming and instruction repetition, as well as specialized DMA engines to minimize costly main memory accesses and to tolerate their latency. We focus on two foundational transformer topologies, encoder-only and decoder-only models. For encoder-only models, we demonstrate a speedup of up to 12.8x between the most optimized implementation and the baseline version. We reach over 79% FPU utilization and 294 GFLOPS/W, outperforming State-of-the-Art (SoA) accelerators by more than 2x utilizing the HW platform while achieving comparable throughput per computational unit. For decoder-only topologies, we achieve 16.1x speedup in the Non-Autoregressive (NAR) mode and up to 35.6x speedup in the Autoregressive (AR) mode compared to the baseline implementation. Compared to the best SoA dedicated accelerator, we achieve 2.04x higher FPU utilization., Comment: 14 pages, 10 figures, 4 tables, IEEE Transactions on Circuits and Systems for Artificial Intelligence
Published: 2024

39. xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

Author: Rutishauser, Georg, Mihali, Joan, Scherer, Moritz, and Benini, Luca
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs., Comment: Accepted for publication at IEEE ASAP 2024
Published: 2024

40. Modeling and Controlling Many-Core HPC Processors: an Alternative to PID and Moving Average Algorithms

Author: Bambini, Giovanni, Ottaviano, Alessandro, Conficoni, Christian, Tilli, Andrea, Benini, Luca, and Bartolini, Andrea
Subjects: Electrical Engineering and Systems Science - Systems and Control, Computer Science - Performance
Abstract: The race towards performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existing research lacks a detailed analysis and modeling of thermal, power, and electrical coupling effects and how they have to be jointly considered to perform dynamic control of complex and heterogeneous Multi-Processor System on Chips (MPSoCs). To close the gap, in this work, we first provide a detailed thermal and power model targeting a modern High Performance Computing (HPC) MPSoC. We consider real-world coupling effects such as actuators' non-idealities and the exponential relation between the dissipated power, the temperature state, and the voltage level in a single processing element. We analyze how these factors affect the control algorithm behavior and the type of challenges that they pose. Based on the analysis, we propose a thermal capping strategy inspired by Fuzzy control theory to replace the state-of-the-art PID controller, as well as a root-finding iterative method to optimally choose the shared voltage value among cores grouped in the same voltage domain. We evaluate the proposed controller with model-in-the-loop and hardware-in-the-loop co-simulations. We show an improvement over state-of-the-art methods of up to 5x the maximum exceeded temperature while providing an average of 3.56% faster application execution runtime across all the evaluation scenarios., Comment: Paper in Review
Published: 2024

41. A Passive and Asynchronous Wake-up Receiver for Acoustic Underwater Communication

Author: Schulthess, Lukas, Mayer, Philipp, Benini, Luca, and Magno, Michele
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Establishing reliable data exchange in an underwater domain using energy and power-efficient communication methods is crucial and challenging. Radio frequencies are absorbed by the salty and mineral-rich water and optical signals are obstructed and scattered after short distances. In contrast, acoustic communication benefits from low absorption and enables communication over long distances. Underwater communication must match low power and energy requirements as underwater sensor systems must have a long battery lifetime and need to work reliably due to their deployment and maintenance cost. For long-term deployments, the sensors' overall power consumption is determined by the power consumption during idle state. It can be reduced by integrating asynchronous always-on wake-up circuits with nano-watt power consumption. However, this approach does reduce but not eliminate idle power consumption, leaving a margin for improvement. This paper presents a passive and asynchronous wake-up receiver for acoustic underwater communication enabling zero-power always-on listening. Zero-power listening is achieved by combining energy and information transmission using a low-power wake-up receiver that extracts energy out of the acoustic signal and eliminates radio frontend idle consumption. In-field evaluations demonstrate that the wake-up circuit requires only 63 uW to detect and compare an 8-bit UUID at a data rate of 200 bps up to a distance of 5 m and that the needed energy can directly be extracted from the acoustic signal.
Published: 2024

42. SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Author: Huang, Wei, Qin, Haotong, Liu, Yangdong, Li, Yawei, Liu, Xianglong, Benini, Luca, Magno, Michele, and Qi, Xiaojuan
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Large language models (LLMs) achieve remarkable performance in natural language understanding but require substantial computation and memory resources. Post-training quantization (PTQ) is a powerful compression technique extensively investigated in LLMs. However, existing PTQ methods are still not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths. Standard PTQ methods using group-wise quantization suffer difficulties in quantizing LLMs accurately to such low-bit, but advanced methods remaining high-precision weights element-wisely are hard to realize their theoretical hardware efficiency. This paper presents a Salience-Driven Mixed-Precision Quantization scheme for LLMs, namely SliM-LLM. The scheme exploits the salience distribution of weights to determine optimal bit-width and quantizers for accurate LLM quantization, while aligning bit-width partition to groups for compact memory usage and fast integer inference. Specifically, the proposed SliM-LLM mainly relies on two novel techniques: (1) Salience-Determined Bit Allocation utilizes the clustering characteristics of salience distribution to allocate the bit-widths of each group, increasing the accuracy of quantized LLMs and maintaining the inference efficiency; (2) Salience-Weighted Quantizer Calibration optimizes the parameters of the quantizer by considering the element-wise salience within the group, balancing the maintenance of salient information and minimization of errors. Comprehensive experiments show that SliM-LLM significantly improves the accuracy of LLMs at ultra-low bits, e.g., 2-bit LLaMA-7B achieves a 5.5-times memory-saving than original model on NVIDIA A800 GPUs, and 48% decrease of perplexity compared to the state-of-the-art gradient-free PTQ method. Moreover, SliM-LLM+, which is integrated from the extension of SliM-LLM with gradient-based quantizers, further reduces perplexity by 35.1%., Comment: 22 pages
Published: 2024

43. Shrinking targets and recurrent behaviour for forward compositions of inner functions

Author: Benini, Anna Miriam, Evdoridou, Vasiliki, Fagella, Núria, Rippon, Philip J., and Stallard, Gwyneth M.
Subjects: Mathematics - Dynamical Systems, 37D05, 37A25, 30D05, 37F10, 28D05, 37F99
Abstract: We prove sharp results about recurrent behaviour of orbits of forward compositions of inner functions, inspired by fundamental results about iterates of inner functions, and give examples to illustrate behaviours that cannot occur in the simpler case of iteration. A result of Fern\'andez, Meli\'an and Pestana gives a precise version of the classical Poincar\'e recurrence theorem for iterates of the boundary extension of an inner function that fixes~0. We generalise this to forward composition sequences $F_n=f_n\circ \dots\circ f_1,$ $n\in \mathbb{N},$ where $f_n$ are inner functions that fix~0, giving conditions on the contraction of $(F_n)$ so that the radial boundary extension $F_n$ hits any shrinking target of arcs $(I_n)$ of a given size. Next, Aaronson, and also Doering and Ma\~n\'e, gave a remarkable dichotomy for iterates of any inner function, showing that the behaviour of the boundary extension is of two entirely different types, depending on the size of the sequence $(|f^n(0)|)$. In earlier work, we showed that one part of this dichotomy holds in the non-autonomous setting of forward compositions. It turns out that this dichotomy is closely related to the result of Fern\'andez, Meli\'an and Pestana, and here we show that a version of the second part of the dichotomy holds in the non-autonomous setting provided we impose a condition on the contraction of $(F_n)$ in relation to the size of the sequence $(|F_n(0)|)$. The techniques we use include a strong version of the second Borel--Cantelli lemma and strong mixing results of Pommerenke for contracting sequences of inner functions. We give examples to show that the contraction conditions that we need to impose in the non-autonomous setting are best possible., Comment: 21 pages
Published: 2024

44. SentryCore: A RISC-V Co-Processor System for Safe, Real-Time Control Applications

Author: Rogenmoser, Michael, Ottaviano, Alessandro, Benz, Thomas, Balas, Robert, Perotti, Matteo, Garofalo, Angelo, and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: In the last decade, we have witnessed exponential growth in the complexity of control systems for safety-critical applications (automotive, robots, industrial automation) and their transition to heterogeneous mixed-criticality systems (MCSs). The growth of the RISC-V ecosystem is creating a major opportunity to develop open-source, vendor-neutral reference platforms for safety-critical computing. We present SentryCore, a reliable, real-time, self-contained, open-source mega-IP for advanced control functions that can be seamlessly integrated into Systems-on-Chip, e.g., for automotive applications, through industry-standard Advanced eXtensible Interface 4 (AXI4). SentryCore features three embedded RISC-V processor cores in lockstep with error-correcting code (ECC) protected data memory for reliable execution of any safety-critical application. Context switching is accelerated to under 110 clock cycles via a RISC-V core-local interrupt controller (CLIC) and dedicated hardware extensions, while a timer-based direct memory access (DMA) engine streamlines sensor data readout during periodic control loops. SentryCore was implemented in Intel's 16nm process node and tested with FreeRTOS, ThreadX, and RTIC software support., Comment: 2 pages, accepted at the RISC-V Summit Europe 2024
Published: 2024

45. Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural Networks

Author: Dequino, Alberto, Carpegna, Alessio, Nadalini, Davide, Savino, Alessandro, Benini, Luca, Di Carlo, Stefano, and Conti, Francesco
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Computer Science - Emerging Technologies, Computer Science - Machine Learning
Abstract: Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples with latent representations of previously learned data, to mitigate forgetting. Experiments on the Heidelberg SHD dataset with Sample and Class-Incremental tasks reach a Top-1 accuracy of 92.5% and 92%, respectively, without forgetting the previously learned information. Furthermore, we minimize the LRs' requirements by applying a time-domain compression, reducing by two orders of magnitude their memory requirement, with respect to a naive rehearsal setup, with a maximum accuracy drop of 4%. On a Multi-Class-Incremental task, our SNN learns 10 new classes from an initial set of 10, reaching a Top-1 accuracy of 78.4% on the full test set.
Published: 2024

46. TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios

Author: Zhang, Yichao, Bertuletti, Marco, Riedel, Samuel, Cavalcante, Matheus, Vanelli-Coralli, Alessandro, and Benini, Luca
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture
Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730MHz, 880MHz, and 924MHz (TT/0.80 V/25 {\deg}C) in 12nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93GOPS/W), Matrix-Multiplication (125GOPS/W), Channel Estimation (96GOPS/W), and Linear System Inversion (61GOPS/W). For all the kernels, it consumes less than 10W, in compliance with industry standards., Comment: 6 pages, 6 figures and 3 tables
Published: 2024
Full Text: View/download PDF

47. Insights from Basilisk: Are Open-Source EDA Tools Ready for a Multi-Million-Gate, Linux-Booting RV64 SoC Design?

Author: Sauter, Philippe, Benz, Thomas, Scheffler, Paul, Gürkaynak, Frank K., and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: Designing complex, multi-million-gate application-specific integrated circuits requires robust and mature electronic design automation (EDA) tools. We describe our efforts in enhancing the open-source Yosys+Openroad EDA flow to implement Basilisk, a fully open-source, Linux-booting RV64GC system-on-chip (SoC) design. We analyze the quality-of-results impact of our enhancements to synthesis tools, interfaces between EDA tools, logic optimization scripts, and a newly open-sourced library of optimized arithmetic macro-operators. We also introduce a streamlined physical design flow with an improved power grid and cell placement integration. Our Basilisk SoC design was taped out in IHP's open 130 nm technology. It achieves an operating frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA flow, while also reducing logic area by 1.6x. Furthermore, tool runtime was reduced by 2.5x, and peak RAM usage decreased by 2.9x. Through collaboration with EDA tool developers and domain experts, Basilisk establishes solid "proof of existence" for a fully open-source EDA flow used in designing a competitive multi-million-gate digital SoC., Comment: 8 pages, 6 figures, submitted at IWLS 2024
Published: 2024

48. Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC

Author: Sauter, Phillippe, Benz, Thomas, Scheffler, Paul, Jiang, Zerun, Muheim, Beat, Gürkaynak, Frank K., and Benini, Luca
Subjects: Computer Science - Hardware Architecture
Abstract: We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP's open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55 % core utilization compared to 50 % in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications., Comment: 2 pages, 1 figure, accepted as a poster at the RISC-V Summit Europe 2024
Published: 2024

49. A Spiking Neural Network Decoder for Implantable Brain Machine Interfaces and its Sparsity-aware Deployment on RISC-V Microcontrollers

Author: Liao, Jiawei, Toomey, Oscar, Wang, Xiaying, Widmer, Lars, Chestek, Cynthia A., Benini, Luca, and Jang, Taekwang
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Implantable Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation, and they demand accurate and energy-efficient algorithms. In this paper, we propose a novel spiking neural network (SNN) decoder for regression tasks for implantable BMIs. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its capability to handle temporal problems. The proposed SNN decoder outperforms the state-of-the-art Kalman filter and artificial neural network (ANN) decoders in offline finger velocity decoding tasks. The decoder is deployed on a RISC-V-based hardware platform and optimized to exploit sparsity. The proposed implementation has an average power consumption of 0.50 mW in a duty-cycled mode. When conducting continuous inference without duty-cycling, it achieves an energy efficiency of 1.88 uJ per inference, which is 5.5X less than the baseline ANN. Additionally, the average decoding latency is 0.12 ms for each inference, which is 5.7X faster than the ANN implementation.
Published: 2024

50. Haag-Kastler stacks

Author: Benini, Marco, Grant-Stuart, Alastair, and Schenkel, Alexander
Subjects: Mathematical Physics, High Energy Physics - Theory, Mathematics - Category Theory, Mathematics - Quantum Algebra, 81Txx, 53C50, 18F20, 18N10
Abstract: This paper provides an alternative implementation of the principle of general local covariance for algebraic quantum field theories (AQFTs) which is more flexible and powerful than the original one by Brunetti, Fredenhagen and Verch. This is realized by considering the $2$-functor $\mathsf{HK} : \mathbf{Loc}^\mathrm{op} \to \mathbf{CAT}$ which assigns to each Lorentzian manifold $M$ the category $\mathsf{HK}(M)$ of Haag-Kastler-style AQFTs over $M$ and to each embedding $f:M\to N$ a pullback functor $f^\ast = \mathsf{HK}(f) : \mathsf{HK}(N) \to \mathsf{HK}(M)$ restricting theories from $N$ to $M$. Locally covariant AQFTs are recovered as the points of the $2$-functor $\mathsf{HK}$. The main advantages of this new perspective are: 1.) It leads to technical simplifications, in particular with regard to the time-slice axiom, since global problems on $\mathbf{Loc}$ become families of simpler local problems on individual Lorentzian manifolds. 2.) Some aspects of the Haag-Kastler framework which previously got lost in locally covariant AQFT, such as a relative compactness condition on the open subsets in a Lorentzian manifold $M$, are reintroduced. 3.) It provides a successful and radically new perspective on descent conditions in AQFT, i.e. local-to-global conditions which allow one to recover a global AQFT on a Lorentzian manifold $M$ from its local data in an open cover $\{U_i \subseteq M\}$., Comment: 65 pages
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,099 results on '"Benini P"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources