183 results on '"HW/SW co-design"'
Search Results
2. Speed vs. efficiency: A framework for high-frequency trading algorithms on FPGA using Zynq SoC platform
- Author
-
Abbas Ali, Abdullah Shah, Azaz Hassan Khan, Malik Umar Sharif, Zaka Ullah Zahid, Rabia Shahid, Tariqullah Jan, and Mohammad Haseeb Zafar
- Subjects
System-on-Chip ,High-frequency trading ,Xilinx Zynq-7000 ,HW/SW co-design ,Technical indicators ,Cryptocurrencies ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Software-based technical indicators have been widely used for the stock market forecasting, aiming to predict market direction. Even though many algorithms for the software based technical indicators are presented, there are almost no hardware implementations reported in the literature. In this paper, the hardware implementation is presented for three commonly used technical indicators: Moving Average Convergence/Divergence (MACD), Relative Strength Index (RSI), and Aroon. Latency evaluation is conducted for Bitcoin and Ethereum within a single-day timeframe, utilizing the Xilinx Zynq-7000 programmable SoC XC7Z020-CLG484-1 platform.Additionally, various hardware/software (HW/SW) partitioning strategies are explored to leverage the flexibility of software alongside the performance advantages of hardware via the Zynq SoC platform. The results show that the best performing technical indicator is MACD with a speedup of 30 times over its software only counterpart. Furthermore, a hybrid design integrating multiple technical indicators is proposed, pairing MACD with RSI due to their competitive throughput values, differing by only 0.38 microseconds. This hybrid approach capitalizes on the parallel processing capabilities of hardware, enabling multiple systems to operate simultaneously.
- Published
- 2024
- Full Text
- View/download PDF
3. Speed vs. efficiency: A framework for high-frequency trading algorithms on FPGA using Zynq SoC platform.
- Author
-
Ali, Abbas, Shah, Abdullah, Khan, Azaz Hassan, Sharif, Malik Umar, Zahid, Zaka Ullah, Shahid, Rabia, Jan, Tariqullah, and Zafar, Mohammad Haseeb
- Subjects
PROCESS capability ,PARALLEL processing ,ALGORITHMS ,MARKETING forecasting ,MOVING average process ,SYSTEMS on a chip - Abstract
Software-based technical indicators have been widely used for the stock market forecasting, aiming to predict market direction. Even though many algorithms for the software based technical indicators are presented, there are almost no hardware implementations reported in the literature. In this paper, the hardware implementation is presented for three commonly used technical indicators: Moving Average Convergence/Divergence (MACD), Relative Strength Index (RSI), and Aroon. Latency evaluation is conducted for Bitcoin and Ethereum within a single-day timeframe, utilizing the Xilinx Zynq-7000 programmable SoC XC7Z020-CLG484-1 platform. Additionally, various hardware/software (HW/SW) partitioning strategies are explored to leverage the flexibility of software alongside the performance advantages of hardware via the Zynq SoC platform. The results show that the best performing technical indicator is MACD with a speedup of 30 times over its software only counterpart. Furthermore, a hybrid design integrating multiple technical indicators is proposed, pairing MACD with RSI due to their competitive throughput values, differing by only 0.38 microseconds. This hybrid approach capitalizes on the parallel processing capabilities of hardware, enabling multiple systems to operate simultaneously. • Proposed SoC framework boosts HFT system efficiency. • HW/SW partitioning on Zynq-7000 SoC enhances flexibility. • MACD, RSI, Aroon indicators implemented for forecasting. • MACD outperforms others on ZYNQ SoC, 30x faster than software. • MACD and RSI combination optimal for hybrid designs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Einleitung
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
5. AMS Metamorphic Testing Umgebung
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
6. AMS verbesserte Code-Coverage-Verifizierungsumgebung
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
7. Vorarbeiten
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
8. Hardware and Environment Modeling
- Author
-
Pieper, Pascal, Drechsler, Rolf, Pieper, Pascal, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
9. Introduction
- Author
-
Pieper, Pascal, Drechsler, Rolf, Pieper, Pascal, and Drechsler, Rolf
- Published
- 2024
- Full Text
- View/download PDF
10. Novel hardware/software co-design approach for Connect6 game-solver
- Author
-
Avijeet Kumar Trivedi, Shubham Garg, and Neeta Pandey
- Subjects
Connect6 ,HW/SW Co-design ,High-speed ,NegaScout ,Node ordering ,Memoization ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper introduces an enhanced Hardware-Software Co-Design implementation of Connect6 game solver compared to a prevalent implementation by using novel data communications and enhanced tree-searching algorithms. Hardware-software communication is improved by transferring the current board position from the Processing System to the Programmable Logic through a 21-bit change board instead of the complete 722-bit board. Tree-search, to find the next best move, is enhanced by using Negascout algorithm, Iterative Deepening, Node Ordering and Memoization. The proposed design demonstrates 50 % improvement in the overall performance and 92 % in communication overhead compared to the prevalent implementation. Further, it is around 4 times faster than its software implementation on the same platform. The analysis was carried on Xilinx Zedboard SOC using the GNU GCC Compiler with the Maximum Optimisation Flag (O3) activated.
- Published
- 2024
- Full Text
- View/download PDF
11. Comparative Study of Keccak SHA-3 Implementations.
- Author
-
Dolmeta, Alessandra, Martina, Maurizio, and Masera, Guido
- Subjects
- *
COMPARATIVE studies , *RESEARCH personnel , *CRYPTOGRAPHY , *SCALABILITY , *DECISION making - Abstract
This paper conducts an extensive comparative study of state-of-the-art solutions for implementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has spawned numerous implementations across diverse platforms and technologies. This research aims to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid) solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical factors, including computational efficiency, scalability, and flexibility, are evaluated across different use cases. We investigate how each implementation performs in terms of speed and resource utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the informed design and deployment of efficient cryptographic solutions. By providing a comprehensive overview of SHA-3 implementations, this study offers a clear understanding of the available options and equips professionals and researchers with the necessary insights to make informed decisions in their cryptographic endeavors. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Hardware Software Co-design Approch for ECG Signal Analysis
- Author
-
Bendahane, Bouchra, Jenkal, Wissam, Laaboubi, Mostafa, Latif, Rachid, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Aboutabit, Noureddine, editor, Lazaar, Mohamed, editor, and Hafidi, Imad, editor
- Published
- 2023
- Full Text
- View/download PDF
13. Introduction
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2023
- Full Text
- View/download PDF
14. AMS Metamorphic Testing Environment
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2023
- Full Text
- View/download PDF
15. Preliminaries
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2023
- Full Text
- View/download PDF
16. AMS Enhanced Code Coverage Verification Environment
- Author
-
Hassan, Muhammad, Große, Daniel, Drechsler, Rolf, Hassan, Muhammad, Große, Daniel, and Drechsler, Rolf
- Published
- 2023
- Full Text
- View/download PDF
17. An FPGA-Based Hardware Accelerator for the k-Nearest Neighbor Algorithm Implementation in Wearable Embedded Systems
- Author
-
Borelli, Antonio, Spagnolo, Fanny, Gravina, Raffaele, Frustaci, Fabio, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Mahmud, Mufti, editor, Ieracitano, Cosimo, editor, Kaiser, M. Shamim, editor, Mammone, Nadia, editor, and Morabito, Francesco Carlo, editor
- Published
- 2022
- Full Text
- View/download PDF
18. Designing Low-Power and High-Speed FPGA-Based Binary Decision Tree Hardware Accelerators
- Author
-
Huzyuk, Roman, Spagnolo, Fanny, Frustaci, Fabio, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Mahmud, Mufti, editor, Ieracitano, Cosimo, editor, Kaiser, M. Shamim, editor, Mammone, Nadia, editor, and Morabito, Francesco Carlo, editor
- Published
- 2022
- Full Text
- View/download PDF
19. Einführung
- Author
-
Herdt, Vladimir, Große, Daniel, Drechsler, Rolf, Herdt, Vladimir, Große, Daniel, and Drechsler, Rolf
- Published
- 2022
- Full Text
- View/download PDF
20. Schlussfolgerung
- Author
-
Herdt, Vladimir, Große, Daniel, Drechsler, Rolf, Herdt, Vladimir, Große, Daniel, and Drechsler, Rolf
- Published
- 2022
- Full Text
- View/download PDF
21. Introduction
- Author
-
Herdt, Vladimir, Große, Daniel, Drechsler, Rolf, Herdt, Vladimir, Große, Daniel, and Drechsler, Rolf
- Published
- 2021
- Full Text
- View/download PDF
22. Hardware/Software Co-Design of a Circle Detection System Based on Evolutionary Computing.
- Author
-
Rojas-Muñoz, Luis Felipe, Rostro-González, Horacio, García-Capulín, Carlos Hugo, and Sánchez-Solano, Santiago
- Subjects
HOUGH transforms ,LINUX operating systems ,PROGRAMMING languages ,EVOLUTIONARY computation ,PARTICIPATORY design ,GENETIC algorithms - Abstract
In recent years, the strategy of co-designing Hardware/Software (HW/SW) systems has been widely adopted to exploit the synergy between both approaches thanks to technological advances that have led to more powerful devices providing an increasingly better cost–benefit trade-off. This paper presents an HW/SW system for the detection of multiple circles in digital images based on a genetic algorithm. It is implemented on an Ultra96-v2 development board, which contains a Xilinx Zynq UltraScale+ MPSoC device and supports a Linux operating system that facilitates application development. The design is powered by developing an interactive computing environment by means of the Jupyter Notebook platform, in which different programming languages coexist. The specific advantages of each of these languages have been used to describe the hardware component that accelerates the evolutionary computation for circle detection (VHDL), to execute SW-HW interaction functions, as well as the pre- and post-processing of the images (ANSI-C) and to code, evaluate, and document the system execution process (Python). As a result, a computationally efficient application was obtained, with high accuracy in the detection of circles in synthetic and real images, and with a high degree of reconfigurability that provides the user with the necessary tools to incorporate it in a specific area of interest. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Conclusion
- Author
-
Herdt, Vladimir, Große, Daniel, Drechsler, Rolf, Herdt, Vladimir, Große, Daniel, and Drechsler, Rolf
- Published
- 2021
- Full Text
- View/download PDF
24. Algorithm/Accelerator Co-Design and Co-Search for Edge AI.
- Author
-
Zhang, Xiaofan, Li, Yuhong, Pan, Junhao, and Chen, Deming
- Abstract
The world has seen the great success of deep neural networks (DNNs) in a massive number of artificial intelligence (AI) applications. However, developing high-quality AI services to satisfy diverse real-life edge scenarios still encounters many difficulties. As DNNs become more compute- and memory-intensive, it is challenging for edge devices to accommodate them with limited computation/memory resources, tight power budgets, and small form-factors. Challenges also come from the demanding requirements of edge AI, requesting real-time responses, high-throughput performance, and reliable inference accuracy. To address these challenges, we propose a series of efficient design methods to perform algorithm/accelerator co-design and co-search for optimized edge AI solutions. We demonstrate our proposed methods on popular edge AI applications (object detection and image classification) and achieve significant improvements than prior designs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems.
- Author
-
Zhang, Xiaofan, Ma, Yuan, Xiong, Jinjun, Hwu, Wen-Mei W., Kindratenko, Volodymyr, and Chen, Deming
- Subjects
- *
PYTHON programming language , *PARTICIPATORY design , *STREAMING video & television , *FIELD programmable gate arrays - Abstract
Deep neural network (DNN)-based video analysis has become one of the most essential and challenging tasks to capture implicit information from video streams. Although DNNs significantly improve the analysis quality, they introduce intensive compute and memory demands and require dedicated hardware for efficient processing. The customized heterogeneous system is one of the promising solutions with general-purpose processors (CPUs) and specialized processors (DNN Accelerators). Among various heterogeneous systems, the combination of CPU and FPGA has been intensively studied for DNN inference with improved latency and energy consumption compared to CPU + GPU schemes and with increased flexibility and reduced time-to-market cost compared to CPU + ASIC designs. However, deploying DNN-based video analysis on CPU + FPGA systems still presents challenges from the tedious RTL programming, the intricate design verification, and the time-consuming design space exploration. To address these challenges, we present a novel framework, called EcoSys, to explore co-design and optimization opportunities on CPU-FPGA heterogeneous systems for accelerating video analysis. Novel technologies include 1) a coherent memory space shared by the host and the customized accelerator to enable efficient task partitioning and online DNN model refinement with reduced data transfer latency; 2) an end-to-end design flow that supports high-level design abstraction and allows rapid development of customized hardware accelerators from Python-based DNN descriptions; 3) a design space exploration (DSE) engine that determines the design space and explores the optimized solutions by considering the targeted heterogeneous system and user-specific constraints; and 4) a complete set of co-optimization solutions, including a layer-based pipeline, a feature map partition scheme, and an efficient memory hierarchical design for the accelerator and multithreading programming for the CPU. In this article, we demonstrate our design framework to accelerate the long-term recurrent convolution network (LRCN), which analyzes the input video and output one semantic caption for each frame. EcoSys can deliver 314.7 and 58.1 frames/s by targeting the LRCN model with AlexNet and VGG-16 backbone, respectively. Compared to the multithreaded CPU and pure FPGA design, EcoSys achieves $20.6\times $ and $5.3\times $ higher throughput performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. HW/SW Co-Design for Security Systems and the Investigation of Deep Learning-based Side-channel Analysis
- Author
-
Li, H. (author) and Li, H. (author)
- Abstract
Electronic devices have permeated into all aspects of our lives, from basic smart cards to sophisticated hybrid automobile systems. These devices comprise a range of products like sensors, wearable gadgets, mobile phones, personal computers, and others, playing vital roles in many applications and enabling the Internet of Things (IoT). However, with this interconnectedness comes the associated security risks since attackers can exploit vulnerabilities in the system. Securing electronic devices requires the use of cryptographic algorithms and trusted execution environments (TEEs). Cryptographic algorithms ensure data confidentiality and integrity through encryption/decryption, hashing, and digital signatures. TEEs provide secure enclaves within the system for critical operations that prevent unauthorized modifications and access by imposing stringent access restrictions. These two measures have become robust mechanisms for enhancing the security of critical operations and data access control. Despite the above security measures, electronic systems are susceptible to various attacks, including side-channel analysis (SCA), in which attackers exploit information leakage from physical devices while executing instructions or cryptographic algorithms. Power consumption and electromagnetic radiation (EM) are common indicators of this leakage. Countermeasures such as masking and hiding techniques are commonly employed to enhance resistance against SCA. However, the advent of deep learning in SCA has brought forth new challenges, rendering previously efficient countermeasures ineffective. Moreover, deep learning-based SCA has the potential to eliminate preprocessing and alignment requirements inherent in earlier methods. Therefore, this thesis focuses on two main objectives. The first objective is the implementation of cryptographic algorithms and the incorporation of TEEs for secure-sensitive applications. HW/SW co-design approach will be utili, Cyber Security
- Published
- 2024
27. R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA
- Author
-
de Bruin, E. (Barry), Vadivel, Kanishkan, Wijtvliet, Mark, Jääskeläinen, Pekka, Corporaal, Henk, de Bruin, E. (Barry), Vadivel, Kanishkan, Wijtvliet, Mark, Jääskeläinen, Pekka, and Corporaal, Henk
- Abstract
Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable architectures (CGRAs) have been proposed as a good compromise between flexibility and energy efficiency for ultra-low power (ULP) signal processing. Existing CGRAs are often specialized and domain-specific or can only accelerate simple kernels, which makes accelerating complete applications on a CGRA while maintaining high energy efficiency an open issue. Moreover, the lack of instruction set architecture (ISA) standardization across CGRAs makes code generation using current compiler technology a major challenge. This work introduces R-Blocks; a ULP CGRA with HW/SW co-design tool-flow based on the OpenASIP toolset. This CGRA is extremely flexible due to its well-established VLIW-SIMD execution model and support for flexible SIMD-processing, while maintaining an extremely high energy efficiency using software bypassing, optimized instruction delivery, and local scratchpad memories. R-Blocks is synthesized in a commercial 22-nm FD-SOI technology and achieves a full-system energy efficiency of 115 MOPS/mW on a common FFT benchmark, 1.45x higher than a highly tuned embedded RISC-V processor. Comparable energy efficiency is obtained on multiple complex workloads, making R-Blocks a promising acceleration target for general-purpose computing., Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable architectures (CGRAs) have been proposed as a good compromise between flexibility and energy efficiency for ultra-low power (ULP) signal processing. Existing CGRAs are often specialized and domain-specific or can only accelerate simple kernels, which makes accelerating complete applications on a CGRA while maintaining high energy efficiency an open issue. Moreover, the lack of instruction set architecture (ISA) standardization across CGRAs makes code generation using current compiler technology a major challenge. This work introduces R-Blocks; a ULP CGRA with HW/SW co-design tool-flow based on the OpenASIP toolset. This CGRA is extremely flexible due to its well-established VLIW-SIMD execution model and support for flexible SIMD-processing, while maintaining an extremely high energy efficiency using software bypassing, optimized instruction delivery, and local scratchpad memories. R-Blocks is synthesized in a commercial 22-nm FD-SOI technology and achieves a full-system energy efficiency of 115 MOPS/mW on a common FFT benchmark, 1.45x higher than a highly tuned embedded RISC-V processor. Comparable energy efficiency is obtained on multiple complex workloads, making R-Blocks a promising acceleration target for general-purpose computing.
- Published
- 2024
28. HW/SW Co-Design for Dates Classification on Xilinx Zynq SoC
- Author
-
Ahmed Chiheb Ammari, Lazhar Khriji, and Medhat Awadalla
- Subjects
artificial neural network ,color and shape-size features ,zynq soc ,hw/sw co-design ,Telecommunication ,TK5101-6720 - Abstract
This paper proposes HW/SW Co-design of an automatic classification system of Khalas, Khunaizi, Fardh, Qash, Naghal, and Maan dates fruit varieties in Oman. The system implements pre-processing, segmentation of the colored input images, color and shape-size features extraction followed by ANN-tansig classification. The performance of the proposed system is experimented and 97.26% highest classification accuracy are achieved. The proposed system is prototyped using a selected Zynq 7020 SoC platform featuring, on the same chip, a dual-core ARM Cortex A9 processing System (PS) interconnected with FPGA logic (PL) though high-throughput communication channels. The original classification algorithm is profiled and then a HW/SW Co-design is developed achieving 10.9 fps real time classification performance. This performance is acceptable and represents almost 14 times speedup acceleration comparatively to the original program implementation.
- Published
- 2020
- Full Text
- View/download PDF
29. Efficient Hardware/Software Co-design for NTRU
- Author
-
Fritzmann, Tim, Schamberger, Thomas, Frisch, Christoph, Braun, Konstantin, Maringer, Georg, Sepúlveda, Johanna, Rannenberg, Kai, Editor-in-Chief, Sakarovitch, Jacques, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Pras, Aiko, Editorial Board Member, Tröltzsch, Fredi, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Reis, Ricardo, Editorial Board Member, Furnell, Steven, Editorial Board Member, Furbach, Ulrich, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Bombieri, Nicola, editor, Pravadelli, Graziano, editor, Fujita, Masahiro, editor, and Austin, Todd, editor
- Published
- 2019
- Full Text
- View/download PDF
30. Design and implementation of filterbank for MPEG-2/4 AAC system.
- Author
-
Tsai, Tsung-Han and Liu, Hsing-Chuang
- Subjects
- *
DESIGN techniques , *ALGORITHMS , *PARTICIPATORY design , *INTELLECTUAL property , *BINARY sequences , *VIDEO coding , *LOGIC circuits - Abstract
In this paper, the design and implementation of low power and high-efficiency filterbank for MPEG-2/4 AAC system is presented. Since filterbank represents the most computation-intensive kernel of AAC codec, we design it with algorithm and architecture aspects. We supply the dedicated algorithm for filterbank. The derived algorithm and the hardware shared engine (HSE) we proposed for AAC can reduce the computation power and implement as a codec used in encoder and decoder. We also optimize the performance, hardware resources, and power consumption thoroughly. It is designed as an intellectual property (IP) to construct the overall decoder in an embedded system. The hardware cost is with 16.1 k logic gates, 2 k-word local memory, and 1 K-word coefficient ROM. The proposed design has a real-time operation at only 1.25 MHz with a sampling rate of 48 kHz. It can achieve 0.70 mW power consumption in TSMC 0.18 μm CMOS technology. Furthermore, we use a programmable chip (SOPC) platform which includes the software and hardware engine. Throughout the computation analysis, the bitstream parser and lower complexity part are performed by a software solution, and the higher complexity part is computed by a hardware solution. Several design techniques are needed including the wrapper design, embedded CPU, and IP to construct the system. Based on this co-design approach, a whole AAC audio decoder is also established. • This paper presents the algorithm and hardware design for MPEG-2/4 AAC. • Real demonstration is constructed in the embedded system. • VLSI design of the proposed chip is performed with various power-saving techniques. • The proposed design outperforms other designs on gate-count usage and power consumption. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Hw/Sw Co-Design technique for 2D fast fourier transform algorithm on Zynq SoC.
- Author
-
Kortli, Yassin, Gabsi, Souhir, Jridi, Maher, Alfalou, Ayman, and Atri, Mohamed
- Subjects
- *
FAST Fourier transforms , *ALGORITHMS , *PARTICIPATORY design , *ARM microprocessors , *COMPUTER performance - Abstract
The Two-Dimensional Fast Fourier Transform (2D-FFT) algorithm is used for the study of many modern systems applied for security and biometrics. The adoption of this algorithm, which is a compute intensive task, is limited due to its hardware design complexity. The first objective of this paper is to underline the effect of the hardware/software co-design (Hw/Sw co-design) for the reduction of the processing time and power consumption. Secondly, we propose an innovative architecture for the 2D-FFT algorithm tested on Zynq Soc, which requires less processing time and memory compared to the traditional algorithm. Three implementations (one software and two Hw/Sw co-designs) of the 2D-FFT algorithm using the Zynq SoC are presented in this paper. The first is based on ARM processor. A speedup of 29x is obtained compared to the original implementation thanks to many optimizations. The second is a Hw/Sw co-design solution of the traditional 2D-FFT algorithm introduced on a hybrid platform combining an ARM Cortex-A9 processor with an FPGA. The third is also a Hw/Sw co-design solution using our optimized 2D-FFT algorithm to reach the real-time contraints for high-resolution images (1920 × 1080). It provides a speedup of 1.13x, 3.31x and 96.21x faster than the Hw/Sw co-design implementation of the traditional RC algorithm, the pure software implementations with and without optimizations, respectively. • Underline the effect of the Hw/Sw co-design of the 2D-FFT algorithm for the reduction of the processing time and power consumption. • Innovative architecture for the 2D-FFT algorithm tested on Zynq Soc respecting real-time constraints. • A comparative study between four existing architectures are used to implement the FFT IP to select the best one. • Study the impact of directives on resource usage and performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Co‐design implementation of High Efficiency Video Coding standard encoder on Zynq MPSoC.
- Author
-
Touzani, Hajar, Mansouri, Anass, Errahimi, Fatima, and Ahaitouf, Ali
- Subjects
- *
BIT rate , *GATE array circuits , *SIGNAL-to-noise ratio , *INTERPOLATION , *VIDEO coding , *STATISTICS - Abstract
Summary: Statistical analysis of High Efficiency Video Coding (HEVC) encoder reveals that in the motion compensation block, the interpolation filter consumes more than 30% in the encoder time in comparison with other blocks. In this paper, we start with an optimized hardware implementation of the interpolation filter on field‐programmable gate array (FPGA) based on Xilinx setup environment. In a second step, a Hardware/Software (HW/SW) co‐design implementation of HM16.7 encoder is performed on Zynq MPSoC platform to evaluate the proposed interpolation filter IP in terms of total encoder run‐time, taking advantages of both processing units (quad‐core ARM Cortex TM‐A53 processor and Programmable Logic FPGA component) available on the Zynq MPSoC. The proposed architecture of luma and chroma filters was simulated and synthesized on Xilinx XCZU7EV‐2FFVC1156 FPGA at 250‐MHz clock frequency. The synthesis results present an optimized power consumption of 3.308 mW for higher resolutions (2560 × 1600 and 1920 × 1080) at 50 fps with the use of just 1% of the FPGA resources. The experimental results of the co‐design implementation of HEVC encoder present a speedup of 2 times (41% in PeopleOnStreet sequence) in terms of processing time compared to the software alone implementation, with a an increase of 0.51% of bit rate and a very small degradation of peak signal‐to‐noise ratio (PSNR) (0.01%). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
33. LDPC Binary Vectors Coding Enhances Transmissions and Memories Reliability
- Author
-
Knot, Tomas, Vlcek, Karel, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Silhavy, Radek, editor, Senkerik, Roman, editor, Kominkova Oplatkova, Zuzana, editor, Prokopova, Zdenka, editor, and Silhavy, Petr, editor
- Published
- 2017
- Full Text
- View/download PDF
34. On Evolutionary Approximation of Sigmoid Function for HW/SW Embedded Systems
- Author
-
Minarik, Milos, Sekanina, Lukas, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, McDermott, James, editor, Castelli, Mauro, editor, Sekanina, Lukas, editor, Haasdijk, Evert, editor, and García-Sánchez, Pablo, editor
- Published
- 2017
- Full Text
- View/download PDF
35. IoT Components for Secure Smart Building Environments
- Author
-
Koulamas, Christos, Giannoulis, Spilios, Fournaris, Apostolos, Keramidas, Georgios, editor, Voros, Nikolaos, editor, and Hübner, Michael, editor
- Published
- 2017
- Full Text
- View/download PDF
36. FPGA based real-time epileptic seizure prediction system.
- Author
-
Coşgun, Ercan and Çelebi, Anıl
- Subjects
EPILEPSY ,PROGRAMMABLE controllers ,ELECTROENCEPHALOGRAPHY ,SYSTEMS on a chip ,FIELD programmable gate arrays - Abstract
The development of systems that can predict epileptic seizures in real-time offers great hope for epilepsy patients. These systems aim to prevent accidents that patients may experience caused by the loss of consciousness during seizures. Therefore, patients must use real-time epileptic seizure prediction systems that do not interfere with their daily activities. In this study, using the unipolar EEG data from a surface electrode, a patient-specific estimation system is implemented in real-time on a system on chip (SoC) that contains an embedded processor and programmable logic blocks. The European epilepsy database EPILEPSIAE is used in the scope of this work. In the proposed system, pre-processing is applied to the EEG data. Then, the features of the data in the frequency domain are extracted. The classifier model is trained with the RusBoosted Tree cluster classifier, which is a machine learning algorithm. Testing is carried out using the proposed classification model. Threshold values are determined, and then false alarms and erroneous classifications are prevented by post-processing. At the end of the tests, prediction success, sensitivity (SEN), Specificity (SPE), False Prediction Rate (FPR), and prediction times are obtained as 77.30%, 95.94%, 0.041 h
−1 , and 33.23 min, respectively. The proposed system outperforms other studies in the literature in the number of electrodes, real-time operation, hardware/software architecture, and FPR performance. A wearable seizure prediction system seems to be commercialized according to the results achieved in this study. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
37. Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search With Hot Start.
- Author
-
Jiang, Weiwen, Yang, Lei, Dasgupta, Sakyasingha, Hu, Jingtong, and Shi, Yiyu
- Subjects
- *
SPACE (Architecture) , *FIELD programmable gate arrays - Abstract
Hardware and neural architecture co-search that automatically generates artificial intelligence (AI) solutions from a given dataset are promising to promote AI democratization; however, the amount of time that is required by current co-search frameworks is in the order of hundreds of GPU hours for one target hardware. This inhibits the use of such frameworks on commodity hardware. The root cause of the low efficiency in existing co-search frameworks is the fact that they start from a “cold” state (i.e., search from scratch). In this article, we propose a novel framework, namely, HotNAS, that starts from a “hot” state based on a set of existing pretrained models (also known as model zoo) to avoid lengthy training time. As such, the search time can be reduced from 200 GPU hours to less than 3 GPU hours. In HotNAS, in addition to hardware design space and neural architecture search space, we further integrate a compression space to conduct model compressing during the co-search, which creates new opportunities to reduce latency, but also brings challenges. One of the key challenges is that all of the above search spaces are coupled with each other, e.g., compression may not work without hardware design support. To tackle this issue, HotNAS builds a chain of tools to design hardware to support compression, based on which a global optimizer is developed to automatically co-search all the involved search spaces. Experiments on ImageNet dataset and Xilinx FPGA show that, within the timing constraint of 5 ms, neural architectures generated by HotNAS can achieve up to 5.79% Top-1 and 3.97% Top-5 accuracy gain, compared with the existing ones. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
38. TOWARD AN EMBEDDED SYSTEM FOR GESTURE RECOGNITION BASED ON ARTIFICIAL NEURAL NETWORK USING RECONFIGURABLE TARGET (CASE STUDY AND REVIEW).
- Author
-
Reda, Ahmad, Alshoufi, Tareq, Bouzid, Ahmed, and Vásárhelyi, József
- Subjects
EMBEDDED computer systems ,ARTIFICIAL neural networks ,ARTIFICIAL intelligence ,ACCELEROMETERS ,NEURAL computers ,DATA acquisition systems - Abstract
With a view to create an intelligent remote control for robot movements, this article treats the study case of dataset creation using RSG (Reference Signal Generator). Using artificial intelligence, the device recognizes the gestures of an operator. Indeed, a neural network can classify time series data coming from accelerometers, and for a beginning 4 gestures are taken into consideration. The most challenging work is to build a reference dataset that is necessary for the learning process. To train the neural network, a huge amount of reference data should be created (hundreds of thousands of time-series vectors per gesture per sensor), which cannot be done manually by an operator. To overcome the issue, an RSG is created. This article also describes how a 1-DoF arm has been designed to emulate the behavior of the human arm doing gestures as well as the data acquisition system. The system is based on a software/hardware co-design implemented on Programmable System on Chip (PSoC). [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
39. High-Performance Vision-Based Navigation on SoC FPGA for Spacecraft Proximity Operations.
- Author
-
Lentaris, George, Stratakos, Ioannis, Stamoulias, Ioannis, Soudris, Dimitrios, Lourakis, Manolis, and Zabulis, Xenophon
- Subjects
- *
COMPUTER vision , *DIGITAL electronics , *FIELD programmable gate arrays , *ARTIFICIAL satellite tracking , *SPACE vehicles - Abstract
Future autonomous spacecraft rendezvous with uncooperative or unprepared objects will be enabled by vision-based navigation, which imposes great computational challenges. Targeting short duration missions in low Earth orbit, this paper develops high-performance avionics supporting custom computer vision algorithms of increased complexity for satellite pose tracking. At algorithmic level, we track 6D pose by rendering a depth image from an object mesh model and robustly matching edges detected in the depth and intensity images. At system level, we devise an architecture to exploit the structure of commercial system-on-chip FPGAs, i.e., Zynq7000, and the benefits of tightly coupling VHDL accelerators with CPU-based functions. At implementation level, we employ our custom HW/SW co-design methodology and an elaborate combination of digital circuit design techniques to optimize and map efficiently all functions to a compact embedded device. Providing significant performance per watt improvement, the resulting VBN system achieves a throughput of 10–14 FPS for 1 Mpixel images, with only 4.3 watts mean power and 1U size, while tracking ENVISAT in real-time with only 0.5% mean positional error. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
40. Hardware-software implementation of HEVC decoder on Zynq.
- Author
-
Ayadi, Lella Aicha, Loukil, Hassen, Ayed, Mohamed Ali Ben, and Masmoudi, Nouri
- Subjects
SYSTEMS on a chip ,LINUX operating systems ,VIDEO coding ,LINEAR orderings - Abstract
This paper presents an efficient implementation of the High Efficiency Video Coding (HEVC) decoder using Hardware/Software (HW/SW) co-design approach on the Zynq System on Chip (SoC) Platform. The reference software decoder HM 10.0 has been implemented under embedded Linux Operating System (OS). For real-time decoding, we provide hardware acceleration for the most computationally intensive parts of the HEVC decoder, which are the interpolation filters. The proposed design improves the processing throughput targeting on the resolution of 3840 × 2160 at a frame rate of 60 fps. HW/SW validation is achieved and examined in terms of resource utilization, throughput and power consumption. In order to improve the total decoding time, we propose to enable the Direct Memory Access (DMA) mode that can help speed page access and minimize the transfer time between the processor and hardware accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
41. A Hardware/Software Co-Design Vision for Deep Learning at the Edge
- Author
-
Flavio Ponzina, Simone Machetti, Marco Rios, Benoit Walter Denkinger, Alexandre Levisse, Giovanni Ansaloni, Miguel Peon-Quiros, and David Atienza
- Subjects
HW/SW co-design ,Hardware and Architecture ,Low-power ,EdgeAI ,Electrical and Electronic Engineering ,Software - Abstract
The growing popularity of edgeAI requires novel solutions to support the deployment of compute-intense algorithms in embedded devices. In this article, we advocate for a holistic approach, where application-level transformations are jointly conceived with dedicated hardware platforms. We embody such a stance in a strategy that employs ensemble-based algorithmic transformations to increase robustness and accuracy in Convolutional Neural Networks (CNNs), enabling the aggressive quantization of weights and activations. Opportunities offered by algorithmic optimizations are then harnessed in domain-specific hardware solutions, such as the use of multiple ultra-low-power processing cores, the provision of shared acceleration resources, the presence of independently power-managed memory banks, and voltage scaling to ultra-low levels, greatly reducing (up to 60% in our experiments) energy requirements. Furthermore, we show that aggressive quantization schemes can be leveraged to perform efficient computations directly in memory banks, adopting in-memory computing solutions. We showcase that the combination of parallel in-memory execution and aggressive quantization leads to more than 70% energy and latency gains compared to baseline implementations.
- Published
- 2022
- Full Text
- View/download PDF
42. SPARTAN/SEXTANT/COMPASS: Advancing Space Rover Vision via Reconfigurable Platforms
- Author
-
Lentaris, George, Stamoulias, Ioannis, Diamantopoulos, Dionysios, Maragos, Konstantinos, Siozios, Kostas, Soudris, Dimitrios, Rodrigalvarez, Marcos Aviles, Lourakis, Manolis, Zabulis, Xenophon, Kostavelis, Ioannis, Nalpantidis, Lazaros, Boukas, Evangelos, Gasteratos, Antonios, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Sano, Kentaro, editor, Soudris, Dimitrios, editor, Hübner, Michael, editor, and Diniz, Pedro C., editor
- Published
- 2015
- Full Text
- View/download PDF
43. Hybrid FPGA/ARM Co-design for Near Real Time of Remote Sensing Imagery
- Author
-
Góngora-Martín, C., Castillo-Atoche, A., Estrada-López, J., Vázquez-Castillo, J., Ortegón-Aguilar, J., Carrasco-Álvarez, R., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Bayro-Corrochano, Eduardo, editor, and Hancock, Edwin, editor
- Published
- 2014
- Full Text
- View/download PDF
44. Towards a Mobile Implementation of Waaves for Certified Medical Image Compression in E-Health Applications
- Author
-
Mhedhbi, Imen, Hachicha, Khalil, Garda, Patrick, Bai, Yuhui, Granado, Bertrand, Topin, Sébastien, Hochberg, Sylvain, Akan, Ozgur, Series editor, Bellavista, Paolo, Series editor, Cao, Jiannong, Series editor, Dressler, Falko, Series editor, Ferrari, Domenico, Series editor, Gerla, Mario, Series editor, Kobayashi, Hisashi, Series editor, Palazzo, Sergio, Series editor, Sahni, Sartaj, Series editor, Shen, Xuemin (Sherman), Series editor, Stan, Mircea, Series editor, Xiaohua, Jia, Series editor, Zomaya, Albert, Series editor, Coulson, Geoffrey, Series editor, Godara, Balwant, editor, and Nikita, Konstantina S., editor
- Published
- 2013
- Full Text
- View/download PDF
45. A Generic and Non-intrusive Profiling Methodology for SystemC Multi-core Platform Simulation Models
- Author
-
Brandenburg, Jens, Stabernack, Benno, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Herkersdorf, Andreas, editor, Römer, Kay, editor, and Brinkschulte, Uwe, editor
- Published
- 2012
- Full Text
- View/download PDF
46. Cost and Performance Evaluation of a Noise Filter for Partitioning in Co-design Methodologies
- Author
-
Rodellar, Victoria, de Icaya, Elvira Martínez, Díaz, Francisco, Peinado, Virginia, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Sirisuk, Phaophak, editor, Morgan, Fearghal, editor, El-Ghazawi, Tarek, editor, and Amano, Hideharu, editor
- Published
- 2010
- Full Text
- View/download PDF
47. SystemC-based Co-Simulation/Analysis for System-Level Hardware/Software Co-Design.
- Author
-
Muttillo, Vittoriano, Pomante, Luigi, Santic, Marco, and Valente, Giacomo
- Subjects
- *
PARTICIPATORY design , *SCIENTIFIC community , *PARALLEL processing , *SYSTEMS design , *ENERGY industries - Abstract
• Electronic system-level HW/SW co-design of heterogeneous parallel embedded systems. • SystemC-based electronic system-level functional and timing HW/SW co-simulation. • System-level multi model of computation co-analysis (communication and concurrency). • System-level multi model of computation co-estimation (load and bandwidth). • Support to automatic electronic system-level design space exploration. Heterogeneous parallel devices are becoming increasingly common in the embedded systems field. This is primarily due to their ability to improve timing performance, while simultaneously reducing costs and energy. In this context, this study addresses the role of a hardware/software (HW/SW) co-simulation and analysis tool for embedded systems designed on heterogeneous parallel architectures. In particular, it presents an extended System C-based tool for functional and timing HW/SW co-simulation/analysis within a reference Electronic System-Level HW/SW co-design flow. The description of the main features of the tool, and the main design and integration issues represent the core of the paper. Furthermore, the paper presents two case studies that demonstrate the enhanced effectiveness and efficiency of the extended tool. This is achieved through reduced simulation. Thanks to all this, the paper contributes to fully motivate the industrial and research communities to adopt and further investigate system-level approaches. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. Cross-Layer Design for IEEE 802.16-2005 System Using Platform-Based Methodologies
- Author
-
Tseng, Li-chuan, Chen, Kuan-yin, Huang, ChingYao, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Denko, Mieso K., editor, Shih, Chi-sheng, editor, Li, Kuan-Ching, editor, Tsao, Shiao-Li, editor, Zeng, Qing-An, editor, Park, Soo Hyun, editor, Ko, Young-Bae, editor, Hung, Shih-Hao, editor, and Park, Jong Hyuk, editor
- Published
- 2007
- Full Text
- View/download PDF
49. Hardware/Software Co-Design of a Circle Detection System Based on Evolutionary Computing
- Author
-
Horacio Rostro Gonzalez, Luis Felipe Rojas Muñoz, Santiago Sánchez-Solano, and Carlos Hugo Garcia Capulin
- Subjects
Computer Networks and Communications ,Hardware and Architecture ,Control and Systems Engineering ,HW/SW co-design ,systems-on-chip ,genetic algorithm ,circle detection ,interactive computing platform ,Signal Processing ,Electrical and Electronic Engineering - Abstract
In recent years, the strategy of co-designing Hardware/Software (HW/SW) systems has been widely adopted to exploit the synergy between both approaches thanks to technological advances that have led to more powerful devices providing an increasingly better cost–benefit trade-off. This paper presents an HW/SW system for the detection of multiple circles in digital images based on a genetic algorithm. It is implemented on an Ultra96-v2 development board, which contains a Xilinx Zynq UltraScale+ MPSoC device and supports a Linux operating system that facilitates application development. The design is powered by developing an interactive computing environment by means of the Jupyter Notebook platform, in which different programming languages coexist. The specific advantages of each of these languages have been used to describe the hardware component that accelerates the evolutionary computation for circle detection (VHDL), to execute SW-HW interaction functions, as well as the pre- and post-processing of the images (ANSI-C) and to code, evaluate, and document the system execution process (Python). As a result, a computationally efficient application was obtained, with high accuracy in the detection of circles in synthetic and real images, and with a high degree of reconfigurability that provides the user with the necessary tools to incorporate it in a specific area of interest.
- Published
- 2022
- Full Text
- View/download PDF
50. Streaming in Consumer Products : Beyond processing data
- Author
-
van Doren, Giel, Engel, Bas, Toolenaar, Frank, editor, and van der Stok, Peter, editor
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.