9,251 results on '"hardware architecture"'
Search Results
2. FPGA-Based Design of a Ready-to-Use and Configurable Soft IP Core for Frame Blocking Time-Sampled Digital Speech Signals †.
- Author
-
Srinivas, Nettimi Satya Sai, Sugan, Nagarajan, Kumar, Lakshmi Sutha, Nath, Malaya Kumar, and Kanhe, Aniruddha
- Abstract
'Frame blocking' or 'Framing' is a technique that divides a time-sampled speech or audio signal into consecutive and equi-sized short-time frames, either overlapped or non-overlapped, for analysis. The framing hardware architectures (FHA) in the literature support framing speech or audio samples of specific word size with specific frame size and frame overlap size. However, speech and audio applications often require framing signal samples of varied word sizes with varied frame sizes and frame overlap sizes. Therefore, the existing FHAs must be redesigned appropriately to keep up with the variability in word size, frame size and frame overlap size, as demanded across multiple applications. Redesigning the existing FHAs for each specific application is laborious, prompting the need for a configurable intellectual property (IP) core. The existing FHAs are inappropriate for creating configurable IP cores as they lack adaptability to accommodate variability in frame size and frame overlap size. Therefore, to address these issues, a novel FHA, adaptable to accommodate the desired variability, is proposed. Furthermore, the proposed FHA is transformed into a field-programmable gate array-based soft, ready-to-use and configurable frame blocking IP core using the Xilinx
® Vivado™ tool. The resulting IP core is versatile, offering configurability for framing in numerous applications incorporating real-time digital speech and audio systems. This research article discusses the proposed FHA and frame blocking IP core in detail. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
3. 国际首颗动目标检测专用卫星的设计与 在轨验证.
- Author
-
杜鑫, 钟若飞, 杨灿坤, 李清扬, 周春平, 李小娟, and 宫辉力
- Abstract
With the rapid development of satellite technology, the demand on remote sensing applications has shifted from traditional static observation to dynamic monitoring. Current remote sensing satellites supporting moving target detection face challenges, such as the difficulties in extracting slow-moving targets, the limited observation coverage, and the difficulties of transmission and processing of large volumes of data. This paper proposes innovative approaches in camera imaging modes, data processing methods, and hardware architecture. (1)The design of a remote sensing camera with the dual-line array push-broom imaging mode is proposed, which allows for the acquisition of same-spectral dual-strip data with controllable time differences in a single imaging session;this breakthrough overcomes the challenge of observing large-scale moving targets under non-agile satellite conditions and provides a means to obtain "instantaneous change" information in the context of dynamic remote sensing. (2)The supporting onboard intelligent processing unit is developed independently, which is equipped with efficient onboard processing algorithms;through the design of high-performance parallel computing hardware-accelerated architecture, the hardware carrier of real-time remote sensing service in the dynamic remote sensing system is formed. The prototype based on the technology has been successfully launched aboard the MN200Sar-1 optical remote sensing satellite, positioning it as the first dedicated moving target detection satellite in the world. The on-orbit verification results show that the satellite is capable of detecting a wide range of moving targets within its sweeping field of view. It exhibits excellent detection performance for high-speed trains, vehicles, ships and other objects in motion. And the on-board processing unit meets the requirements for on-board processing applications in terms of time efficiency, energy utilization, and processing effectiveness. The related technologies and achievements hold significant theoretical and practical implications for various application domains, including intelligent transportation, disaster prevention and mitigation, and national security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Weighted edge-based low-cost artifacts-free high-quality VLSI implementation for demosaicking.
- Author
-
Siva, Midde Venkata and Jayakumar, E. P.
- Abstract
Bayer color filter array (CFA) is the most used color filter pattern which is employed in digital cameras to capture images, necessitating a demosaicking process for generating complete RGB images. Furthermore, the cost associated with demosaicking must not exceed the savings realized using a CFA. This paper presents a low-cost hardware architecture that produces good image quality with reduced color artifacts in the reconstructed images. In the proposed method, Green interpolation is developed by computing weighted color differences and weighted edges in the horizontal and vertical directions of 5 × 7 mask size. Red and Blue channel information is obtained by computing color difference values from already computed Green values and corresponding pixel values from the Bayer image. Further, two extra line buffers are used to store the previously computed intermediate Green values, which eliminates the memory requirement to store those values. Compared with the existing methods, our method produces good-quality reconstructed images without color artifacts. The proposed method utilizes less hardware than the previous methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A high‐throughput flexible lossless compression and decompression architecture for color images.
- Author
-
Xu, Tongqing, Yao, Tan, Li, Ning, Li, JunMing, Min, Xinlong, and Xiao, Hao
- Subjects
- *
COMPUTATIONAL complexity , *ALGORITHMS , *IMAGE compression , *PIXELS , *HARDWARE , *COMPUTER software - Abstract
Lossless image compression techniques shrink the image size to improve the transmission efficiency and reduce the occupied storage space while ensuring the quality of the image is lossless. Among them, the LOCO‐I/JPEG‐LS algorithm benefits high lossless compression ratio and low computational complexity and thus is widely used for various real‐time applications. However, due to the problems of the context dependency in the LOCO‐I, the parallelism in the algorithm is greatly constrained, which significantly limits the throughput and the real‐time performance of hardware implementations. Existing designs achieve more parallelism by using a lot of hardware costs or straightforward chunking with losing compression ratio. In order to trade off the parallelism and the compression ratio, this paper proposes a chunk‐oriented error modeling scheme for LOCO‐I, which enables parallelism in both compression and decompression and achieves a better compression ratio in chunks. Based on the optimized algorithm, a high‐throughput flexible lossless compression and decompression architecture (HFCD) is proposed, which achieves higher pixel per clock (PPC) with less hardware cost. Additionally, HFCD introduces a parameter sharing mechanism to enable random access of image chunks to improve the flexibility for decompression. Experimental results show that, compared with state‐of‐the‐art works, HFCD achieves 3.02–13.50 times improvement for the PPC of compression. For decompression, benefiting from our optimizations, HFCD achieves 22.4 times speedup compared to the software solution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN.
- Author
-
Dehnavi, Hossein, Dehnavi, Mohammad, and Klidbary, Sajad Haghzad
- Abstract
Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at 16 × 16 , and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a 4.42 % BD-BR and - 0.19 BD-PSNR compared to HM16.5. The proposed system can process 64 × 64 CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Hardware implementation of digital pseudo-random number generators for real-time applications.
- Author
-
Gafsi, Mohamed, Hafsa, Amal, and machout, Mohsen
- Abstract
This paper introduces the hardware implementation of Digital Pseudo-Random Number Generators (DPRNG) based on chaotic systems. First, high-performance pipeline hardware architectures, respectively, for the 3D Lorenz, 3D Chua, 3D Rossler, and 3D Chen chaotic systems are designed to increase the operating frequency. The study also includes an examination of the hardware architectures with 32-bit fixed-point and 32-bit single floating-point data precision. Second, hardware architectures of DPRNG based on a single chaotic system and the congruential generator are put forward. Third, a robust DPRNG, that mixes the 3D Lorenz, 3D Chua, 3D Rossler, and 3D Chen chaotic systems is proposed where all designed chaotic systems operate in parallel. This architecture increases pseudo-random numbers space up to 2
480 . The FPGA implementation of the proposed pipeline hardware architecture of the complex DPRNG can achieve a maximal operating frequency of 192.446 MHz with a high throughput of 73,899.264 Mbps. The NIST 800-22 test suite result indicates that the DPRNG produces high-quality pseudo-random bits. Consequently, the proposed DPRNG is deemed suitable for use in high-speed applications. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
8. Hardware architecture optimization for high-frequency zeroing and LFNST in H.266/VVC based on FPGA.
- Author
-
Zhang, Junxiang, Sheng, Qinghua, Pan, Rui, Wang, Jiawei, Qin, Kuan, Huang, Xiaofang, and Niu, Xiaoyan
- Abstract
To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Hardware Design of Desktop EDM Machine Tool Numerical Control System.
- Author
-
ZHANG Chenxin, JIANG Yi, NIE Zilong, and JIANG Chensong
- Subjects
NUMERICAL control of machine tools ,ELECTRIC metal-cutting - Abstract
Aiming at the compatibility and discharge requirements of desktop EDM machine tool numerical control system, a hardware platform of EDM machine tool numerical control system based on ARM and FPGA was designed. The task distribution of the platform was uniform and reasonable, with ARM as the core to build the upper computer, and the transplantation of Linux system to carry out multi-task processing, man-machine interface and interpolation operation. The lower computer system based on FPGA was responsible for motion control, discharge pulse control, auxiliary module control and other tasks. The hardware platform was built in the form of core board plus base board and the communication was carried out by Ethernet port. The experimental results show that the CNC system built by the hardware platform can accomplish the EDM small hole machining. This research provides a hardware platform design of EDM machine tool numerical control system with good portability, low cost of iterative upgrade and adaptability to various pulse power sources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. FPGA-Based DNN Implementation for the Autonomous Car System
- Author
-
Lam, Duc Khai, Vy Vo, Dang Nhat, Anh Pham, Xuan Tuan, Thinh Ngo, Ha Quang, Chlamtac, Imrich, Series Editor, Hai, Nguyen Thanh, editor, Huy, Nguyen Xuan, editor, Amine, Khalil, editor, and Lam, Tran Dai, editor
- Published
- 2024
- Full Text
- View/download PDF
11. Scratchy: A Class of Adaptable Architectures with Software-Managed Communication for Edge Streaming Applications
- Author
-
Faye, Joseph W., Haggui, Naouel, Kermarrec, Florent, Martin, Kevin J. M., Bhattacharyya, Shuvra, Nezan, Jean-François, Pelcat, Maxime, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Dias, Tiago, editor, and Busia, Paola, editor
- Published
- 2024
- Full Text
- View/download PDF
12. Intelligent Building Automation System: A Layered Hardware and Software Architecture Approach
- Author
-
Chavan, Pramod U., Chavan, Pratibha P., Ghanekar, Vivek D., Jadhav, Sharad T., Kale, Shilpa J., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So–In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
13. Design of a Pipeline Computing Module as Part of a Specialized VLSI
- Author
-
Tarasov, Ilya, Lyulyava, Daniil, Duksin, Nikita, Duksina, Ilona, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Jordan, Vladimir, editor, Tarasov, Ilya, editor, Shurina, Ella, editor, Filimonov, Nikolay, editor, and Faerman, Vladimir A., editor
- Published
- 2024
- Full Text
- View/download PDF
14. Multiplier Design for the Modulo Set and Its Application in DCT for HEVC
- Author
-
Kopperundevi, P., Prakash, M. Surya, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Kalya, Shubhakar, editor, Kulkarni, Muralidhar, editor, and Bhat, Subramanya, editor
- Published
- 2024
- Full Text
- View/download PDF
15. 一种基于FPGA 的SVPWM 硬件架构及其计算速度优化.
- Author
-
刘德平, 辛云川, and 刘子旭
- Abstract
Copyright of Journal of Zhengzhou University: Engineering Science is the property of Editorial Office of Journal of Zhengzhou University: Engineering Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
16. Real-time hardware architecture of an ECG compression algorithm for IoT health care systems and its VLSI implementation.
- Author
-
Ez-ziymy, Siham, Hatim, Anas, and Hammia, Slama
- Abstract
The Internet of Things (IoT) in the medical and biomedical field proposes new and efficient hardware for healthcare services. Thanks to machine-machine interaction and real-time solutions, the problems of accessibility and reliability are resolved. In addition, increased patient engagement in decision-making will drive health service compliance. Vital signals like the electrocardiogram (ECG) are some of the most critical biomedical information to process; it is the subject of several studies. The data flow of those signals is enormous, making real-time transmission a tough job, hence the need to compress these vital signals. Designing efficient hardware compression engines is a promising challenge for efficient real-time transmission. This article introduces a new VLSI (Very-Large-Scale Integration) architecture for an ECG compression engine based on the algorithm presented in the same work. The efficiency of our processor was verified using the MIT BIH databases. We have also implemented it using An FPGA, which reaches a frequency of 170 MHz and 65 n TCMS CMOS. The proposed processor uses 1.85 Kgates and consumes 25 nW with a compression ratio of 3.42. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Efficient FPGA Binary Neural Network Architecture for Image Super-Resolution.
- Author
-
Su, Yuanxin, Seng, Kah Phooi, Smith, Jeremy, and Ang, Li Minn
- Subjects
IMAGE reconstruction algorithms ,HIGH resolution imaging ,IMAGE quality in imaging systems ,CONVOLUTIONAL neural networks ,FIELD programmable gate arrays ,DEEP learning - Abstract
Super-resolution systems refer to computer-based systems designed to enhance the quality of images or video by producing high-resolution renditions from low-resolution counterparts using computational algorithms and technologies. Various methods and techniques have been used in development of super-resolution systems. The development of Convolution Neural Networks (CNNs) and the Deep Learning (DL) methods have outperformed traditional methods. However, as models become increasingly deeper with wider receptive fields, the number of parameters significantly increases. While this often results in better performance, it renders these models impractical for real-life scenarios such as smartphones or other mobile systems. Currently, most proposed methods with higher perceptual quality demand a substantial amount of time to process a single image, even on powerful hardware like NVIDIA GPUs. Such computationally expensive models are not cost-effective for real-world application scenarios. Optimization is needed to reduce the computational costs and memory requirements to enhance their suitability for less powerful hardware configurations. In this work, we propose an efficient binary neural network architecture, ResBinESPCN, designed for image super-resolution. In our design, we improved the energy efficiency of the architecture through algorithmic and hardware-level optimizations. These optimizations not only enhance computational efficiency and reduce memory consumption but also achieve effective image super-resolution in resource-constrained environments. Our experimental validation highlights the effectiveness of this network structure and includes ablation studies on models with varying data bit widths. Hardware analysis substantiates the efficiency and real-time capabilities of this model. Additionally, deploying the model on FPGA using FINN demonstrates its low hardware resource usage and low power consumption. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. A Survey on Swarm Robotics for Area Coverage Problem.
- Author
-
Muhsen, Dena Kadhim, Sadiq, Ahmed T., and Raheem, Firas Abdulrazzaq
- Subjects
- *
AGGREGATION (Robotics) - Abstract
The area coverage problem solution is one of the vital research areas which can benefit from swarm robotics. The greatest challenge to the swarm robotics system is to complete the task of covering an area effectively. Many domains where area coverage is essential include exploration, surveillance, mapping, foraging, and several other applications. This paper introduces a survey of swarm robotics in area coverage research papers from 2015 to 2022 regarding the algorithms and methods used, hardware, and applications in this domain. Different types of algorithms and hardware were dealt with and analysed; according to the analysis, the characteristics and advantages of each of them were identified, and we determined their suitability for different applications in covering the area for many goals. This study demonstrates that naturally inspired algorithms have the most significant role in swarm robotics for area coverage compared to other techniques. In addition, modern hardware has more capabilities suitable for supporting swarm robotics to cover an area, even if the environment is complex and contains static or dynamic obstacles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. 异构多平台信号处理任务调度研究.
- Author
-
李宇东, 马金全, 谢宗甫, and 沈小龙
- Abstract
The simple parallel computing or single heterogeneous platform can no longer meet the requirements of signal processing and task scheduling with large computation and high complexity, so heterogeneous multi platform system has become the development trend of signal processing and task scheduling. In view of improving the throughput of the platform, utilization rate of the processor and perception of the task, this study investigates the signal processing model of heterogeneous multi-platform, and the scheduling tasks and hardware and software resources are modeled by directed acyclic graph. Based on the proposed scheduling algorithms, the task scheduling is summarized and compared. It is found that the hybrid scheduling algorithm based on task perception can meet the platform scheduling requirements well. It is a trend of future research to use mixed scheduling algorithm based on task perception to solve task scheduling in signal processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Optimizing Hardware Resource Utilization for Accelerating the NTRU-KEM Algorithm †.
- Author
-
Lee, Yongseok, Youn, Jonghee, Nam, Kevin, Oh, Hyunyoung, and Paek, Yunheung
- Subjects
ALGORITHMS ,PARALLEL processing ,RESEARCH personnel ,HARDWARE ,CRYPTOGRAPHY ,ITERATIVE learning control - Abstract
This paper focuses on enhancing the performance of the Nth-degree truncated-polynomial ring units key encapsulation mechanism (NTRU-KEM) algorithm, which ensures post-quantum resistance in the field of key establishment cryptography. The NTRU-KEM, while robust, suffers from increased storage and computational demands compared to classical cryptography, leading to significant memory and performance overheads. In environments with limited resources, the negative impacts of these overheads are more noticeable, leading researchers to investigate ways to speed up processes while also ensuring they are efficient in terms of area utilization. To address this, our research carefully examines the detailed functions of the NTRU-KEM algorithm, adopting a software/hardware co-design approach. This approach allows for customized computation, adapting to the varying requirements of operational timings and iterations. The key contribution is the development of a novel hardware acceleration technique focused on optimizing bus utilization. This technique enables parallel processing of multiple sub-functions, enhancing the overall efficiency of the system. Furthermore, we introduce a unique integrated register array that significantly reduces the spatial footprint of the design by merging multiple registers within the accelerator. In experiments conducted, the results of our work were found to be remarkable, with a time-area efficiency achieved that surpasses previous work by an average of 25.37 times. This achievement underscores the effectiveness of our optimization in accelerating the NTRU-KEM algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. DAEBI: A Tool for Data Flow and Architecture Explorations of Binary Neural Network Accelerators
- Author
-
Yayla, Mikail, Latotzke, Cecilia, Huber, Robert, Iskif, Somar, Gemmeke, Tobias, Chen, Jian-Jia, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Silvano, Cristina, editor, Pilato, Christian, editor, and Reichenbach, Marc, editor
- Published
- 2023
- Full Text
- View/download PDF
22. Flexible Systolic Hardware Architecture for Computing a Custom Lightweight CNN in CT Images Processing for Automated COVID-19 Diagnosis
- Author
-
Aguirre-Alvarez, Paulo Aarón, Diaz-Carmona, Javier, Arredondo-Velázquez, Moisés, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mahmud, Mufti, editor, Mendoza-Barrera, Claudia, editor, Kaiser, M. Shamim, editor, Bandyopadhyay, Anirban, editor, Ray, Kanad, editor, and Lugo, Eduardo, editor
- Published
- 2023
- Full Text
- View/download PDF
23. On VEI, AGI Pyramid, and Energy : Can AGI Society Prevent the Singularity?
- Author
-
Alidoust, Mohammadreza, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hammer, Patrick, editor, Alirezaie, Marjan, editor, and Strannegård, Claes, editor
- Published
- 2023
- Full Text
- View/download PDF
24. FPGA implementation of Proximal Policy Optimization algorithm for Edge devices with application to Agriculture Technology.
- Author
-
Waseem, Shaik Mohammed and Roy, Subir Kumar
- Abstract
Reinforcement Learning (RL) is a technique where an agent learns to accomplish an assigned task on the basis of reward phenomenon. RL algorithm when implemented with embedded - Field Programmable Gate Array (FPGA) hardware, is capable of influencing future applications and automation to a much greater extent than other implementation approaches. This work discusses an important RL algorithm called the Proximal Policy Optimization (PPO) applied to the example of Cart-Pole a well know benchmark from the control theory domain. It presents a novel hardware architecture designed, implemented and verified for the benchmark using the PPO based RL algorithm. The hardware implementation uses the Xilinx Avnet Ultra96v2 platform consisting of the Xilinx Zynq Ultrascale + MPSoC (ZU3EG). The synthesis platform uses the Xilinx Vivado HLS 2019.2v and Xilinx Vivado 2019.2v along with Xilinx's PYNQ framework. The results from Matlab/Simulink are used as a golden reference to verify the results from the hardware implementation. It also enables a better understanding of the dynamics of the Cart-Pole benchmark problem. A comparative analysis of the proposed hardware architecture with the state of art implementations in the literature is done. This along with the illustration of application framework enables us to establish the novelty of the proposed approach and its usefulness for applications from the domain of agriculture intended to be executed on edge devices. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Design and implementation of hardware-efficient architecture for saturation-based image dehazing algorithm.
- Author
-
George, Anuja and Jayakumar, E. P.
- Abstract
For real-time single-image dehazing, this paper suggests a straightforward and efficient saturation-based transmission map estimation method. For the suggested image dehazing algorithm, the design of a hardware-efficient very large scale integration (VLSI) architecture is also provided. By removing the computationally demanding sorting operations, the algorithm computes the dark channel, increases the robustness of atmospheric light estimation using a hardware-friendly local atmospheric light estimation module based on the pixel saturation values, and reduces the effects of halo artifacts using an edge-preserving filter to estimate the saturation-based transmission map. Compared to previous sophisticated dehazing approaches, this study exhibits competitive performance in the quality of the dehazed images. The best of the existing dehazing architecture as well as the proposed architecture are described in Verilog hardware description language (HDL), functionally verified using Vivado 2019.1 simulator, and synthesized using Cadence genus compiler. The results of the implementation show that the suggested design is hardware-efficient and offers higher throughput. The suggested dehazing architecture achieves better results in terms of area and delay than the most recent methods and is appropriate for applications with hardware restrictions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Configurable Encryption and Decryption Architectures for CKKS-Based Homomorphic Encryption.
- Author
-
Lee, Jaehyeok, Duong, Phap Ngoc, and Lee, Hanho
- Subjects
- *
DATA privacy , *TIME complexity , *CLOUD storage , *DATA security - Abstract
With the increasing number of edge devices connecting to the cloud for storage and analysis, concerns about security and data privacy have become more prominent. Homomorphic encryption (HE) provides a promising solution by not only preserving data privacy but also enabling meaningful computations on encrypted data; while considerable efforts have been devoted to accelerating expensive homomorphic evaluation in the cloud, little attention has been paid to optimizing encryption and decryption (ENC-DEC) operations on the edge. In this paper, we propose efficient hardware architectures for CKKS-based ENC-DEC accelerators to facilitate computations on the client side. The proposed architectures are configurable to support a wide range of polynomial sizes with multiplicative depths (up to 30 levels) at a 128-bit security guarantee. We evaluate the hardware designs on the Xilinx XCU250 FPGA platform and achieve an average encryption time 23.7× faster than that of the well-known SEAL HE library. By reducing time complexity and improving the hardware utilization of cryptographic algorithms, our configurable CKKS-supported ENC-DEC hardware designs have the potential to greatly accelerate cryptographic processes on the client side in the post-quantum era. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. QuantLaneNet: A 640-FPS and 34-GOPS/W FPGA-Based CNN Accelerator for Lane Detection.
- Author
-
Lam, Duc Khai, Du, Cam Vinh, and Pham, Hoai Luan
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *COMPUTER performance , *ENERGY consumption - Abstract
Lane detection is one of the most fundamental problems in the rapidly developing field of autonomous vehicles. With the dramatic growth of deep learning in recent years, many models have achieved a high accuracy for this task. However, most existing deep-learning methods for lane detection face two main problems. First, most early studies usually follow a segmentation approach, which requires much post-processing to extract the necessary geometric information about the lane lines. Second, many models fail to reach real-time speed due to the high complexity of model architecture. To offer a solution to these problems, this paper proposes a lightweight convolutional neural network that requires only two small arrays for minimum post-processing, instead of segmentation maps for the task of lane detection. This proposed network utilizes a simple lane representation format for its output. The proposed model can achieve 93.53% accuracy on the TuSimple dataset. A hardware accelerator is proposed and implemented on the Virtex-7 VC707 FPGA platform to optimize processing time and power consumption. Several techniques, including data quantization to reduce data width down to 8-bit, exploring various loop-unrolling strategies for different convolution layers, and pipelined computation across layers, are optimized in the proposed hardware accelerator architecture. This implementation can process at 640 FPS while consuming only 10.309 W, equating to a computation throughput of 345.6 GOPS and energy efficiency of 33.52 GOPS/W. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. 可用于 HEVC 视频编码器的混合输入 DCT 变换器设计.
- Author
-
兰尔铭, 施隆照, 宋佳柔, and 杨小玲
- Abstract
Copyright of Journal of Fuzhou University is the property of Journal of Fuzhou University, Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
29. A Streaming Data Processing Architecture Based on Lookup Tables.
- Author
-
Yuemaier, Aximu, Chen, Xiaogang, Qian, Xingyu, Dai, Weibang, Li, Shunfen, and Song, Zhitang
- Subjects
ELECTRONIC data processing ,PHASE change memory ,FAST Fourier transforms ,INTERNET of things - Abstract
Processing in memory (PIM) is a new computing paradigm that stores the function values of some input modes in a lookup table (LUT) and retrieves their values when similar input modes are encountered (instead of performing online calculations), which is an effective way to save energy. In the era of the Internet of Things, the processing of massive data generated by the front-end requires low-power and real-time processing. This paper investigates an energy-efficient processing architecture based on table lookup in phase-change memory (PCM). This architecture replaces logical-based calculations with LUT lookups to minimize power consumption and operation latency. In order to improve the efficiency of table lookup, the RISC-V instruction set has included extended lookup and data stream transmission instructions. Finally, the system architecture is validated by hardware simulation, and the performance of computing the fast Fourier transform (FFT) application is evaluated. The proposed architecture effectively improves the execution efficiency and reduces the power consumption of data flow operations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Cell-Based Refinement Processor Utilizing Disparity Characteristics of Road Environment for SGM-Based Stereo Vision Systems
- Author
-
Cheol-Ho Choi, Hyun Woo Oh, Joonhwan Han, and Jungho Shin
- Subjects
Stereo vision ,disparity refinement ,hardware architecture ,FPGA ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Embedded stereo vision systems based on traditional approaches often require a disparity refinement process to enhance image quality. Weighted median filter (WMF)-based processors are commonly employed for their excellent refinement performance. However, when implemented on a field-programmable gate array (FPGA), WMF-based processors face a trade-off between hardware resource utilization and refinement performance. To address this trade-off, we previously proposed a new disparity refinement processor based on the hybrid max-median filter (HMMF). However, our earlier work did not guarantee flawless operation in large occluded and texture-less regions, particularly in areas with numerous holes. In order to overcome this limitation of conventional processors, we proposed a cell-based disparity refinement processor. This processor extends our previous HMMF-based disparity refinement processor. To evaluate its refinement performance, we conducted experiments using four types of publicly available stereo datasets. When comparing refinement performance, our proposed processor outperforms conventional processors when using the KITTI 2012 and 2015 stereo benchmark datasets. Additionally, the results demonstrate that our proposed processor exhibits superior refinement performance when applied to the Cityscapes and StereoDriving datasets in comparison to conventional processors. Furthermore, when considering hardware resource utilization, our proposed processor demonstrates lower resource requirements than conventional processors when implemented on an FPGA. Therefore, our proposed disparity refinement processor is well-suited for the disparity refinement process in stereo vision systems that require cost-effectiveness and high performance.
- Published
- 2023
- Full Text
- View/download PDF
31. Hardware Architecture for Reducing Worst-Case Latency in Fast SCF Polar Decoders
- Author
-
Useok Lee, Jeahack Lee, and Myung Hoon Sunwoo
- Subjects
Polar codes ,5G ,decoding history ,worst-case latency ,successive cancellation flip (SCF) ,hardware architecture ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The Successive Cancellation Flip (SCF)-based decoding for polar codes requires significant latency at low SNRs. This paper proposes a low-latency SCF-based decoding and decoder architecture based on decoding history. In particular, a history memory structure for the Fast-SCF (FSCF) decoder has been proposed. The proposed history memory can store the intermediate decoding result of the first decoding and reduce the latency by shortening the additional decoding distance of SCF-based decoding. Furthermore, codeword segmentation is used to compensate for the area increase due to the history memory. The proposed decoder was synthesized using the Samsung 28 nm standard cell library and compared with state-of-the-art polar decoders. The proposed History-based FSCF (HFSCF) decoder improved the worst-case throughput, and the result was approximately doubled compared to FSCF decoders that share the same decoder architecture. In addition, the normalized worst-case area efficiency was 78% higher than the FSCF decoder with the same flipping trial and 22% higher than the latest Belief Propagation Flip (BPF) decoder.
- Published
- 2023
- Full Text
- View/download PDF
32. Lightweight Hardware Architecture of EKF-SLAM and Its FPGA Implementation
- Author
-
Hammia, Slama, Hatim, Anas, Bouaaddi, Abella, Haijoub, Abdelilah, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Motahhir, Saad, editor, and Bossoufi, Badre, editor
- Published
- 2022
- Full Text
- View/download PDF
33. A Simplified Vowel-Like Speech Detection Method and Its FPGA Implementation
- Author
-
Garnaik, Sarmila, Rout, Shasanka Sekhar, Sethi, Kabiraj, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Engelbrecht, Andries, editor, Scotti, Fabio, editor, Gandhi, Niketa, editor, Manghirmalani Mishra, Pooja, editor, Fortino, Giancarlo, editor, Sakalauskas, Virgilijus, editor, and Pllana, Sabri, editor
- Published
- 2022
- Full Text
- View/download PDF
34. Integrating Business Intelligence with Cloud Computing: State of the Art and Fundamental Concepts
- Author
-
El Ghalbzouri, Hind, El Bouhdidi, Jaber, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Ben Ahmed, Mohamed, editor, Teodorescu, Horia-Nicolai L., editor, Mazri, Tomader, editor, Subashini, Parthasarathy, editor, and Boudhir, Anouar Abdelhakim, editor
- Published
- 2022
- Full Text
- View/download PDF
35. High-throughput and area-efficient architectures for image encryption using PRINCE cipher.
- Author
-
Kumar, Abhiram, Singh, Pulkit, Patro, K Abhimanyu Kumar, and Acharya, Bibhudendra
- Subjects
- *
IMAGE encryption , *BLOCK ciphers , *UBIQUITOUS computing , *CIPHERS , *PRINCES , *IMAGE analysis - Abstract
Internet of Things (IoT) has gained popularity in recent years and has engulfed nearly every industry. The widespread use of numerous ubiquitous computing devices in the low-resource domain has resulted in a new set of privacy and security concerns. To address the problem of security in resource-constrained devices, many lightweight algorithms have been developed. This paper proposes optimized hardware implementations of the lightweight PRINCE block cipher, with the aim of providing adequate security while maximizing resource efficiency. The proposed architecture uses fewer resources and provides a reasonable trade-off between area footprint and efficiency. In the proposed unrolled pipelined architecture, the encryption round is divided into three sub-stages, with registers inserted in between. Using this design approach, the operating frequency is greatly improved. As a result, this architecture adapts itself effectively to high-performance applications. This paper also proposes serial-based and round-based architectures for resource-constrained devices. The proposed unrolled sub pipeline PRINCE block cipher is implemented on the Virtex-6-FF784 and Virtex-4-FF668 FPGA device families and achieved substantial improvements in throughput of 13.057% and 113%, respectively, as well as efficiency of 8.109% and 113.734% respectively. The proposed architecture is evaluated on a variety of grayscale images, and the security analysis is performed using MATLAB software. Aside from that, the proposed architecture uses the CBC-mode of operation. The security analysis and encryption outputs show that the proposed architecture is an effective choice for image encryption and provides sufficient security to the cipher images. • Propose a serial architecture which reduces the number of utilized resources that suggests reduction in area for resource constrained devices. • Propose an unrolled pipeline architecture which achieves high throughput, and low latency implementation. • Propose a round-based architecture and provides a comparison of all proposed designs with existing implementations. • All the implementations are analyzed by using different security analyses for image encryption applications that secure their immunity against various statistical and differential attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Combining Gradient-Based and Thresholding Methods for Improved Signal Reconstruction Performance.
- Author
-
Žarić, Maja Lakičević, Draganić, Anđela, Orović, Irena, Beko, Marko, and Stanković, Srđan
- Abstract
Analysis of sparse signals has been attracting the attention of the research community in recent years. Several approaches for sparse signal recovery have been developed to provide accurate recovery from a small portion of available data. This paper proposes an improved combined approach for both accurate and computationally efficient signal recovery. Particularly, the proposed approach uses the benefits of the gradient-based steepest descent method (that belongs to the convex optimization group of algorithms) in combination with a specially designed thresholding method. This approach includes solutions for several commonly used sparse bases – the discrete Fourier, discrete cosine transform, and discrete Hermite transform, but can be adapted for other transformations as well. The presented theory is experimentally evaluated and supported by empirical data. Various analytic and real-world signals are used to assess the performance of the proposed algorithm. The analyses are performed for different percentages of available samples. The complexity of the presented algorithm can be seen through the analog hardware implementation presented in this paper. Additionally, the user-friendly graphical interface is developed with a belonging signal database to ease usage and testing. The interface allows users to choose various parameters and to examine the performance of the proposed tool in different scenarios and transformation bases. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Design and Xilinx Virtex-field-programmable gate array for hardware in the loop of sensorless second-order sliding mode control and model reference adaptive system–sliding mode observer for direct torque control of induction motor drive.
- Author
-
Krim, Saber and Mimouni, Mohamed Faouzi
- Abstract
This article aims first to propose an improved space vector modulation–direct torque control in order to enhance the induction motor performance using: (1) a super-twisting controller and (2) a novel model reference adaptive system based on a sliding mode observer. Second, due to the complexity of the suggested control algorithm, this article deals also with a hardware implementation on a field-programmable gate array board. Indeed, the field-programmable gate array is mainly chosen to reduce the execution time, thanks to its parallel processing which significantly improves the quality of the control system by reducing the sampling period, and consequently the delays in the control loop. Besides, the super-twisting controller is proposed to enhance the speed regulation loop which is a second-order sliding mode control technique that uses a continuous control law to prevent the chattering phenomenon induced by the first-order sliding mode control technique. Moreover, the high performance control requires information about the rotor speed which can be obtained by a sensor. Generally, the use of the speed sensor increases the system cost and size, and reduces its system reliability. Therefore, a combination between a model reference adaptive system observer and a sliding mode observer is suggested for speed estimation and overcoming the sensitivity of the classical model reference adaptive system observer against uncertainties and stator resistance variations. The performance of the proposed space vector modulation--direct torque control--super-twisting controller with the model reference adaptive system--sliding mode observer algorithm of an induction motor is verified through simulation, hardware co-simulation and experimental validation utilizing a field-programmable gate array Virtex-5-ML507 board. The performance of the suggested sensorless control method is compared also with other recent published schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A Hardware-Efficient Perturbation Method to the Digital Tent Map.
- Author
-
Nardo, Lucas, Nepomuceno, Erivelton, Muñoz, Daniel, Butusov, Denis, and Arias-Garcia, Janier
- Subjects
DIGITAL maps ,DIGITAL mapping ,PARTICLE swarm optimization ,DIGITAL communications ,DIGITAL electronics ,GATE array circuits - Abstract
Digital chaotic systems used in various applications such as signal processing, artificial intelligence, and communications often suffer from the issue of dynamical degradation. This paper proposes a solution to address this problem in the digital tent map. Our proposed method includes a simple and optimized hardware architecture, along with a hardware-efficient perturbation method, to create a high-performance computing system that retains its chaotic properties. We implemented our proposed architecture using an FPGA (Field-Programmable Gate Array) and the 1's complement fixed-point format. Our results demonstrate that the implemented digital circuit reduces logical resource consumption compared to state-of-the-art references and exhibits pseudo-random nature, as confirmed by various statistical tests. We validated our proposed pseudo-random number generator in a hardware architecture for particle swarm optimization, demonstrating its effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. Efficient FPGA architecture to implement non-separable fast Fourier transform for image and video applications.
- Author
-
Sarkar, Sayantam and Bhairannawar, Satish S.
- Subjects
- *
FAST Fourier transforms , *VIDEO processing , *FIELD programmable gate arrays , *LOGIC circuits , *DATA conversion - Abstract
Fast Fourier Transform (FFT) is widely used in image and video processing applications to convert the respective image or video frames into transform domain that is very helpful to extract the accurate features of that image or video frame for various real-time applications. In this paper, efficient non-separable 8-point FFT architecture (DIT-FFT) is proposed that is implemented on Spartan-6 (xc6slx45-3csg324) FPGA. The proposed architecture consists of Data Format Conversion, Addition, Subtraction, Multiplier Equivalent and D-FF blocks, respectively. The non-separable equations of 8-point DIT-FFT are derived from the respective Butterfly Diagram that is then implemented using basic logic gates, which optimises the hardware utilisations with the help of Complex Conjugate property. The constant multiplications present in the non-separable DIT-FFT equations are implemented through Adders and Shifters presents in Multiplier Equivalent block which further optimises the overall hardware utilisations. Moreover, the Q-format are used to increase the data accuracy of the architecture. The comparison results show that the proposed architecture is better than existing in different prospectives. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Optimizing Hardware Resource Utilization for Accelerating the NTRU-KEM Algorithm
- Author
-
Yongseok Lee, Jonghee Youn, Kevin Nam, Hyunyoung Oh, and Yunheung Paek
- Subjects
post-quantum security ,NTRU ,key encapsulation mechanism ,hardware architecture ,ASIC ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This paper focuses on enhancing the performance of the Nth-degree truncated-polynomial ring units key encapsulation mechanism (NTRU-KEM) algorithm, which ensures post-quantum resistance in the field of key establishment cryptography. The NTRU-KEM, while robust, suffers from increased storage and computational demands compared to classical cryptography, leading to significant memory and performance overheads. In environments with limited resources, the negative impacts of these overheads are more noticeable, leading researchers to investigate ways to speed up processes while also ensuring they are efficient in terms of area utilization. To address this, our research carefully examines the detailed functions of the NTRU-KEM algorithm, adopting a software/hardware co-design approach. This approach allows for customized computation, adapting to the varying requirements of operational timings and iterations. The key contribution is the development of a novel hardware acceleration technique focused on optimizing bus utilization. This technique enables parallel processing of multiple sub-functions, enhancing the overall efficiency of the system. Furthermore, we introduce a unique integrated register array that significantly reduces the spatial footprint of the design by merging multiple registers within the accelerator. In experiments conducted, the results of our work were found to be remarkable, with a time-area efficiency achieved that surpasses previous work by an average of 25.37 times. This achievement underscores the effectiveness of our optimization in accelerating the NTRU-KEM algorithm.
- Published
- 2023
- Full Text
- View/download PDF
41. A Survey on Swarm Robotics for Area Coverage Problem
- Author
-
Dena Kadhim Muhsen, Ahmed T. Sadiq, and Firas Abdulrazzaq Raheem
- Subjects
swarm robotics ,area coverage ,hardware architecture ,swarm robotics algorithms ,Industrial engineering. Management engineering ,T55.4-60.8 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The area coverage problem solution is one of the vital research areas which can benefit from swarm robotics. The greatest challenge to the swarm robotics system is to complete the task of covering an area effectively. Many domains where area coverage is essential include exploration, surveillance, mapping, foraging, and several other applications. This paper introduces a survey of swarm robotics in area coverage research papers from 2015 to 2022 regarding the algorithms and methods used, hardware, and applications in this domain. Different types of algorithms and hardware were dealt with and analysed; according to the analysis, the characteristics and advantages of each of them were identified, and we determined their suitability for different applications in covering the area for many goals. This study demonstrates that naturally inspired algorithms have the most significant role in swarm robotics for area coverage compared to other techniques. In addition, modern hardware has more capabilities suitable for supporting swarm robotics to cover an area, even if the environment is complex and contains static or dynamic obstacles.
- Published
- 2023
- Full Text
- View/download PDF
42. Hardware-Assisted Cross-Generation Prediction of GPUs Under Design
- Author
-
O’Neal, Kenneth, Brisk, Philip, Shriver, Emily, and Kishinevsky, Michael
- Subjects
Bioengineering ,Graphics processors ,hardware architecture ,machine learning ,modeling techniques ,performance of systems ,Electrical and Electronic Engineering ,Computer Hardware ,Computer Hardware & Architecture - Abstract
This paper introduces a predictive modeling framework for GPU performance. The key innovation underlying this approach is that performance statistics collected from representative workloads running on current generation GPUs can effectively predict the performance of next-generation GPUs. This is useful when simulators are available for the next-generation device, but simulation times are exorbitant, rendering early design space exploration of microarchitectural parameters and other features infeasible. When predicting performance across three Intel GPU generations (Haswell, Broadwell, Skylake), our models achieved impressively low out-of-sample-errors ranging from 7.45% to 8.91%, while running 29 481 to 44 214 times faster than cycle-accurate simulations. A detailed ranking of the most impactful features selected for these models provides an insight as to which microarchitectural subsystems have the greatest impact on performance from one generation to the next.
- Published
- 2019
43. Hardware-Assisted Cross-Generation Prediction of GPUs under Design
- Author
-
O'Neal, K, Brisk, P, Shriver, E, and Kishinevsky, M
- Subjects
Graphics processors ,hardware architecture ,machine learning ,modeling techniques ,performance of systems ,Bioengineering ,Computer Hardware & Architecture ,Electrical and Electronic Engineering ,Computer Hardware - Abstract
This paper introduces a predictive modeling framework for GPU performance. The key innovation underlying this approach is that performance statistics collected from representative workloads running on current generation GPUs can effectively predict the performance of next-generation GPUs. This is useful when simulators are available for the next-generation device, but simulation times are exorbitant, rendering early design space exploration of microarchitectural parameters and other features infeasible. When predicting performance across three Intel GPU generations (Haswell, Broadwell, Skylake), our models achieved impressively low out-of-sample-errors ranging from 7.45% to 8.91%, while running 29 481 to 44 214 times faster than cycle-accurate simulations. A detailed ranking of the most impactful features selected for these models provides an insight as to which microarchitectural subsystems have the greatest impact on performance from one generation to the next.
- Published
- 2019
44. Design, integration and implementation of crypto cores in an SoC environment
- Author
-
Pandey, Jai Gopal, Gupta, Sanskriti, and Karmakar, Abhijit
- Published
- 2022
- Full Text
- View/download PDF
45. Development and Validation of Embedded System Architecture for Shallow-Water Based H-AUV.
- Author
-
Shaik, Shakeera, Vandavasi, Bala Naga Jyothi, Narayanaswamy, Vedachalam, and Venkataraman, Hrishikesh
- Subjects
GRAPHICAL user interfaces ,AUTONOMOUS underwater vehicles ,TECHNOLOGICAL innovations ,KALMAN filtering ,MOVING average process ,AUTONOMOUS vehicles ,NAVIGATION - Abstract
Autonomous underwater vehicles (AUVs) have gained enormous popularity over the years and are employed extensively in various industries, including bio-research, subsea industries, and military applications. Most of the available commercial AUVs are very expensive and complex. This makes it unsuitable to be used for civilian applications. On the other hand, recent technological advancements have made it possible to have highly capable sensors at acceptable prices as an alternative to expensive commercial vehicles. In this paper, an embedded system-based multi-sensor hardware architecture for an H-configured AUV, called H-AUV, is designed and developed with low-cost, power-efficient, real-time controllers with a small footprint for data acquisition and sensors. These parameters play an important role for designing and developing energy-efficient autonomous vehicles. Significantly, auto navigation is a very important mechanism for AUVs, which includes auto heading control and depth keeping control techniques developed, deployed, and tested in the H-AUV. Additionally, the denoise filters such as moving average, exponential, and dynamic linear Kalman filter (KF) have been exercised and validated for heading and depth control of an H-AUV. It was found that the dynamic KF is very efficient and performs with 88% accuracy in the heading and depth control mechanisms. The KF is also found to perform with 98% accuracy for surface navigation of the H-AUV. Finally, an indigenous graphical user interface has been developed for data telemetry, command, and logging in autonomous and manual modes through wired and wireless communication. The proposed development and validation of an efficient and low-cost H-AUV shall support academic researchers for subsea applications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition With Hierarchical Tucker Tensor Decomposition.
- Author
-
Gong, Yu, Yin, Miao, Huang, Lingyi, Deng, Chunhua, and Yuan, Bo
- Subjects
- *
PARTICIPATORY design , *ENERGY consumption , *ALGORITHMS , *HARDWARE - Abstract
Long short-term memory (LSTM) is a type of powerful deep neural network that has been widely used in many sequence analysis and modeling applications. However, the large model size problem of LSTM networks make their practical deployment still very challenging, especially for the video recognition tasks that require high-dimensional input data. Aiming to overcome this limitation and fully unlock the potentials of LSTM models, in this paper we propose to perform algorithm and hardware co-design towards high-performance energy-efficient LSTM networks. At algorithm level, we propose to develop fully decomposed hierarchical Tucker (FDHT) structure-based LSTM, namely FDHT-LSTM, which enjoys ultra-low model complexity while still achieving high accuracy. In order to fully reap such attractive algorithmic benefit, we further develop the corresponding customized hardware architecture to support the efficient execution of the proposed FDHT-LSTM model. With the delicate design of memory access scheme, the complicated matrix transformation can be efficiently supported by the underlying hardware without any access conflict in an on-the-fly way. Our evaluation results show that both the proposed ultra-compact FDHT-LSTM models and the corresponding hardware accelerator achieve very high performance. Compared with the state-of-the-art compressed LSTM models, FDHT-LSTM enjoys both order-of-magnitude reduction (more than $1000 \times$ 1000 × ) in model size and significant accuracy improvement (0.6% to 12.7%) across different video recognition datasets. Meanwhile, compared with the state-of-the-art tensor decomposed model-oriented hardware TIE, our proposed FDHT-LSTM architecture achieve $2.5\times$ 2. 5 × , $1.46\times$ 1. 46 × and $2.41\times$ 2. 41 × increase in throughput, area efficiency and energy efficiency, respectively on LSTM-Youtube workload. For LSTM-UCF workload, our proposed design also outperforms TIE with $1.9\times$ 1. 9 × higher throughput, $1.83\times$ 1. 83 × higher energy efficiency and comparable area efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. EGCN: An Efficient GCN Accelerator for Minimizing Off-Chip Memory Access.
- Author
-
Han, Yunki, Park, Kangkyu, Jung, Youngbeom, and Kim, Lee-Sup
- Subjects
- *
DYNAMIC random access memory , *REPRESENTATIONS of graphs , *MATRIX multiplications , *MEMORY , *RANDOM access memory , *ENERGY consumption - Abstract
As Graph Convolutional Networks (GCNs) have emerged as a promising solution for graph representation learning, designing specialized GCN accelerators has become an important challenge. An analysis of GCN workloads shows that the main bottleneck of GCN processing is not computation but the memory latency of intensive off-chip data transfer. Therefore, minimizing off-chip data transfer is the primary challenge for designing an efficient GCN accelerator. To address this challenge, optimization is initialized by considering GCNs as tiled matrix multiplication. In this paper, we optimize off-chip memory access from both the in- and out-of-tile perspectives. From the out-of-tile perspective, we find optimal tile configurations of given datasets and on-chip buffer capacity, then observe the dataflow across phases and layers. Inter-layer phase fusion dataflow with optimal tile configuration reduces data transfer of intermediate outputs. From the in-tile perspective, due to the sparsity of tiles, tiles have redundant data which does not participate in computation. Redundant data load is eliminated with hardware support. Finally, we introduce an efficient GCN inference accelerator, EGCN, specialized for minimizing off-chip memory access. EGCN achieves 41.9% off-chip DRAM access reduction, 1.49× speedup, and 1.95× energy efficiency improvement on average over the state-of-the-art accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Parallel field programmable gate array implementation of the sum of absolute differences algorithm used in the stereoscopic system.
- Author
-
Sejai, Mohamed, Mansouri, Anass, Dosse, Saad Bennani, and Ruichek, Yassine
- Subjects
- *
FIELD programmable gate arrays , *STEREO vision (Computer science) , *ALGORITHMS , *PARALLEL processing , *IMAGING systems - Abstract
Stereo vision is a popular method for an artificial vision-based environment perception system used in various applications such as intelligent transportation. With two cameras, the disparity map is calculated to find the distance and depth of objects in front of a moving vehicle. The key element of the stereoscopic system is based on the sum of absolute differences (SAD) algorithm, which is the most repeated operation in the stereo matching subsystem; however, this algorithm requires a very intensive processing time, statistical analysis show that the SAD block can consume more than 80% of the overall processing time of the algorithm. In this paper we propose a highly efficient hardware architecture of the SAD algorithm for real time stereo matching, the proposed architecture is established by a hierarchical parallel architecture of the SAD block, and verified by simulation and successfully implemented in Cyclone IV field programmable gate array (FPGA), it provides a significant reduction of processing time and the performance of the stereo imaging system is able to achieve 30 frames per second of 640x480 resolution color images. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency.
- Author
-
Corbalan-Navarro, David, Aragon, Juan L., Anglada, Marti, de Lucas, Enrique, Parcerisa, Joan-Manuel, and Gonzalez, Antonio
- Subjects
RENDERING (Computer graphics) ,GRAPHICS processing units ,ENERGY consumption ,PROBLEM solving - Abstract
The most common task of GPUs is to render images in real time. When rendering a 3D scene, a key step is to determine which parts of every object are visible in the final image. There are different approaches to solve the visibility problem, the Z-Test being the most common. A main factor that significantly penalizes the energy efficiency of a GPU, especially in the mobile arena, is the so-called overdraw, which happens when a portion of an object is shaded and rendered but finally occluded by another object. This useless work results in a waste of energy; however, a conventional Z-Test only avoids a fraction of it. In this article we present a novel microarchitectural technique, the Omega-Test, to drastically reduce the overdraw on a Tile-Based Rendering (TBR) architecture. Graphics applications have a great degree of inter-frame coherence, which makes the output of a frame very similar to the previous one. The proposed approach leverages the frame-to-frame coherence by using the resulting information of the Z-Test for a tile (a buffer containing all the calculated pixel depths for a tile), which is discarded by nowadays GPUs, to predict the visibility of the same tile in the next frame. As a result, the Omega-Test early identifies occluded parts of the scene and avoids the rendering of non-visible surfaces eliminating costly computations and off-chip memory accesses. Our experimental evaluation shows average EDP savings in the overall GPU/Memory system of 26.4 percent and an average speedup of 16.3 percent for the evaluated benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Monitoring, Recording and Diagnosis Designed Hardware Structure for a Data Acquisition Module Running in Electro-energetic Environment
- Author
-
ALBIŢA Anca, SELIŞTEANU Dan, and MĂMULEANU Mădălin
- Subjects
data acquisition ,synchronous sampling ,single board computer ,hardware architecture ,serial communication interfaces ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Applications for monitoring, recording and diagnosis in electro-energetics settle as objective the safety increase in exploitation, for installation and for equipment as well, through fast failure detection and preventing critical situations. This specific area of applications implies the design and implementation for an industrial environment acquisition module with several basic technical features: allowing the acquisition of a minimum of 8 analog electrical signals, with settable parameters, providing synchronous sampling for all inputs and also 14 or 16-bit resolution for analog-to-digital conversion, assuring temporary data storing and on-request transfer to a computer for database saving. The data acquisition unit functionality is managed through a firmware, launched automatically when powered on. The module provides the required data acquisition information for further data processing, complex functioning regimes and voluntarily generated phenomena analysis specific to electro-energetic installations. This presented data acquisition module has an open structure, in terms of hardware architecture and also as resource storage for local processing and communication, thus being seen as an optimal solution.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.