792 results on '"Reconfigurable hardware"'
Search Results
2. NLU-V: A Family of Instruction Set Extensions for Efficient Symmetric Cryptography on RISC-V.
- Author
-
Uzuner, Hakan and Kavun, Elif Bilge
- Subjects
- *
BLOCK ciphers , *CRYPTOGRAPHY - Abstract
Cryptographic primitives nowadays are not only implemented in high-performance systems but also in small-scale systems, which are increasingly powered by open-source processors, such as RISC-V. In this work, we leverage RISC-V's modular base instruction set and architecture to propose a generic instruction set extension (ISE) for symmetric cryptography. We adapt the work from Engels et al. in ARITH'13, the non-linear/linear instruction set extension (NLU), which presents a generic hardware/software co-design solution for efficient symmetric crypto implementations through a hardware unit extending the 8-bit AVR instruction set. These new instructions realize non-linear and linear layers, which are widely used to implement the block ciphers in symmetric cryptography. Our proposal modifies and extends the NLU instructions to a 32-bit RISC-V architecture; hence, we call the proposed ISE 'NLU-V'. The proposed architecture is integrated into the open-source RISC-V implementation 'Icicle' and synthesized on a Xilinx Kintex-7 XC7K160T FPGA. The area overhead for the proposed NLU-V ISE is 1088 slice registers and 4520 LUTs. As case studies, the PRESENT and AES block ciphers are implemented using the new ISE on RISC-V in assembly. Our evaluation metric to showcase the performance gain, Z 'time-area-product (TAP)' (the execution time in clock cycles times code memory consumption), reflects the impact of the proposed family of instructions on the performance of the cipher implementations. The simulations show that the NLU-V achieves 89% gain for PRESENT and 68% gain for AES. Further, the NLU-V requires 44% less lines of code for the PRESENT and 23% less for the AES implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration.
- Author
-
Yatham, Brahmendra, Wang, Shu-Ting, Kim, Dohee, Sarikhani, Parisa, Mahmoudi, Babak, Mahajan, Divya, Park, Jongse, Esmaeilzadeh, Hadi, Kim, Joon, Ahn, Byung, Kinzer, Sean, Ghodrati, Soroush, and Mahapatra, Rohan
- Subjects
Compilers ,Hardware/Software Interfaces ,Heterogeneous (Hybrid) Systems ,Reconfigurable Hardware ,Runtime Environments - Abstract
FPGA accelerators offer performance and efficiency gains by narrowing the scope of acceleration to one algorithmic domain. However, real-life applications are often not limited to a single domain, which naturally makes Cross-Domain Multi-Acceleration a crucial next step. The challenge is, existing FPGA accelerators are built upon their specific vertically-specialized stacks, which prevents utilizing multiple accelerators from different domains. To that end, we propose a pair of dual abstractions, called Yin-Yang, which work in tandem and enable programmers to develop cross-domain applications using multiple accelerators on a FPGA. The Yin abstraction enables cross-domain algorithmic specification, while the Yang abstraction captures the accelerator capabilities. We also develop a dataflow virtual machine, dubbed XLVM, that transparently maps domain functions (Yin) to best-fit accelerator capabilities (Yang). With six real-world cross-domain applications, our evaluations show that Yin-Yang unlocks 29.4× speedup, while the best single-domain acceleration achieves 12.0×.
- Published
- 2022
4. FPGA-based ML adaptive accelerator: A partial reconfiguration approach for optimized ML accelerator utilization
- Author
-
Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib, and Nazha Cherkaoui
- Subjects
Field programmable gate array ,Partial reconfiguration ,Reconfigurable hardware ,Machine learning ,Dynamic classifier selection ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The relentless increase in data volume and complexity necessitates advancements in machine learning methodologies that are more adaptable. In response to this challenge, we present a novel architecture enabling dynamic classifier selection on FPGA platforms. This unique architecture combines hardware accelerators of three distinct classifiers—Support Vector Machines, K-Nearest Neighbors, and Deep Neural Networks—without requiring the combined area footprint of those implementations. It further introduces a hardware-based Accelerator Selector that dynamically selects the most fitting classifier for incoming data based on the K-Nearest Centroid approach. When tested on four different datasets, Our architecture demonstrated improved classification performance, with an accuracy enhancement of up to 8% compared to the software implementations. Besides this enhanced accuracy, it achieved a significant reduction in resource usage, with a decrease of up to 45% compared to a static implementation making it highly efficient in terms of resource utilization and energy consumption on FPGA platforms, paving the way for scalable ML applications. To the best of our knowledge, this work is the first to harness FPGA platforms for dynamic classifier selection.
- Published
- 2024
- Full Text
- View/download PDF
5. Embedded device for digitalizing monitor and error signals in Mössbauer Spectroscopy.
- Author
-
Oliva, Matías J., Pasquevich, Gustavo A., and Veiga, Alejandro L.
- Subjects
- *
MOSSBAUER spectroscopy , *DIGITAL technology , *ELECTRONIC equipment , *SOFTWARE development tools , *VELOCITY - Abstract
In a previous work, we proposed a method to optimize the channel velocity relationship of the spectra and improve laboratory times in the presence of calibration changes. In addition to the usual method, which involves measuring a reference spectrum, we proposed incorporating the recording of the Monitor (velocity reference) and Error signals available in the drive unit into the calibration process. We have verified that this technique substantially improves laboratory times and more efficiently accounts for nonlinearities in the channel velocity relationship. In that work, we showed that it is possible to use a digital 8-bit oscilloscope to record the signals. However, given the precision with which the measurement must be performed, this implies a laborious and time-consuming procedure. To simplify the operation in the laboratory, we present the specification, design, and implementation of an electronic device dedicated to this task, based on open hardware and open software tools. Its incorporation does not require additional lab time, while the growth in data volume can be handled with current networking technologies. Full detail for a straightforward implementation is provided, and the quality of the signal's recordings are compared with the ones obtained in our previous work. The result is an efficient and easy to use method, which uses standard, available and low-cost devices with an open hardware and software philosophy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Effects of Runtime Reconfiguration on PUFs Implemented as FPGA-Based Accelerators.
- Author
-
Nassar, Hassan, Bauer, Lars, and Henkel, Jorg
- Abstract
Physical unclonable functions (PUFs) are a handy security primitive for resource-constrained devices. They offer an alternative to the resource-intensive classical hash algorithms. Using the IC differences resulting from the fabrication process, PUFs give device-specific outputs (responses) when given the same inputs (challenges). Hence, without using a device-specific key, PUFs can generate device-specific responses. FPGAs are one of the platforms that are heavily studied as a candidate for PUF implementation. The idea is that a PUF that is designed as an HDL code can be used as part of the static design or as a dynamic accelerator. Previous works studied PUF implementation as part of the static design. In contrast to the state-of-the-art, this letter studies PUFs when used as runtime reconfigurable accelerators. In this letter, we find that not all regions of an FPGA are equally suitable for implementing different PUF types. Regions, where clock routing resources exist, are the worst suited for PUF implementation. Moreover, we find out that for certain PUF types, the property of dynamic partial reconfiguration can lead to performance degradation if not applied carefully. When static routing passing through the region increases, the PUF performance degrades significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Hardware Trojan Insertion
- Author
-
Tehranipoor, Mark, Nalla Anandakumar, N., Farahmandi, Farimah, Tehranipoor, Mark, Anandakumar, N. Nalla, and Farahmandi, Farimah
- Published
- 2023
- Full Text
- View/download PDF
8. A Case for Low-Cost Personal Electronic Laboratory Equipment Using FPGAs.
- Author
-
Adegbite, Timothy Olanrewaju and Akinwale, Olawale Babatunde
- Subjects
ELECTRONIC equipment ,FIELD programmable gate arrays ,LABORATORY equipment & supplies ,ELECTRIC potential measurement ,MEASUREMENT errors - Abstract
The field of reconfigurable computing is gaining a lot of following, and several use cases have been developed for it. At the centre of reconfigurable computing is the field programmable gate array (FPGA) due to its computational speed and versatility. The goal of the work reported here was to show that a single FPGA board paired with a computer monitor can be used as the sole laboratory equipment in a cash-strapped educational institution or by an individual. A Terasic DE1-SoC board was programmed as an oscilloscope, and digital multimeter. In keeping with the low-cost theme of this work, no external signal conditioning circuit was used and the on-board LTC2308 ADC was used for signal acquisition. At frequencies below 15 kHz, the voltage measurements of the developed FPGA lab instrument had a mean error of 58 mV. The voltage measurement errors, however, increased with an increase in frequency and the errors were significant when the signal frequencies exceeded 100 kHz. In terms of the use of the FPGA to replace multiple lab instruments, 13% of the DSPs on the FPGA were used for the implementation and 80% of the Adaptive logic modules. We therefore demonstrate that with $300 dollars, multiple pieces of laboratory equipment can be replaced by a single FPGA board and a monitor. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. NLU-V: A Family of Instruction Set Extensions for Efficient Symmetric Cryptography on RISC-V
- Author
-
Hakan Uzuner and Elif Bilge Kavun
- Subjects
symmetric cryptography ,block ciphers ,instruction set extension ,RISC-V ,reconfigurable hardware ,FPGA ,Technology - Abstract
Cryptographic primitives nowadays are not only implemented in high-performance systems but also in small-scale systems, which are increasingly powered by open-source processors, such as RISC-V. In this work, we leverage RISC-V’s modular base instruction set and architecture to propose a generic instruction set extension (ISE) for symmetric cryptography. We adapt the work from Engels et al. in ARITH’13, the non-linear/linear instruction set extension (NLU), which presents a generic hardware/software co-design solution for efficient symmetric crypto implementations through a hardware unit extending the 8-bit AVR instruction set. These new instructions realize non-linear and linear layers, which are widely used to implement the block ciphers in symmetric cryptography. Our proposal modifies and extends the NLU instructions to a 32-bit RISC-V architecture; hence, we call the proposed ISE ‘NLU-V’. The proposed architecture is integrated into the open-source RISC-V implementation ‘Icicle’ and synthesized on a Xilinx Kintex-7 XC7K160T FPGA. The area overhead for the proposed NLU-V ISE is 1088 slice registers and 4520 LUTs. As case studies, the PRESENT and AES block ciphers are implemented using the new ISE on RISC-V in assembly. Our evaluation metric to showcase the performance gain, Z ‘time-area-product (TAP)’ (the execution time in clock cycles times code memory consumption), reflects the impact of the proposed family of instructions on the performance of the cipher implementations. The simulations show that the NLU-V achieves 89% gain for PRESENT and 68% gain for AES. Further, the NLU-V requires 44% less lines of code for the PRESENT and 23% less for the AES implementation.
- Published
- 2024
- Full Text
- View/download PDF
10. Pre-processing Block Hardware Architecture in Image Processing Using Reconfigurable Platform
- Author
-
Chiranjeevi, G. N., Kulkarni, Subhash, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Chen, Joy Iong-Zong, editor, Tavares, João Manuel R. S., editor, Iliyasu, Abdullah M., editor, and Du, Ke-Lin, editor
- Published
- 2022
- Full Text
- View/download PDF
11. FPGA-based Learning Acceleration for LSTM Neural Network.
- Author
-
Dec, Grzegorz Rafał
- Subjects
- *
ALGORITHMS , *PYTHON programming language , *SPEED , *GRAPHICS processing units - Abstract
This paper presents and discusses the implementation of a learning accelerator for an LSTM neural network that utilizes an FPGA. The accelerator consists of a backpropagation through time algorithm for an LSTM. The presented net performs a binary classification task and consists of an LSTM and a dense layer. The performance is then compared to both a hard-coded Python implementation and an implementation using Keras library and the GPU. The implementation is executed using the DSP blocks, available via the Vivado Design Suite, which is in compliance with the IEEE754 standard. The results of the simulation show that the FPGA implementation remains accurate and achieves higher speed than the other solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. A hardware accelerated system for high throughput cellular image analysis
- Author
-
Lee, Dajung, Mehta, Nirja, Shearer, Alexandria, and Kastner, Ryan
- Subjects
Networking and Information Technology R&D (NITRD) ,Stem Cell Research - Nonembryonic - Non-Human ,Biotechnology ,Substance Misuse ,Drug Abuse (NIDA only) ,Bioengineering ,Stem Cell Research ,Hardware acceleration ,Biomedical image analysis ,Cytometry ,Reconfigurable hardware ,FPGA ,High throughput ,Computer Software ,Distributed Computing - Abstract
Imaging flow cytometry and high speed microscopy have shown immense promise for clinical diagnostics, biological research, and drug discovery. They enable high throughput screening and sorting using biological, chemical, or mechanical properties of cells. These techniques can separate mature cells from immature ones, determine the presence of cancerous cells, classify stem cells during differentiation, and screen drugs based upon how they affect cellular architecture. The process works by imaging cells at a high rate, extracting features of the cell (e.g., size, location, circularity, deformation), and using those features to classify the cell. Modern systems have a target throughput of thousands of cells per second, which requires imaging at rates of more than 60,000 frames per second. The cellular features must be calculated in less than a millisecond to enable real-time sorting. This creates challenging computing performance constraints in terms of both throughput and latency. In this paper, we present a hardware accelerated system for high throughput cellular image analysis. We carefully developed algorithms and their corresponding hardware implementations to meet the strict computational demands. Our algorithm analyzes and extracts cellular morphological features from low resolution microscopic images. Our hardware accelerated system operates at over 60,000 frames per second with 0.068 ms latency. This is almost 1400× faster in throughput than similar software based analysis and 335× better in terms of latency.
- Published
- 2018
13. Reconfigurability for Static Camouflaging
- Author
-
Rangarajan, Nikhil, Patnaik, Satwik, Knechtel, Johann, Rakheja, Shaloo, Sinanoglu, Ozgur, Rangarajan, Nikhil, Patnaik, Satwik, Knechtel, Johann, Rakheja, Shaloo, and Sinanoglu, Ozgur
- Published
- 2021
- Full Text
- View/download PDF
14. Temporal Accelerators: Unleashing the Potential of Embedded FPGAs
- Author
-
Christopher Cichiwskyj and Gregor Schiele
- Subjects
IoT ,Embedded ,FPGA ,Reconfigurable Hardware ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
When the complexity of a problem rises, its solution requires more hardware resources. A usual way to solve this is to use larger processors and add more memory. When using Field Programmable Gate-Arrays (FPGAs), which can instantiate arbitrary circuit designs, a larger, more costly and power hungry chip is used. In this paper we propose a different approach, namely to split the problem into a graph of interdependent smaller tasks and to reconfigure a small FPGA during runtime to execute each of these tasks efficiently sequentially. This can result in cheaper and more energy efficient systems that can execute very complex problems locally. We present a basic analytical model, evaluate its accuracy and discuss initial insight from it.
- Published
- 2021
- Full Text
- View/download PDF
15. Bypassing Multicore Memory Bugs With Coarse-Grained Reconfigurable Logic.
- Author
-
Lee, Doowon and Bertacco, Valeria
- Subjects
- *
CACHE memory , *FINITE state machines , *ARM microprocessors , *MEMORY , *SYSTEMS design , *LOGIC - Abstract
Multicore systems deploy sophisticated memory hierarchies to improve memory operations’ throughput and latency by exploiting multiple levels of cache hierarchy and several complex memory-access instructions. As a result, the functional verification of the memory subsystem is one of the most challenging tasks in the overall system design effort, leading to many bugs in the released product. In this work, we propose MemPatch, a novel reconfigurable hardware solution to bypass such escaped bugs. To design MemPatch, we first analyzed publicly available errata documents and classified memory-related bugs by root cause and symptoms. We then leveraged that learning to design a specialized, reconfigurable detection fabric, implementing finite state machines that can model the bug-triggering events at the microarchitectural level. Finally, we complemented this detection logic with hardware offering multiple bug-bypassing options. Our evaluation of MemPatch mapped a multicore RISC-V out-of-order processor, augmented with our logic, to a Xilinx ZCU102 FPGA board. When configured to detect up to 32 distinct bugs, MemPatch entails 7.6% area and 7.3% power overheads. An estimate on a commercial ARM Cortex-A57 processor target indicates that the area overhead would be much lower, 1.0%. The performance impact was found to be no more than 1% in all cases. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration.
- Author
-
Kim, Joon Kyung, Ahn, Byung Hoon, Kinzer, Sean, Ghodrati, Soroush, Mahapatra, Rohan, Yatham, Brahmendra, Wang, Shu-Ting, Kim, Dohee, Sarikhani, Parisa, Mahmoudi, Babak, Mahajan, Divya, Park, Jongse, and Esmaeilzadeh, Hadi
- Subjects
- *
FIELD programmable gate arrays , *GATE array circuits , *DEEP brain stimulation - Abstract
Field-programmable gate array (FPGA) accelerators offer performance and efficiency gains by narrowing the scope of acceleration to one algorithmic domain. However, real-life applications are often not limited to a single domain, which naturally makes Cross-Domain Multi-Acceleration a crucial next step. The challenge is, existing FPGA accelerators are built upon their specific vertically specialized stacks, which prevents utilizing multiple accelerators from different domains. To that end, we propose a pair of dual abstractions, called Yin-Yang, which work in tandem and enable programmers to develop cross-domain applications using multiple accelerators on a FPGA. The Yin abstraction enables cross-domain algorithmic specification, while the Yang abstraction captures the accelerator capabilities. We also developed a dataflow virtual machine, dubbed Accelerator-Level Virtual Machine (XLVM), which transparently maps domain functions (Yin) to best-fit accelerator capabilities (Yang). With six real-world cross-domain applications, our evaluations show that Yin-Yang unlocks 29.4× speedup, while the best single-domain acceleration achieves 12.0×. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Design of RISC Processor with IEEE754 Standard Floating-Point Instruction Set in FPGA using VHDL for Digital Signal Processing Applications.
- Author
-
ÖZKILBAÇ, Bahadır and KARACALI, Tevhit
- Subjects
DIGITAL signal processing ,PARALLEL processing ,FLOATING-point arithmetic ,ARTIFICIAL neural networks ,ENERGY consumption - Abstract
Copyright of Erzincan University Journal of Science & Technology is the property of Erzincan Binali Yildirim Universitesi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
18. An Efficient Reconfigurable Encoder for the IEEE 1901 Standard.
- Author
-
Chen, Yuxing, Cui, Hangxuan, and Wang, Zhongfeng
- Subjects
FORWARD error correction ,CARRIER transmission on electric lines ,VIDEO coding ,INTERNET of things - Abstract
The IEEE 1901 standard for power line communication (PLC) enables simple connection among Internet of Things devices. The forward error correction (FEC) codes specified in the IEEE 1901 standard include low-density parity-check convolutional codes (LDPC-CCs) and Reed-Solomon convolutional concatenated (RSCC) codes. This work introduces an efficient reconfigurable encoder in full compliance with the IEEE 1901 standard. First, we propose a reconfigurable LDPC-CC encoder to fulfill the multirate requirement and improve the architecture by fine-tuned parallelization, which takes full advantage of the characteristics of the codeword structure. Then, for area reduction, the optimization regarding the RSCC encoder is extensively exploited. Moreover, the commonality between the encoders is discovered, and some circuitries are shared to reduce the hardware complexity. Equipped with these techniques, an efficient reconfigurable encoder for the IEEE 1901 standard is developed and implemented with 28-nm technology. Implementation results demonstrate that the proposed encoder can meet the throughput requirement of the IEEE 1901 standard and is both power- and area-efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Circuit-Variant Moving Target Defense for Side-Channel Attacks.
- Author
-
Mullins, Tristen, Baggett, Brandon, Andel, Todd R., and McDonald, J. Todd
- Abstract
The security of cryptosystems involves preventing an attacker's ability to obtain information about plaintext. Traditionally, this has been done by prioritizing secrecy of the key through complex key selection and secure key exchange. With the emergence of side-channel analysis (SCA) attacks, bits of a secret key may be derived by correlating key values with physical properties of cryptographic process execution. Information such as power consumption and electromagnetic (EM) radiation side-channel properties can be observed during encryption or decryption. These signals reflect data-dependent system behaviours that may reveal secret key information. Power and EM SCA attacks require several measurements of the target process to amplify the signal of interest, filter out noise, and derive the secret key through statistical analysis methods. Differential power and EM analysis attacks rely on correlating actual side-channel measurements to hypothetical models. The goal of this research is to increase the complexity of both power and EM SCA by introducing structural and spatial randomization of the target hardware. We propose a System-on-a-Chip (SOC) countermeasure that will periodically reconfigure an AES scheme using randomly located S-box circuit variants. We hypothesize that changing the location of the target modules between encryption runs will result in a nonconstant EM signal strength for any given point on the chip, increasing the number of traces needed to perform a localized EM SCA attack. Further, each of the S-box circuit variants will consist of functionally equivalent, structurally diverse hardware. By diversifying the implementations at the gate-level, we aim to vary the power behaviour observed by the attacker and disrupt the correlation between the hypothetical and actual power consumption, increasing the complexity of power SCA. This moving target defense aims to disrupt side-channel collection and correlation needed to successfully implement an attack. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. An IEC 61131-3-Based PLC Timers Module Implemented on FPGA Platform
- Author
-
Patel, Dhruv M., Shah, Ankit K., Shukla, Yagnesh B., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Bindhu, V., editor, Chen, Joy, editor, and Tavares, João Manuel R. S., editor
- Published
- 2020
- Full Text
- View/download PDF
21. 'Software Reconfigurable Hardware' in IoT Student Training
- Author
-
Ursutiu, Doru, Samoila, Cornel, Neagu, Andrei, Florea, Aurelia, Chiricioiu, Adriana, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Auer, Michael E., editor, and Tsiatsos, Thrasyvoulos, editor
- Published
- 2020
- Full Text
- View/download PDF
22. RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs.
- Author
-
Darbani, Paria, Rohbani, Nezam, Beitollahi, Hakem, and Lotfi-Kamran, Pejman
- Subjects
CONVOLUTIONAL neural networks ,VERY large scale circuit integration ,IMAGE processing ,COMPUTER vision ,ARRAY processing - Abstract
Convolutional neural networks (CNNs) are widely used in machine learning (ML) applications such as image processing. CNN requires heavy computations to provide significant accuracy for many ML tasks. Therefore, the efficient implementations of CNNs to improve performance using limited resources without accuracy reduction is a challenge for ML systems. One of the architectures for the efficient execution of CNNs is the array-based accelerator, that consists of an array of similar processing elements (PEs). The array accelerators are popular as high-performance architecture using the features of parallel computing and data reuse. These accelerators are optimized for a set of CNN layers, not for individual layers. Using the same accelerator dimension size to compute all CNN layers with varying shapes and sizes leads to the resource underutilization problem. We propose a flexible and scalable architecture for array-based accelerator that increases resource utilization by resizing PEs to better match the different shapes of CNN layers. The low-cost partial reconfiguration improves resource utilization and performance, resulting in a 23.2% reduction in computational times of GoogLeNet compared to the state-of-the-art accelerators. The proposed architecture decreases the on-chip memory access rate by 26.5% with no accuracy loss. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Realization of Deep Learning Based Embedded Soft Sensor for Bioprocess Application.
- Author
-
Krishna, V. V. S. Vijaya, Pappa, N., and Vasantharani, S. P. Joy
- Subjects
DEEP learning ,SYSTEMS on a chip ,DETECTORS ,SIMULATION software ,NONLINEAR systems - Abstract
Industries use soft sensors for estimating output parameters that are difficult to measure on-line. These parameters can be determined by laboratory analysis which is an offline task. Now a days designing Soft sensors for complex nonlinear systems using deep learning training techniques has become popular, because of accuracy and robustness. There is a need to find pertinent hardware for realizing soft sensors to make it portable and can be used in the place of general purpose PC. This paper aims to propose a new strategy for realizing a soft sensor using deep neural networks (DNN) on appropriate hardware which can be referred as embedded soft sensor (ESS). The work focuses on developing an ESS for estimating lactose concentration in a simulated and experimental bioreactor using DNN and realizing it on the Zynq based System on Chip (SoC). Deep neural network is developed for the process with certain number of hidden layers. The model parameters of the process is represented at input layer and lactose concentration is considered at output layer. The performance of the ESS has been observed with the number of hidden layers and different activation functions. Then the optimized neural network is chosen for realizing on hardware. Comparison is made among the values obtained from hardware realization, software simulation and laboratory analysis. Output analysis shows that the values obtained through hardware realization are closer to the values obtained through laboratory analysis. From the results it can be concluded that Deep learning provides a better way, alternative to traditional techniques for realizing ESS on hardware. From the proposed work, it can be shown that if any sensor is unavailable for measuring any parameter then this ESS can be used to measure the values. Since this ESS is realized on reconfigurable hardware like SoC, it can be portable and flexible to measure values. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. The HERA Methodology: Reconfigurable Logic in General-Purpose Computing
- Author
-
Philipp Holzinger and Marc Reichenbach
- Subjects
Automatic synthesis ,hardware/software interfaces ,heterogeneous systems ,reconfigurable hardware ,virtual memory ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Due to the ongoing slowdown of Dennard scaling, heterogeneous hardware architectures are inevitable to meet the increasing demand for energy efficient systems. However, one of the most important aspects that shape today’s computing landscape is the wide availability of software that can run on any system. Current applications that use accelerators, in contrast, are often especially tailored to a specific hardware setup and therefore not universally deployable. This is particularly true for reconfigurable logic as their internal structure requires the circuits and their integration to be designed as well. This makes them inherently difficult to use and therefore less accessible for a general audience. Nevertheless, their balance of flexibility and efficiency puts reconfigurable accelerators in a unique position between CPUs, GPUs, and ASICs. Therefore, one of the main challenges of future heterogeneous systems is to foster collaborative computing between these vastly different components while still being simple to use. Previous approaches mostly focused on subproblems instead of a holistic view of hardware and software in the context of commonplace usability. This paper analyzes the general demands on a reconfigurable platform and derives their requirements regarding accessibility and security. Hereby, we investigate several key features like hardware virtualization, system shared virtual memory, and the use of wide-spread programming paradigms. Then, we systematically build up such a platform based on the established ROCm GPU framework and its internal HSA standard. This new common HERA methodology is finally also demonstrated as a prototype.
- Published
- 2021
- Full Text
- View/download PDF
25. A Survey of Network-Based Hardware Accelerators.
- Author
-
Skliarova, Iouliia
- Subjects
CENTRAL processing units ,GRAPHICS processing units ,ELECTRONIC data processing ,GATE array circuits ,HARDWARE - Abstract
Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Central Processing Units) due to the sequential matter of their operations and memory bandwidth limitations. To achieve desired performance levels, reconfigurable (FPGA (Field-Programmable Gate Array)-based) hardware accelerators are frequently explored that permit the processing units' architectures to be better adapted to the specific problem/algorithm requirements. In particular, network-based data-processing algorithms are very well suited to implementation in reconfigurable hardware because several data-independent operations can easily and naturally be executed in parallel over as many processing blocks as actually required and technically possible. GPUs (Graphics Processing Units) have also demonstrated good results in this area but they tend to use significantly more power than FPGA, which could be a limiting factor in embedded applications. Moreover, GPUs employ a Single Instruction, Multiple Threads (SIMT) execution model and are therefore optimized to SIMD (Single Instruction, Multiple Data) operations, while in FPGAs fully custom datapaths can be built, eliminating much of the control overhead. This review paper aims to analyze, compare, and discuss different approaches to implementing network-based hardware accelerators in FPGA and programmable SoC (Systems-on-Chip). The performed analysis and the derived recommendations would be useful to hardware designers of future network-based hardware accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Embebed wavelet image reconstruction in parallel computation hardware
- Author
-
Guevara Escobedo, Jorge, Ozanyan, Krikor, and Yin, Hujun
- Subjects
621.36 ,Wavelets ,Multiresolution Tomography ,Fast Tomography Reconstruction ,Radon Transform ,Filtered Backprojection ,Computed Tomography ,Local Tomography ,Parallel Tomography Reconstruction ,Embedded Tomography ,Reconfigurable Hardware ,FPGAs - Abstract
In this thesis an algorithm is demonstrated for the reconstruction of hard-field Tomography images through localized block areas, obtained in parallel and from a multiresolution framework. Block areas are subsequently tiled to put together the full size image. Given its properties to preserve its compact support after being ramp filtered, the wavelet transform has received to date much attention as a promising solution in radiation dose reduction in medical imaging, through the reconstruction of essentially localised regions. In this work, this characteristic is exploited with the aim of reducing the time and complexity of the standard reconstruction algorithm. Independently reconstructing block images with geometry allowing to cover completely the reconstructed frame as a single output image, allows the individual blocks to be reconstructed in parallel, and to experience its performance in a multiprocessor hardware reconfigurable system (i.e. FPGA). Projection data from simulated Radon Transform (RT) was obtained at 180 evenly spaced angles. In order to define every relevant block area within the sinogram, forward RT was performed over template phantoms representing block frames. Reconstruction was then performed in a domain beyond the block frame limits, to allow calibration overlaps when fitting of adjacent block images. The 256 by 256 Shepp-Logan phantom was used to test the methodology of both parallel multiresolution and parallel block reconstruction generalisations. It is shown that the reconstruction time of a single block image in a 3-scale multiresolution framework, compared to the standard methodology, performs around 48 times faster. By assuming a parallel implementation, it can implied that the reconstruction time of a single tile, should be very close related to the reconstruction time of the full size and resolution image.
- Published
- 2016
27. FPGA-based Neural Net for Failures Prediction in the Cold Forging Process.
- Subjects
- *
ARTIFICIAL neural networks , *PERSONAL computers , *METALWORK , *FORECASTING - Abstract
This paper presents and discusses the implementation of deep neural network for the purpose of failure prediction in the cold forging process. The implementation consists of an LSTM and a dense layer implemented on FPGA. The network was trained beforehand on Desktop Computer using Keras library for Python and the weights and the biases were embedded into the implementation. The implementation is executed using the DSP blocks, available via Vivado Design Suite, which are in compliance with the IEEE754 standard. The simulation of the network achieves 100% classification accuracy on the test data and high calculation speed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. High-Level and Compact Design of Cross-Channel LTE DownLink Channel Encoder
- Author
-
Xu, Jieming, Leeser, Miriam, Akan, Ozgur, Series Editor, Bellavista, Paolo, Series Editor, Cao, Jiannong, Series Editor, Coulson, Geoffrey, Series Editor, Dressler, Falko, Series Editor, Ferrari, Domenico, Series Editor, Gerla, Mario, Series Editor, Kobayashi, Hisashi, Series Editor, Palazzo, Sergio, Series Editor, Sahni, Sartaj, Series Editor, Shen, Xuemin (Sherman), Series Editor, Stan, Mircea, Series Editor, Xiaohua, Jia, Series Editor, Zomaya, Albert Y., Series Editor, Moerman, Ingrid, editor, Marquez-Barja, Johann, editor, Shahid, Adnan, editor, Liu, Wei, editor, Giannoulis, Spilios, editor, and Jiao, Xianjun, editor
- Published
- 2019
- Full Text
- View/download PDF
29. Self-repairing Functional Unit Design in an Embedded Out-of-Order Processor Core
- Author
-
Sriraman, Harini, Pattabiraman, Venkatasubbu, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bhatia, Sanjiv K., editor, Tiwari, Shailesh, editor, Mishra, Krishn K., editor, and Trivedi, Munesh C., editor
- Published
- 2019
- Full Text
- View/download PDF
30. A Reconfigurable Architecture for Implementing Locally Connected Neural Arrays
- Author
-
G-H-Cater, J. E., Clarke, C. T., Metcalfe, B. W., Wilson, P. R., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Arai, Kohei, editor, Kapoor, Supriya, editor, and Bhatia, Rahul, editor
- Published
- 2019
- Full Text
- View/download PDF
31. Background
- Author
-
Tajik, Shahin, Möller, Sebastian, Series Editor, Küpper, Axel, Series Editor, Raake, Alexander, Series Editor, and Tajik, Shahin
- Published
- 2019
- Full Text
- View/download PDF
32. Fast Resource and Timing Aware Design Optimisation for High-Level Synthesis.
- Author
-
Perina, Andre B., Silitonga, Arthur, Becker, Jurgen, and Bonato, Vanderlei
- Subjects
- *
COMPILERS (Computer programs) , *GATE array circuits , *FIELD programmable gate arrays , *SPACE exploration - Abstract
Field-Programmable Gate Arrays (FPGA) are often present in energy-efficient systems, although its non-trivial development flow is an obstacle for massive adoption. High-Level Synthesis (HLS) approaches attempt to mitigate the gap by targetting FPGAs from software languages, however manual tuning is still essential to meet performance demands. We present a high-level design space exploration framework with timing and resource awareness that uses an estimator named Lina to evaluate each design point. Lina is a profiling-based approach that avoids the costly static analyses performed by HLS compilers, allowing a significantly faster exploration of optimisations. Estimations are improved by supporting a continuous range of operating frequencies and by considering resource usage for both floating-point and integer datapaths. For a given set of C kernels, the estimated solutions are among the best 1% for execution time and resource footprint. The exploration of each kernel using Lina was performed on average two orders of magnitude faster than using early HLS compiler reports, and four orders of magnitude faster than fully compiling each design point. By considering the design spaces traversed, our solutions reached 70% of the maximum speed-up achievable. This represents an average speed-up of 14-16× compared to the baseline designs with no optimisations enabled. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
33. OmpSs@FPGA Framework for High Performance FPGA Computing.
- Author
-
de Haro, Juan Miguel, Bosch, Jaume, Filgueras, Antonio, Vidal, Miquel, Jimenez-Gonzalez, Daniel, Alvarez, Carlos, Martorell, Xavier, Ayguade, Eduard, and Labarta, Jesus
- Subjects
- *
HIGH performance computing , *COMPILERS (Computer programs) , *FIELD programmable gate arrays - Abstract
This article presents the new features of the OmpSs@FPGA framework. OmpSs is a data-flow programming model that supports task nesting and dependencies to target asynchronous parallelism and heterogeneity. OmpSs@FPGA is the extension of the programming model addressed specifically to FPGAs. OmpSs environment is built on top of Mercurium source to source compiler and Nanos++ runtime system. To address FPGA specifics Mercurium compiler implements several FPGA related features as local variable caching, wide memory accesses or accelerator replication. In addition, part of the Nanos++ runtime has been ported to hardware. Driven by the compiler this new hardware runtime adds new features to FPGA codes, such as task creation and dependence management, providing both performance increases and ease of programming. To demonstrate these new capabilities, different high performance benchmarks have been evaluated over different FPGA platforms using the OmpSs programming model. The results demonstrate that programs that use the OmpSs programming model achieve very competitive performance with low to moderate porting effort compared to other FPGA implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
34. Temporal Accelerators: Unleashing the Potential of Embedded FPGAs.
- Author
-
Cichiwskyj, Christopher and Schiele, Gregor
- Abstract
When the complexity of a problem rises, its solution requires more hardware resources. A usual way to solve this is to use larger processors and add more memory. When using Field Programmable Gate-Arrays (FPGAs), which can instantiate arbitrary circuit designs, a larger, more costly and power hungry chip is used. In this paper we propose a different approach, namely to split the problem into a graph of interdependent smaller tasks and to reconfigure a small FPGA during runtime to execute each of these tasks efficiently sequentially. This can result in cheaper and more energy efficient systems that can execute very complex problems locally. We present a basic analytical model, evaluate its accuracy and discuss initial insight from it. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. FHAST: FPGA-Based Acceleration of Bowtie in Hardware.
- Author
-
Fernandez, Edward B, Villarreal, Jason, Lonardi, Stefano, and Najjar, Walid A
- Subjects
DNA ,Equipment Design ,Equipment Failure Analysis ,Chromosome Mapping ,Sequence Analysis ,DNA ,Signal Processing ,Computer-Assisted ,Software ,High-Throughput Nucleotide Sequencing ,Short-read mapping ,genome re-sequencing ,FPGAs ,reconfigurable hardware ,Sequence Analysis ,Signal Processing ,Computer-Assisted ,Information and Computing Sciences ,Biological Sciences ,Mathematical Sciences ,Bioinformatics - Abstract
While the sequencing capability of modern instruments continues to increase exponentially, the computational problem of mapping short sequenced reads to a reference genome still constitutes a bottleneck in the analysis pipeline. A variety of mapping tools (e.g., Bowtie, BWA) is available for general-purpose computer architectures. These tools can take many hours or even days to deliver mapping results, depending on the number of input reads, the size of the reference genome and the number of allowed mismatches or insertion/deletions, making the mapping problem an ideal candidate for hardware acceleration. In this paper, we present FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for Bowtie that uses a hardware design based on field programmable gate arrays (FPGA). Our architecture masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously. FHAST is composed by multiple parallel engines to exploit the parallelism available to us on an FPGA. We have implemented and tested FHAST on the Convey HC-1 and later ported on the Convey HC-2ex, taking advantage of the large memory bandwidth available to these systems and the shared memory image between hardware and software. A preliminary version of FHAST running on the Convey HC-1 achieved up to 70x speedup compared to Bowtie (single-threaded). An improved version of FHAST running on the Convey HC-2ex FPGAs achieved up to 12x fold speed gain compared to Bowtie running eight threads on an eight-core conventional architecture, while maintaining almost identical mapping accuracy. FHAST is a drop-in replacement for Bowtie, so it can be incorporated in any analysis pipeline that uses Bowtie (e.g., TopHat).
- Published
- 2015
36. Resubstitution method for big size Boolean logic design targeting look‐up‐table implementation.
- Author
-
Lemberski, Igor and Suponenkovs, Artjoms
- Subjects
- *
LOGIC design , *FIELD programmable gate arrays , *LIBRARY technical services , *SOFTWARE refactoring - Abstract
A scalable design method to perform multilevel network minimization targeting k‐input look‐up‐tables (k‐LUT) is proposed. It contributes toward the big size logic design theory and application. The method is based on the resubstitution which is formulated and solved as a covering task: A node function, which depends on an input selected for the resubstitution, is split into a set of dichotomies. The selected input is removed, and the minimal set of inputs to cover the dichotomies are sought. The resubstitution procedure runs on top of the k‐LUT network produced by existing synthesis tools (SIS, ABC). Scalability is achieved by the extraction of windows, which satisfy given constraints (number of inputs, nodes, etc.). The window logic is described using the proposed extended programmable logic array (PLA) table, which contains information about don't cares. Experiments show that the best networks obtained using SIS and ABC can be further improved by applying our method. Also, big benchmarks from the EPFL library are processed, and for almost half of them, improvements are achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Towards Dynamic and Partial Reconfigurable Hardware Architectures for Cryptographic Algorithms on Embedded Devices
- Author
-
Arkan Alkamil and Darshika G. Perera
- Subjects
FPGAs ,reconfigurable hardware ,dynamic and partial reconfiguration ,embedded systems ,embedded hardware ,cryptographic algorithms ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In the era of IoT, embedded systems are becoming the cornerstone of many IoT related applications, such as smart cars and wearable devices. However, embedded devices have numerous constraints and requirements, including stringent area and power, reduced cost and time-to-market, and increased speedup. Furthermore, these applications are becoming increasingly compute/data-intensive requiring more processing power. Also, especially for IoT related applications, security is another major issue in resource-constrained embedded devices. Although cryptographic algorithms are widely used to ensure the security of these applications, commonly used ones, such as AES, are unsuitable for highly constrained embedded devices, due to their sheer complexity. Hence, several lightweight cryptographic algorithms were proposed in the literature that might be better suited for embedded devices. From these, SPECK and SIMON, introduced by NSA, are the two most popular ones. Another important challenge is how to incorporate the cryptographic algorithms in to embedded devices, efficiently and effectively, without compromising the integrity of the compute/data-intensive applications running on these small-footprint devices. Our previous analysis demonstrated that FPGAs are currently the best avenue to support compute/data-intensive applications running on resource-constrained embedded devices, due to FPGA's many attractive traits, including, post-fabrication reprogrammability, dynamic and partial reconfiguration capabilities, and reduced time-to-market. Also, FPGAs can be utilized to provide several advantages/features required for the embedded device's security, such as cryptographic algorithm agility, algorithm upload, algorithm modification, and resource efficiency. In this research work, we introduce novel, unique, and efficient dynamic and partial reconfigurable hardware architectures for the most popular SPECK and SIMON algorithms on embedded devices, considering the constraints associated with these devices and the requirements of the applications running on embedded devices. We also introduce unique system-level architectures for our proposed designs. To the best of our knowledge, no similar work exists in the literature that provides dynamic and partial reconfigurable hardware for SPECK and SIMON, and also provides system-level architecture. Our dynamic and partial reconfigurable hardware designs achieve 28% space saving compared to its static reconfigurable hardware, and 59 times speedup compared to its software counterpart.
- Published
- 2020
- Full Text
- View/download PDF
38. Low-power adaptive control scheme using switching activity measurement method for reconfigurable analog-to-digital converters
- Author
-
Ab Razak, Mohd Zulhakimi, Arslan, Tughrul, and Hamilton, Alister
- Subjects
621.3 ,Reconfigurable hardware ,Low power ,Switching activity ,Analog-to-digital converters - Abstract
Power consumption is a critical issue for portable devices. The ever-increasing demand for multimode wireless applications and the growing concerns towards power-aware green technology make dynamically reconfigurable hardware an attractive solution for overcoming the power issue. This is due to its advantages of flexibility, reusability, and adaptability. During the last decade, reconfigurable analog-to-digital converters (ReADCs) have been used to support multimode wireless applications. With the ability to adaptively scale the power consumption according to different operation modes, reconfigurable devices utilise the power supply efficiently. This can prolong battery life and reduce unnecessary heat emission to the environment. However, current adaptive mechanisms for ReADCs rely upon external control signals generated using digital signal processors (DSPs) in the baseband. This thesis aims to provide a single-chip solution for real-time and low-power ReADC implementations that can adaptively change the converter resolution according to signal variations without the need of the baseband processing. Specifically, the thesis focuses on the analysis, design and implementation of a low-power digital controller unit for ReADCs. In this study, the following two important reconfigurability issues are investigated: i) the detection mechanism for an adaptive implementation, and ii) the measure of power and area overheads that are introduced by the adaptive control modules. This thesis outlines four main achievements to address these issues. The first achievement is the development of the switching activity measurement (SWAM) method to detect different signal components based upon the observation of the output of an ADC. The second achievement is a proposed adaptive algorithm for ReADCs to dynamically adjust the resolution depending upon the variations in the input signal. The third achievement is an ASIC implementation of the adaptive control module for ReADCs. The module achieves low reconfiguration overheads in terms of area and power compared with the main analog part of a ReADC. The fourth achievement is the development of a low-power noise detection module using a conventional ADC for signal improvement. Taken together, the findings from this study demonstrate the potential use of switching activity information of an ADC to adaptively control the circuits, and simultaneously expanding the functionality of the ADC in electronic systems.
- Published
- 2014
39. High throughput resource efficient reconfigurable interleaver for MIMO WLAN application
- Author
-
Bijoy Kumar Upadhyaya, Pijush Kanti Dutta Pramanik, and Salil Kumar Sanyal
- Subjects
MIMO ,Reconfigurable hardware ,FPGA ,WLAN ,VHDL ,Digital hardware ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Demand for high-speed wireless broadband internet service is ever increasing. Multiple-input-multiple-output (MIMO) Wireless LAN (WLAN) is becoming a promising solution for such high-speed internet service requirements. This paper proposes a novel algorithm to efficiently model the address generation circuitry of the MIMO WLAN interleaver. The interleaver used in the MIMO WLAN transceiver has three permutation steps involving floor function whose hardware implementation is the most challenging task due to the absence of corresponding digital hardware. In this work, we propose an algorithm with a mathematical background for the address generator, eliminating the need for floor function. The algorithm is converted into digital hardware for implementation on the reconfigurable FPGA platform. Hardware structure for the complete interleaver, including the read address generator and memory module, is designed and modeled in VHDL using Xilinx Integrated Software Environment (ISE) utilizing embedded memory and DSP blocks of Spartan 6 FPGA. The functionality of the proposed algorithm is verified through exhaustive software simulation using ModelSim software. Hardware testing is carried out on Zynq 7000 FPGA using Virtual Input Output (VIO) and Integrated Logic Analyzer (ILA) core. Comparisons with few recent similar works, including the conventional Look-Up Table (LUT) based technique, show the superiority of our proposed design in terms of maximum improvement in operating frequency by 196.83%, maximum reduction in power consumption by 74.27%, and reduction of memory occupancy by 88.9%. In the case of throughput, our design can deliver 8.35 times higher compared to IEEE 802.11n requirement.
- Published
- 2021
- Full Text
- View/download PDF
40. HLS Algorithmic Explorations for HPC Execution on Reconfigurable Hardware - ECOSCALE
- Author
-
Malakonakis, Pavlos, Georgopoulos, Konstantinos, Ioannou, Aggelos, Lavagno, Luciano, Papaefstathiou, Ioannis, Mavroidis, Iakovos, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Voros, Nikolaos, editor, Huebner, Michael, editor, Keramidas, Georgios, editor, Goehringer, Diana, editor, Antonopoulos, Christos, editor, and Diniz, Pedro C., editor
- Published
- 2018
- Full Text
- View/download PDF
41. Comparing C and SystemC Based HLS Methods for Reconfigurable Systems Design
- Author
-
Georgopoulos, Konstantinos, Malakonakis, Pavlos, Tampouratzis, Nikolaos, Nikitakis, Antonis, Chrysos, Grigorios, Dollas, Apostolos, Pnevmatikatos, Dionysios, Papaefstathiou, Ioannis, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Voros, Nikolaos, editor, Huebner, Michael, editor, Keramidas, Georgios, editor, Goehringer, Diana, editor, Antonopoulos, Christos, editor, and Diniz, Pedro C., editor
- Published
- 2018
- Full Text
- View/download PDF
42. High-flexible hardware and instruction of composite Galois field multiplication targeted at symmetric crypto processor.
- Author
-
Su, Yang, Yang, Bai-Long, Yang, Chen, and He, Jing-Yuan
- Abstract
Composite Galois field multiplication is one of the most important and complex nonlinear arithmetic unit in symmetric cipher algorithms. However, current hardware implementations are hard to maintain high performance and flexibility. Based on reconfigurable technology, we propose a flexible architecture of composite Galois field multiplication (RCGFM) and dedicated instructions of composite Galois filed multiplication (ICGFM) over G F ((2 n) m) , where n = 8 , m = 1 , 2 , 3 , 4 . The RCGFM adopts a serial–parallel mixed structure, which can achieve different Galois field multiplications with good parallelism and scalability. By extending the x k B multiplications of serial chain, where k = 1 , 2 , 3 , the RCGFM can concurrently support the composite Galois filed multiplications with higher orders, such as G F ((2 8) m) , where m ≥ 5 , m ∈ Z + . Moreover, in order to reduce the instruction overhead of target symmetric crypto processor, the ICGFM is specially designed, which is composed of operation and configuration instructions for x k B and A × B over G F ((2 n) m) . The ICGFM can be applied to RCGFM structure efficiently and flexibly by configuring the corresponding parameters. The experimental results show that under 0.18 µm CMOS technology, the maximum clock frequency is 625 MHz, while the area of circuit is 11.2 kilo gates. Compared with current researches, the RCGFM structure can improve the throughput rate more than a factor of 1.36x–9.19x, when normalized to the same technology and per kilo gates, the technology-scaled throughput rate increases more than a factor of 1.25x–4.4x, while the area overhead does not increase significantly. In addition, the ICGFM can reduce 1–2 orders of magnitude the number of instructions compared with other works. At last, the reconfigurable architecture we proposed supports different composite Galois field multiplications over G F ((2 n) m) with more flexibility and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
43. LSTM Cell Implementation on FPGAs.
- Author
-
Dec, Grzegorz Rafał
- Subjects
- *
FLOATING-point arithmetic , *ALGORITHMS , *INTEGERS - Abstract
This paper presents and discusses the implementation of an LSTM cell on an FPGA with an activation function inspired by the CORDIC algorithm. The realization is performed using both IEEE754 standard and 32-bit integer numbers. The case with floating-point arithmetic is analyzed with and without DSP blocks provided by the Xilinx design suite. The alternative implementation including the integer arithmetic was optimized for a minimal number of clock cycles. Presented implementation uses xc6slx150t-2fgg900 and achieves high calculations accuracy for both cases. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
44. Conclusion and Future Work
- Author
-
Tajik, Shahin, Möller, Sebastian, Series Editor, Küpper, Axel, Series Editor, Raake, Alexander, Series Editor, and Tajik, Shahin
- Published
- 2019
- Full Text
- View/download PDF
45. A Dynamically Reconfigurable System for Closed-Loop Measurements of Network Traffic
- Author
-
Khan, Faisal, Ghiasi, Soheil, and Chuah, Chen-Nee
- Subjects
Distributed Computing and Systems Software ,Information and Computing Sciences ,Engineering ,Reconfigurable hardware ,network monitoring ,parallel circuits ,Computer Software ,Distributed Computing ,Computer Hardware ,Computer Hardware & Architecture ,Electronics ,sensors and digital hardware ,Distributed computing and systems software - Abstract
Streaming network traffic measurement and analysis is critical for detecting and preventing any real-time anomalies in the network. The high speeds and complexity of today's networks, coupled with ever evolving threats, necessitate closing of the loop between measurements and their analysis in real time. The ensuing system demands high levels of programmability and processing where streaming measurements adapt to the changing network behavior in a goal-oriented manner. In this work, we exploit the features and requirements of the problem and develop an application-specific FPGA-based closed-loop measurement (CLM) system. We make novel use of fine-grained partial dynamic reconfiguration (PDR) as underlying reprogramming paradigm, performing low-latency just-in-time compiled logic changes in FPGA fabric corresponding to the dynamic measurement requirements. Our innovative dynamically reconfigurable socket offers 3× logic savings over conventional static solutions, while offering much reduced reconfiguration latencies over conventional PDR mechanisms. We integrate multiple sockets in a highly parallel CLM framework and demonstrate its effectiveness in identifying heavy flows in streaming network traffic. The results using an FPGA prototype offer 100 percent detection accuracy while sustaining increasing link speeds. © 1968-2012 IEEE.
- Published
- 2014
46. In-Circuit Assertions and Exceptions for Reconfigurable Hardware Design
- Author
-
Todman, Tim, Luk, Wayne, Hinchey, Mike, Series editor, Bowen, Jonathan P., editor, and Olderog, Ernst-Rüdiger, editor
- Published
- 2017
- Full Text
- View/download PDF
47. The Future of 'Hardware – Software Reconfigurable' : LabVIEW Compiler to Raspberry PI
- Author
-
Ursutiu, D., Samoila, C., Jinga, V., Altoe, F., Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Auer, Michael E., editor, Guralnick, David, editor, and Uhomoibhi, James, editor
- Published
- 2017
- Full Text
- View/download PDF
48. Reconfigurable Multiprocessor Systems-on-Chip
- Author
-
Goehringer, Diana, Hussain, Waqar, editor, Nurmi, Jari, editor, Isoaho, Jouni, editor, and Garzia, Fabio, editor
- Published
- 2017
- Full Text
- View/download PDF
49. Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective.
- Author
-
Natarajan, Santhi, N., Krishna Kumar, Pal, Debnath, and Nandy, S. K.
- Abstract
Genome Informatics (GI) involves accurate computational investigations of strongly correlated subsystems that demands inter-disciplinary approaches for problem solving. With the growing volume of genomic sequencing data at an alarming rate, High Performance Computing (HPC) solutions offer the right platform to address the computational needs. GI requires algorithm-architecture co-design of parallel and accelerated biocomputing involving reconfigurable hardware like FPGAs and graphics accelerators or GPUs, to bridge the gap between growing data volumes and compute capabilities. Such platforms offer high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. This paper details the performance analysis of ReneGENE-GI's Comparative Genomics Module (CGM), the compute intensive stage of the pipeline. This module comes in two flavours, designed to run on GPUs and FPGAs respectively, hosted on HPC platforms. The pipeline uses a very efficient reference indexing algorithm based on the dynamic Monotonic Minimal Perfect Hashing Function (MMPH), allowing an absolute indexing for the reference genome, thus avoiding heuristics. Alignment time for our FPGA version is about one-tenth the time taken by our single GPU implementation, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the FPGA version of our CGM is several orders faster than the competitors, offering precision over heuristics. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
50. An experimental synthesis methodology of fractional-order chaotic attractors.
- Author
-
Sánchez-López, C.
- Abstract
A novel synthesis methodology of fractional-order chaotic systems, from the level of nonlinear systems until their experimental verification using microcontrollers, is presented. Firstly, the integer-order behavioral model of the Lorenz's, Rossler's, Chen's, Liu's, Saturated Nonlinear Function Series at one- and two-direction systems is briefly reviewed. Secondly, a first-order transfer function that approximates the behavior of a fractional-order integrator, based on continued fraction expansion method, is substituting the integer-order integrator inside the revised previously chaotic systems. Thirdly, the minimum phase Al-Alaoui's transformation method is used for synthesizing all fractional-order chaotic systems in the discrete domain and are programmed in MATLAB and in the Arduino DUE development board. For a fair comparison, the minimum phase Al-Alaoui's algorithm not only is used for solving the integer-order chaotic systems and it is also programmed in MATLAB and Arduino DUE board, but same initial conditions are also used for both interger- and fractional-order chaotic systems. Finally, experimental results of the integer- and fractional-order chaotic oscillators are shown. The results obtained not only allow a simple synthesis methodology of fractional-order chaotic attractors and their experimental evidence on reconfigurable hardware, but also demonstrate the viability of fractional attractors to be used in various applications such as secure communications, robot control, cryptography and so on. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.