"Reconfigurable hardware" / Publication Type: Dissertations - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Reconfigurable hardware"' showing total 186 results

Start Over "Reconfigurable hardware" Publication Type Dissertations

186 results on '"Reconfigurable hardware"'

1. Harnessing reconfigurable hardware to design heterogeneous systems

Author: Iordanou, Konstantinos and Kotselidis, Christos-Efthymios
Abstract: A typical Machine Learning (ML) development cycle for edge computing seeks to maximise the performance during model training, and then minimise the memory/area footprint of the trained model for deployment on edge devices, targeting CPU, GPU, microcontrollers, or custom hardware accelerators. A reasonable question to postulate would be: Could we develop a supervised learning technique that takes data as input, and generates a circuit representation for classification behaving like an ML model? This thesis proposes a methodology for automatically generating predictor circuits for the classification of tabular data. In contrast to image and text, tabular data can combine numerical and categorical data. The proposed approach provides comparable prediction performance to conventional ML techniques, whilst using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates, and generates a classifier circuit automatically, with maximised training prediction accuracy. Classifier circuits are called ``Tiny Classifiers" since they consist of no more than 400 logic gates. They can efficiently be implemented as ASIC blocks or FPGA accelerators. The Auto Tiny Classifiers methodology or AutoTiC is evaluated on a wide range of tabular datasets and is compared against conventional ML techniques, such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. When they are implemented in ASIC, they use 10-75x less area/power and can be clocked 2-3x faster compared to the corresponding ML baselines. When implemented on an FPGA, they use 3-11x fewer resources. The slowing of Moore's law and the breakdown of Dennard scaling have pushed computing systems towards increased specialisation. Novel architectures are required to provide greater performance scaling than traditional approaches. Heterogeneity was introduced as an alternative to sidestep the performance wall of multi-core processors. In contrast to homogeneous systems, heterogeneous systems use a mixture of dedicated cores that are specialised for specific tasks. Cloud providers are trying to gain a competitive advantage with heterogeneous systems, that combine powerful CPUs, GPUs, FPGAs, and TPUs. An FPGA-oriented SoC includes application custom hardware kernels on the die. Improving the performance of SoCs, by including specialised hardware, requires a deep understanding of the computationally significant kernels for the applications under consideration. However, the design of SoCs does not only require the deployment of custom hardware kernels within the system. The study of system-level bottlenecks is also an important part of the development process for high-performance specialised software/hardware systems. For these systems, there is a key research problem that needs to be addressed: How does the interaction of custom compute kernels with processors affect the overall performance of a system, and what would be the optimal integration of a hardware kernel within the cache memory hierarchy of an SoC to extract better performance? This thesis describes a methodology for microarchitectural simulation of SoCs that offers the flexibility to identify and improve system-level bottlenecks, by studying the effect of custom compute kernels on the cache memory hierarchy of an SoC. Hardware designers can perform a timing simulation of SoCs while tuning the microarchitecture of the computing cores and custom hardware kernels. The proposed methodology offers the novel capability to place Register-Transfer Level (RTL) compute kernels in a simulation environment and perform a timing analysis of their interaction with the cache memory hierarchy. Application binaries are instrumented dynamically to generate processor load/store, program counter events, and any memory access generated by hardware kernels, that are sent to hardware-based timing models of processors and memory hierarchies. Some of the key features of the proposed simulation methodology are the ability to code exclusively at the user level, the dynamic discovery and use of the available hardware models at execution time, as well as the transparent testing and optimisation of the custom compute kernels with the cache memory hierarchy in a heterogeneous system. The final portion of this work focuses on the deployment of Tiny Classifier circuits as custom compute kernels in an SoC under-test. Additionally, other custom hardware RTL kernels from a wide range of benchmark suites are explored as part of an SoC. Different scenarios and integration of the hardware kernels in the cache memory hierarchy of an SoC are analysed by using the proposed simulation framework.
Published: 2023

2. Dynamic memory management for reconfigurable hardware

Author: Xue, Zeping and Thomas, David
Subjects: 621.3
Abstract: The main motivation for dynamic memory management is to increase the memory efficiency of a system by allowing memory chunks to be re-used at run-time. At a software level, programming languages such as C employ malloc() and free() functions that can be called as an application runs in order to acquire memory chunks of a requested size and return memory blocks holding objects that are no longer useful respectively. Despite the fact that software-based memory management has been studied for decades, hardware-based dynamic memory management has largely remained unexplored. With an increasing trend towards the use of hardware accelerators in both embedded and cloud applications, field-programmable gate arrays (FPGAs) are becoming widely adopted by both academia and industry. As chip densities increase, FPGAs are becoming more resource-rich. This gives the chance of mapping larger scale applications on FPGAs. For memory-demanding and memory-footprint-complex applications, the memory resource can rapidly become a constraint to designs; additionally, FPGA development flows conventionally use design-time static memory allocations. As a result, for memory footprint complex applications, the development process can be labour intensive and relies on human-engineered memory allocation. This thesis aims to find hardware solutions for dynamic memory management so that FPGA applications can use memory dynamically at run-time. The novel contributions of this thesis are: 1) A design of a hardware dynamic memory manager, SysAlloc, which is flexible in managing any range of memory size and scalable in serving an arbitrary number of clients while keeping the resource utilisation low. 2) A framework, SynADT, for implementing dynamic data structures in HLS using run-time dynamic memory management. 3) A benchmarking methodology, BenchADT, for comparing and evaluating dynamic memory managers and platforms. 4) An enhanced hardware dynamic memory manager, ZepAlloc, which, similarly to SysAlloc, can manage any range of memory size and provide memory management to any number of clients connected to the same bus, but also hides the memory-management latency to clients by using size-segregated pre-allocation queues.
Published: 2019
Full Text: View/download PDF

3. Automated methodologies for mapping convolutional neural networks on reconfigurable hardware

Author: Venieris, Stylianos and Bouganis, Christos-Savvas
Subjects: 006.3
Abstract: Convolutional neural networks (ConvNets) are a family of machine learning models which have demonstrated state-of-the-art performance in a wide range of Artificial Intelligence (AI) tasks. To obtain accuracy gains, ConvNets have been typically enhanced either by designing deeper and wider models with a larger number of trainable parameters, or by designing novel components that introduce irregular dataflow. Both approaches are computationally expensive and pose challenges with respect to the deployment of ConvNets in real-life applications. ConvNet-enabled applications are also characterised by a variability across performance requirements, spanning from throughput-driven to latency-critical systems. This property calls for a model- and performance-aware design of computing systems in order to meet the diverse application-level specifications. Furthermore, in emerging complex AI systems, such as autonomous vehicles, ConvNets constitute mere building blocks of the overall system leading to multi-ConvNet settings. Upon deployment, the different models have to run concurrently, meet their respective performance constraints and share the underlying resources. This thesis proposes design methodologies and hardware architectures targeting field-programmable gate arrays (FPGAs) that address the aforementioned challenges, aiming for the high-performance deployment of ConvNets. The contributions of this work include: an analytical model for representing both ConvNet workloads and hardware mappings, together with a ConvNet-to-FPGA toolflow for the automated generation of ConvNet accelerators; a latency-driven methodology for the generation of latency-optimised hardware mappings which meet the stringent response-time constraints of modern ConvNet applications; novel architectural optimisations for state-of-the-art ConvNets with irregular connectivity, together with the corresponding mapping methodology; and a toolflow for the parallel deployment of multiple ConvNets on a single FPGA, enabling emerging multi-ConvNet applications. By applying the above methodologies to real-life workloads, it is shown that significant performance gains are achieved over existing state-of-the-art implementations on FPGAs and GPUs, enabling in this way the automated generation of ConvNet accelerators that are tailored to both the ConvNet-FPGA pair and the target performance requirements in single- and multi-ConvNet settings.
Published: 2019
Full Text: View/download PDF

4. Single event upset mitigation techniques in reconfigurable hardware

Author: Vavouras, Michail and Bouganis, Christos
Subjects: 621.3815
Abstract: Advances in semiconductor technology using smaller sizes of transistors in order to fit more of them in the same area and increase performance, pose a threat for the reliability of integrated circuits. Technology scaling accelerates transistor ageing and degradation, causing more faults during the lifetime of an integrated circuit. Sources of faults such as manufacturing defects, degradation and ageing of transistors degrade the performance of integrated circuits leading to faults with a permanent effect that might be catastrophic for certain applications. A special case of integrated circuits, FPGAs, suffer from radiation-induced faults since they contain million of bits for the configuration of their resources that if flipped due to radiation might change the intended functionality of the application running on the FPGA, causing a failure. However, FPGAs can be dynamically reconfigured in the field and mitigate radiation effects providing fault-tolerance and high availability. A novel fault-tolerant architecture for an artificial pancreas application is proposed that consists of a mixed substrate of ASIC and FPGA. Fault detection is provided through modular redundancy, and dynamic reconfiguration is used as a repair mechanism. Experimental results show that 5,100x lower probability of failures per hour (PFH) than a DMR for permanent faults can be achieved with 2.4x more area than DMR. In addition, the proposed solution achieves 83x lower PFH than a TMR with 1.6x area overheads when considering transient faults. A framework supporting fault injection at the configuration memory of an SRAM FPGA and scrubbing was developed throughout this work. The framework supports various SEU and scrub rates and is implemented on the modern ZYNQ FPGA architecture. Existing scrubbing strategies were implemented for a second-order polynomial case study together with two new scrubbing techniques taking into consideration area information of the modules of the application. Experimental results show that the area-driven scrubbing technique achieves 43.6% LUTs and 40.9% REGs savings when compared to a DMR design. The area-driven technique for the partial TMR design saves 15% LUTs and 23% REGs area as compared to the TMR without sacrificing availability, but with increased power consumption for scrubbing. The conclusion of the work is that dynamic reconfiguration techniques can be effectively applied in FPGAs for trading-off resources and power consumption for availability.
Published: 2017
Full Text: View/download PDF

5. Reconfigurable hardware-based multi-agent systems for capital markets trading

Author: Gerlein, Eduardo
Subjects: 006.3
Abstract: The use of High Performance Computing (HPC) in capital markets has witnessed considerable growth in the past decade. In particular, electronic trading in globalized markets and exchanges requires sophisticated communication and data management to support the massive amount of incoming streaming data where the main problem is in latency management. In addition, novel algorithms for trading may incorporate computational intelligence techniques in order to implement and improve the current decision making process. Multi-Agent Systems (MAS) have been recognized as a feasible solution to address complex problems in many areas and appear an innovative, powerful and flexible solution for implementing trading engines. On other hand, reconfigurable hardware and in particular Field Programmable Gate Arrays (FPGA) offers performance benefits over conventional software implementations, and seems to be the next logical step in the development of multi-agent technology. However, only a very limited number of projects have reported multi-agent implementations in reconfigurable hardware. Current agent oriented programming (AOP) methodologies are not entirely appropriate for the design and deployment of MAS at microchip level, making agents in hardware difficult to engineer. This arises because there is no clear methodology for their design that incorporates a similar level of conceptualization to software implementations, while at the same time takes into account the specific requirements for FPGAs. This thesis presents as its main contributions a novel methodology to implement MAS in FPGA using the Event-Driven Reactive Architecture (EDRA) at agent level and a hierarchical Network-on- Chip (NoC) approach at societal level, presenting an agent-based trading engine as a validation scenario. EDRA is proposed to design and implement the internal architecture of hardware-based scenario. EDRA is proposed to design and implement the internal architecture of hardware-based agents allowing one to overcome the absence of a well-defined procedure to model and deploy a MAS in reconfigurable hardware. It uses a fine-grained task decomposition inside agents to generate reactive behaviours and link them with consistent hardware interfaces to enable internal flow of information, favouring modular constructions, flexibility and re-use of structures. The communication model at societal level consists of a combination of a Star-NoC topology, scaling in a hierarchical fashion by means of the integration of lower level clusters of agents and routers, in conjunction with a message broadcast mechanism through standardized interfaces using the Open Core Protocol (OCP). A router microarchitecture and network adapters are designed to interface EDRA agents into the NoC. In conjunction, EDRA and the Star-NoC allow for the design of Multi- Agent System-on-Chip (MASoC), extending an agent oriented design model to the realm of hardware design. Furthermore, this thesis demonstrates how to use the proposed model to design and deploy an agent-based trading engine, implemented in an Altera Stratix IV FPGA. With this application an agent-based High Performance Computing platform for financial applications is created, but a Machine Learning (ML) technique as a mean to increment the agent’s cognitive capabilities is also included, achieving a performance adequate for practical High Frequency Trading (HFT) applications.
Published: 2015

6. Optimising financial computation for reconfigurable hardware

Author: Jin, Qiwei and Luk, Wayne
Subjects: 004
Abstract: This thesis proposes novel methodologies for design, optimisation and generalisation of reconfigurable hardware based finance computation. The applications of the proposed methodologies to numerical methods which are commonly used in the finance industry, such as Monte Carlo and Finite Difference are studied in detail. These studies show reconfigurable hardware can effectively improve performance and energy efficiency in finance computation. There are three contributions. First, an application independent Monte Carlo framework for interest rate derivatives payoff evaluations based on the HeathJarrowMorton (HJM) mathematical Framework. By identifying three levels of functional specialisations in the model, the framework is able to retain good performance while supporting multiple applications. In addition, a process is proposed for the Monte Carlo framework to identify the optimal reduced precision data representation, in order to utilise hardware resource better and retain output numerical accuracy. The automatically generated Field-Programmable Gate Array (FPGA) implementations show significant speedup and energy saving over comparable Central Processing Unit (CPU) and Graphical Processing Unit (GPU). Second, a novel framework for accelerating option payoff evaluation based on finite difference method. The parallelism of the proposed architectures is exploited based on two levels of computational granularities. The implementations are generated based on a high level description. Significant speedup and energy savings are archived comparing our FPGA designs over both CPU and GPU designs. Third, a novel performance optimisation process based on dynamic reconfiguration for stencil computation. By optimally adjusting the underlying numerical procedure and making use of carefully chosen coefficients for constant multipliers, both the hardware resource consumption per kernel and the amount of computation needed per problem are reduced, and the numerical accuracy requirements are also met. Significant speedup is shown by comparing the optimised dynamic design with the unoptimised dynamic design and the original static design.
Published: 2014
Full Text: View/download PDF

7. Random forest training on reconfigurable hardware

Author: Cheng, Chuan and Bouganis, Christos-Savvas
Subjects: 621.3
Abstract: Random Forest (RF) is one of the most widely used supervised learning methods available. An RF is ensemble of decision tree classifiers with injection of several sources of randomness. It demonstrates a set of improvement over single decision and regression trees and is comparable or superior to major classification tools such as support vector machine (SVM) and adaptive boosting (Adaboost) with respect to accuracy, interpretability, robustness and processing speed. RF can be generally divided into training process and predicting process. Recently with emergence of large-scale data mining applications, the RF training process implemented in software on a single computer can no longer induce a complex RF model within reasonable amount of time. Alternative solutions involving computer clusters and GPUs usually come with disadvantages with respect to Performance/Power ratio and are not feasible for portable/embedded applications. In this work a set of FPGA-based implementations of the RF training process are proposed. FPGA devices allow construction of efficient custom hardware architectures and feature lower power consumption than typical GPPs or GPUs therefore are suitable for portable/embedded applications. The proposed hardware training architectures take advantage of different types of inherent parallelism in the RF training algorithm and distribute the workload to a set of parallel workers. Combining the parallel processing techniques with custom hardware designs featuring low latency, the architectures are able to accelerate the training process without loss in accuracy.
Published: 2015
Full Text: View/download PDF

8. Optimising and evaluating designs for reconfigurable hardware

Author: Becker, Tobias, Luk, Wayne, and Cheung, Peter
Subjects: 004
Abstract: Growing demand for computational performance, and the rising cost for chip design and manufacturing make reconfigurable hardware increasingly attractive for digital system implementation. Reconfigurable hardware, such as field-programmable gate arrays (FPGAs), can deliver performance through parallelism while also providing flexibility to enable application builders to reconfigure them. However, reconfigurable systems, particularly those involving run-time reconfiguration, are often developed in an ad-hoc manner. Such an approach usually results in low designer productivity and can lead to inefficient designs. This thesis covers three main achievements that address this situation. The first achievement is a model that captures design parameters of reconfigurable hardware and performance parameters of a given application domain. This model supports optimisations for several design metrics such as performance, area, and power consumption. The second achievement is a technique that enhances the relocatability of bitstreams for reconfigurable devices, taking into account heterogeneous resources. This method increases the flexibility of modules represented by these bitstreams while reducing configuration storage size and design compilation time. The third achievement is a technique to characterise the power consumption of FPGAs in different activity modes. This technique includes the evaluation of standby power and dedicated low-power modes, which are crucial in meeting the requirements for battery-based mobile devices.
Published: 2011
Full Text: View/download PDF

9. Reconfigurable hardware for control applications

Author: Milligan, Graeme Richard
Subjects: 330.9, QA75 Electronic computers. Computer science, TK Electrical engineering. Electronics Nuclear engineering
Abstract: This portfolio document is intended to present the work carried out in order to meet the requirements of the Engineering Doctorate (EngD) program undertaken at the Institute for System Level Integration (ISLI). This program was undertaken in partnership with the Universities of Glasgow, Edinburgh, Strathclyde and Heriott Watt and was funded by EPSRC and SLI Ltd. The use of control systems is becoming ubiquitous with even the simplest of systems now employing some kind of control logic. For this reason the project investigated the use and development of reconfigurable hardware for control applications. This first involved a detailed analysis of the current state of the art in the reconfigurable field as well as some selected applications where it is thought this technology may be of benefit. The main body of the project was separated into three distinct areas of research and is hence presented as a collection of three technical documents. The first of these areas was the use of reconfigurable hardware for the implementation of Finite State Machines (FSM) with particular reference to reducing the size of the hardware block required to implement these structures. From this a novel implementation method was developed based on the principle of Forward Transition Expressions which are capable of implementing FSMs on a reconfigurable device using run-time reconfiguration. The second area of research was the investigation of the characteristics of reconfigurable devices with a view to estimating the amount of hardware required within a device from high level parameters. The final area of research was the development of a custom reconfigurable device specifically tailored for the implementation of FSM.
Published: 2008

10. Data representation optimisation for reconfigurable hardware design

Author: Osborne, William George, Luk, Wayne, and Mencer, Oskar
Subjects: 004
Abstract: One of the challenges of designing hardware circuits is representing the data in an efficient way - minimising area and power while maximising clock frequency. There are several ways of representing variables, each with different characteristics, such as the effect arithmetic operations have on the absolute and relative error. In the first part of this thesis, a new method of transforming arithmetic by combining different numerical representations to exploit their advantages is discussed. The problem is formulated as a set of linear equations which are then solved to find the optimal solution. Algorithms that generate sub-optimal solutions are investigated because they take a fraction of the time to run. A new reconfigurable device structure is proposed based on the results presented. In this case, the accuracy of the original application is guaranteed to be met regardless of the input data. In many applications, guaranteeing that a transformed design has at least the same accuracy as the original is not a strong enough constraint. For this reason, the error on the output is guaranteed to be lower than a specified value. In the second part of this thesis, accuracy reduction is investigated with the goal of minimising circuit area. Energy-efficient run-time reconfigurable hardware is automatically created by systematically deactivating parts of the circuit based on the accuracy required. A model to determine the conditions under which reconfiguring the chip, if this is possible, is more energy-efficient than multiplexing is shown. The approach is expanded to general purpose processors; a new computational model - both software and hardware architecture - to reduce the energy of future devices is introduced.
Published: 2011
Full Text: View/download PDF

11. Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis

Author: Hussain, Hanaa Mohammad, Erdogan, Ahmet, Benkrid, Khaled, Hossack, Will, and Easson, Bill
Subjects: 572.8, FPGA, microarray, DPR, bioinformatics
Abstract: The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis.
Published: 2012

12. Online scheduling for real-time multitasking on reconfigurable hardware devices

Author: Wassi-Leupi, Guy
Subjects: 004.33, FPGA, Field programmable gate array, Reconfigurable SoC, DES, Modelling, Scheduling, Placement, RTOS, Real-time operating system, DSE, Design space exploration
Abstract: Nowadays the ever increasing algorithmic complexity of embedded applications requires the designers to turn towards heterogeneous and highly integrated systems denoted as SoC (System-on-a-Chip). These architectures may embed CPU-based processors, dedicated datapaths as well as recon gurable units. However, embedded SoCs are submitted to stringent requirements in terms of speed, size, cost, power consumption, throughput, etc. Therefore, new computing paradigms are required to ful l the constraints of the applications and the requirements of the architecture. Recon gurable Computing is a promising paradigm that provides probably the best trade-o between these requirements and constraints. Dynamically recon gurable architectures are their key enabling technology. They enable the hardware to adapt to the application at runtime. However, these architectures raise new challenges in SoC design. For example, on one hand, designing a system that takes advantage of dynamic recon guration is still very time consuming because of the lack of design methodologies and tools. On the other hand, scheduling hardware tasks di ers from classical software tasks scheduling on microprocessor or multiprocessors systems, as it bears a further complicated placement problem. This thesis deals with the problem of scheduling online real-time hardware tasks on Dynamically Recon gurable Hardware Devices (DRHWs). The problem is addressed from two angles : (i) Investigating novel algorithms for online real-time scheduling/placement on DRHWs. (ii) Scheduling/Placement algorithms library for RTOS-driven Design Space Exploration (DSE). Regarding the first point, the thesis proposes two main runtime-aware scheduling and placement techniques and assesses their suitability for online real-time scenarios. The first technique discusses the impact of synthesizing, at design time, several shapes and/or sizes per hardware task (denoted as multi-shape task), in order to ease the online scheduling process. The second technique combines a looking-ahead scheduling approach with a slots-based recon gurable areas management that relies on a 1D placement. The results show that in both techniques, the scheduling and placement quality is improved without signi cantly increasing the algorithm time complexity. Regarding the second point, in the process of designing SoCs embedding recon gurable parts, new design paradigms tend to explore and validate as early as possible, at system level, the architectural design space. Therefore, the RTOS (Real-Time Operating System) services that manage the recon gurable parts of the SoC can be re fined. In such a context, gathering numerous hardware tasks scheduling and placement algorithms of various complexity vs performance trade-o s in a kind of library is required. In this thesis, proposed algorithms in addition to some existing ones are purposely implemented in C++ language, in order to insure the compatibility with any C++/SystemC based SoC design methodology.
Published: 2011

13. Pattern Classification using Reconfigurable Hardware

Author: Kizito, Jimmy Anthony Galiwango
Subjects: 004
Published: 2009

14. Biologically inspired neural network implementations on reconfigurable hardware

Author: Glackin, Brendan
Subjects: 006.32
Abstract: For a considerable period of time, the goal of the computational intelligence research community has been the creation of an artificial system with the ability to leam for itself in a manner that replicates to some degree, the natural intelligence of the human brain. The development of the integrated circuit (IC), and the accompanying genesis of the modem day computer in the 1960s, was perhaps seen as a significant advancement towards this objective. However, whilst technology has progressed at an extraordinary rate, the fundamental issue is that it is very difficult to develop truly intelligent systems, irrespective of the vast number of computations that can be performed per second.
Published: 2008

15. Reconfigurable Hardware for the Correlating Electron Spectrograph

Author: Huber, Nicolas
Subjects: 520
Abstract: This thesis describes an aspect of the operational and functional implementation of the CORES instrument, an electronion spectrograph to be flown aboard the Russian Obstanovka Experiment Complex mission to the ISS in 2008. For the phenomenon under investigation, wave-particle interactions in space plasma, CORES is ideally suited as it can accept a very wide band of energies simultaneously and at the same time have 3600 field-of-view. This instrument includes a novel delay-baSed particle readout system, . developed specifically for this application. A reconfigurable device (FPGA) lies at the heart of the instrument, and this has been taken fully advantage of to achieve the necessary functionality. As such this thesis mostly focuses on the configuration of this device, which allows for the multi-channel particle capturing, as well as defining the processing abilities of the instrument. These include a number of novel functions implemented in hardware for spectral analysis of the recorded particle events, such as a parallel version of se~ond order correlation, a parallel version of the I-bit autocorrelation function, etc. Furthermore, an innovative tool for on-board telemetry filtering is also presented, namely the on-board calculation of the information content of the collected measurements. This thesis also offers a technical view of CORES, presenting its capabilities based on its hardware structure.
Published: 2007

16. Management and programming of reconfigurable hardware resources

Author: Thomas, David
Subjects: 006.62
Published: 2006

17. Accelerating matrix product on reconfigurable hardware for image processing applications

Author: BensaaÌ‚li, F.
Subjects: 621.367
Published: 2005

18. A framework for refining functional specifications into parallel reconfigurable hardware implementations

Author: Hawkins, John
Subjects: 621.395
Published: 2005

19. Design and implementation of a high level image processing machine using reconfigurable hardware

Author: Donachy, Paul
Subjects: 621.3994, Field Programmable Gate Array
Published: 1996

20. Hardware evolution : automatic design of electronic circuits in reconfigurable hardware by artificial evolution

Author: Thompson, Adrian
Subjects: 620.0042029, Fault tolerance, Computation
Published: 1996

21. Low-power adaptive control scheme using switching activity measurement method for reconfigurable analog-to-digital converters

Author: Ab Razak, Mohd Zulhakimi, Arslan, Tughrul, and Hamilton, Alister
Subjects: 621.3, Reconfigurable hardware, Low power, Switching activity, Analog-to-digital converters
Abstract: Power consumption is a critical issue for portable devices. The ever-increasing demand for multimode wireless applications and the growing concerns towards power-aware green technology make dynamically reconfigurable hardware an attractive solution for overcoming the power issue. This is due to its advantages of flexibility, reusability, and adaptability. During the last decade, reconfigurable analog-to-digital converters (ReADCs) have been used to support multimode wireless applications. With the ability to adaptively scale the power consumption according to different operation modes, reconfigurable devices utilise the power supply efficiently. This can prolong battery life and reduce unnecessary heat emission to the environment. However, current adaptive mechanisms for ReADCs rely upon external control signals generated using digital signal processors (DSPs) in the baseband. This thesis aims to provide a single-chip solution for real-time and low-power ReADC implementations that can adaptively change the converter resolution according to signal variations without the need of the baseband processing. Specifically, the thesis focuses on the analysis, design and implementation of a low-power digital controller unit for ReADCs. In this study, the following two important reconfigurability issues are investigated: i) the detection mechanism for an adaptive implementation, and ii) the measure of power and area overheads that are introduced by the adaptive control modules. This thesis outlines four main achievements to address these issues. The first achievement is the development of the switching activity measurement (SWAM) method to detect different signal components based upon the observation of the output of an ADC. The second achievement is a proposed adaptive algorithm for ReADCs to dynamically adjust the resolution depending upon the variations in the input signal. The third achievement is an ASIC implementation of the adaptive control module for ReADCs. The module achieves low reconfiguration overheads in terms of area and power compared with the main analog part of a ReADC. The fourth achievement is the development of a low-power noise detection module using a conventional ADC for signal improvement. Taken together, the findings from this study demonstrate the potential use of switching activity information of an ADC to adaptively control the circuits, and simultaneously expanding the functionality of the ADC in electronic systems.
Published: 2014

22. Embebed wavelet image reconstruction in parallel computation hardware

Author: Guevara Escobedo, Jorge, Ozanyan, Krikor, and Yin, Hujun
Subjects: 621.36, Wavelets, Multiresolution Tomography, Fast Tomography Reconstruction, Radon Transform, Filtered Backprojection, Computed Tomography, Local Tomography, Parallel Tomography Reconstruction, Embedded Tomography, Reconfigurable Hardware, FPGAs
Abstract: In this thesis an algorithm is demonstrated for the reconstruction of hard-field Tomography images through localized block areas, obtained in parallel and from a multiresolution framework. Block areas are subsequently tiled to put together the full size image. Given its properties to preserve its compact support after being ramp filtered, the wavelet transform has received to date much attention as a promising solution in radiation dose reduction in medical imaging, through the reconstruction of essentially localised regions. In this work, this characteristic is exploited with the aim of reducing the time and complexity of the standard reconstruction algorithm. Independently reconstructing block images with geometry allowing to cover completely the reconstructed frame as a single output image, allows the individual blocks to be reconstructed in parallel, and to experience its performance in a multiprocessor hardware reconfigurable system (i.e. FPGA). Projection data from simulated Radon Transform (RT) was obtained at 180 evenly spaced angles. In order to define every relevant block area within the sinogram, forward RT was performed over template phantoms representing block frames. Reconstruction was then performed in a domain beyond the block frame limits, to allow calibration overlaps when fitting of adjacent block images. The 256 by 256 Shepp-Logan phantom was used to test the methodology of both parallel multiresolution and parallel block reconstruction generalisations. It is shown that the reconstruction time of a single block image in a 3-scale multiresolution framework, compared to the standard methodology, performs around 48 times faster. By assuming a parallel implementation, it can implied that the reconstruction time of a single tile, should be very close related to the reconstruction time of the full size and resolution image.
Published: 2016

23. ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware

Author: Silva, Antonio Carlos Fernandes da, primary
Full Text: View/download PDF

24. Architecture and methodology for automated development of evolvable and reconfigurable hardware applications

Author: Mora de Sambricio, Javier, primary
Full Text: View/download PDF

25. The unified floating point vector coprocessor for reconfigurable hardware

Author: Kathiara, Jainik, primary
Full Text: View/download PDF

26. High - performance implementation of algorithms on reconfigurable hardware

Author: Γέντσος, Χρίστος, primary
Full Text: View/download PDF

27. Multi-standard adaptive wireless communication receivers : adaptive applications mapped on heterogeneous dynamically reconfigurable hardware

Author: Rauwerda, Gerard Keimpe, primary
Full Text: View/download PDF

28. Multigrid solvers in reconfigurable hardware. (c2006)

Author: Kasbah, Safaa J., primary
Full Text: View/download PDF

29. Variable Fractional Digital Delay Filter on Reconfigurable Hardware

Author: Sangaiah, Karthik Ramu, additional
Full Text: View/download PDF

30. Hexarray: A Novel Self-Reconfigurable Hardware System

Author: Hussein, Fady, primary
Full Text: View/download PDF

31. Enhancing Trust in Reconfigurable Hardware Systems

Author: Venugopalan, Vivek
Subjects: Secure Computing, Trusted Computing, Resilient Computing, Root of Trust, Hardware Trojans, Cyber Physical System Security, Embedded Systems, Reconfigurable Hardware
Abstract: A Cyber-Physical System (CPS) is a large-scale, distributed, embedded system, consisting of various components that are glued together to realize control, computation and communication functions. Although these systems are complex, they are ubiquitous in the Internet of Things (IoT) era of autonomous vehicles/drones, smart homes, smart grids, etc. where everything is connected. These systems are vulnerable to unauthorized penetration due to the absence of proper security features and safeguards to protect important information. Examples such as the typewriter hack involving subversive chips resulting in leakage of keystroke data and hardware backdoors crippling anti-aircraft guns during an attack demonstrate the need to protect all system functions. With more focus on securing a system, trust in untrusted components at the integration stage is of a higher priority. This work builds on a red-black security system, where an architecture testbed is developed with critical and non-critical IP cores and subjected to a variety of Hardware Trojan Threats (HTTs). These attacks defeat the classic trusted hardware model assumptions and demonstrate the ability of Trojans to evade detection methods based on physical characteristics. A novel metric is defined for hardware Trojan detection, termed as HTT Detectability Metric (HDM) that leverages a weighted combination of normalized physical parameters. Security analysis results show that using HDM, 86% of the implemented Trojans were detected as compared to using power consumption, timing variation and resource utilization alone. This led to the formulation of the security requirements for the development of a novel, distributed and secure methodology for enhancing trust in systems developed under untrusted environments called FIDelity Enhancing Security (FIDES). FIDES employs a decentralized information flow control (DIFC) model that enables safe and distributed information flows between various elements of the system such as IP cores, physical memory and registers. The DIFC approach annotates/tags each data item with its sensitivity level and the identity of the participating entities during the communication. Trust enhanced FIDES (TE-FIDES) is proposed to address the vulnerabilities arising from the declassification process during communication between third-party soft IP cores. TE-FIDES employs a secure enclave approach for preserving the confidentiality of the sensitive information in the system. TE-FIDES is evaluated by targeting an IoT-based smart grid CPS application, where malicious third-party soft IP cores are prevented from causing a system blackout. The resulting hardware implementation using TE-FIDES is found to be resilient to multiple hardware Trojan attacks.
Published: 2017

32. Scheduling Tasks on Heterogeneous Chip Multiprocessors with Reconfigurable Hardware

Author: Teller, Justin Stevenson
Subjects: Computer Science, Electrical Engineering, scheduling and task partitioning, reconfigurable hardware, parallel processors, emerging technologies, heterogeneous systems, Network on a Chip (NoC), Chip Multiprocessor (CMP), Heterogeneous Chip Multiprocessor (H-CMP), matching and scheduling
Abstract: This dissertation presents several methods to more efficiently use the computational resources availableon a Heterogeneous Chip Multiprocessor (H-CMP). Using task scheduling techniques, three challenges to the effective usage of H-CMPs are addressed: the emergence of reconfigurable hardware in general purpose computing, utilization of the network on a chip (NoC), and fault tolerance.To utilize reconfigurable hardware, we introduce the Mutually Exclusive Processor Groups reconfiguration model, and an accompanying task scheduler, theHeterogeneous Earliest Finish Time with Mutually Exclusive Processor Groups (HEFT-MEG) scheduling heuristic. HEFT-MEG schedules reconfigurations using a novel back-tracking algorithm to evaluatehow different reconfiguration decisions affect previously scheduled tasks. In both simulation and real execution, HEFT-MEG successfully schedules reconfiguration allowing the architecture to adapt to changing application requirements.After an analysis of IBM's Cell Processor NoC and generation of a simple stochastic model, we propose a hybrid task scheduling system using a Compile- and Run-time Scheduler (CtS and RtS) that work in concert. The CtS, Contention Aware HEFT (CA-HEFT), updates task start and finish times when schedulingto account for network contention. The RtS, the Contention Aware Dynamic Scheduler (CADS), adjusts the schedule generated by CA-HEFT to account for variation in the communication pattern and actual task finish times, using a novel dynamic block algorithm. We find that using a CtS and RtS in concert improves the performance of several application types in real execution on the Cell processor.To enhance fault tolerance, we modify the previously proposed hybrid scheduling system to accommodate variability in the processor availability. The RtS is divided into two portions, the Fault Tolerant Re-Mapper (FTRM) and the Reconfiguration and Recovery Scheduler (RRS). FTRM examines the current processor availability and remaps tasks to the available set of processors. RRS changes the reconfiguration schedule so that the reconfigurations more accurately reflect the new hardware capabilities. The proposed hybrid scheduling system enables application performanceto gracefully degrade when processor availability diminishes, and increase when processor availability increases.
Published: 2008

33. Mapping recursive functions to reconfigurable hardware

Author: Ferizis, George
Subjects: reconfigurable hardware, fpga, recursion, compilers, simulation
Abstract: Reconfigurable computing is a method of development that provides a developer with the ability to reprogram a hardware device. In the specific case of FPGAs this allows for rapid and cost effective implementation of hardware devices when compared to standard a ASIC design, coupled with an increase in performance when compared to software based solutions. With the advent of development tools such as Celoxica's DK package and Xilinx's Forge package, that support languages traditionally associated with software development, a change in the skill sets required to develop FPGA solutions from hardware designers to software programmers is possible and perhaps desirable to increase the adoption of FPGA technologies. To support developers with these skill sets tools should closely mirror current software development tools in terms of language, syntax and methodology, while at the same time both transparently and automatically take advantage of as much of the increased performance that reconfigurable architectures can provide over traditional software architectures by utilizing the parallelism and the ability to create arbitrary depth pipelines which is not present in traditional microprocessor designs. A common feature of many programming languages that is not supported by many higher level design tools is recursion. Recursion is a powerful method used to elegantly describe many algorithms. Recursion is typically implemented by using a stack to store arguments, context and a return address for function calls. This however limits the controlling hardware to running only a single function at any moment which eliminates an algorithm's ability to take advantage of the parallelism available between successive iterations of a recursive function. This squanders the high amount of parallelism provided by the resources on the FPGA thus reducing the performance of the recursive algorithm. This thesis presents a method to address the lack of support for recursion in design tools that exploits the parallelism available between recursive calls. It does this by unrolling the recursion into a pipeline, in a similar manner to the pipeline obtained from loop unrolling, and then streaming the data through the resulting pipeline. However essential differences between loops and recursive functions such as multiple recursive calls in a function, and hence multiple unrollings, and post-recursive statements add further complexity to the issue of unrolling as the pipeline may take a non-linear shape and contain heterogeneous stages. Unrolling the recursive function on the FPGA increases the parallelism available, however the depth of the pipline and therefore the amount of parallelism available, is limited by the finite resources on the FPGA. To make efficient use of the resources on the FPGA the system must be able to unroll the function in a way to best suit the input but also must ensure that the function is not unrolled past its maximum recursive depth. A trivial solution such as unrolling on-demand introduces a latency into the system when a further instance of the function is unrolled that reduces overall performance. To reduce this penalty it is desirable for the system to be able to predict the behaviour of the recursive function based on the input data and unroll the function to a suitable length prior to it being required. Accurate prediction is possible in cases where the condition for recursion is a simple function on the arguments, however in cases where the condition for recursion is based on complex functions, such as the entire recursive function, accurate prediction is not possible. In situations such as this a heuristic is used which provides a close approximation to the correct depth of recursion at any given time. This prediction allows the system to reduce the performance penalty from real time unrolling without over utilization of the the FPGA resources. Results obtained demonstrate the increase in performance for various recursive functions obtained from the increased parallelism, when compared to a stack based implementation on the same device. In certain instances due to constraints on hardware availability results were gained from device simulation using a simulator developed for this purpose. Details of this simulator are presented in this thesis.
Published: 2005

34. Design Disjunction for Resilient Reconfigurable Hardware

Author: Alzahrani, Ahmad
Subjects: Computer Engineering
Abstract: Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity.
Published: 2015

35. A Memory-Array Centric Reconfigurable Hardware Accelerator for Security Applications

Author: Babecki, Christopher
Subjects: Computer Engineering, security applications, domain-specific hardware accelerator, energy-efficiency, reconfigurable computing, side-channel attack resistance
Abstract: Security is becoming an increasing concern in today's computer applications. Unfortunately, most encryption/decryption algorithms are computationally expensive and often do not map efficiently to general purpose processors (GPPs) or reconfigurable fabrics like Field Programmable Gate Arrays (FPGAs). Fixed-function accelerators offer significant improvement in energy-efficiency, but they do not allow more than one application to reuse hardware resources. This work presents a new reconfigurable hardware framework called Memory-Array centric Hardware Accelerator (MAHA) for accelerating a wide array of security applications. It incorporates a coarse-grained datapath, support for lookup functions, and flexible interconnect optimizations, which enable on-demand pipelining and parallel computations in multiple light-weight processing elements. Through simulations, this work compares the performance of MAHA to a commercial GPPs and FPGAs. Results for a set of six common security applications show comparable latency between MAHA and FPGA implementations with 2.5X improvement in energy-delay product and 4X improvement in iso-area throughput.
Published: 2015

36. Reconfigurable Hardware-Based Simulation Modeling of Flexible Manufacturing Systems

Author: Tang, Wei
Subjects: reconfigurable, simulator, hardware, Field programmable gate arrays, FMS, Simulation
Abstract: This dissertation research explores a reconfigurable hardware-based parallel simulation mechanism that can dramatically improve the speed of simulating the operations of flexible manufacturing systems (FMS). Here reconfigurable hardware-based simulation refers to running simulation on a reconfigurable hardware platform, realized by Field Programmable Gate Array (FPGA). The hardware model, also called simulator, is specifically designed for mimicking a small desktop FMS. It is composed of several micro-emulators, which are capable of mimicking operations of equipment in FMS, such as machine centers, transporters, and load/unload stations. To design possible architectures for the simulator, a mapping technology is applied using the physical layout information of an FMS. Under such a mapping method, the simulation model is decomposed into a cluster of micro emulators on the board where each machine center is represented by one micro emulator. To exploit the advantage of massive parallelism, a kind of star network architecture is proposed, with the robot sitting at the center. As a pilot effort, a prototype simulator has been successfully built. A new simulation modeling technology named synchronous real-time simulation (SRS) is proposed. Instead of running conventional programs on a microprocessor, this new technology adopts several concepts from electronic area, such as using electronic signals to mimic the behavior of entities and using specifically designed circuits to mimic system resources. Besides, a time-scaling simulation method is employed. The method uses an on-board global clock to synchronize all activities performed on different emulators, and by this way tremendous overhead on synchronization can be avoided. Experiments on the prototype simulator demonstrate the validity of the new modeling technology, and also show that tremendous speedup compared to conventional software-based simulation methods can be achieved.
Published: 2005

37. EFFICIENT IMPLEMENTATION OF ELLIPTIC CURVE CRYPTOGRAPHY IN RECONFIGURABLE HARDWARE

Author: Lien, E-Jen
Subjects: Electrical Engineering, Elliptic curve cryptography, ECC, MAHA, MBC, FPGA, low-power, encryption, security
Abstract: Elliptic curve cryptography (ECC) has emerged as a promising public-key cryptography approach for data protection. It is based on the algebraic structure of elliptic curves over finite fields. Although ECC provides high level of information security, it involves computationally intensive encryption/decryption process, which negatively affects its performance and energy-efficiency. Software implementation of ECC is often not amenable for resource-constrained embedded applications. Alternatively, hardware implementation of ECC has been investigated – in both application specific integrated circuit(ASIC) and field programmable gate array (FPGA) platforms – in order to achieve desired performance and energy efficiency. Hardware reconfigurable computing platforms such as FPGAs are particularly attractive platform for hardware acceleration of ECC for diverse applications, since they involve significantly less design cost and time than ASIC. In this work, we investigate efficient implementation of ECC in reconfigurable hardware platforms. In particular, we focus on implementing different ECC encryption algorithms in FPGA and a promising memory array based reconfigurable computing framework, referred to as MBC. MBC leverages the benefit of nanoscale memory, namely, high bandwidth, large density and small wire delay to drastically reduce the overhead of programmable interconnects. We evaluate the performance and energy efficiency of these platforms and compare those with a purely software implementation. We use the pseudo-random curve in the prime field and Koblitz curve in the binary field to do the ECC scalar multiplication operation. We perform functional validation with data that is recommended by NIST. Simulation results show that in general, MBC provides better energy efficiency than FPGA while FPGA provides better latency.
Published: 2012

38. Acceleration Methodology for the Implementation of Scientific Applications on Reconfigurable Hardware

Author: Martin, Phillip
Subjects: Computer Sciences
Abstract: The role of heterogeneous multi-core architectures in the industrial and scientific computing community is expanding. For researchers to increase the performance of complex applications, a multifaceted approach is needed to utilize emerging reconfigurable computing (RC) architectures. First, the method for accelerating applications must provide flexible solutions for fully utilizing key architecture traits across platforms. Secondly, the approach needs to be readily accessible to application scientists. A recent trend toward emerging disruptive architectures is an important signal that fundamental limitations in traditional high performance computing (HPC) are limiting break through research. To respond to these challenges, scientists are under pressure to identify new programming methodologies and elements in platform architectures that will translate into enhanced program efficacy. Reconfigurable computing (RC) allows the implementation of almost any computer architecture trait, but identifying which traits work best for numerous scientific problem domains is difficult. However, by leveraging the existing underlying framework available in field programmable gate arrays (FPGAs), it is possible to build a method for utilizing RC traits for accelerating scientific applications. By contrasting both hardware and software changes, RC platforms afford developers the ability to examine various architecture characteristics to find those best suited for production-level scientific applications. The flexibility afforded by FPGAs allow these characteristics to then be extrapolated to heterogeneous, multi-core and general-purpose computing on graphics processing units (GP-GPU) HPC platforms. Additionally by coupling high-level languages (HLL) with reconfigurable hardware, relevance to a wider industrial and scientific population is achieved. To provide these advancements to the scientific community we examine the acceleration of a scientific application on a RC platform. By leveraging the flexibility provided by FPGAs we develop a methodology that removes computational loads from host systems and internalizes portions of communication with the aim of reducing fiscal costs through the reduction of physical compute nodes required to achieve the same runtime performance. Using this methodology an improvement in application performance is shown to be possible without requiring hand implementation of HLL code in a hardware description language (HDL) A review of recent literature demonstrates the challenge of developing a platform-independent flexible solution that allows access to cutting edge RC hardware for application scientists. To address this challenge we propose a structured methodology that begins with examination of the application's profile, computations, and communications and utilizes tools to assist the developer in making partitioning and optimization decisions. Through experimental results, we will analyze the computational requirements, describe the simulated and actual accelerated application implementation, and finally describe problems encountered during development. Using this proposed method, a 3x speedup is possible over the entire accelerated target application. Lastly we discuss possible future work including further potential optimizations of the application to improve this process and project the anticipated benefits.
Published: 2009

39. A Physical Implementation with Custom Low Power Extensions of a Reconfigurable Hardware Fabric

Author: Dhanabalan, Gerold Joseph
Abstract: The primary focus of this thesis is on the physical implementation of the SuperCISC Reconfigurable Hardware Fabric (RHF). The SuperCISC RHF provides a fast time to market solution that approximates the benefits of an ASIC (Application Specific Integrated Circuit) while retaining the design flow of an embedded software system. The fabric which consists of computational ALU stripes and configurable multiplexer based interconnect stripes has been implemented in the IBM 0.13um CMOS process using Cadence SoC Encounter. As the entire hardware fabric utilizes a combinational flow, glitching power consumption is a potential problem inherent to the fabric. A CMOS thyristor based programmable delay element has been designed in the IBM 0.13um CMOS process, to minimize the glitch power consumed in the hardware fabric. The delay element was characterized for use in the IBM standard cell library to synthesize standard cell ASIC designs requiring this capability such as the SuperCISC fabric. The thesis also introduces a power-gated memory solution, which can be used to increase the size of an EEPROM memory for use in SoC style applications. A macromodel of the EEPROM has been used to model the erase, program and read characteristics of the EEPROM. This memory is designed for use in the fabric for storing encryption keys, etc.
Published: 2008

40. Reconfigurable Hardware Acceleration of Exact Stochastic Simulation

Author: Thurmon, Brandon Parks
Subjects: Electrical and Computer Engineering
Abstract: This thesis explores the use of reconfigurable hardware in modeling chemical species reacting in a spatially homogeneous environment. The time evolution of biochemical models is often evaluated using a deterministic approach that uses differential equations to describe the chemical interactions of the model. However, such an approach treats species as continuous valued concentrations, is inaccurate for small species populations, and neglects the stochastic nature of biochemical systems. The Stochastic Simulation Algorithm (SSA) developed by Gillespie is able to properly account for these inherent noise fluctuations. This allows the SSA to accurately project the time evolution of a biochemical model. Unfortunately, the SSA can be computationally intensive and require a substantial amount of time to complete. Therefore, it has been proposed that the SSA be implemented on a Field Programmable Gate Array (FPGA) to improve performance. Employing an FPGA allows parallelism to be exploited within the SSA providing a speedup over software implementations executing instructions sequentially. Recent work in this area has focused on implementing the SSA on an FPGA to simulate specific biochemical models. However, this requires re-constructing and re-synthesizing the design in order to simulate a new biochemical system. This work examines the use of a reconfigurable computing platform to allow an implementation of the SSA on an FPGA to simulate a variety of models. The designs presented herein demonstrate a speedup of roughly 1.5X.
Published: 2005

41. Power Converter Emulation on Reconfigurable Hardware Using Device-Level Behavioral Modeling

Author: Shi, Bo
Subjects: Field-programmable gate arrays (FPGAs), Power diode, Hardware emulation, Power converter behavioral modeling, Insulated gate bipolar transistor (IGBT)
Abstract: Abstract: Precise models of power electronic converters significantly improve the fidelity of hardwarein- the-loop (HIL) simulators, thereby accelerating and reducing costs of design cycles in industrial applications. This thesis proposes detailed device-level hardware models of the IGBT and the power diode, for emulating power electronic converters on the field programmable gate array (FPGA). The hardware emulation utilizes detailed nonlinear behavioral models of these devices, and features a paralleled and fully pipelined implementation using an accurate floating-point data representation. Test cases for simple device-level power electronic circuits and a half-bridge converter are emulated on FPGA. A modular multi-level converter circuit using the proposed models is also emulated. The captured oscilloscope results demonstrate high accuracy of the emulator in comparison to the off-line simulation of the original test systems.
Published: 2016

42. Verification of Intellectual Property Blocks Using Reconfigurable Hardware

Author: Kuan, Koay Teng
Subjects: Electrical and Computer Engineering
Abstract: The purpose of this thesis is to develop a procedure to verify intellectual property (IP)cores on the Pilchard platform which contains reconfigurable hardware. The hardware and tools used for the verification process are documented. Two IP cores are used as examples of how the Pilchard design flow is to be applied. One core that does a simple logical function is implemented to serve as a demonstration of Pilchard read and write operations. To demonstrate the versatility of the hardware platform, a complex core that performs a Fast Fourier Transform operation was also implemented successfully. Results from these IP implementations indicate that for high performance IP cores to be verified on the Pilchard, careful attention must be exercised to minimize the possible timing delay that occurs during place-and-route.
Published: 2002

43. Exploiting Reconfiguration and Co-Design for Domain-Agnostic Hardware Acceleration

Author: Kim, Sung
Subjects: Reconfigurable hardware, Accelerators, Accelerator compilers, Software-defined hardware, Multicore processors
Abstract: Hardware accelerators have become permanent features in the post-Dennard computing landscape, displacing conventional processors for a variety of applications. Not only have semiconductor power and performance limitations become more stringent, but the demand for computing power has accelerated at an unprecedented pace. Data and compute-intensive application domains -- such as machine learning, vision, and bioinformatics -- require processing power orders of magnitude greater than what general-purpose processors can provide. The requirements of emerging applications, in conjunction with the limitations associated with conventional processors, have resulted in industry-wide efforts to develop new application-specific integrated circuit (ASIC) designs. Nevertheless, conventional ASIC accelerators sacrifice programmability for the sake of performance and energy-efficiency -- a non-ideal state of affairs. To address the problems above, this thesis introduces an end-to-end hardware-software concept for a semi-specialized accelerator that retains ASIC-like characteristics without sacrificing software programmability. In particular, we propose hardware-software co-design techniques to (1) exploit workload characteristics in programmable accelerators via rapid hardware reconfiguration, and (2) develop a compiler stack that generates optimized, auto-parallelized application kernels. Chapter I discusses why hardware acceleration is needed, the current landscape of ASIC and general-purpose processor hardware, and identifies challenges associated with building accelerators that are both programmable and efficient. Chapter II introduces an initial design concept for a rapidly-reconfigurable programmable accelerator, and discusses challenges associated with the paradigm. Based on learnings from Chapter II, Chapter III proposes key enhancements to improve performance and resolve key hardware bottlenecks, and presents results from a fabricated prototype chip. Chapter IV discusses software development challenges inherent with our hardware approach, and introduces an end-to-end optimizing compiler to automatically generate kernels that exploit the proposed accelerator architecture.
Published: 2023

44. Methodology for complex dataflow application development

Author: Voss, Nils, Luk, Wayne, and Gaydadjiev, Georgi
Abstract: This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.
Published: 2021
Full Text: View/download PDF

45. Building Efficient, Reconfigurable Hardware using Hierarchical Interconnects

Author: Wang, Chengcheng
Subjects: Engineering, Efficient, Flexible, FPGA, Hardware, Hierarchical, Reconfigurable
Abstract: In the semiconductor industry today, ASICs are able to offer 10x-1000x higher energy and area efficiencies than non-dedicated chips, such as programmable DSP processers, field-programmable gate arrays (FPGAs), and microprocessors. Not surprisingly, SoCs today have become an integration of many ASIC blocks, each performing a few dedicated tasks. The growing size of modern SoC chips, accelerated by the increasing demands for functionalities, has exposed the major drawback of ASIC: design cost. These large SoCs are re-designed a few times a year to rectify hardware-bugs and to support new features. Because ASICs are not reconfigurable, even the smallest hardware change would require a re-design. Additionally, design cost is rising exponentially with every technology generation.The rising design cost of ASICs has exposed a huge need today: efficiency and flexibility must co-exist. But among flexible hardware candidates, microprocessors and programmable DSP processors are far too slow to meet the throughput requirements of ASICs. FPGAs do come close in terms of performance, but are extremely inefficient due to its high energy and large area overhead. We must bridge the huge gap in efficiency for FPGA to become a viable contender to ASICs.The primary culprit for FPGA inefficiency is interconnect, which accounts for over 75% of area and delay. For over 20 years, 2D-mesh network has been the back-bone of FPGA interconnects, but full connectivity in a 2D-mesh require O(N2) switches, requiring interconnects to grow much faster than Moore's Law. As a result, various heuristics are used to simplify switch-box arrays at the cost of resource utilization, but interconnect area of modern FPGA is still around 80%. This work builds FPGA using hierarchical interconnects based on Bene networks, requiring O(N∙log∙N) switches. Although Bene is commonly used in telecommunication, this work is its first silicon realization of a FPGA. To realize a highly efficient interconnect architecture, significant pruning of the network is required. Novel techniques such as fast-path U-turns and unbalanced branching are also implemented. A custom place-and-route software is developed to map benchmark designs on a variety of interconnect candidates. From mapping results, the architecture is updated based on network utilization until an optimized design is converged. The large area of FPGA chip requires aggressive power gating (PG), but interconnect signals often lack spatial locality, make it block-level PG difficult. A novel PG circuit technique is developed to power-gate individual interconnect switches with very small overhead in area and performance. Such technique requires fundamental circuit changes, even modifying the CMOS inverter.With innovations in chip architecture, circuit design, and extensive software development, this work has demonstrated 5 user-mappable FPGAs (from 1K-16K LUTs) all with around 50% interconnect area: a 3-4x reduction from commercial FPGAs while preserving connectivity. An energy efficiency of 1.1 GOPS/mW is the highest among reported FPGAs, and is 22x more efficient than the most efficient commercial FPGA today, significantly bridging the efficiency gap between FPGA and ASIC.
Published: 2013

46. Reconfigurable hardware for color space conversion

Author: Patil, Sreenivas
Subjects: ASIC, Commercial color-printing, CSC design, FPGA, Printing performance
Abstract: Color space conversion (CSC) is an important application in image and video processing systems. CSC has been implemented in software and various kinds of hardware. Hardware implementations can achieve a higher performance compared to software-only solutions. Application specific integrated circuits (ASICs) are efficient and have good performance. However, they lack the programmability of devices such as field programmable gate arrays (FPGAs). This thesis studies the performance vs. flexibility tradeoffs in the migration of an existing CSC design from an ASIC to an FPGA. The existing ASIC is used within a commercial color-printing pipeline. Performance is critical in this application. However, the flexibility of FPGAs is desirable for faster time to market and also the ability to reuse one physical device across multiple functions. This thesis investigates whether the reprogrammability of FPGAs can be used to reallocate idle resources and studies the suitability of FPGAs for image processing applications. In the ASIC design, two major conversion units that are never used at the same time are identified. The FPGA-based implementation instantiates only one of these two units at a time, thus saving area. Reconfiguring the FPGA switches which of the two units is instantiated. The goal is to configure the device and process an entire page within one second. The FPGA implementation is approximately a factor of three slower than the ASIC design, but fast enough to process one page per second. In the current setup, the configuration time is very high. It exceeds the total time allotted for both configuration and processing. However, other methods of configuration seem promising to reduce the time. Evaluation of the performance of the implementation and the reconfiguration time is presented. Methods to improve the performance and reduce the time and area for reconfiguration are discussed.
Published: 2008

47. Evolutionary optimization techniques and reconfigurable hardware

Author: Purnaprajna, Madhura
Subjects: Embedded computer systems. Design and construction., Adaptive computing systems., Mathematical optimization.
Published: 2005

48. Evaluating Placement Algorithms with the DREAM Framework for Reconfigurable Hardware Devices

Author: Eatmon, Dedra
Abstract: The field programmable gate array (FPGA) has become one of the most utilized configurable devices in the area of reconfigurable computing. FPGAs have alarge amount of flexibility and provide a high degree of parallel computing capability. Since their introduction in the 1980's, these configurable logicdevices have experienced a dramatic increase in programming capabilities and performance. Both factors have been significant in the changing roles ofconfigurable devices in custom-computing machines. However, the improvements in capability and performance have not eliminated the issues related toefficient placement of applications on these devices. This thesis presents a tool that evaluates placement algorithms for configurable logic devices. Written in Java, the tool is a framework in whichvarious placement algorithms can be executed and the performance and quality ofeach placement evaluated using a cost function. Based on devices thatsupport relocatable hardware modules (RHMs), the tool places modules with user-specified placement algorithms and provides feedback that can be usedfor a comparative analysis. The framework manages module mappings to the logicdevice that are both independent of each other and do not requirepin-to-pin routing connections. Such a tool is valuable for the identification of effective placement algorithms for real-time placement of RHMs in run-time reconfigurable systems. The Dynamic Resource Allocation and Management (DREAM) framework, has been designed and developed to evaluate FPGA placement algorithms/heuristics. Aportion of the evaluation is based on a simplistic cost function that calculates the amount of contiguous unused space remaining on the device intwo dimensions. In our experiments, we use an FPGA logic core generator to produce several rectangular RHMs. In addition to the rectangular RHMs producedby the logic core generation tool, our framework can handle arbitrary circuit profiles. Several scenarios consisting of approximately 500insertions/deletions of both rectangular and non-rectangular RHMs are used as test data sets for placement. Three placement algorithms are presented todemonstrate the flexibility of the framework. The first algorithm tested in the DREAM framework is a random placement algorithm. The second algorithm isan adaptation of a traditional best-fit algorithm that we call exhaustivesearch. The third algorithm is a modified version of first-fit.Future work will involve the development of additional placement algorithms andthe incorporation ofplacement issues that relate to requests for central reconfigurable computing resources originating from a remote site. The DREAM framework answers the call for a tool that is sorely needed to identify placement algorithms that can be effectively used for real-timeplacement. In addition to providing results that can be used to benchmark the performance of placement algorithms in real-time on a configurablesystem, this tool also allows the end-user methods to store and load placementsfor future optimization. By taking full advantage of the partial andfull dynamic reconfiguration capabilities of logic devices currently used in run-time reconfigurable systems, the goal of DREAM is to provide a tool with whichthe quality of placement algorithms can be quantified and compared.
Published: 2000

49. Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing

Author: Adetomi, Adewale Akinlawon, Arslan, Tughrul, and Hamilton, Alister
Subjects: 621.39, reconfigurable devices, on-chip resources, reliability, error mitigation
Abstract: The sheer hardware-based computational performance and programming flexibility offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs) make them attractive for computing in applications that require high performance, availability, reliability, real-time processing, and high efficiency. Fueled by fabrication process scaling, modern reconfigurable devices come with ever greater quantities of on-chip resources, allowing a more complex variety of applications to be developed. Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now embrace reconfigurable computing devices likes FPGAs to meet their critical computing needs. In addition, the capability to autonomously reprogramme these devices in the field is being exploited for reliability in application domains like aerospace, defence, military, and nuclear power stations. In such applications, real-time computing is important and is often a necessity for reliability. As such, applications and algorithms resident on these devices must be implemented with sufficient considerations for real-time processing and reliability. Often, to manage a reconfigurable hardware device as a computing platform for a multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems (ROSes) have been proposed to give a software look to hardware-based computation. The key requirements of a ROS include partitioning, task scheduling and allocation, task configuration or loading, and inter-task communication and synchronization. Existing ROSes have met these requirements to varied extents. However, they are limited in reliability, especially regarding the flexibility of placing the hardware circuits of tasks on device's chip area, the problem arising more from the partitioning approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip inter-communication among tasks, hampering the flexibility of runtime task relocation for reliability. This thesis proposes the enabling frameworks for reliable, available, real-time, efficient, secure, and high-performance reconfigurable computing by providing techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware. This work provides task configuration infrastructures for reliable reconfigurable computing. Key features, especially reliability-enabling functionalities, which have been given little or no attention in state-of-the-art are implemented. These features include internal register read and write for device diagnosis; configuration operation abort mechanism, and tightly integrated selective-area scanning, which aims to optimize access to the device's reconfiguration port for both task loading and error mitigation. In addition, this thesis proposes a novel reliability-aware inter-task communication framework that exploits the availability of dedicated clocking infrastructures in a typical FPGA to provide inter-task communication and synchronization. The clock buffers and networks of an FPGA use dedicated routing resources, which are distinct from the general routing resources. As such, deploying these dedicated resources for communication sidesteps the restriction of static routes and allows a better relocation of circuits for reliability purposes. For evaluation, a case study that uses a NASA/JPL spectrometer data processing application is employed to demonstrate the improved reliability brought about by the implemented configuration controller and the reliability-aware dynamic communication infrastructure. It is observed that up to 74% time saving can be achieved for selective-area error mitigation when compared to state-of-the-art vendor implementations. Moreover, an improvement in overall system reliability is observed when the proposed dynamic communication scheme is deployed in the data processing application. Finally, one area of reconfigurable computing that has received insufficient attention is security. Meanwhile, considering the nature of applications which now turn to reconfigurable computing for accelerating compute-intensive processes, a high premium is now placed on security, not only of the device but also of the applications, from loading to runtime execution. To address security concerns, a novel secure and efficient task configuration technique for task relocation is also investigated, providing configuration time savings of up to 32% or 83%, depending on the device; and resource usage savings in excess of 90% compared to state-of-the-art.
Published: 2019

50. Accelerating Homomorphic Encryption in the Cloud Environment through High-Level Synthesis and Reconfigurable Resources

Author: Foster, Michael J
Subjects: Cloud computing, FPGA, High-level synthesis, Homomorphic encryption, Karastuba, Reconfigurable hardware
Abstract: The recent surge in cloud services is revolutionizing the way that data is stored and processed. Everyone with an internet connection, from large corporations to small companies and private individuals, now have access to cutting-edge processing power and vast amounts of data storage. This rise in cloud computing and storage, however, has brought with it a need for a new type of security. In order to have access to cloud services, users must allow the service provider to have full access to their private, unencrypted data. Users are required to trust the integrity of the service provider and the security of its data centers. The recent development of fully homomorphic encryption schemes can offer a solution to this dilemma. These algorithms allow encrypted data to be used in computations without ever stripping the data of the protection of encryption. Unfortunately, the demanding memory requirements and computational complexity of the proposed schemes has hindered their wide-scale use. Custom hardware accelerators for homomorphic encryption could be implemented on the increasing number of reconfigurable hardware resources in the cloud, but the long development time required for these processors would lead to high production costs. This research seeks to develop a strategy for faster development of homomorphic encryption hardware accelerators using the process of High-Level Synthesis. Insights from existing number theory software libraries and custom hardware accelerators are used to develop a scalable, proof-of-concept software implementation of Karatsuba modular polynomial multiplication. This implementation was designed to be used with High-Level Synthesis to accelerate the large modular polynomial multiplication operations required by homomorphic encryption. The accelerator generated from this implementation by the High-Level Synthesis tool Vivado HLS achieved significant speedup over the implementations available in the highly-optimized FLINT software library.
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

186 results on '"Reconfigurable hardware"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources