830 results on '"Hardware design"'
Search Results
102. Highly parallelized memristive binary neural network.
- Author
-
Chen, Jiadong, Wen, Shiping, Shi, Kaibo, and Yang, Yin
- Subjects
- *
DEEP learning , *ELECTRIC circuit networks , *NEURAL circuitry , *CONVOLUTIONAL neural networks - Abstract
At present, in the new hardware design work of deep learning, memristor as a non-volatile memory with computing power has become a research hotspot. The weights in the deep neural network are the floating-point number. Writing a floating-point value into a memristor will result in a loss of accuracy, and the writing process will take more time. The binarized neural network (BNN) binarizes the weights and activation values that were originally floating-point numbers to +1 and -1. This will greatly reduce the storage space consumption and time consumption of programming the resistance value of the memristor. Furthermore, this will help to simplify the programming of memristors in deep neural network circuits and speed up the inference process. This paper provides a complete solution for implementing memristive BNN. Furthermore, we improved the design of the memristor crossbar by converting the input feature map and kernel before performing the convolution operation that can ensure the sign of the input voltage of each port constant. Therefore, we do not need to determine the sign of the input voltage required by the port in advance which simplifies the process of inputting the feature map elements to each port of the crossbar in the form of voltage. At the same time, in order to ensure that the output of the current convolution layer can be directly used as the input of the next layer, we have added a corresponding processing circuit, which integrates batch-normalization and binarization operations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
103. 面向敏捷硬件设计的符号模拟器设计与实现.
- Author
-
邹鸿基, 李敝, 罗丹, and 方雨德
- Abstract
In the agile hardware design methodology, a domain specific hardware description language is often used for RTL modeling. This novel situation brings new challenges for design verification. In order to support design verification techniques such as (bounded) model checking and equivalence checkings a symbolic simulator is designed and implemented for PyRTL and its intermediate representations. This paper introduces the design principle, conversion rules and other key technologies of our symbol simulator. The experimental results show the correctness of the implemented symbol simula-tor. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
104. Real-time FPGA-based implementation of the AKAZE algorithm with nonlinear scale space generation using image partitioning.
- Author
-
Soleimani, Parastoo, Capson, David W., and Li, Kin Fun
- Abstract
The first step in a scale invariant image matching system is scale space generation. Nonlinear scale space generation algorithms such as AKAZE, reduce noise and distortion in different scales while retaining the borders and key-points of the image. An FPGA-based hardware architecture for AKAZE nonlinear scale space generation is proposed to speed up this algorithm for real-time applications. The three contributions of this work are (1) mapping the two passes of the AKAZE algorithm onto a hardware architecture that realizes parallel processing of multiple sections, (2) multi-scale line buffers which can be used for different scales, and (3) a time-sharing mechanism in the memory management unit to process multiple sections of the image in parallel. We propose a time-sharing mechanism for memory management to prevent artifacts as a result of separating the process of image partitioning. We also use approximations in the algorithm to make hardware implementation more efficient while maintaining the repeatability of the detection. A frame rate of 304 frames per second for a 1280 × 768 image resolution is achieved which is favorably faster in comparison with other work. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
105. Estimating Task Efforts in Hardware Development Projects in a Scrum Context.
- Author
-
Briatore, Simone and Golkar, Alessandro
- Abstract
Hardware developers started experimenting with Scrum to accelerate their product development. However, it is not possible to implement Scrum in the same way as it was done for software systems, where the approach is already well established. One of the processes required in Scrum is the estimate of task efforts when creating a backlog for an Agile Sprint. This article presents a pilot validation experiment of a novel Agile framework for the development of hardware systems, including a parametric tool to estimate task effort in a more rigorous way than traditional confidence votes. This article presents the validation of electronic hardware design task estimation and overall project performance. The validation is performed through experimental work with teams of junior engineering students. The validation experiment showed an improvement from a minimum of eight to a maximum of eighteen percent when employing the presented tool during planning phases of the development. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
106. Efficient Hardware Arithmetic for Inverted Binary Ring-LWE Based Post-Quantum Cryptography.
- Author
-
Imana, Jose L., He, Pengzhou, Bao, Tianyou, Tu, Yazheng, and Xie, Jiafeng
- Subjects
- *
ARITHMETIC , *POLYNOMIAL rings , *ELLIPTIC curve cryptography , *SHIFT registers , *COMPUTATIONAL complexity , *QUANTUM cryptography , *CRYPTOGRAPHY - Abstract
Ring learning-with-errors (RLWE)-based encryption scheme is a lattice-based cryptographic algorithm that constitutes one of the most promising candidates for Post-Quantum Cryptography (PQC) standardization due to its efficient implementation and low computational complexity. Binary Ring-LWE (BRLWE) is a new optimized variant of RLWE, which achieves smaller computational complexity and higher efficient hardware implementations. In this paper, two efficient architectures based on Linear-Feedback Shift Register (LFSR) for the arithmetic used in Inverted Binary Ring-LWE (InvBRLWE)-based encryption scheme are presented, namely the operation of $A\cdot B+C$ over the polynomial ring $\mathbb {Z}_{q}/(x^{n}+1)$. The first architecture optimizes the resource usage for major computation and has a novel input processing setup to speed up the overall processing latency with minimized input loading cycles. The second architecture deploys an innovative serial-in serial-out processing format to reduce the involved area usage further yet maintains a regular input loading time-complexity. Experimental results show that the architectures presented here improve the complexities obtained by competing schemes found in the literature, e.g., involving 71.23% less area-delay product than recent designs. Both architectures are highly efficient in terms of area-time complexities and can be extended for deploying in different lightweight application environments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
107. Low-Latency Low-Complexity Method and Architecture for Computing Arbitrary Nth Root of Complex Numbers.
- Author
-
Wu, Ruiqi, Chen, Hui, He, Guoqiang, Fu, Yuxiang, and Li, Li
- Subjects
- *
COMPLEX numbers , *COMPUTER architecture , *5G networks - Abstract
This paper presents a new architecture, based on CORDIC and parabolic synthesis methodology, for computing Nth root of a complex number. The proposed architecture uses the pretreatment for normalization and parabolic synthesis method to calculate the Nth root of modulus of the input complex number and performs the conversion between the plane coordinate form and the polar coordinate form of the complex number by CORDIC, which not only ensures the accuracy but also has an ultra-low computation latency. MATLAB simulation result indicates that our proposed method can calculate the Nth root of the complex numbers in the form of fixed-point number with an error of $2.16 \boldsymbol {\times {10^{ - 6}}}$. Under TSMC 40nm CMOS technology, the report shows that the area consumption is $27390.72 \boldsymbol {\mu m^{2}}$ at the frequency of 1GHz and the power consumption is 2.3549mW. More importantly, the computation latency of the proposed architecture is only 60.18% of the latest architecture in the same calculation accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
108. Energy Efficient 0.5V 4.8pJ/SOP 0.93μW Leakage/Core Neuromorphic Processor Design.
- Author
-
Nambiar, Vishnu P., Pu, Junran, Lee, Yun Kwan, Mani, Aarthy, Koh, Eng Kiat, Wong, Ming Ming, Li, Fei, Goh, Wang Ling, and Do, Anh Tuan
- Abstract
This brief presents a neuromorphic processor with asynchronous routers and configurable LIF neuron models. The neurocore microarchitecture revolves around a high-V
th SRAM to reduce leakage, alongside reconfigurable neuron compute logic circuits and async routers to maximize energy efficiency. The neuron compute module achieves low power via an area efficient ALU implementation by using only adder and bitshifter circuits. We describe this LIF neuron model ALU design, and also include key neurocore verification scenarios (i.e., router deadlocks and functional coverage), CPU-neurocore control flow, and asynchronous router performance analysis. Our 16-core fabricated chip in 40 nm CMOS process works down to 0.5V. The measured leakage and average energy efficiency are 0.93 μW/core and 4.8 pJ/SOP respectively (at 0.5V), which is 20% better than state of the art. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
109. Hardware and coding efficiency assessment of 3D-HEVC DIS tool using alternative similarity criteria.
- Author
-
Borges, Vinicius A., Perleberg, Murilo R., Afonso, Vladimir, Porto, Marcelo S., and Agostini, Luciano V.
- Subjects
ENCODING ,HARDWARE ,DEFINITIONS - Abstract
3D-HEVC is the state-of-the-art standard to compress three-dimensional videos. One of the 3D-HEVC novel tools is the DIS tool, which is used to efficiently compress smooth and homogeneous areas of depth maps by using four different prediction modes. The decision of which DIS mode will be used is done through the SVDC similarity criterion in the DIS original definition. This article proposes the substitution of the complex SVDC criterion for simpler and more hardware friendly criteria as SATD, SSE, and SAD. These alternative criteria were evaluated in terms of encoding efficiency and hardware impacts in comparison with the SVDC. Dedicated DIS hardware were designed using each one of each criterion and these designs were described in VHDL and synthesized for TSMC 40 nm. The best results were found with SAD criteria, with losses of only 0.2% in coding efficiency and with expressive gains of more than 50 times in power and more than 35 times in area, when compared with SVDC. The reached results showed that the use of a simpler similarity criterion is an important alternative to be used in DIS tool, mainly if an efficient hardware design is required. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
110. Bespoke Reflections: Creating a One-Handed Braille Keyboard.
- Author
-
Ellis, Kirsten, de Vent, Ross, Kirkham, Reuben, and Olivier, Patrick
- Subjects
ASSISTIVE technology ,BRAILLE ,KEYBOARDING ,PEOPLE with disabilities ,CURRICULUM - Abstract
A plethora of assistive technologies are designed to cater to relatively common types of disabilities. However, some people have disabilities or circumstances that fall outside these pluralities, requiring a bespoke assistive technology to be developed and custom built to meet their unique requirements. To explore the opportunities and challenges of such an endeavor, we document the process undertaken to build a braille keyboard for a one-handed blind person over the course of 18-months. This process involved iterative prototyping within an intensive co-creation process, due to the unique needs arising from having two intersecting impairments and the challenges of effectively developing an entirely new format of AAT. Through a structured reflection on this process, we provide an account of the practical, pragmatic and ethical considerations that apply when developing a bespoke assistive technology, whilst illustrating the wider value of bespoke assistive technology development for a more general community of people with disabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
111. Hardware Design of Concatenated Zigzag Hadamard Encoder/Decoder System With High Throughput
- Author
-
Sheng Jiang, Francis C. M. Lau, and Chiu-Wing Sham
- Subjects
Concatenated zigzag Hadamard code ,hardware design ,high throughput ,turbo Hadamard code ,zigzag Hadamard code ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Both turbo Hadamard codes and concatenated zigzag Hadamard codes are ultimate-Shannonlimit-approaching channel codes. The former one requires the use of Bahl-Cocke-Jelinek-Raviv (BCJR) in the iterative decoding process, making the decoder structure more complex and limiting its throughput. The latter one, however, does not involve BCJR decoding. Hence its decoder structure can be much simpler and can potentially operate at a much higher throughput. In this paper, we investigate the hardware design of a concatenated zigzag Hadamard encoder/decoder system and implement it onto an FPGA board. We design a decoder capable of decoding multiple codewords at the same time, and the proposed system can operate with a throughput of 1.44 Gbps - an increase of 50% compared with the turbo Hadamard encoder/decoder system. As for the error performance, the encoder/decoder system with a 6-bit quantization achieves a bit error rate of 2 × 10-5 at Eb/N0 = -0.2 dB.
- Published
- 2020
- Full Text
- View/download PDF
112. Flexible and Scalable FPGA-Oriented Design of Multipliers for Large Binary Polynomials
- Author
-
Davide Zoni, Andrea Galimberti, and William Fornaciari
- Subjects
Computer arithmetic ,FPGA ,hardware design ,multiplication ,GF2 ,applied cryptography ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the recent advances in quantum computing, code-based cryptography is foreseen to be one of the few mathematical solutions to design quantum resistant public-key cryptosystems. The binary polynomial multiplication dominates the computational time of the primitives in such cryptosystems, thus the design of efficient multipliers is crucial to optimize the performance of post-quantum public-key cryptographic solutions. This manuscript presents a flexible template architecture for the hardware implementation of large binary polynomial multipliers. The architecture combines the iterative application of the Karatsuba algorithm, to minimize the number of required partial products, with the Comba algorithm, used to optimize the schedule of their computations. In particular, the proposed multiplier architecture supports operands in the order of dozens of thousands of bits, and it offers a wide range of performance-resources trade-offs that is made independent from the size of the input operands. To demonstrate the effectiveness of our solution, we employed the nine configurations of the LEDAcrypt public-key cryptosystem as representative use cases for large-degree binary polynomial multiplications. For each configuration we showed that our template architecture can deliver a performance-optimized multiplier implementation for each FPGA of the Xilinx Artix-7 mid-range family. The experimental validation performed by implementing our multiplier for all the LEDAcrypt configurations on the Artix-7 12 and 200 FPGAs, i.e., the smallest and the largest devices of the Artix-7 family, demonstrated an average performance gain of 3.6x and 33.3x with respect to an optimized software implementation employing the gf2x C library.
- Published
- 2020
- Full Text
- View/download PDF
113. Suction pad unit using a bellows pneumatic actuator as a support mechanism for an end effector of depalletizing robots
- Author
-
Junya Tanaka, Akihito Ogawa, Hideichi Nakamoto, Takafumi Sonoura, and Haruna Eto
- Subjects
Depalletizing robot ,Logistics site ,Transfer method ,Hardware design ,Pneumatic actuator ,Technology ,Mechanical engineering and machinery ,TJ1-1570 ,Control engineering systems. Automatic machinery (General) ,TJ212-225 ,Machine design and drawing ,TJ227-240 ,Technology (General) ,T1-995 ,Industrial engineering. Management engineering ,T55.4-60.8 ,Automation ,T59.5 ,Information technology ,T58.5-58.64 - Abstract
Abstract This paper describes a vacuum suction-type end effector for depalletizing robots in distribution centers. The developed end effector has multiple suction pad units to which a bellows pneumatic actuator is applied as the support mechanism. Load-bearing capacity is improved due to a high-strength wire provided inside the bellows, and the contraction force is improved due to ring members placed inside of the ridges of the rubber bellows. The developed end effector is attached to the arm of a linear motion-type depalletizing robot, and its real-world performance is verified. Verification results confirm that the suction pad units tolerate cardboard box inclination and differences in box height by a simple lowering motion of the arm, and multiple cardboard boxes can be simultaneously unloaded. Moreover, as compared with conventional end effectors, the developed end effector achieves large expansion and contraction in a thin structure. The developed end effector is expected to broaden applications for depalletizing robots.
- Published
- 2020
- Full Text
- View/download PDF
114. Analysis and Comparison of FPGA-Based Histogram of Oriented Gradients Implementations
- Author
-
Sina Ghaffari, Parastoo Soleimani, Kin Fun Li, and David W. Capson
- Subjects
Histogram of oriented gradients ,field programmable gate arrays ,hardware acceleration ,hardware design ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
One of the commonly-used feature extraction algorithms in computer vision is the histogram of oriented gradients. Extracting the features from an image using this algorithm requires a large amount of computations. One way to boost the speed is to implement this algorithm on field programmable gate arrays, to benefit from flexible designs such as parallel computing. In this paper, we first, provide a summary of the steps of the histogram of oriented gradients algorithm. We then survey the implementation techniques of the histogram of oriented gradients on field-programmable gate arrays in the past decade. We group the different techniques into four main categories and analyze various enhancement methods in each category. The first group is the optimization of the algorithm computation which involves the steps of input selection, magnitude calculation, orientation and bin assignment, and normalization. The second category is data manipulation techniques which include numerical representation, data flow modification, and memory optimization. The third group contains modified features based on the histogram of oriented gradients and their hardware implementation, and the fourth one is the implementations in hardware-software co-design of the algorithm. We compare the different implementations using a speed metric called pixels per clock cycle, and resource utilization. Finally, we provide design summary tables for efficient implementation with respect to the speed metric, accuracy, and resource utilization.
- Published
- 2020
- Full Text
- View/download PDF
115. A Low Power and High Performance Hardware Design for Automatic Epilepsy Seizure Detection
- Author
-
S. Syed Rafiammal, D. Najumnissa, G. Anuradha, S. Kaja Mohideen, P.K. Jawahar, and Syed Abdul Mutalib
- Subjects
epilepsy detection ,system on chip implementation ,quadrature linear discriminant analysis ,hardware design ,seizure detection ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Telecommunication ,TK5101-6720 - Abstract
An application specific integrated design using Quadrature Linear Discriminant Analysis is proposed for automatic detection of normal and epilepsy seizure signals from EEG recordings in epilepsy patients. Five statistical parameters are extracted to form the feature vector for training of the classifier. The statistical parameters are Standardised Moment, Co-efficient of Variance, Range, Root Mean Square Value and Energy. The Intellectual Property Core performs the process of filtering, segmentation, extraction of statistical features and classification of epilepsy seizure and normal signals. The design is implemented in Zynq 7000 Zc706 SoC with average accuracy of 99%, Specificity of 100%, F1 score of 0.99, Sensitivity of 98% and Precision of 100 % with error rate of 0.0013/hr., which is approximately zero false detection.
- Published
- 2019
- Full Text
- View/download PDF
116. Design and Analysis of Kite for Producing Power up to 2.6 Watts
- Author
-
Urooj, Shabana, Sharma, Urvashi, Tripathi, Pragati, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bhateja, Vikrant, editor, Coello Coello, Carlos A., editor, Satapathy, Suresh Chandra, editor, and Pattnaik, Prasant Kumar, editor
- Published
- 2018
- Full Text
- View/download PDF
117. Managing the Human Factor During the Working-Out of New Technologies and Hardware: The Reindustrialization Conditions
- Author
-
Kolbachev, Evgeny, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, and Trzcielinski, Stefan, editor
- Published
- 2018
- Full Text
- View/download PDF
118. DEVELOPMENT OF A PARALLEL GRIPPER WITH AN EXTENSION NAIL MECHANISM USING A METAL BELT.
- Author
-
TANAKA, JUNYA and MATSUHIRA, NOBUTO
- Subjects
METALS ,ROBOT hands - Abstract
Aiming to expand the range of applications for parallel grippers, we propose an extension nail mechanism that can be mounted on a parallel gripper. We also propose an extension nail mechanism comprising a stainless steel belt, two transport belts, a triangular nail, and a drive unit. The triangular nail is connected to one end of the stainless steel belt, and the drive unit is connected near the other end. We achieve smooth sliding of the nails underneath objects by arranging the transport belts on either side of the stainless steel belt. By elastically winding one end of the stainless steel belt and each of the transport belts, the nail mechanism can be miniaturized while achieving large expansion and contraction. We achieve stable grasping operations by using the extension nail mechanism of the parallel gripper in accordance with the flexibility of the object. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
119. Proximity coherence for chip-multiprocessors
- Author
-
Barrow-Williams, Nick and Moore, Simon
- Subjects
621.39 ,Computer science ,Hardware design ,Proximity Coherence ,CMP ,Cache design ,Network-on-chip ,Physical locality - Abstract
Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesign- ing coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design - Proximity Coherence - a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.
- Published
- 2011
- Full Text
- View/download PDF
120. MAREX: A general purpose hardware architecture for membrane computing.
- Author
-
Cascado-Caballero, Daniel, Diaz-del-Rio, Fernando, Cagigas-Muñiz, Daniel, Rios-Navarro, Antonio, Guisado-Lizar, Jose-Luis, Pérez-Hurtado, Ignacio, and Riscos-Núñez, Agustín
- Subjects
- *
LOGIC circuits , *PARALLEL computers , *PARALLEL processing , *HARDWARE - Abstract
Membrane computing is an unconventional computing paradigm that has gained much attention in recent decades because of its massively parallel character and its usefulness to build models of complex systems. However, until now, there was no generic hardware implementation of P systems. Computational frameworks to execute P systems up to this day rely on the simulation of the parallel working mechanisms of P systems by inherently sequential algorithms. Such algorithms can then be implemented as is or can be parallelized, up to a certain point, to run on parallel computers. However, this is not as efficient as a dedicated parallel hardware implementation. There have been ad hoc implementations of particular P systems for parallel hardware, but they lack to be problem-generic or they are not scalable enough to implement large P systems. In this paper, a first intrinsically parallel hardware architecture to implement generic P system models is introduced. It is designed to be straightforwardly implemented in programmable logic circuits like FPGAs. The feasibility and correct execution of our architecture has been verified by means of a simulator, and several simulation results for different P system examples have been analysed to foresee the pros and cons of this design. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
121. SAR成像应用特征分析及硬件设计空间讨论.
- Author
-
孔玺畅, 文梅, and 蓝强
- Abstract
SAR (Synthetic Aperture Radar), or synthetic aperture radar, is an active earth observation system. In recent years, SAR has gradually developed towards multi-platform, and has appeared on small mobile platforms such as unmanned aerial vehicles and probe vehicles. SAR imaging is an imaging program running on SAR. Due to the emergence of a new special operating environment, it has stricter requirements for low energy consumption and high computing power. How to provide high-performance, low-power application support for a specific platform has become a core point. This paper analyzes the characteristics of SAR imaging calculation and memory access, and specifically optimizes the program and tests the performance of the program on the x86 platform to obtain a reliable performance reference. On this basis, oriented to the hardware structure of the DSP + FFT accelerator, a mathematical model of computing power ratio is constructed to provide a solution for hardware design. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
122. Walker - An Autonomous, Interactive Walking Aid.
- Author
-
Hackbarth, Johannes and Jacob, Caspar
- Subjects
MULTIMODAL user interfaces ,HEARING impaired ,HUMAN-robot interaction - Abstract
In this paper, we describe ongoing work about a robotic walker-frame that was designed to aid patients in an orthopaedic rehabilitation clinic. The so-called Walker is able to autonomously drive to patients and then changes into a more traditional walking-frame, i.e. one that has to be pushed by the patient, but it can still help by giving navigation instructions. Walker was designed with a multi-modal user interface in such a way that it can also be used by visually, hearing or speaking impaired people. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
123. Real-Time Simulation of Hybrid Modular Multilevel Converters Using Shifted Phasor Models
- Author
-
Yingdong Wei, Dewu Shu, Xiaorong Xie, Venkata Dinavahi, and Zheng Yan
- Subjects
DC systems ,electromagnetic transient (EMT) ,field-programmable gate array (FPGA) ,hardware design ,modular multilevel converter (MMC) ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The real-time simulation of modular multilevel converter (MMC) is essential to the evaluation and validation of its control and protection systems. Moreover, the dynamics at both sub-module level and system level are expected from the real-time simulations of MMCs. To achieve this objective, this paper proposes the shifted phasor modeling (SPM) of the MMC by representing each sub-module with a Thevenin equivalent circuit that is derived using shifted phasors with improved accuracy. The efficiency of the SPM is guaranteed by modeling each arm as a switch-dependent Thevenin equivalent circuit. The computational burden remains almost unchanged even when the number of sub-modules increases considerably. The proposed model is materialized using field programmable gate array. And, thus the real-time simulation of MMC-based DC grids can be realized to capture the dynamics at the system level as well as the sub-module level. The effectiveness of this paper has been validated in terms of both accuracy and efficiency on a two-terminal MMC-based low-voltage direct current system.
- Published
- 2019
- Full Text
- View/download PDF
124. Very Low Power Neural Network FPGA Accelerators for Tag-Less Remote Person Identification Using Capacitive Sensors
- Author
-
Marwen Roukhami, Mihai Teodor Lazarescu, Francesco Gregoretti, Younes Lahbib, and Abdelkader Mami
- Subjects
Indoor person identification ,capacitive sensing ,neural networks ,hardware design ,ultra-low power FPGAs ,hardware acceleration ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Human detection, identification, and monitoring are essential for many applications aiming to make smarter the indoor environments, where most people spend much of their time (like home, office, transportation, or public spaces). The capacitive sensors can meet stringent privacy, power, cost, and unobtrusiveness requirements, they do not rely on wearables or specific human interactions, but they may need significant on-board data processing to increase their performance. We comparatively analyze in terms of overall processing time and energy several data processing implementations of multilayer perceptron neural networks (NNs) on board capacitive sensors. The NN architecture, optimized using augmented experimental data, consists of six 17-bit inputs, two hidden layers with eight neurons each, and one four-bit output. For the software (SW) NN implementation, we use two STMicroelectronics STM32low-power ARM microcontrollers (MCUs): one MCU optimized for power and one for performance. For hardware (HW) implementations, we use four ultralow-power field-programmable gate arrays (FPGAs), with different sizes, dedicated computation blocks, and data communication interfaces (one FPGA from the Lattice iCE40 family and three FPGAs from the Microsemi IGLOO family). Our shortest SW implementation latency is 54.4 μs and the lowest energy per inference is 990 nJ, while the shortest HW implementation latency is 1.99 μs and the lowest energy is 39 nJ (including the data transfer between MCU and FPGA). The FPGAs active power ranges between 6.24 and 34.7 mW, while their static power is between 79 and 277 μW. They compare very favorably with the static power consumption of Xilinx and Altera low-power device families, which is around 40 mW. The experimental results show that NN inferences offloaded to external FPGAs have lower latency and energy than SW ones (even when using HW multipliers), and the FPGAs with dedicated computational blocks (multiply-accumulate) perform best.
- Published
- 2019
- Full Text
- View/download PDF
125. Digital control system design for bearingless permanent magnet synchronous motors
- Author
-
L. Chen, X. Sun, S. Wang, Z. Shi, Z. Yang, B. Su, and K. Li
- Subjects
bpmsms ,digital control system ,double-closed speed regulating system ,software design ,hardware design ,Technology ,Technology (General) ,T1-995 - Published
- 2018
- Full Text
- View/download PDF
126. In-Circuit Assertions and Exceptions for Reconfigurable Hardware Design
- Author
-
Todman, Tim, Luk, Wayne, Hinchey, Mike, Series editor, Bowen, Jonathan P., editor, and Olderog, Ernst-Rüdiger, editor
- Published
- 2017
- Full Text
- View/download PDF
127. How to Break Secure Boot on FPGA SoCs Through Malicious Hardware
- Author
-
Jacob, Nisha, Heyszl, Johann, Zankl, Andreas, Rolfes, Carsten, Sigl, Georg, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fischer, Wieland, editor, and Homma, Naofumi, editor
- Published
- 2017
- Full Text
- View/download PDF
128. 双电机独立驱动式电动拖拉机协同控制器开发.
- Author
-
胡晨明, 刘孟楠, 魏垂泉, and 王通
- Abstract
Copyright of Journal of Henan University of Science & Technology, Natural Science is the property of Editorial Office of Journal of Henan University of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
129. High-Level Synthesis of Number-Theoretic Transform: A Case Study for Future Cryptosystems.
- Author
-
Ozcan, Erdem and Aysu, Aydin
- Abstract
Compared to traditional hardware development methodologies, high-level synthesis (HLS) offers a faster time-to-market and lower design-cost at the expense of implementation efficiency. Although HLS tools are becoming popular in some applications, such as digital signal processing and neural network classification, their usability on cryptographic applications is largely unexplored. This feasibility is critical especially for cryptosystems that are under development, such as the next-generation public-key cryptosystems needed for quantum-resistance. This letter provides a thorough investigation of HLS on number theoretic transform (NTT)—the core arithmetic function of lattice-based quantum-resistant cryptosystems. We demonstrate a fast yet extensive design space exploration of NTT through the Vivado HLS tool, analyze the shortcomings/challenges of optimized configurations, and quantitatively compare the results to software-based and hand-coded hardware designs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
130. Disposable Robotic Finger Driven Pneumatically by Flat Tubes and a Hollow Link Mechanism.
- Author
-
Tanaka, Junya and Matsuhira, Nobuto
- Subjects
- *
ROBOTICS , *ACTUATORS , *WEIGHT-bearing (Orthopedics) , *ROBOTIC exoskeletons , *PNEUMATICS - Abstract
We propose a robotic finger with an exoskeleton-type structure that bends and extends by the deformation force of flat tubes. Our objective is to realize a disposable robot hand for gripping unsanitary objects. To reduce the cost of disposing of the robotic finger, a commercially available cable carrier chain was used for the exoskeleton component, and the flat tubes used in the pneumatic actuator were prepared by thermal processing of a commercially available tube. The driving joint of the robotic finger consists of a hollow link mechanism and two flat tubes, which are respectively arranged inside the hollow link mechanism and at the joint boundary. The proposed joint structure achieves both smooth drivability and good load-bearing capacity. The developed robotic finger weighs approximately 85 g and generates a fingertip force of approximately 4 N when a pressure of 0.25 MPa is applied. Because the developed robotic finger is pneumatically driven, it conforms to the object shape and is compliant to external force. Verification of the mechanism demonstrated that the developed robotic finger is useful because it was able to grasp six types of assumed objects. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
131. Nimbro-OP2X: Affordable Adult-Sized 3D-Printed Open-Source Humanoid Robot for Research.
- Author
-
Ficht, Grzegorz, Farazi, Hafez, Rodriguez, Diego, Pavlichenko, Dmytro, Allgeuer, Philipp, Brandenburger, André, and Behnke, Sven
- Subjects
HUMANOID robots ,VISUAL perception ,HUMAN-robot interaction ,INDUSTRIAL costs ,HUMAN ecology ,ROBOTS - Abstract
For several years, high development and production costs of humanoid robots restricted researchers interested in working in the field. To overcome this problem, several research groups have opted to work with simulated or smaller robots, whose acquisition costs are significantly lower. However, due to scale differences and imperfect simulation replicability, results may not be directly reproducible on real, adult-sized robots. In this paper, we present the NimbRo-OP2X, a capable and affordable adult-sized humanoid platform aiming to significantly lower the entry barrier for humanoid robot research. With a height of 135 cm and weight of only 19 kg, the robot can interact in an unmodified, human environment without special safety equipment. Modularity in hardware and software allows this platform enough flexibility to operate in different scenarios and applications with minimal effort. The robot is equipped with an on-board computer with GPU, which enables the implementation of state-of-the-art approaches for object detection and human perception demanded by areas such as manipulation and human–robot interaction. Finally, the capabilities of the NimbRo-OP2X, especially in terms of locomotion stability and visual perception, are evaluated. This includes the performance at RoboCup 2018, where NimbRo-OP2X won all possible awards in the AdultSize class. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
132. 基于DSP 的多电飞机双向直流变换系统.
- Author
-
卢建华, 郝凯敏, 李 飞, and 顾鸿赟
- Abstract
Copyright of Ordnance Industry Automation is the property of Editorial Board for Ordnance Industry Automation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
133. Hardware Design and Fault-Tolerant Synthesis for Digital Acoustofluidic Biochips.
- Author
-
Zhong, Zhanwei, Zhu, Haodong, Zhang, Peiran, Morizio, James, Huang, Tony Jun, and Chakrabarty, Krishnendu
- Abstract
A digital microfluidic biochip (DMB) is an attractive platform for automating laboratory procedures in microbiology. To overcome the problem of cross-contamination due to fouling of the electrode surface in traditional DMBs, a contactless liquid-handling biochip technology, referred to as acoustofluidics, has recently been proposed. A major challenge in operating this platform is the need for a control signal of frequency 24 MHz and voltage range ± 10/± 20 V to activate the IDT units in the biochip. In this paper, we present a hardware design that can efficiently activate/de-activated each IDT, and can fully automate an bio-protocol. We also present a fault-tolerant synthesis technique that allows us to automatically map biomolecular protocols to acoustofluidic biochips. We develop and experimentally validate a velocity model, and use it to guide co-optimization for operation scheduling, module placement, and droplet routing in the presence of IDT faults. Simulation results demonstrate the effectiveness of the proposed synthesis method. Our results are expected to open new research directions on design automation of digital acoustofluidic biochips. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
134. VLSI implementation of anti‐notch lattice structure for identification of exon regions in Eukaryotic genes.
- Author
-
Pathak, Vikas, Nanda, Satyasai Jagannath, Joshi, Amit Mahesh, and Sahu, Sitanshu Sekhar
- Abstract
In a Eukaryotic gene, identification of exon regions is crucial for protein formation. The periodic‐3 property of exon regions has been used for its identification. An anti‐notch infinite impulse response (IIR) filter is mostly employed to recognise this periodic‐3 property. The lattice structure realisation of anti‐notch IIR filter requires less hardware over direct from‐II structures. In this study, a hardware implementation of IIR anti‐notch filter lattice structure is carried out on Zynq‐series (Zybo board) field programmable gate array (FPGA). The performance of hardware design has been improved using techniques like retiming, pipelining and unfolding and finally assessed on various Eukaryotic genes. The hardware implementation reduces the time frame to analyse the DNA sequence of Eukaryotic genes for protein formation, which plays a significant role in detecting individual diseases from genetic reports. Here, the performance evaluation is carried out in MATLAB simulation environment and the results are found similar. Application‐specific integrated circuit (ASIC) implementation of the anti‐notch filter lattice structure is also carried out on CADENCE‐RTL compiler. It is observed that the FPGA implementation is 31 to 34 times faster and ASIC implementation is 58 to 64 times faster compared to the results generated by MATLAB platform with similar prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
135. 6WR: A Hardware Friendly 3D-HEVC DMM-1 Algorithm and its Energy-Aware and High-Throughput Design.
- Author
-
Perleberg, Murilo, Borges, Vinicius, Afonso, Vladimir, Palomino, Daniel, Agostini, Luciano, and Porto, Marcelo
- Abstract
This brief presents the Six Wedgelets and six Refinements (6WR), a hardware friendly algorithm targeting the Depth Modeling Mode 1 (DMM-1) encoding tool of the 3D-High Efficiency Video Coding (3D-HEVC) standard. This brief also presents the high-throughput and energy-aware hardware design for the 6WR. The 6WR algorithm reduces 98.5% of the evaluated wedge lets by exploring the edges gradients, with average coding efficiency losses between 1.2% and 2.8%. The hardware design implements the Bresenham algorithm to avoid the use of memory. The synthesis results show that the 6WR architecture can process up to nine views in 3D full HD 1080p videos at 30 frames per second, with a power dissipation of 263.7 mW. When compared with related works, the 6WR architecture reached the highest throughput and the best results of coding efficiency and energy efficiency when the same target throughput is considered. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
136. Design and analysis of SIC: a provably timing-predictable pipelined processor core.
- Author
-
Hahn, Sebastian and Reineke, Jan
- Abstract
We introduce the strictly in-order core (SIC), a timing-predictable pipelined processor core. SIC is provably timing compositional and free of timing anomalies. This enables precise and efficient worst-case execution time (WCET) and multi-core timing analysis. SIC's key underlying property is the monotonicity of its transition relation w.r.t. a natural partial order on its microarchitectural states. This monotonicity is achieved by carefully eliminating some of the dependencies between consecutive instructions from a standard in-order pipeline design. We present a formal proof framework based on satisfiability modulo theories that is able to automatically verify SIC's timing predictability. SIC preserves most of the benefits of pipelining: it is only about 6–7% slower than a conventional non-strict in-order pipelined processor. Its timing predictability enables orders-of-magnitude faster WCET and multi-core timing analysis than conventional designs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
137. 一种高效的CABAC熵编码硬件设计.
- Author
-
傅晨, 郑明魁, 陈志峰, 施隆照, and 王炎
- Abstract
Copyright of Journal of Fuzhou University is the property of Journal of Fuzhou University, Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
138. Platform Studies: Frequently Questioned Answers
- Author
-
Bogost, Ian and Montfort, Nick
- Subjects
Platform studies ,platforms ,hardware design ,technological determinism ,social constructivism ,pedagogy ,new media - Abstract
We describe six common misconceptions about platform studies, a family of approaches to digital media focused on the underlying computer systems that support creative work. We respond to these and clarify the platform studies concept.
- Published
- 2009
139. A Flexible NTT-Based Multiplier for Post-Quantum Cryptography
- Author
-
Kristjane Koleci, Paolo Mazzetti, Maurizio Martina, and Guido Masera
- Subjects
Number theoretic transform ,accelerator ,General Computer Science ,ASIC ,General Engineering ,QC-MDPC codes ,polynomial product ,convolution ,applied cryptography ,post-quantum cryptography ,hardware design ,FPGA ,General Materials Science ,Electrical and Electronic Engineering - Published
- 2023
- Full Text
- View/download PDF
140. An Analysis of the Impact of Gating Techniques on the Optimization of the Energy Dissipated in Real-Time Systems
- Author
-
Ernest Antolak and Andrzej Pułka
- Subjects
real time ,multitask ,energy-efficient ,time-predictable ,safety systems ,hardware design ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The paper concerns research on electronics-embedded safety systems. The authors focus on the optimization of the energy consumed by multitasking real-time systems. A new flexible and reconfigurable multi-core architecture based on pipeline processing is proposed. The presented solution uses thread-interleaving mechanisms that allow avoiding hazards and minimizing unpredictability. The proposed architecture is compared with the classical solutions consisting of many processors and based on the scheme using one processor per single task. Energy-efficient task mapping is analyzed and a design methodology, based on minimizing the number of active and utilized resources, is proposed. New techniques for energy optimization are proposed, mainly, clock gating and switching-resources blocking. The authors investigate two main factors of the system: setting the processing frequency, and gating techniques; the latter are used under the assumption that the system meets the requirements of time predictability. The energy consumed by the system is reduced. Theoretical considerations are verified by many experiments of the system’s implementation in an FPGA structure. The set of tasks tested consists of programs that implement Mälardalen WCET benchmark algorithms. The tested scenarios are divided into periodic and non-periodic execution schemes. The obtained results show that it is possible to reduce the dynamic energy consumed by real-time applications’ meeting their other requirements.
- Published
- 2022
- Full Text
- View/download PDF
141. Algorithmic Multi-Ported Memories Enabled Power-Efficient Pre-Distorter Design in ASIC
- Author
-
Shen, Xuying and Shen, Xuying
- Abstract
The transition from the 5G to the 6G era is a pivotal juncture in contemporary wireless communication. Under such a circumstance, Digital Pre-Distortion (DPD) technology has established its significance as an effective method to linearize Power Amplifiers. However, DPD is facing a series of challenges, notably the increased bandwidth which necessitates more complex modeling techniques. This thesis focuses on the fact that the DPD requires multi-ported memories for the Look-Up-Tables to store correction coefficients, where two research questions are identified. Firstly, this thesis analyses the power, area, and delay-performance trade-offs with an increase in the number of read and write ports of Flip-Flop (FF)-based memories. Secondly, this thesis evaluates and compares the performance of the conventional FF-based multi-ported memories and algorithmic FF-based multi-ported memories. As a Master’s thesis project, this research utilizes the knowledge and practice skills expected of a Master’s student specializing in Embedded Systems. In this thesis, conventional and algorithmic multi-ported memories are implemented and evaluated after studying related works. Subsequently, an industrial Application-Specific Integrated Circuit (ASIC) design flow is executed, undergoing iterative refinements. And in the end, the conclusions are drawn based on an analysis of the software reports. The results underscore that area and power consumption exhibit linear growth alongside increased port numbers within conventional multi-ported memories. Also, the algorithmic multi-ported memory presents a promising alternative, engendering improvements across all three dimensions of delay, area, and power consumption. The implemented memories can be integrated into DPD forward path with customized port numbers in the future, offering adaptability in terms of port configuration and better performance in terms of timing, area and power. Additionally, these implemented memories stand as a valuable, Övergången från den 5G- till den 6G- eran är en avgörande tidpunkt inom samtida trådlös kommunikation. Under sådana omständigheter har DPDtekniken etablerat sin betydelse som en effektiv metod för att linjärisera effektförstärkare. Dock står DPD inför en rad utmaningar, särskilt den ökade bandbredden som kräver mer komplexa modelleringstekniker. Denna avhandling fokuserar på det faktum att DPD kräver flerportsminnen för att Look-Up-Tables ska lagra korrigeringskoefficienter, där två forskningsfrågor identifieras. För det första analyserar denna avhandling effekt- , area- och fördröjningsprestanda-avvägningar med en ökning av antalet läs- och skrivportar för FF-baserade minnen. För det andra utvärderar och jämför denna avhandling prestandan hos konventionella FF-baserade multiportade minnen och algoritmiska FF-baserade multiportade minnen. Som ett masteruppsatsprojekt använder denna forskning de kunskaper och övningsfärdigheter som förväntas av en masterstudent som specialiserar sig på inbyggda system. I denna avhandling implementeras och utvärderas konventionella och algoritmiska flerportade minnen efter att ha studerat relaterade arbeten. Nästa steg är att genomföra en industriell ASIC-designflöde som genomgår iterativa förbättringar. Och till slut dras slutsatserna baserat på en analys av mjukvarurapporterna. Denna avhandling understryker att area och strömförbrukning ökar linjärt med ökade portnummer inom konventionella flerportade minnen. Å andra sidan presenterar det algoritmiska flerportade minnet ett lovande alternativ och ger förbättringar inom alla tre dimensioner av fördröjning, area och strömförbrukning. De implementerade minnena kan integreras i DPD-signalförloppet med anpassade portnummer i framtiden och erbjuda anpassningsbarhet när det gäller portkonfiguration och bättre prestanda vad gäller tid, area och ström. Dessutom utgör dessa implementerade minnen en värdefull referenspunkt för ingenjörer som är engagerade i utvecklingen av FF-baserade flerport
- Published
- 2023
142. Design, implementation and evaluation of an out-of-order instruction queue based on a parameterizable model
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Moretó Planas, Miquel, Hernández Calderón, César, Mendoza Escobar, Jonnatan, Iznardo Ruiz, Alejandro, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Moretó Planas, Miquel, Hernández Calderón, César, Mendoza Escobar, Jonnatan, and Iznardo Ruiz, Alejandro
- Published
- 2023
143. Computation-in-memory from application-specific to programmable designs based on memristor devices
- Author
-
Zahedi, M.Z. (author) and Zahedi, M.Z. (author)
- Abstract
Computation-in-Memory (CIM) is a promising alternative to traditional computing systems where the storage is conceptually separated fromthe computing units. Instead, the CIM paradigm aims to perform the computation where the data resides, alleviating the memory bottleneck and ultimately leading to higher energy efficiency and performance. From the memory technology perspective, memristors, emerging non-volatile memory devices, demonstrate various beneficial characteristics. Although the concept of CIM, in combination with these emerging memory technologies, is in the infancy stage, it shows great potential as a future of computing systems. To further understand and quantify the potential of CIM, more development is required at each abstraction level. In this thesis, we first explore the main potentials for memristor-based computation-in-memory. Then, we study different applications from the CIM perspective to understand different behaviors and patterns of applications and use this knowledge to develop architectural solutions for CIM. Based on that, we study the realization of CIM as a generic and flexible platformat amicro-architecture level., Computer Engineering
- Published
- 2023
144. Fault Management Impacts on the Networking Systems Hardware Design
- Author
-
Vitucci, Carlo, Sundmark, Daniel, Jagemar, M., Danielsson, J., Larsson, A., Nolte, Thomas, Vitucci, Carlo, Sundmark, Daniel, Jagemar, M., Danielsson, J., Larsson, A., and Nolte, Thomas
- Abstract
Processing capacity distribution has become widespread in the fog computing era. End-user services have multiplied, from consumer products to Industry 5.0. In this scenario, the services must have a very high-reliability level. But in a system with such displacement of hardware, the reliability of the service necessarily passes through the hardware design. Devices shall have a high quality, but they shall also efficiently support fault management. Hardware design must take into account all fault management functions and participate in creating a fault management policy to ensure that the ultimate goal of fault management is fulfilled, namely to increase a system's reliability. Efficiently and sustainably, both in the system's performance and the product's cost. This paper analyzes the hardware design techniques that efficiently contribute to the realization of fault management and, consequently, guarantee a high level of reliability and availability for the services offered to the end customer. We describe hardware requirements and how they affect the choice of devices in the hardware design of networking systems.
- Published
- 2023
- Full Text
- View/download PDF
145. Research on design and key technology of wideband radar intermediate frequency direct acquisition module based on Virtex-7 series FPGA
- Author
-
Jian Zhang, Yue Zhang, Zengping Chen, and Qianqiang Lin
- Subjects
clocks ,field programmable gate arrays ,protocols ,data acquisition ,radar equipment ,buffer circuits ,integrated circuit design ,analogue-digital conversion ,wideband radar intermediate frequency direct acquisition module ,Virtex-7 series FPGA ,radar system ,hardware design ,structure design ,data flow design ,FPGA top-level design ,DDR3 buffer design ,high-speed ADC card ,JESD204B protocol configuration ,clock optimisation ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
The wideband radar intermediate frequency direct acquisition module is used to acquire, process and transmit data, which is an important part of radar system. Based on the current advanced Virtex-7 FPGA board and high-speed ADC card, this article designs the module from two aspects of hardware design and hardware development, including structure design, data flow design, and FPGA top-level design. Then the article researches on some key technologies involved, including JESD204B protocol configuration, data flow transmission, DDR3 buffer design and clock optimisation. Practice has proved that the design of this article can accomplish the task well in practical engineering.
- Published
- 2019
- Full Text
- View/download PDF
146. Efficient spatial and temporal safety for microcontrollers and application-class processors
- Author
-
Rugg, Peter
- Subjects
Memory safety ,RISC-V ,Hardware design ,CHERI - Abstract
This thesis discusses the implementation of Capability Hardware Enhanced RISC Instructions (CHERI) secure capabilities for RISC-V microarchitectures. This includes implementations for three different scales of core, including microcontrollers and the first open application of CHERI to a superscalar processor. Tradeoffs in developing the architecture and performant microarchitecture are investigated. The processors are then used as a platform to conduct research in reducing the overheads for achieving temporal safety with CHERI. CHERI offers a contemporary cross-architecture description of capabilities. The initial design was previously carried out in a single MIPS processor. Based on its success in this context, this thesis investigates the microarchitectural implications across a wider range of processors. To improve adoption, this work is performed on the more contemporary RISC-V architecture. The thesis also explores the microarchitectural implications of architectural decisions arising from the adaptation of CHERI to this new context. The first implementations are to the Piccolo and Flute microcontrollers. They present new tradeoffs, for example being the first CHERI implementations supporting a merged register file and capability mode bit. The area and frequency implications are evaluatd on FPGA, and the performance and power overheads are investigated across a range of benchmarks. To validate correctness, the processors are integrated into a new TestRIG infrastructure. This thesis also develops the first open instantiation of CHERI for a superscalar out-of-order application-class core: RiscyOO. This presents new questions due to the very different design of the more sophisticated microarchitecture, and highlights more architectural tradeoffs. Again, the processor is evaluated on FPGA, investigating area, frequency, power, and performance. This allows the first analysis of how the overheads scale differently across different sizes of core. Finally, the augmented processors are used as a platform to refine the use of CHERI for temporal safety. Significant improvements are made to the architecture-neutral model used for revocation sweeps. In addition, processor-specific acceleration of revocation is performed, including new approaches for caching capability tags.
- Published
- 2023
- Full Text
- View/download PDF
147. A Symmetric and Multilayer Reconfigurable Architecture for Hash Algorithm
- Author
-
Wang, Wang Fan, Qinrang Liu, Xinyi Zhang, Yanzhao Gao, Xiaofeng Qi, and Xuan
- Subjects
hash algorithm ,reconfigurable computing ,hardware design ,parallelism - Abstract
As an essential protection mechanism of information security, hash algorithms are extensively used in various security mechanisms. The diverse application scenarios make the implementation of hash algorithms more challenging regarding flexibility, performance, and resources. Since the existing studies have such issues as wasted resources and few algorithms are supported when implementing hash algorithms, we proposed a new reconfigurable hardware architecture for common hash algorithms in this paper. First, we used the characteristics of symmetry of SM3 (Shang Mi 3) and SHA2 (Secure Hash Algorithm 2) to design an architecture that also supports MD5 (Message Digest 5) and SHA1 (Secure Hash Algorithm 1) on both sides. Then we split this architecture into two layers and eliminated the resource wastes introduced by different word widths through exploiting greater parallelism. Last, we further divided the architecture into four operators and designed an array. The experimental results showed that our architecture can support four types of hash algorithms successfully, and supports 32-bit and 64-bit word widths without wasting resources. Compared with existing designs, our design has a throughput rate improvement of about 56.87–226% and a throughput rate per resource improvement of up to 5.5 times. Furthermore, the resource utilization rose to 80% or above when executing algorithms.
- Published
- 2023
- Full Text
- View/download PDF
148. From the Standards to Silicon: Formally Proved Memory Controllers
- Author
-
Felipe, Mihail, and Florian
- Subjects
DRAM ,Coq ,Hardware Design ,Code Generation - Abstract
Recent research in both academia and industry has successfully used deductive verification to design hardware and prove its correctness. While tools and languages to write formally proved hardware have been proposed, applications and use cases are often overlooked. In this work, we focus on Dynamic Random Access Memories (DRAM) controllers and the DRAM itself – which has its expected temporal and functional behaviours described in the standards written by the Joint Electron Device Engineering Council (JEDEC). Concretely, we associate an existing Coq DRAM controller framework – which can be used to write DRAM scheduling algorithms that comply with a variety of correctness criteria – to a back-end system that generates proved logically equivalent hardware. This makes it possible to simultaneously enjoy the trustworthiness provided by the Coq framework and use the generated synthesizable hardware in real systems. We validate the approach by using the generated code as a plug-in replacement in an existing DDR4 controller implementation, which includes a host interface (AXI), a physical layer (PHY) from Xilinx, and a model of a memory part Micron MT40A1G8WE-075E:D. We simulate and synthesise the full system.
- Published
- 2023
149. Designing New Memory Systems for Next-Generation Data Centers
- Author
-
Mao, Howard Zhehao
- Subjects
Computer engineering ,Computer Architecture ,Distributed memory ,DRAM caching ,Hardware Design ,Warehouse-scale computing - Abstract
In recent years, there has been a trend towards greater use of DRAM in data center applications. In-memory key-value stores are being used to cache or replace disk-based databases, and memory-based big data frameworks are supplanting earlier disk-based frameworks. However, process technology improvements for DRAM have not kept pace and instead have stagnated in terms of cost and density. With next-generation memory technologies like STT-MRAM, PCRAM, and RRAM still far from commercial viability, improving memory utilization is the most potentially fruitful path towards reducing the cost of data center memory in the near term. A typical way to improve utilization of a resource in a data center is to disaggregate said resource, allowing it to be shared across multiple nodes. Disaggregating memory has generally been quite difficult because the latency of a round-trip across a typical data center network is much greater than the latency of a DRAM access. However, recent work on photonic interconnects promises to deliver data center networks with much lower latencies, making the concept of data center remote memory more feasible.In this work, we present the design of a DRAM caching remote memory system which divides a data center rack into compute-specialized and memory-specialized nodes. Each memory blade contains a fixed-function hardware controller that serves data from its large pool of DRAM to the compute blades through the rack network. Each compute blade contains a small local DRAM that is used as a cache for remote memory. This local DRAM is managed by a hardware controller which automatically refills the cache on misses by sending requests to remote memory. This system provides a global pool of memory that can be dynamically allocated among the compute blades and is transparent to software. We evaluated our system using microbenchmarks and realistic data center applications in cloud FPGA-based RTL simulations. Through these evaluations, we found that our DRAM caching system can serve data at lower latencies than earlier virtual memory-based remote memory systems and, with the aid of prefetching, can achieve performance comparable to local DRAM.
- Published
- 2020
150. Efficient High-Level Coding in a PLC to FPGA Translation and Implementation Flow
- Author
-
Economakos, Christoforos, Economakos, George, Sobh, Tarek, editor, and Elleithy, Khaled, editor
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.