2,176 results on '"Datapath"'
Search Results
2. Gilat Satellite awarded over $5M in support of U.S. Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry -- Military aspects ,Business ,News, opinion and commentary - Abstract
Gilat Satellite Networks announced that its wholly owned US-based subsidiary, DataPath, received contracts in support of the US Department of Defense for core terminal related services, technology insertion and upgrades [...]
- Published
- 2024
3. Datapath
- Author
-
Jantsch, Axel and Jantsch, Axel
- Published
- 2023
- Full Text
- View/download PDF
4. Q2 2024 Gilat Satellite Networks Ltd Earnings Call - Final
- Subjects
Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Business - Abstract
Presentation OPERATOR: Ladies and gentlemen, thank you for standing by. Welcome to Gilat's second-quarter 2024 results conference call. (Operator Instructions) As a reminder, this conference is being recorded, August 7, [...]
- Published
- 2024
5. DataPath Names Robinson as President
- Subjects
Datapath ,General Dynamics Corp. ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Aircraft industry ,Business ,Computers and office automation industries ,Telecommunications industry - Abstract
INTERNET BUSINESS NEWS-(C)1995-2024 M2 COMMUNICATIONS Israel-based satellite networking technology, solutions, and services company Gilat Satellite Networks Ltd. (NASDAQ: GILT) (TASE: GILT) has appointed Nicole Robinson as president of DataPath, the [...]
- Published
- 2024
6. Gilat Received Over $4 Million Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Satellite communications ,Defense industry ,Satellite communications ,Business ,Business, international - Abstract
M2 PRESSWIRE-October 28, 2024-: Gilat Received Over $4 Million Order from the US Department of Defense (C)1994-2024 M2 COMMUNICATIONS RDATE:28102024 * Gilat to provide DKET 3421 terminals for easy-to-deploy transportable [...]
- Published
- 2024
7. Implementation of 32-bit ISA Five-Stage Pipeline RISC-V Processor Core
- Author
-
Kalmath, Manjunath, Kulkarni, Akshay, Siddamal, Saroja V., Mallidu, Jayashree, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Pandit, Manjaree, editor, Gaur, M. K., editor, Rana, Prashant Singh, editor, and Tiwari, Akhilesh, editor
- Published
- 2022
- Full Text
- View/download PDF
8. Shader Datapath Design of OpenGL-ES Lite
- Author
-
Lei, Yuhan, Li, Ruichao, Xing, Lidong, Xhafa, Fatos, Series Editor, Xie, Quan, editor, Zhao, Liang, editor, Li, Kenli, editor, Yadav, Anupam, editor, and Wang, Lipo, editor
- Published
- 2022
- Full Text
- View/download PDF
9. Gilat Satellite awarded $5M order from U.S. Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Defense industry ,Business ,News, opinion and commentary - Abstract
Gilat Satellite Networks announced that the US Department of Defense awarded another $5 million order to one of the company's US-based subsidiaries, DataPath. This additional order is for DKET 3421 [...]
- Published
- 2024
10. Gilat Awarded Over $5 Million Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Military paraphernalia ,Mechanization, Military ,Satellite communications ,Satellite communications ,Banking, finance and accounting industries ,Business - Abstract
PETAH TIKVA, Israel, April 03, 2024 (GLOBE NEWSWIRE) -- Gilat Satellite Networks Ltd. (NASDAQ, TASE: GILT), a worldwide leader in satellite networking technology, solutions, and services, announced today that the [...]
- Published
- 2024
11. Gilat Satellite Networks «Gilat Awarded $5 Million Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Satellite communications ,Satellite communications ,Business ,Business, international - Abstract
M2 PRESSWIRE-April 1, 2024-: Gilat Satellite Networks «Gilat Awarded $5 Million Order from the US Department of Defense (C)1994-2024 M2 COMMUNICATIONS RDATE:29032024 * Gilat to provide DKET 3421 terminals for [...]
- Published
- 2024
12. A 16nm All-Digital Hardware Monitor for Evaluating Electromigration Effects in Signal Interconnects Through Bit-Error-Rate Tracking.
- Author
-
Pande, Nakul, Zhou, Chen, Lin, Ming-Hsien, Fung, Rita, Wong, Richard, Wen, Shi-Jie, and Kim, Chris H.
- Abstract
The impact of Electromigration (EM) on the Bit-Error-Rate (BER) of signal interconnect paths was experimentally examined. An array-based test-vehicle for tracking Bit-Error-Rate (BER) degradation of signal interconnects subject to Direct-Current (DC) EM stress was implemented in a 16nm FinFET process. A unit interconnect path comprises five identical interconnect stages where each wire is driven by inverter based buffers. Accelerated EM stress testing is achieved entirely on-chip using metal heaters located directly above the devices-under-test (DUTs) and separate stress circuits driving both ends of the wire. The proposed test structure features a ring-based Voltage-Controlled-Oscillator (VCO), a bit-pattern generator and local BER sampling monitors which enable bitwise tracking of ‘0’ and ‘1’ errors separately, further simplifying the overall test-setup and allowing for high precision characterization of EM induced resistance shifts using only digital circuits. Measurement data collected from the 16nm prototype reveals unique insights into EM induced signal path degradation that was not available prior to this work. Our experimental studies suggest that monitoring the BER of an interconnect path could be used as a new metric for capturing EM induced resistance shifts in a real system, in lieu of the conventional approach which focuses on monitoring standalone wire resistances. Supplemental simulations showcasing the projected degradation in the interconnect path operating frequency as a function of stress time constructed from resistance traces sampled from identical wires implemented in the same process reaffirm the measurement trends. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Lightweight Datapath Implementation of ANU Cipher for Resource-Constrained Environments
- Author
-
Dahiphale, Vijay, Bansod, Gaurav, Zambare, Ankur, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Arai, Kohei, editor, Bhatia, Rahul, editor, and Kapoor, Supriya, editor
- Published
- 2019
- Full Text
- View/download PDF
14. Gilat awarded $5m order from US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Satellite communications ,Defense industry ,Satellite communications ,General interest ,News, opinion and commentary - Abstract
PETAH TIKVA: Gilat Satellite Networks Ltd. (Nasdaq: GILT, TASE: GILT), a worldwide leader in satellite networking technology, solutions, and services, announced today that the US Department of Defense awarded another [...]
- Published
- 2024
15. Gilat Awarded $5 Million Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Satellite communications ,Defense industry ,Satellite communications ,Banking, finance and accounting industries ,Business - Abstract
PETAH TIKVA, Israel, March 28, 2024 (GLOBE NEWSWIRE) -- Gilat Satellite Networks Ltd. (Nasdaq: GILT, TASE: GILT), a worldwide leader in satellite networking technology, solutions, and services, announced today that [...]
- Published
- 2024
16. Gilat Ltd., Awarded $10 Million Follow-On Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Defense industry ,General interest ,News, opinion and commentary - Abstract
PETAH TIKVA: Gilat Satellite Networks Ltd. announces that the US Department of Defense awarded a $10 million follow-on order to one of the company's US-based subsidiaries, DataPath. This additional order [...]
- Published
- 2024
17. Gilat Awarded $10 Million Follow-On Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Satellite communications ,Defense industry ,Satellite communications ,Business ,Business, international - Abstract
M2 PRESSWIRE-February 8, 2024-: Gilat Awarded $10 Million Follow-On Order from the US Department of Defense (C)1994-2024 M2 COMMUNICATIONS RDATE:08022024 Gilat to provide additional DKET 3421 terminals for easy-to-deploy transportable [...]
- Published
- 2024
18. Gilat Awarded $10 Million Follow-On Order from the US Department of Defense
- Subjects
United States. Department of Defense ,Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Defense industry ,Satellite communications ,Defense industry ,Satellite communications ,Banking, finance and accounting industries ,Business - Abstract
PETAH TIKVA, Israel, Feb. 08, 2024 (GLOBE NEWSWIRE) -- Gilat Satellite Networks Ltd. (Nasdaq: GILT, TASE: GILT), a worldwide leader in satellite networking technology, solutions, and services, announced today that [...]
- Published
- 2024
19. DataPath Names Robinson as President
- Subjects
Datapath ,Gilat Satellite Networks Ltd. ,General Dynamics Corp. ,Satellite communications services industry ,Aircraft industry ,Business - Abstract
M2 EQUITYBITES-May 21, 2024-DataPath Names Robinson as President (C)2024 M2 COMMUNICATIONS http://www.m2.co.uk Israel-based satellite networking technology, solutions, and services company Gilat Satellite Networks Ltd. (NASDAQ: GILT) (TASE: GILT) has appointed [...]
- Published
- 2024
20. DataPath Names Robinson as President
- Subjects
Datapath ,General Dynamics Corp. ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Aircraft industry ,Business ,Telecommunications industry - Abstract
TELECOMWORLDWIRE-May 21, 2024-DataPath Names Robinson as President (C)1994-2024 M2 COMMUNICATIONS http://www.m2.co.uk Israel-based satellite networking technology, solutions, and services company Gilat Satellite Networks Ltd. (NASDAQ: GILT) (TASE: GILT) has appointed Nicole [...]
- Published
- 2024
21. Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions.
- Author
-
Santos, Paulo C., de Lima, João P. C., de Moura, Rafael F., Alves, Marco A. Z., Beck, Antonio C. S., and Carro, Luigi
- Subjects
- *
CACHE memory , *INFORMATION sharing - Abstract
Processing-in-Memory (PIM) or Near-Data Accelerator (NDA) has been recently revisited to mitigate the issues of memory and power wall, mainly supported by the maturity of 3D-staking manufacturing technology, and the increasing demand for bandwidth and parallel data access in emerging processing-hungry applications. However, as these designs are naturally decoupled from main processors, at least three open issues must be tackled to allow the adoption of PIM: how to offload instructions from the host to NDAs, since many can be placed along memory; how to keep cache coherence between host and NDAs, and how to deal with the internal communication between different NDA units considering that NDAs can communicate to each other to better exploit their adoptions. In this work, we present an efficient design to solve these challenges. Based on the hybrid Host-Accelerator code, to provide fine-grain control, our design allows transparent offloading of NDA instructions directly from a host processor. Moreover, our design proposes a data coherence protocol, which includes an inclusion-policy agnostic cache coherence mechanism to share data between the host processor and the NDA units, transparently, and a protocol to allow communication between different NDA units. The proposed mechanism allows full exploitation of the experimented state-of-the-art design, achieving a speedup of up to 14.6× compared to a AVX architecture on PolyBench Suite, using, on average, 82% of the total time for processing and only 18% for the cache coherence and communication protocols. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
22. Datapath Design and Power Optimization in High Speed Core.
- Author
-
Anandi, V., Ramesh, M., and Ramesh, Suraj
- Subjects
VERY large scale circuit integration ,LOGIC design ,MATHEMATICAL optimization ,MULTICASTING (Computer networks) ,SPEED ,WORK design - Abstract
In today's high performance advanced VLSI circuit design including most semicustom designs adapt efficient utilization of circuit resources with better productivity offering a good layout density and high performances with optimum resources in submicron designs. The major focus of this work is to optimize the Functional Unit Block which is part of a processor in terms of timing, power and area. Datapath design methodology along with the enhancement of the existing design technology and optimization methods, is used to deliver a quality design and this work also proposes various logical optimization, RTL changes, placement, routing optimization techniques to meet the specifications in terms of power, timing. Datapath designing is a method where the logic synthesis of the design is done manually without any tool based, approach, to give more control over the design. The major focus of the work is to understand the functionality of the functional unit block, and implement the best suitable power optimization methodologies that include sizing sequential cells, Latch conversions, Redundant buffer removal, Clock gating and buffer merging to optimize the FUB in terms of timing and power to meet the specifications and also the quality. A power gain of 0.4% is claimed in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2021
23. A Framework for High-Level Synthesis of VLSI Circuits Using a Modified Moth-Flame Optimization Algorithm
- Author
-
M.R. Esmaeili, S.H. Zahiri, and S.M. Razavi
- Subjects
digital vlsi circuits ,datapath ,digital filters ,high-level synthesis ,moth-flame algorithm ,Computer engineering. Computer hardware ,TK7885-7895 ,Science - Abstract
Background and Objectives: High-level synthesis (HLS) is one of the substantial steps in designing VLSI digital circuits. The primary purpose of HLS is to minimize the digital units used in the system to improve their power, delay, and area.Methods: In the modified MFO algorithm presented in this paper, a hyperbolic spiral is chosen as the update mechanism of moths. Also, by presenting a new approach, a paramount issue involved in applying meta-heuristic methods for solving HLS problems of VLSI circuits has been disentangled.Results: By comparing the performance of the proposed method with Genetic algorithm (GA)-based method and particle swarm optimization (PSO)-based method for the synthesis of the digital filters, it is concluded that the proposed method has the higher ability in the HLS of data path in digital filters. The best improvement is 2.78% for the delay (latency), 6.51% for the occupied area of the chip and 6.93% in power consumption. Another feature of the proposed method is its high-speed in finding optimal solutions, in a manner which, more than 21.6% and 12.9% faster than the GA-based and PSO-based methods, respectively on average.Conclusion: The most important very large scale integration (VLSI) circuits are digital filters and transformers, which are widely used in audio and video processing, medical signal processing, and telecommunication systems. The complex, expansive, and discrete nature of design space in high-level synthesis problems has made them one of the most difficult problems in VLSI circuit design.
- Published
- 2018
- Full Text
- View/download PDF
24. A novel method for high-level synthesis of datapaths in digital filters using a moth-flame optimization algorithm.
- Author
-
Esmaeili, Mohammad Reza, Zahiri, Seyed Hamid, and Razavi, Seyed Mohammad
- Abstract
High-level synthesis (HLS) is one of the most important processes in digital VLSI circuit design. Owing to complexity and enormity of the design space in HLS problems, employing meta-heuristic methods and swarm intelligence has been considered as a highly favorable option when solving such problems. This research work proposes a moth-flame optimization (MFO) algorithm-based method for HLS of datapaths in digital filters, where scheduling, allocating, and binding steps were performed simultaneously. It was observed that the efficiency of the proposed method enjoyed an improved efficiency thanks to the mentioned simultaneous steps while being combined with the MFO algorithm. By comparing the performance of the proposed method with Genetic algorithm based method and particle swarm optimization based method for HLS of digital filters benchmarks, it can be inferred that the proposed method outperforms the other two methods in HLS of digital filters. This is evidently approved by a maximum improvement observed in the rates of the delay, the occupied area of the chip, and the power consumption for 2.99%, 6.58%, and 6.48%, respectively. In addition to the mentioned improvement, another striking characteristic of the proposed method is its fast runtime in reaching a response. This could significantly lower the costs while increasing the design speed of circuits having large dimensions. As well, an averagely 20% rise was also discerned in the algorithm runtime compared to the other two methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
25. Output Domain Downscaler
- Author
-
Büyükmıhçı, Mert, Levent, Vecdi Emre, Guzel, Aydin Emre, Ates, Ozgur, Tosun, Mustafa, Akgün, Toygar, Erbas, Cengiz, Gören, Sezer, Ugurdag, Hasan Fatih, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Czachórski, Tadeusz, editor, Gelenbe, Erol, editor, Grochla, Krzysztof, editor, and Lent, Ricardo, editor
- Published
- 2016
- Full Text
- View/download PDF
26. Novel Shannon-Based Low-Power Full-Adder Architecture for Neural Network Applications
- Author
-
Lalithamma, G. A., Puttaswamy, P. S., Sridhar, V, editor, Sheshadri, Holalu Seenappa, editor, and Padma, M C, editor
- Published
- 2014
- Full Text
- View/download PDF
27. Design and Implementation of novel datapath designs of lightweight cipher RECTANGLE for resource constrained environment.
- Author
-
Dahiphale, Vijay, Raut, Hrishikesh, and Bansod, Gaurav
- Subjects
CIPHERS ,BLOCK ciphers ,SMART cards ,ENCRYPTION protocols ,INTELLIGENT sensors ,RECTANGLES - Abstract
The advancements in IoT and manufacturing techniques has given rise to the use of small embedded devices such as RFIDs, sensor nodes and smart cards. Due to hardware and software constraints, the standard encryption algorithm like AES cannot be used for encryption of such devices. Thus lightweight block ciphers like PRESENT, LED, MIDORI, and RECTANGLE were proposed. RECTANGLE cipher uses Bit-slice technology to make it suitable for the extremely constrained environment. For achieving high efficiency and good performance, along with the selection of the cipher, it is important to implement it with the right datapath. In this paper, we have focused on the hardware implementation of block cipher RECTANGLE with different novel datapaths. In this paper, we have proposed, implemented and evaluated 5 most efficient datapaths of different data bus size of RECTANGLE cipher. All these datapaths are implemented on different FPGA platforms with the same implementation conditions and the results are compared on every performance metric. Based on the device and desired performance metrics, one can choose the best-suited architecture for their application. This paper presents a novel architectures of cipher RECTANGLE suitable for the constrained environment. We have also compared the metrics of the proposed architectures with that of other standard lightweight block ciphers to obtain a better insight on the resourcefulness of the proposed datapaths. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
28. A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type
- Author
-
Jaekang Shin, Lee-Sup Kim, and Seungkyu Choi
- Subjects
Speedup ,Artificial neural network ,Edge device ,Computer science ,business.industry ,Deep learning ,Inference ,Data type ,Theoretical Computer Science ,Computational Theory and Mathematics ,Computer engineering ,Hardware and Architecture ,Datapath ,Artificial intelligence ,business ,Throughput (business) ,Software - Abstract
As deep learning applications often encounter accuracy degradation due to the distorted inputs from a variety of environmental conditions, training with personal data has become essential for the edge devices. Hence, training on edge by supporting a trainable deep learning accelerator has been actively studied. Nevertheless, previous research does not consider the fundamental datapath for training and the importance of retaining the high performance for inference tasks. In this work, we propose NeuroFlix, a deep neural network training accelerator supporting heterogeneous data-type of floating- and fixed-point for input operands. From two perspectives: 1)separate precision decision for each input data, 2)maintenance of high performance on inference, we configure the data with low-bit fixed-point of activation/weight and floating-point based error gradient securing up to half-precision. A novel MAC architecture is designed to compute low/high-precision modes for the different input combinations. By substituting a high-cost floating-point based addition to brick-level separate accumulations, we realize both area-efficient architecture and high throughput for low-precision computation. Consequently, NeuroFlix outperforms the previous architectures of state-of-the-art configurations proving its high efficiency in both training and inference. By also comparing with the off-the-shelf bfloat16-based accelerator, it achieves 1.2/2.0 of speedup/energy-efficiency at training and further enhancement of 3.6/4.5 at inference.
- Published
- 2022
- Full Text
- View/download PDF
29. Design of High Performance MIPS Cryptography Processor
- Author
-
Singh, Kirat Pal, Parmar, Shivani, Kumar, Dilip, Akan, Ozgur, Series editor, Bellavista, Paolo, Series editor, Cao, Jiannong, Series editor, Dressler, Falko, Series editor, Ferrari, Domenico, Series editor, Gerla, Mario, Series editor, Kobayashi, Hisashi, Series editor, Palazzo, Sergio, Series editor, Sahni, Sartaj, Series editor, Shen, Xuemin (Sherman), Series editor, Stan, Mircea, Series editor, Xiaohua, Jia, Series editor, Zomaya, Albert, Series editor, Coulson, Geoffrey, Series editor, Singh, Karan, editor, and Awasthi, Amit K., editor
- Published
- 2013
- Full Text
- View/download PDF
30. Case Study 2: DSIP Architecture Instances for FIR Filtering
- Author
-
Fasthuber, Robert, Catthoor, Francky, Raghavan, Praveen, Naessens, Frederik, Fasthuber, Robert, Catthoor, Francky, Raghavan, Praveen, and Naessens, Frederik
- Published
- 2013
- Full Text
- View/download PDF
31. Microprogrammed Architectures
- Author
-
Schaumont, Patrick R. and Schaumont, Patrick R.
- Published
- 2013
- Full Text
- View/download PDF
32. Gilat Satellite Networks to Acquire DataPath to Boost its Presence in the US Defense Market
- Subjects
Datapath ,Gilat Satellite Networks Ltd. ,Satellite communications services industry ,Telecommunications industry - Abstract
Gilat Satellite Networks has signed a definitive agreement to acquire DataPath to boost its presence in the U.S. defense market, the satellite ground tech company announced March 9. Gilat said [...]
- Published
- 2023
33. A power–performance partitioning approach for low‐power DA‐based FIR filter design with emphasis on datapath and controller
- Author
-
Seyede Fatemeh Ghamkhari and Mohammad Bagher Ghaznavi-Ghoushchi
- Subjects
Distributed arithmetic ,Finite impulse response ,Control theory ,Computer science ,Fir filter design ,Applied Mathematics ,Emphasis (telecommunications) ,Datapath ,Power performance ,Electrical and Electronic Engineering ,Computer Science Applications ,Electronic, Optical and Magnetic Materials ,Power (physics) - Published
- 2021
- Full Text
- View/download PDF
34. Extending Performance-Energy Trade-offs Via Dynamic Core Scaling
- Author
-
John Lach, Hang Zhang, and Wei Zhang
- Subjects
Power management ,business.industry ,Computer science ,Energy consumption ,Optimal control ,Theoretical Computer Science ,Power (physics) ,Computational Theory and Mathematics ,Hardware and Architecture ,Control theory ,Embedded system ,Datapath ,business ,Frequency scaling ,Software ,Energy (signal processing) - Abstract
Modern processors often need to switch among different power states based on usage scenarios, energy availability, and thermal conditions. Dynamic voltage and frequency scaling (DVFS) is a commonly used power management strategy to trade off performance and energy. As transistor scaling is reaching its limit, the viable supply voltage range where DVFS can operate is shrinking, which limits its effectiveness. To extend the performance-energy trade-off capabilities in modern processors, this article proposes dynamic core scaling (DCS) that does not rely on voltage scaling. DCS dynamically adjusts the active superscalar datapath resources so that programs run at a given percentage of their maximum speed while minimizing energy consumption at the same time. Since DCS does not need voltage scaling, it can be combined with DVFS to achieve greater energy savings. To effectively manage performance-energy trade-offs using a combination of DCS and DVFS, this article proposes an oracle controller that demonstrates the optimal control strategy, and two practical controllers that are applicable in real implementations. Evaluations using an 8-way superscalar processor implemented in a 45-nm circuit show that DCS is more effective in performance-energy trade-offs than DVFS at the high performance end for a number of SPEC CPU2000 benchmarks. When used together with DVFS, DCS saves an additional 20 percent of a full-size core’s energy on average. At the minimum operating voltage, DVFS hits its limit, while DCS is still able to achieve an average of 46 percent further energy reduction.
- Published
- 2021
- Full Text
- View/download PDF
35. Fully Synthesizable Unified True Random Number Generator and Cryptographic Core
- Author
-
Massimo Alioto and Sachin Taneja
- Subjects
Key generation ,Computer engineering ,business.industry ,Clock signal ,Computer science ,Random number generation ,Datapath ,Cryptography ,Confusion and diffusion ,Electrical and Electronic Engineering ,Encryption ,business ,Randomness - Abstract
This paper introduces a novel class of architectures that unify true random number generation and private-key cryptography by reusing the cryptographic core for both tasks. The unified architecture is well suited for low-cost constrained secure integrated systems, in view of the inherent area efficiency and the low design effort entailed by conventional automated design flows. Clock pulse over-stretching in pulsed latch clocking generates randomness by inducing metastability and jittered oscillations. Shannon confusion and diffusion in the cryptographic datapath enforce high entropy and robustness against variations. Conventional cryptographic operation is alternatively performed at moderate clock pulsewidths. A 40-nm CMOS testchip demonstrates the proposed unified architecture with a compact area of 0.43 $\cdot 10^{6}~F^{2}$ ( $F\,\,=$ minimum feature size), based on a SIMON cryptographic core. The true random number generator (TRNG) output shows cryptographic-grade quality without any calibration across dice, process (across two manufacturing lots), voltage, and temperature variations. Energy per encryption down to 0.25 pJ/bit is demonstrated. Unification of TRNG and the cryptographic core results in inherent data locality and obfuscation of key generation within logic, improving the resilience to physical attacks.
- Published
- 2021
- Full Text
- View/download PDF
36. General Galois processor for transmitters in 5G/6G base stations
- Author
-
Yong Bai, Qingbo Zhai, and Dake Liu
- Subjects
Parallel processing (DSP implementation) ,Computer Networks and Communications ,Computer science ,Cyclic redundancy check ,Clock rate ,Application-specific instruction-set processor ,Datapath ,SIMD ,Electrical and Electronic Engineering ,Arithmetic ,Shift register ,Scrambling - Abstract
This paper proposes a flexible eight-mode high parallel Galois SIMD ASIP(Application Specific Instruction Set Processor). It supports parallel executions of Gold, Scrambling, CRC, CC, Turbo, RM, PSS, SSS encoding LFSR (linear feedback shift registers) algorithms with high performance and flexibility. It can perform also general bit processing and m-sequence. Our design is based on proposed table conversion and a datapath for unified eight-mode encoding. Based on 28 nm digital CMOS technology, the total area is 0.177mm2 and the clock frequency can be up to 1 GHz. The throughputs of Gold, Scrambling, CRC32, CRC24, CRC16, CRC8, CC, Turbo are 64Gb/s, 64Gb/s, 128Gb/s, 168Gb/s, 256Gb/s, 512Gb/s, 3×80Gb/s, and 72Gb /s, respectively.
- Published
- 2021
- Full Text
- View/download PDF
37. A 5 μW Standard Cell Memory-Based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing
- Author
-
Luca Benini, Abbas Rahimi, Manuel Eggimann, Eggimann M., Rahimi A., and Benini L.
- Subjects
Standard cell ,Computer science ,business.industry ,Wearable computer ,Fault detection and isolation ,VLSI ,machine learning ,edge computing ,Gesture recognition ,Encoding (memory) ,Microcode ,Datapath ,Hyperdimensional computing ,standard cell memory ,Electrical and Electronic Engineering ,business ,hardware accelerator ,always-on ,Computer hardware ,Efficient energy use - Abstract
Hyperdimensional computing (HDC) is a brain-inspired computing paradigm-based on high-dimensional holistic representations of vectors. It recently gained attention for embedded smart sensing due to its inherent error-resiliency and suitability to highly parallel hardware implementations. In this work, we propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator for always-on classification in energy-constrained sensor nodes. By using energy-efficient standard cell memory (SCM), the design is easily cross-technology mappable. It achieves extremely low power, 5 $\mu \text{W}$ in typical applications, and an energy efficiency improvement over the state-of-the-art (SoA) digital architectures of up to $3\times $ in post-layout simulations for always-on wearable tasks such as Electromyography (EMG) hand gesture recognition. As part of the accelerator’s architecture, we introduce novel hardware-friendly embodiments of common HDC-algorithmic primitives, which results in $3.3\times $ technology scaled area reduction over the SoA, achieving the same accuracy levels in all examined targets. The proposed architecture also has a fully configurable datapath using microcode optimized for HDC stored on an integrated SCM-based configuration memory, making the design “general-purpose” in terms of HDC algorithm flexibility. This flexibility allows usage of the accelerator across novel HDC tasks, for instance, a newly designed HDC-algorithm for the task of ball bearing fault detection.
- Published
- 2021
- Full Text
- View/download PDF
38. Graph-Based Logic Bit Slicing for Datapath-Aware Placement.
- Author
-
Chau-Chin Huang, Bo-Qiao Lin, Hsin-Ying Lee, Yao-Wen Chang, Kuo-Sheng Wu, and Jun-Zhi Yang
- Subjects
STATISTICAL correlation ,GRAPHIC methods in statistics ,LEAST squares ,BIPARTITE graphs ,GRAPH theory - Abstract
Extracting similar datapath bit slices which handle highly parallel bit operations can help a modern placer to obtain better solutions for datapathoriented designs. A current state-of-the-art datapath bit slicing method achieves the best extraction results using a network-flow-based algorithm. However, this work has two major drawbacks: (1) it extracts only a limited number of bit slices for datapaths with different I/O widths, which are commonly seen in real designs, and (2) it does not consider bit-slice similarity, which is an important feature for placement considering datapaths. To remedy these drawbacks, we present (1) a balanced bipartite edge-cover algorithm to fully slice a datapath with different I/O widths, and (2) a simulated annealing scheme to further improve bit-slice similarity, while maintaining fully-sliced structures. Compared with the state-of-the-art work, experimental results show that our slicing algorithm extracts more bit slices with similar structures, and helps a leading academic placer achieve averagely 5% smaller routed wirelength. The results also validate the high correlation between datapaths and structure regularity/similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
39. The Dream Digital Signal Processor : Architecture, Programming Model and Application Mapping
- Author
-
Mucci, Claudio, Rossi, Davide, Campi, Fabio, Ciccarelli, Luca, Pizzotti, Matteo, Perugini, Luca, Vanzolini, Luca, De Marco, Tommaso, Innocenti, Massimiliano, Voros, Nikolaos S., editor, Rosti, Alberto, editor, and Hübner, Michael, editor
- Published
- 2009
- Full Text
- View/download PDF
40. Virtual Queues for P4: A Poor Man’s Programmable Traffic Manager
- Author
-
Chrysa Papagianni, Koen De Schepper, Hasanin Harkous, Michael Jarschel, Rastin Pries, Marinos Dimolianis, and Multiscale Networked Systems (IvI, FNWI)
- Subjects
Bandwidth management ,business.product_category ,Computer Networks and Communications ,business.industry ,Computer science ,Quality of service ,Packet processing ,020206 networking & telecommunications ,02 engineering and technology ,Active queue management ,Pipeline (software) ,ddc ,Virtual queue ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,Network switch ,Electrical and Electronic Engineering ,business ,Computer network - Abstract
The advent of programmable network switch ASICs and recent developments on other programmable data planes (NPUs, FPGAs) drive the renewed interest in network data plane programmability. The P4 language has emerged as a strong candidate to describe a protocol independent datapath pipeline. With its supported architectures, the P4 language provides an excellent way to define the packet processing and forwarding behavior, while leaving other networking components such as the traffic management engine, to non-programmable fixed function elements, based on the capabilities of most programmable devices. However, network flexibility is essential to meet the Quality of Service (QoS) requirements of traffic flows. Thus, enabling programmable control for fixed-function elements like traffic management is crucial. Towards that end we propose the use of virtual queues in the P4 pipeline, investigate the application of virtual queue-based traffic management, and portability of the approach using different P4 programmable targets. Specifically, we focus on virtual queue based Active Queue Management (AQM) for congestion policing and meeting the latency targets of distinct network slices. The solution is compared to P4 built-in functionality for bandwidth management using meters, proving also that the additional dimensions of control are achieved without compromising the processing complexity of the solution.
- Published
- 2021
- Full Text
- View/download PDF
41. A Sub-μ W Reversed-Body-Bias 8-bit Processor on 65-nm Silicon-on-Thin-Box (SOTB) for IoT Applications
- Author
-
Ckristian Duran, Trong-Thuc Hoang, Cong-Kha Pham, Koichiro Ishibashi, Khai-Duy Nguyen, Ronaldo Serrano, Van-Phuc Hoang, Marco Sarmiento, and Xuan-Tu Tran
- Subjects
Universal asynchronous receiver/transmitter ,business.industry ,Computer science ,Clock rate ,8-bit ,Instruction set ,Microcontroller ,Gate array ,Datapath ,Hardware_INTEGRATEDCIRCUITS ,Electrical and Electronic Engineering ,business ,Field-programmable gate array ,Computer hardware - Abstract
For most Internet-of-Things (IoT) applications, embedded processors typically execute lightweight tasks such as sensing and communication. The typical IoT program senses some information and sends them via a channel, usually a wireless channel with an RF circuit. These IoT nodes often require a system with networking capabilities and a low-power harvester implementation. This brief presents a sub- $\mu \text{W}$ 8-bit processor which is suitable for such IoT applications. The processor implements the Open8 Instruction Set Architecture (ISA) with an 8-bit datapath and 16-bit bus addressing. The chip contains the processor and a 4-KB of Static Random-Access-Memory (SRAM), and is fabricated by the 65-nm Silicon-On-Thin-Box (SOTB) process. The SOTB process is one of the Fully-Depleted Silicon-On-Insulator (FD-SOI) technology. Hence, the ability to control biasing voltages is one of its key advantages to achieve low-power. The experimental results show that the power consumption at the reverse-body bias can reach down to 50-nW with 0.5-V supply voltage and 32-KHz operating clock frequency. The completed microcontroller consists of the Open8 processor, 32-KB of Read-Only-Memory (ROM), 4-KB of SRAM, Serial Peripheral Interface (SPI), SPI programmer, debug module, General-Purpose In-Outs (GPIOs), and UART. The system was tested using an XC7A100T Xilinx Field-Programmable Gate Array (FPGA); it yielded 1.8% of the total FPGA utilization.
- Published
- 2021
- Full Text
- View/download PDF
42. 64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure
- Author
-
Ryota Kashima, Ikki Nagaoka, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki
- Subjects
business.industry ,Computer science ,Pipeline (computing) ,Clock rate ,Register file ,Process (computing) ,Condensed Matter Physics ,01 natural sciences ,Electronic, Optical and Magnetic Materials ,Pipeline transport ,Arithmetic logic unit ,Logic gate ,0103 physical sciences ,Datapath ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,010306 general physics ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Computer hardware - Abstract
We successfully demonstrated an 8-bit-wide, bit-parallel datapath composed of an arithmetic logic unit and register files for high-throughput oriented SFQ microprocessors based on a gate-level-pipeline structure. Achieving high-speed operation in the bit-parallel datapath is difficult because of feedback paths. We used concurrent-flow clocking and counter-flow clocking in combination to solve the timing problem at the feedback path in the datapath, and we optimized the number of JJs and pipeline stages in the register file for solving the timing issue. We designed the datapath with the cell library for the AIST 10 kA/cm $^2$ Advanced Process. The total number of pipeline stages, Josephson junctions, and circuit area of the designed datapath were 52, 18448, and 3.81 mm × 4.05 mm, respectively. We obtained a relatively wide bias margin of the designed datapath at the target clock frequency of 50 GHz, and it operated up to 64 GHz in on-chip high-speed testing.
- Published
- 2021
- Full Text
- View/download PDF
43. Area–Energy–Error Optimized Faithful Multiplier for Digital Signal Processing
- Author
-
Kousalya Manoharan, Ramya Ramasamy, Sriram Kumar, Kalaiselvi Sundaram, Nagarajan Shanmugam, and Vijeyakumar Krishnasamy Natarajan
- Subjects
Signal processing ,Adder ,Computer science ,business.industry ,Applied Mathematics ,Carry (arithmetic) ,Image processing ,Signal Processing ,Datapath ,Multiplier (economics) ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,business ,Algorithm ,Digital signal processing ,Energy (signal processing) - Abstract
Approximate computing is a striking approach to design area-efficient low-power datapath units for fault buoyant applications. This brief presents the design of a novel 4: 2 approximate compressor that generates no error in the carry signal. The proposed compressor is employed for partial product (PP) compression in two variants of Dadda multiplier to see its effectiveness in error-resilient image and signal processing applications. In the targeted multipliers, the approximate 4:2 compressor is used in the least n PP columns, while the exact counterpart is used in the remaining most significant columns, and hence the maximum error is precisely maintained within 2n. PP compression is performed in stages using the Wallace approach, and the final two rows of sum and carry signals are added using a ripple carry adder in the basic design. In the proposed multiplier design-2, we do not generate sum bits in the approximate part. However, the proposed error-tolerant compressor is used in appropriate columns to propagate carry to the least significant column in the exact part. Performance evaluations using Cadence Encounter with 90 nm application specific integrated circuit technology revealed that the proposed-full width (P-FW) and the proposed-truncated (P-Trun) approximate multipliers demonstrate 22.7% and 32.4% power-delay product reduction compared to the standard multiplier. Implementations of the proposed multipliers in signal and image processing applications revealed superior performance in terms of accuracy compared to prior similar approximate designs.
- Published
- 2021
- Full Text
- View/download PDF
44. Mixed-radix, virtually scaling-free CORDIC algorithm based rotator for DSP applications
- Author
-
Mazad Zaveri, Deepak Verma, and Ankur Changela
- Subjects
Angle of rotation ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Fast Fourier transform ,02 engineering and technology ,Scale factor ,020202 computer hardware & architecture ,Computer Science::Hardware Architecture ,Hardware and Architecture ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,CORDIC ,business ,Algorithm ,Mixed radix ,Rotation (mathematics) ,Software ,Digital signal processing - Abstract
In this work, we proposed a novel Coordinate Rotation DIgital Computer (CORDIC) rotator algorithm that converges faster by performing radix-2,4 and 16 CORDIC iterations while maintaining the scale factor implicitly constant. A mixed-radix is used to achieve convergence faster to reduce the computational latency of the CORDIC algorithm. The main concern of the higher radix CORDIC algorithm is the compensation of a variable scale factor. To solve this problem, the Taylor series approximation of sine and cosine is proposed for a higher radix CORDIC algorithm to achieve the scaling-free rotation of the two-dimensional vector. The scaling-free rotation of the proposed CORDIC algorithm removes the read-only memory (ROM) needed to store scale factor of higher radix CORDIC algorithm. Further, the proposed CORDIC algorithm is designed in rotation mode and optimized by removing the Z datapath for the digital signal processing (DSP) applications for which the angle of rotation is known in advance. Finally, the multipath delay commutator (MDC) fast Fourier transform (FFT) algorithm is implemented with the proposed CORDIC algorithm based rotator on FPGA. The proposed design is compared with existing designs. In a comparison between the radix-16 CORDIC rotator based FFT implementation and our proposed implementation, it has been found out that implementation proposed in this article has used 17% fewer resources.
- Published
- 2021
- Full Text
- View/download PDF
45. Design and analysis of high performance and low power FFT for DSP datapath using Vedic Multipliers
- Author
-
Pradeep Kumar, Nidhi Gaur, and Anu Mehra
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Computation ,Fast Fourier transform ,Datapath ,Electrical and Electronic Engineering ,business ,Instrumentation ,Digital signal processing ,Electronic, Optical and Magnetic Materials ,Computational science ,Power (physics) - Abstract
Fast Fourier Transform is one of the most efficient methods of performing computation in Digital signal processing blocks. These computations are basically performed by the inherent floating-point ...
- Published
- 2021
- Full Text
- View/download PDF
46. Efficient Incorporation of the RNS Datapath in Reverse Converter
- Author
-
MohammadReza Taheri, Keivan Navi, and Amir Sabbagh Molahosseini
- Subjects
Discrete mathematics ,Digital signal processor ,Adder ,020208 electrical & electronic engineering ,Degree of parallelism ,02 engineering and technology ,Residue number system ,CMOS ,Arithmetic Unit ,Hardware and Architecture ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,Residue Number System ,Realization (systems) ,Mathematics - Abstract
The class of moduli sets in the form of $\{2^{k}, 2^{n}-1, 2^{n}+1, m_{4}\}$ with $m_{4}\in \{2^{r}+1, 2^{r}-1\}$ has earned significant popularity in the implementation of the Residue Number System (RNS)-based computational systems, mainly thanks to the efficient arithmetic unit and a high degree of parallelism. However, its complicated inter-modulo computation leads to a high overhead associated with the complex reverse converter. This overhead is the main barrier for energy-efficient implementation of RNS-based devices, particularly for edge computing applications. This brief presents a new approach that embeds the reverse converter into the arithmetic unit of the RNS processor for the aforesaid well-known class of moduli sets. The effective hardware reuse in the proposed approach leads to an area and energy-efficient RNS realization for this class of moduli set. The experimental results based on 65 nm CMOS technology indicate the superiority of RNS realization by employing the proposed design methodology. The proposed architecture for a given RNS provides a substantial 17.4% area-saving and 13.32% less power-consumption on average compared to the traditional design approach, with the negligible penalty in delay.
- Published
- 2021
- Full Text
- View/download PDF
47. Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators
- Author
-
Suriyaprakash Natarajan, Kanad Basu, Suvadeep Banerjee, Shamik Kundu, and Arnab Raha
- Subjects
business.industry ,Computer science ,Deep learning ,Systolic array ,Image processing ,02 engineering and technology ,020202 computer hardware & architecture ,Set (abstract data type) ,Computer engineering ,Hardware and Architecture ,Fault coverage ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software ,Test data - Abstract
High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum. To this end, Google developed the tensor processing unit (TPU) to accelerate the computationally intensive matrix multiplication operation of a DNN on its systolic array architecture. Faults manifested in the datapath of such a systolic array due to latent manufacturing defects or single-event effects may lead to functional safety (FuSa) violation. Although DNNs are known to resist minor perturbations with their inherent fault-tolerant characteristics, we show that the classification accuracy of the model plummets from 97.4% to 7.75% with a minimal fault rate of 0.0003% in the accelerator, implying catastrophic circumstances when deployed across mission-critical systems. Hence, to ensure FuSa of such accelerators, this article provides an extensive FuSa assessment of the accelerator exposed to faults in the datapath, by varying the network parameters, position, and characteristics of the induced error across multiple exhaustive data sets. Furthermore, we propose two novel strategies to obtain a diminutive set of functional test patterns to detect FuSa violation in a DNN accelerator. Our experimental results demonstrate that the obtained test sets can achieve an average of 92.63% (in some cases, up to 100%) fault coverage with cardinality as low as 0.1% of the entire test data set.
- Published
- 2021
- Full Text
- View/download PDF
48. Software Physical/Virtual Rx Queue Mapping Toward High-Performance Containerized Networking
- Author
-
Ryota Kawashima
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Network packet ,Packet forwarding ,020206 networking & telecommunications ,Throughput ,02 engineering and technology ,Virtualization ,computer.software_genre ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Electrical and Electronic Engineering ,business ,Virtual network ,computer ,Host (network) ,Computer network - Abstract
Softwarization of Network Functions (NFs) accelerates automated deployment and management of services on next-gen networks. Combining flexibility and high-performance is a vital requirement for Network Functions Virtualisation (NFV); however, many studies have demonstrated that containerization or virtualization of NFs severely degrades the fundamental efficiency of packet forwarding. Virtual network I/O, a mechanism of packet transferring between a guest and the host, has been seen as the performance bottleneck in the PVP (Physical-Virtual-Physical) datapath, and one of the main causes of this deterioration is packet copy between them. Various techniques, such as zero-copy, pass-through, and hardware offloading, have been examined to alleviate the performance overhead. However, existing designs and implementations incur pragmatic issues, such as compatibility, manageability, and insufficient quality of performance. We propose yet another design and implementation of zero-copy/pass-through acceleration (named IOVTee) to resolve real-world problems as well as to enhance the forwarding efficiency. IOVTee takes advantage of pre-processing of virtual switches with achieving zero-copy on the receive (Rx) path. The pluggable style of IOVTee for vhost-user (the de-facto virtual network I/O) enables our approach to be transparent to both containers/VMs and virtual switches. In this article, we explain the heart of IOVTee, a fully software-based Rx queue mapping mechanism (between physical and virtual) that enables a concept of Virtual DMA Write-through (to the NF). Our evaluation results showed that applying IOVTee to vhost-user drastically increased efficiency of packet forwarding in the PVP datapath (by 45% and 98% for traffic of 64-byte and 1514-byte packets respectively).
- Published
- 2021
- Full Text
- View/download PDF
49. Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions
- Author
-
Paulo C. Santos, João Paulo Cardoso de Lima, Antonio Carlos Schneider Beck, Luigi Carro, Marco A. Z. Alves, and Rafael Fao de Moura
- Subjects
010302 applied physics ,Speedup ,Computer science ,business.industry ,Bandwidth (signal processing) ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Theoretical Computer Science ,Data access ,Embedded system ,0103 physical sciences ,Datapath ,0202 electrical engineering, electronic engineering, information engineering ,business ,Communications protocol ,Host (network) ,Protocol (object-oriented programming) ,Software ,Cache coherence ,Information Systems - Abstract
Processing-in-Memory (PIM) or Near-Data Accelerator (NDA) has been recently revisited to mitigate the issues of memory and power wall, mainly supported by the maturity of 3D-staking manufacturing technology, and the increasing demand for bandwidth and parallel data access in emerging processing-hungry applications. However, as these designs are naturally decoupled from main processors, at least three open issues must be tackled to allow the adoption of PIM: how to offload instructions from the host to NDAs, since many can be placed along memory; how to keep cache coherence between host and NDAs, and how to deal with the internal communication between different NDA units considering that NDAs can communicate to each other to better exploit their adoptions. In this work, we present an efficient design to solve these challenges. Based on the hybrid Host-Accelerator code, to provide fine-grain control, our design allows transparent offloading of NDA instructions directly from a host processor. Moreover, our design proposes a data coherence protocol, which includes an inclusion-policy agnostic cache coherence mechanism to share data between the host processor and the NDA units, transparently, and a protocol to allow communication between different NDA units. The proposed mechanism allows full exploitation of the experimented state-of-the-art design, achieving a speedup of up to 14.6× compared to a AVX architecture on PolyBench Suite, using, on average, 82% of the total time for processing and only 18% for the cache coherence and communication protocols.
- Published
- 2021
- Full Text
- View/download PDF
50. A Hardware/Software Co-Design Methodology for Adaptive Approximate Computing in clustering and ANN Learning
- Author
-
Fabrizio Lombardi, Fei Qiao, Pengfei Huang, Weiqiang Liu, and Chenghua Wang
- Subjects
semi-supervised learning ,Artificial neural network ,Computer science ,Approximation algorithm ,QA75.5-76.95 ,Information technology ,Semi-supervised learning ,Approximate computing ,k-means clustering ,T58.5-58.64 ,Operand ,approximate multiplier ,Computer engineering ,Electronic computers. Computer science ,Datapath ,Unsupervised learning ,Multiplication ,Cluster analysis - Abstract
As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing (AxC) can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have been extensively studied; however, their application at system level has not been extensively pursued. Furthermore, when approximate arithmetic circuits are applied at system level, error-accumulation effects and a convergence problem may occur in computation. Multiple approximate components can interact in a typical datapath, hence benefiting from each other. Many applications require more complex datapaths than a single multiplication. In this paper, a hardware/software co-design methodology for adaptive approximate computing is proposed. It makes use of feature constraints to guide the approximate computation at various accuracy levels in each iteration of the learning process in Artificial Neural Networks (ANNs). The proposed adaptive methodology also considers the input operand distribution and the hybrid approximation. Compared with a baseline design, the proposed method significantly reduces the power-delay product while incurring in only a small loss of accuracy. Simulation and a case study of image segmentation validate the effectiveness of the proposed methodology.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.