120 results on '"approximate multiplier"'
Search Results
2. Balancing precision and efficiency: an approximate multiplier with built-in error compensation for error-resilient applications.
- Author
-
Sayadi, Ladan, Amirany, Abdolah, Moaiyeri, Mohammad Hossein, and Timarchi, Somayeh
- Abstract
In the pursuit of high-performance designs for error-resilient applications, approximate computing emerges as a key strategy. This paper introduces an innovative approximate multiplier, leveraging two highly efficient compressors. These compressors operate in tandem across two stages, strategically compensating for errors and culminating in a multiplier that maintains accuracy and significantly reduces delay in the final stage. The proposed method is specifically tailored for applications reliant on multiplication, such as image processing and neural networks. HSPICE simulations were conducted using 7 nm FinFET technology to gauge its efficacy. Results indicate a remarkable 82% reduction in power-delay product (PDP) compared to traditional multipliers. Moreover, system-level simulations underscore the practicality of the proposed multiplier in real-world applications like image processing and artificial intelligence, revealing minimal compromise in accuracy. This work contributes a nuanced perspective to approximate computing, presenting a multiplier poised to elevate efficiency without sacrificing precision in critical domains. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. Design and Evaluation of High-Speed Approximate Multipliers Based on Improved Error Distance 4:2 Compressors for Error Resilient Image Applications.
- Author
-
Zuhair, Zahraa A. and Al-Sabawi, Emad A.
- Subjects
LOGIC circuit design ,LOGIC design ,HIGH performance computing ,IMAGE processing ,COMPRESSORS - Abstract
Approximate Computing (AC) have widely adopted in designing large-scale logic circuits. In particular, approximate adder and multiplier circuits have been considerably targeted in the realm of image processing due to their high energy-saving while preserving proper level of computing accuracy. Nevertheless, intensive research and development are maintained in the means of seeking more matured designs that effectively prioritize design overheads over error resilience. In this paper, low error-distance approximate 4:2 compressor circuits are proposed to construct high-speed approximate adders. The developed compressors realize high logic computing and incur competitive area and power consumption, and therefore, they were leveraged to configure an approximate 8×8 multiplier designs. To achieve a favorable trade-off between computational accuracy and hardware resource usage, we develop a simulation framework that evaluates the accuracy of the proposed multiplier designs at the gate-level (measuring error distance) and at the application-level (evaluating SNR and SSIM) of an image. The framework truncates specific propagated carry bits, i.e., least significant bits (LSBs), to realize profitable area- and power-saving. Furthermore, two main high-speed multiplier designs are proposed herein, namely High Computing Performance Approximate Multiplier (HCP-AMUL) and HCP Low Error Approximate Multiplier (HCPLE-AMUL). Matlab R2022b along with VS Code are used for running simulations and accuracy evaluation, while Vivado 2018.2 is utilized for HDL reconfigurable logic design and implementation and evaluation of area, power, and speed, configured on an FPGA Xilinx Nexys 4 Artix-7 (XC7A100T1CSG324) trainer board. The experimental results demonstrate the efficacy of the developed multipliers as the developed HCPLE-AMUL delivers 54.26%, 11.72%, and 449.85 of speedup, power saving, and Power-Delay-AreaError-Product (PDAEP) improvement, respectively. On the other hand, the presented HCP-AMUL realizes an improved saving of area and power at the expense of an acceptable lowering of computation accuracy. It achieves 9.66%, 505.40, and 53.73% of power saving, PDAEP, and speedup respectively, Thus, the proposed compressor and multiplier circuits potentially can be promising approximate computing modules for image processing applications to provide improved trade-off between computation accuracy and logic utilization complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
4. FPGA‐Based Resource‐Optimal Approximate Multiplier for Error‐Resilient Applications.
- Author
-
Khurshid, Burhan
- Subjects
- *
BOOLEAN networks , *PARETO analysis , *ARITHMETIC , *LOGIC , *ALGORITHMS - Abstract
ABSTRACT Arithmetic units inspired by approximate computations have seen a significant development in error‐resilient applications, wherein accuracy can be traded off for enhanced performance. Most of the existing literature pertaining to approximate computations targets ASIC platforms. In this paper, we focus on exploiting the features of approximate computation to design efficient digital hardware for FPGA platforms. Specifically, we propose an FPGA implementation of an approximate multiplier unit based on the CORDIC algorithm. Contemporary FPGA‐based approximate multiplier implementations report a lot of compromise in accuracy and a relatively higher implementation cost in terms of utilized resources, timing, and energy. We conduct a detailed Pareto analysis to determine the number of optimal computing stages for the proposed CORDIC‐based approximate multiplier that justifies the accuracy‐performance trade‐offs. More importantly, we focus on the optimal logic distribution of the proposed multiplier circuit by restructuring the top‐level Boolean network and translating it into a circuit netlist that can be efficiently mapped onto the inherent FPGA fabric of LUTs and Carry4 primitives. Our CORDIC‐based implementations significantly improve the accuracy metrics while maintaining a suitable performance trade‐off. The efficacy of our proposed multiplier is tested using two image‐processing applications, namely, image blending and image smoothening. The obtained results show a substantial improvement over the existing state‐of‐the‐art approximate multipliers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. APPAs: fast and efficient approximate parallel prefix adders and multipliers.
- Author
-
Rashidi, Bahram
- Subjects
- *
TIME complexity , *IMAGE processing , *PARALLEL processing , *ERROR rates , *SUFFIXES & prefixes (Grammar) - Abstract
In this paper, the approximate parallel prefix adders with minimizing hardware and timing complexities are proposed. Moreover, an approximate multiplier based on these adders is designed. The approximate structures include two approximate Sklansky adders, one approximate Ladner-Fischer adder, and one approximate Kogge-Stone adder. The proposed adders are free from carry rippling. The main strategy for approximate design is primarily based on rearranging and deleting sub-blocks and secondary reducing the critical path delay and area in the adders. In this case, we have a trade-off between accuracy, delay, and area. The proposed approximate multiplier has a serial structure that is designed based on using one approximate parallel prefix adder. The proposed approximate adders and multiplier are compared from hardware and accuracy point of view such as gate count, delay, area delay product, error rate, mean error distance, mean relative error distance, and normalized error distance. The efficacy of proposed structures in image processing applications such as image smoothing (low-pass filter) and image multiplication is performed using MATLAB. The results show the proposed approximate structures are comparable in terms of area, delay, PSNR, and mean structural similarity index metric parameters with other works. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Design and Analysis of Optimized Approximate Multiplier Using Novel Higher-Order Compressor.
- Author
-
Thakur, Garima and Jain, Shruti
- Subjects
- *
IMAGE processing , *COMPRESSORS , *MULTIPLICATION - Abstract
The energy-efficient error-tolerant circuits have paved the way for a whole new area in low-power consumption applications with approximate computing. The approximate computing fulfills the trade-off requirement of exact computation and provides efficient performance. In this paper, a novel energy-efficient multiplier has been proposed for image processing applications. In the multiplication process, compressors are used as an important component for the reduction of partial products. Higher-order approximate 5:2 and 6:2 compressors are also designed and simulated in VIVADO using Verilog coding. The proposed higher-order compressors result in less area and low-power consumption in comparison with the existing state-of-the-art technique. These high-performance compressors are used at the multipliers’ reduction stage, resulting in an energy-efficient circuit for error-tolerant applications. All the simulations were carried out in VIVADO considering 8-bit inputs. Multiplication performance shows 37.77 % (8-bit) improvement in terms of power consumption in comparison to the conventional multiplier. The multiplication process has been done on the original, negative, and sharpened images using their masks. The proposed multiplier shows 51.36% (original image), 6.04% (negative image), and 22.44% (sharpened image) PSNR improvement in comparison to state-of-the-art work. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Energy-Efficient Approximate Multiplier Design With Lesser Error Rate Using the Probability-Based Approximate 4:2 Compressor.
- Author
-
Krishna, L. Hemanth, Sk, Ayesha, Rao, J. Bhaskara, Veeramachaneni, Sreehari, and Sk, Noor Mahammad
- Abstract
This letter proposes novel approximate 4:2 compressors developed using input reordering circuits and input combination probabilities. The input reordering circuit is used to reduce the hardware complexity of the proposed designs. This letter proposes two designs of approximate 4:2 compressors. This compressor is used in designing an approximate multiplier. The proposed multiplier designs utilize less energy than the already published ones due to acceptable inaccurate output/precision, which are best suitable for image processing applications. The proposed multiplier designs MUL1, MUL2, MUL3, and MUL4 saves 22.75%, 21.95%, 11.57%, and 8.95% energy than the best of the existing design (Kong and Li, 2021). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Power–Area-Optimized Approximate Multiplier Design for Image Fusion.
- Author
-
Thakur, Garima, Sohal, Harsh, and Jain, Shruti
- Subjects
- *
IMAGE fusion , *COMPRESSORS - Abstract
In this paper, three approximate multiplier architectures are proposed: area-optimized approximate multiplier (AOM), power-optimized approximate multiplier (POM), and power- and area-optimized approximate multiplier (PAOM). These designs are implemented using speculative Han–Carlson adder and compressor-based multiplier blocks. Han–Carlson adder is used as the basic adder block in the final addition stage of all the three approximate multiplier designs. Different types of compressors (3:2, 4:2, 5:2, 6:2, 7:2, 8:2) are used for the implementation of the energy-efficient approximate multiplier blocks. All the simulations are performed on VIVADO design tool. Also, the designed multipliers are validated for image blending (an error-tolerant) application. The proposed power optimization approximate multiplier shows 0.86%, 10.54% PSNR improvement in comparison with area optimization approximate multiplier and power and area optimization approximate multiplier, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Design and implementation of hybrid (radix-8 Booth and TRAM) approximate multiplier using 15-4 approximate compressors for image processing application.
- Author
-
Immareddy, Srikanth, Sundaramoorthy, Arunmetha, and Alagarsamy, Aravindhan
- Abstract
This manuscript proposes a low-power and high-speed hybrid approximate multiplier using 15-4 approximate compressors in partial product stage for image processing application. Initially, the most significant bits (MSB) of approximate multiplier is encoded by approximate radix-8 Booth’s (R-8B) encoding, and also least significant bits (LSB) is encoded by approximate truncated-round approximate multiplier (TRAM) encoding both are used to rounding the LSB to the adjacent power of two. Then, approximate 15-4 compressors are subjugated in partial product lessening stage to produce MSB result. Then, the hybrid approximate multiplier under 15-4 approximate compressors is carried out in the application of image processing. The proposed approach is done in MATLAB and Vivado Design Suite 2018.1 simulator, then observes that the power consumption of proposed design attains 31.814%, 23.562% lower than existing models. Similarly, the velocity attains 42.63%, 6.263% higher than the existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Energy efficient enhanced all pass transformation fostered variable digital filter design based on approximate adder and approximate multiplier for eradicating sensor nodes noise.
- Author
-
Raja, M. Ramkumar, Naveen, R., Durai, C. Anand Deva, Usman, Mohammed, Shukla, Neeraj Kumar, and Muqeet, Mohammed Abdul
- Subjects
ADAPTIVE filters ,FINITE impulse response filters ,IMPULSE response ,SIGNAL processing ,KALMAN filtering ,NOISE ,DETECTORS - Abstract
Variable digital filter (VDF) plays a significant role in communication and signal processing field. Any prototype filter's preferred frequency response is attained by creating All Pass Transformation (APT) based filter to maintain complete control over the cut-off frequency. However, the speed, power, and area usage of the digital filter are constrained by its performance. Therefore, in this manuscript, All Pass Transformation based Variable digital filters (APT-VDF) using Error Reduced Carry Prediction Approximate Adder (ERCPAA) andSandpiper Optimization fostered Approximate Multiplier (SO-AM) is proposed. The proposed APT-VDF-ERCPAA-SOAM filter design is utilized for enhancing the filter efficiency by reducing noise in the sensor nodes. The proposed ERCPAA design is incorporated with carry prediction and constant truncation for diminishing the path delay and area utilization. Moreover, the proposed SO-AM is used for minimizing the design complexity and power utilization. The simulation of the proposed method is activated in Verilog and the design is synthesized in FPGA uses Xilinx ISE 14.5. The proposed APT-VDF- ERCPAA- SO-AM filter design has attained 35.6%, 21.75%, 28.69% lower power and 46.58%, 12.3%, 38.07% lower delay than the existing approaches, like Very Large-Scale Integration design of All Pass Transformation based Variable digital filters uses a new variable block sized ternary adder (VBSTA) and ternary multiplier (APTVDF-VBSTA-TM), Finite Impulse Response (FIR) adaptive filter design by hybridizing canonical signed digit (CSD) and approximate booth recode (ABR) algorithm in DA architecture (FIR- CSDABR-DA) and digital FIR filter design using Carry Save Adder (CSA) and Structured Tree Multiplier (FIR-CSA-STM) respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Efficient approximate multipliers with adjustable accuracy.
- Author
-
Pourkhatoon, Mohammadreza, Emrani Zarandi, Azadeh Alsadat, and Mohammadi, Majid
- Subjects
- *
SIGNAL-to-noise ratio , *IMAGE processing , *COMPUTER systems , *ELECTRICITY pricing , *COMPUTER arithmetic , *HIGH performance computing , *SMART devices - Abstract
The number of smart devices grows rapidly, and the main leakage of many of these devices is their limited batteries, in addition to the need for a fast and low-power computing system. Approximate calculations are an excellent method for such systems to achieve higher speed and less area and power consumption at the cost of lower accuracy. Furthermore, in many applications with inherent tolerance for insignificant inaccuracies such as image processing, approximate computing can be used to achieve acceptable performance with higher speed, lower power or area consumption. In this work, approximate multiplier structures are proposed based on working on the partial product trees with the aim of reducing area consumption and increasing accuracy. Comprehensive experimental analysis is performed to evaluate the performance of the proposed multiplier in terms of area, delay, power consumption, and accuracy. The results show that our suggested design improves at most 29% of power consumption, 11% of delay, and 55% of the area rather than an exact multiplier. Moreover, the performance of our multipliers is investigated based on the peak signal noise ratio (PSNR) for processing two images, which lead to 56.65 and 23.6 dB. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. A-DSCNN: Depthwise Separable Convolutional Neural Network Inference Chip Design Using an Approximate Multiplier
- Author
-
Jin-Jia Shang, Nicholas Phipps, I-Chyn Wey, and Tee Hui Teo
- Subjects
application-specific integrated circuits ,approximate multiplier ,CMOS ,convolutional neural network ,depthwise separable convolution ,processing element ,Electronic computers. Computer science ,QA75.5-76.95 ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 - Abstract
For Convolutional Neural Networks (CNNs), Depthwise Separable CNN (DSCNN) is the preferred architecture for Application Specific Integrated Circuit (ASIC) implementation on edge devices. It benefits from a multi-mode approximate multiplier proposed in this work. The proposed approximate multiplier uses two 4-bit multiplication operations to implement a 12-bit multiplication operation by reusing the same multiplier array. With this approximate multiplier, sequential multiplication operations are pipelined in a modified DSCNN to fully utilize the Processing Element (PE) array in the convolutional layer. Two versions of Approximate-DSCNN (A-DSCNN) accelerators were implemented on TSMC 40 nm CMOS process with a supply voltage of 0.9 V. At a clock frequency of 200 MHz, the designs achieve 4.78 GOPs/mW and 4.89 GOP/mW power efficiency while occupying 1.16 mm2 and 0.398 mm2 area, respectively.
- Published
- 2023
- Full Text
- View/download PDF
13. Design of an Approximate Multiplier with Time and Power Efficient Approximation Methods.
- Author
-
Liu, Ruyi, Duan, Wei, Luo, Xiaodie, Ren, Qian, Li, Yifan, and Song, Min
- Subjects
- *
SIGNAL-to-noise ratio , *IMAGE processing - Abstract
Approximate multipliers have gradually become a focus of research due to the emergence of fault-tolerant applications. This paper deals with the approximation methods for an approximation multiplier with truncation, probability transformation and a majority gate-based compressor chain. With the help of probability analysis, the proposed approximation methods are utilized in an approximate 8 × 8 unsigned multiplier to achieve low accuracy loss, high efficiency for time and power. Compared with the precise and approximate multipliers, the proposed design brings 55.0%, 39.0% reduction in delay and 73.8%, 22.6% power saving. The proposed multiplier achieves better peak signal-to-noise ratio (PSNR) values when evaluated with an image processing application. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Energy-Efficient Hardware Implementation of Fully Connected Artificial Neural Networks Using Approximate Arithmetic Blocks.
- Author
-
Esmali Nojehdeh, Mohammadreza and Altun, Mustafa
- Subjects
- *
ARTIFICIAL neural networks , *FEEDFORWARD neural networks , *PARALLEL processing , *ARITHMETIC - Abstract
In this paper, we explore efficient hardware implementation of feedforward artificial neural networks (ANNs) using approximate adders and multipliers. Due to a large area requirement in a parallel architecture, the ANNs are implemented under the time-multiplexed architecture where computing resources are re-used in the multiply accumulate (MAC) blocks. The efficient hardware implementation of ANNs is realized by replacing the exact adders and multipliers in the MAC blocks by the approximate ones taking into account the hardware accuracy. Additionally, an algorithm to determine the approximate level of multipliers and adders due to the expected accuracy is proposed. As an application, the MNIST and SVHN databases are considered. To examine the efficiency of the proposed method, various architectures and structures of ANNs are realized. Experimental results show that the ANNs designed using the proposed approximate multiplier have a smaller area and consume less energy than those designed using previously proposed prominent approximate multipliers. It is also observed that the use of both approximate adders and multipliers yields, respectively, up to 50% and 10% reduction in energy consumption and area of the ANN design with a small deviation or better hardware accuracy when compared to the exact adders and multipliers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. A-DSCNN: Depthwise Separable Convolutional Neural Network Inference Chip Design Using an Approximate Multiplier.
- Author
-
Shang, Jin-Jia, Phipps, Nicholas, Wey, I-Chyn, and Teo, Tee Hui
- Subjects
CONVOLUTIONAL neural networks ,INTEGRATED circuits ,VOLTAGE ,ANALOG multipliers ,ARCHITECTURE - Abstract
For Convolutional Neural Networks (CNNs), Depthwise Separable CNN (DSCNN) is the preferred architecture for Application Specific Integrated Circuit (ASIC) implementation on edge devices. It benefits from a multi-mode approximate multiplier proposed in this work. The proposed approximate multiplier uses two 4-bit multiplication operations to implement a 12-bit multiplication operation by reusing the same multiplier array. With this approximate multiplier, sequential multiplication operations are pipelined in a modified DSCNN to fully utilize the Processing Element (PE) array in the convolutional layer. Two versions of Approximate-DSCNN (A-DSCNN) accelerators were implemented on TSMC 40 nm CMOS process with a supply voltage of 0.9 V. At a clock frequency of 200 MHz, the designs achieve 4.78 GOPs/mW and 4.89 GOP/mW power efficiency while occupying 1.16 mm 2 and 0.398 mm 2 area, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Design and evaluation of ultra‐fast 8‐bit approximate multipliers using novel multicolumn inexact compressors.
- Author
-
Karimi, Fereshteh, Faghih Mirzaee, Reza, Fakeri‐Tabrizi, Ali, and Roohi, Arman
- Subjects
- *
COMPRESSORS , *IMAGE processing , *ERROR rates , *COMPUTER arithmetic , *MICROCONTROLLERS - Abstract
Summary: A multiplier, as a key component in many different applications, is a time‐consuming, energy‐intensive computation block. Approximate computing is a practical design paradigm that attempts to improve hardware efficacy while keeping computation quality satisfactory. A novel multicolumn 3,3:2 inexact compressor is presented in this paper. It takes three partial products from two adjacent columns each for rapid partial product reduction. The proposed inexact compressor and its derivatives enable us to design a high‐speed approximate multiplier. Then, another ultra‐fast, high‐efficient approximate multiplier is achieved by utilizing a systematic truncation strategy. The proposed multipliers accumulate partial products in only two stages, one fewer stage than other approximate multipliers in the literature. Implementation results by the Synopsys Design Compiler and 45 nm technology node demonstrate nearly 11.11% higher speed for the second proposed design over the fastest existing approximate multiplier. Furthermore, the new approximate multipliers are applied to the image processing application of image sharpening, and their performance in this application is highly satisfactory. It is shown in this paper that the error pattern of an approximate multiplier, in addition to the mean error distance and error rate, has a direct effect on the outcomes of the image processing application. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Design and Analysis of Low Power Approximate Multiplier Using Novel Compressor
- Author
-
Thakur, Garima, Sohal, Harsh, and Jain, Shruti
- Published
- 2024
- Full Text
- View/download PDF
18. An Optimized Deep-Learning-Based Low Power Approximate Multiplier Design.
- Author
-
Usharani, M., Sakthivel, B., Priya, S. Gayathri, Nagalakshmi, T., and Shirisha, J.
- Subjects
DEEP learning ,IMAGE processing ,DATA mining ,ERROR rates ,MULTIPLIERS (Mathematical analysis) - Abstract
Approximate computing is a popular field for low power consumption that is used in several applications like image processing, video processing, multimedia and data mining. This Approximate computing is majorly performed with an arithmetic circuit particular with a multiplier. The multiplier is the most essential element used for approximate computing where the power consumption is majorly based on its performance. There are several researchers are worked on the approximate multiplier for power reduction for a few decades, but the design of low power approximate multiplier is not so easy. This seems a bigger challenge for digital industries to design an approximate multiplier with low power and minimum error rate with higher accuracy. To overcome these issues, the digital circuits are applied to the Deep Learning (DL) approaches for higher accuracy. In recent times, DL is the method that is used for higher learning and prediction accuracy in several fields. Therefore, the Long Short-Term Memory (LSTM) is a popular time series DL method is used in this work for approximate computing. To provide an optimal solution, the LSTM is combined with a meta-heuristics Jellyfish search optimisation technique to design an input aware deep learning-based approximate multiplier (DLAM). In this work, the jelly optimised LSTM model is used to enhance the error metrics performance of the Approximate multiplier. The optimal hyperparameters of the LSTM model are identified by jelly search optimisation. This fine-tuning is used to obtain an optimal solution to perform an LSTM with higher accuracy. The proposed pre-trained LSTM model is used to generate approximate design libraries for the different truncation levels as a function of area, delay, power and error metrics. The experimental results on an 8-bit multiplier with an image processing application shows that the proposed approximate computing multiplier achieved a superior area and power reduction with very good results on error rates. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Design of Generalized Enhanced Static Segment Multiplier with Minimum Mean Square Error for Uniform and Nonuniform Input Distributions.
- Author
-
Di Meo, Gennaro, Saggese, Gerardo, Strollo, Antonio G. M., and De Caro, Davide
- Subjects
MEAN square algorithms ,APPROXIMATION error - Abstract
In this paper, we analyze the performances of an Enhanced Static Segment Multiplier (ESSM) when the inputs have both uniform and non-uniform distribution. The enhanced segmentation divides the multiplicands into a lower, a middle, and an upper segment. While the middle segment is placed at the center of the inputs in other implementations, we seek the optimal position able to minimize the approximation error. To this aim, two design parameters are exploited: m, defining the size and the accuracy of the multiplier, and q, defining the position of the middle segment for further accuracy tuning. A hardware implementation is proposed for our generalized ESSM (gESSM), and an analytical model is described, able to find m and q which minimize the mean square approximation error. With uniform inputs, the error slightly improves by increasing q, whereas a large error decrease is observed by properly choosing q when the inputs are half-normal (with a NoEB up to 18.5 bits for a 16-bit multiplier). Implementation results in 28 nm CMOS technology are also satisfactory, with area and power reductions up to 71% and 83%. We report image and audio processing applications, showing that gESSM is a suitable candidate in applications with non-uniform inputs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. A Low-Power and High-Accuracy Approximate Multiplier With Reconfigurable Truncation
- Author
-
Fang-Yi Gu, Ing-Chao Lin, and Jia-Wei Lin
- Subjects
Approximate computing ,approximate multiplier ,CNN accelerator ,deep learning ,high precision ,reconfigurable approximate design ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Multipliers are among the most critical arithmetic functional units in many applications, and those applications commonly require many multiplications which result in significant power consumption. For applications that have error tolerance, employing an approximate multiplier is an emerging method to reduce critical path delay and power consumption. An approximate multiplier can trade off accuracy for lower energy and higher performance. In this paper, we not only propose an approximate 4-2 compressor with high accuracy, but also an adjustable approximate multiplier that can dynamically truncate partial products to achieve variable accuracy requirements. In addition, we also propose a simple error compensation circuit to reduce error distance. The proposed approximate multiplier can adjust the accuracy and power required for multiplications at run-time based on the users’ requirement. Experimental results show that the delay and the average power consumption of the proposed adjustable approximate multiplier can be reduced by 27% and 40.33% (up to 72%) when compared to the Wallace tree multiplier. Moreover, we demonstrate the suitability and reconfigurability of our proposed multiplier in convolutional neural networks (CNNs) to meet different requirements at each layer.
- Published
- 2022
- Full Text
- View/download PDF
21. High-performance, energy-efficient, and memory-efficient FIR filter architecture utilizing 8x8 approximate multipliers for wireless sensor network in the Internet of Things
- Author
-
Charles Rajesh Kumar J., D. Vinod Kumar, and M.A. Majid
- Subjects
WSN ,IoT ,Approximate multiplier ,Approximate adders ,FIR filter ,Wallace tree ,Electric apparatus and materials. Electric circuits. Electric networks ,TK452-454.4 ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
IoT uses wireless sensor networks (WSN) to deploy many sensors to track environmental and physical parameters. The WSN measurements are frequently contaminated and altered by noise. The noise in the signal increases the sensor node’s computation and energy utilization, resulting in less longevity of the sensor node. The Finite Impulse Response (FIR) filter is commonly employed in WSN to pre-process sensed signals to remove noise from the sensed signals using delay elements, multipliers, and adders. Traditional multiplier-based FIR filter designs result in hardware-intensive multipliers that consume a lot of energy, and area and have low computation speed. These drawbacks make them unsuitable for IoT-based WSN systems with stringent power efficiency necessities. Approximate computing enhances the energy efficiency of an FIR filter. Arithmetic circuits utilizing approximate computing improve the hardware performance, with some loss of accuracy to save energy utilization and boost speed. A novel approximate multiplier architecture employing a fast and straightforward approximation adder is proposed in this study. Approximate multiplier M1 using OR gate and approximate multiplier M2 using proposed approximate adders are compared. The proposed approximate adder is suited for building an adder tree to accumulate partial product (PP) because it is less complicated than traditional adders. Compared to a one-bit-full adder, the critical path delay (CPD) is reduced significantly in the proposed methods. The accuracy comparison of M1. M2 and Wallace tree using the normalized mean error distance (NMED), the mean relative error distance (MRED), the maximum error (ME), and the error rate (ER) with the number of bits utilized for reducing error. For the area (delay) optimized circuit, when the bit used is 4, the delay is 0.4 ns for M1, 0.43 ns for M2, and 1.08 ns for the Wallace tree multiplier. For the delay (area) optimized circuit, when the bit used is 4, the delay is 0.16 ns for M1, 0.16 ns for M2, and 0.40 ns for the Wallace tree multiplier. To more accurately evaluate performance at the circuit level, the PDP and ADP are computed. The NMED, MRED, ME, and ER versus PDP and ADP are computed. The proposed multipliers M1 and M2 are compared with existing approximate multipliers. When an equivalent MRED, NMED, or ER is taken into account, M1 has the smallest ADP and PDP among other multipliers. The very low likelihood of a significant ED occurring is indicated by the small values of NMED and MRED in M1 and M2. The proposed solutions effectively reduce delay, area, and power while maintaining increased accuracy and performance.
- Published
- 2022
- Full Text
- View/download PDF
22. Truncation Based Approximate Multiplier For Error Resilient Applications.
- Author
-
Parekh, Prashil, Mehta, Samidh, and Mane, Pravin
- Subjects
- *
INTEGRATED circuit design , *ELECTRONIC design automation - Abstract
Approximate computing is a promising approach for low power IC design and has recently received considerable research attention. To accommodate dynamic levels of approximation, a few accuracy configurable multiplier designs have been developed in the past. However, these designs consumed considerable area and power. Accuracy, as well as latency, power and area design metrics are used to evaluate our approximate multiplier designs of different bit widths, i.e. 16 x 16, 32 × 32 and 64 × 64. Simulation and synthesis results showed a considerable gain than previous designs since we can change the components required according to the error tolerance. Moreover, we have also proposed a technique, where the system takes charge of the design and makes a call depending on the magnitude of the numbers provided. When compared with the exact multiplier designs, for 16 bit, 32 bit and 64 bit, we achieved a reduction by 24.5%, 71.5% and 85.1% in area; reduction in power by 37.7%, 89.4% and 88.2% and with a mean relative error distance of 0.5393%, 0.5428% and 0.2878% respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. A New Approximate 4-2 Compressor using Merged Sum and Carry.
- Author
-
Jyothi, Chinthalgiri, Saranya, K., Jammu, Bhaskara Rao, Veeramachaneni, Sreehari, and Mahammad, SK Noor
- Subjects
- *
COMPRESSORS , *IMAGE processing , *IMAGING systems , *ENERGY consumption , *ERROR rates , *MULTIPLIERS (Mathematical analysis) - Abstract
Multiplication is the fundamental process in many image processing systems that undertake more computational assets. As many DSP and image applications are tolerable to inaccurate results, approximate multiplication is preferred for energy efficiency. Here in this paper, two types of approximate compressors are proposed by exploring the relationship between the sum and carry from the truth table to utilize them to design energy-saving multipliers. The proposed compressor circuits are synthesized using a 45nm library. The proposed circuits produce better Energy and Energy Delay Product (EDP) percentages when compared with the previously presented approximate compressors. Using the proposed approximate multiplier designs, the application to image processing is also presented in this paper. Image quality parameters like error rate, Normalized Relative Error Distance (NRED), and Average Relative Error Distance (ARED) are evaluated. New parameter Power and Exactness Product (PEP) is introduced, and it explicitly shows that the proposed designs are 35 % and 47 % efficient in terms of structural and quality aspects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. CMOS Implementation and Performance Analysis of Known Approximate 4:2 Compressors.
- Author
-
Anguraj, Parthibaraj, Krishnan, Thiruvenkadam, and Subramanian, Saravanan
- Subjects
- *
COMPRESSOR performance , *GATE array circuits , *INTEGRATED circuits , *IMAGE processing , *ERROR rates , *COMPRESSORS - Abstract
Approximate computing is one of the emerging concepts in multimedia applications like image processing applications. In the research world, it is getting more attention from researchers. Because of sacrificing a smaller scale in the accuracy of the design, it reduces the circuit parameters like area complexity, delay, and power. The purpose of this work is to survey the Field-Programmable Gate Array (FPGA) and Application-Specific Integrated Circuit (ASIC) implementation of modified Dadda multiplier architecture using various approximate 4:2 compressor designs presented for the last few decades. Based on implementation outcomes, this survey examines the approximate modified Dadda multiplier design performance for its closeness to the exact computation. In addition, the comparison is carried out based on approximate 4:2 compressors performance, an error rate of the particular design, the accuracy analysis metrics of approximate multiplier and its area utilization, power consumption, and delay. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Hybrid Radix-16 booth encoding and rounding-based approximate Karatsuba multiplier for fast Fourier transform computation in biomedical signal processing application.
- Author
-
Jayaraman Rajanediran, Dinesh Kumar, Babu C, Ganesh, K, Priyadharsini, and Ramkumar, M.
- Subjects
- *
BIOMEDICAL signal processing , *DIGITAL signal processing , *ENERGY consumption , *ERROR rates , *MULTIPLICATION , *MULTIPLIERS (Mathematical analysis) - Abstract
Multiplication is an essential biomedical signal processing function implemented in the Digital Signal Processing (DSP) cores. To enhance the speed, area and energy efficiency of DSP cores, approximate multiplication is used. Also, low power multiplier unit design is one of the requirements of DSP processor to meet the increasing demands. To balance both the design and error metrics of a multiplier design, an efficient Hybrid Radix-16 Booth Encoding and rounding-based approximate Karatsuba Multiplier (RBEKM-16) is proposed. This research introduces an Approximate Karatsuba multiplier based on rounding, utilizing rounding approximation to compute the least significant part of the product. Simple operators, like adders and multiplexers, replace complex and costly conventional Floating-Point (FP) multipliers in this process. Radix-4 logarithms are incorporated to further minimize hardware complexity and calculate the product's most significant part. Subsequently, an approximate 4-2 compressor is applied in the partial product reduction stage to generate the most significant bit result. In the experimental scenario, the efficiency of the multiplier is evaluated in terms of energy efficiency, area utilization and error rate by using Xilinx ISE 8.1i tool. The results from the experiments indicate that the suggested multiplier demonstrates improved energy efficiency, utilizes space more effectively, and performs well in applications related to biomedical signal processing. Further, the accomplished area utilization of the proposed 16-bit multiplier is 1068 μ m 2 , delay is 3.01 ns, power consumption is 0.021 mW and power delay product is 119 fJ. • The proposed multiplier uses rounding approximation to generate the product's least significant part. • The radix-16 booth encoding is used to calculate the product's most significant part. • By integrating both the approximate multiplier and booth encoding, produces an excellent final outcome. • The introduced AKM adds more flexibility, that maximizes the multiplier unit's performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Low-Power Compressor-Based Approximate Multipliers With Error Correcting Module.
- Author
-
Kumar, U. Anil, Chatterjee, Sumit K., and Ahmed, Syed Ershad
- Abstract
This letter proposes an unsigned approximate multiplier architecture segmented into three portions: the least significant portion that contributes least to the partial product (PP) is replaced with a new constant compensation term to improve hardware savings without sacrificing accuracy. The PPs in the middle portion are simplified using a new 4:2 approximate compressor, and the error due to approximation is compensated using a simple yet efficient error correction module. The most significant portion of the multiplier is implemented using exact logic as approximating it will results in a large error. Experimental results of 8-bit multiplier show that the power and power-delay products are reduced up to 47.7% and 55.2%, respectively, in comparison with the exact design and 36.9% and 39.5%, respectively, in comparison with the existing designs without significant compromise on accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Efficient Approximate Multiplier Based on a New 1-Gate Approximate Compressor.
- Author
-
Ejtahed, Seyed Amir Hossein and Timarchi, Somayeh
- Subjects
- *
COMPUTER arithmetic , *COMPRESSORS , *PRODUCT improvement - Abstract
Multiplier is one of the most important arithmetic blocks in computer arithmetic units, which affects the performance of the whole system. Improving efficiency and reducing power consumption can be achieved at the cost of reducing the computation accuracy. One approach to design approximate multipliers is to use an approximate compressor. This paper proposes an approximate compressor to be exploited in a multiplier circuit. The proposed compressor consists of only one gate. According to the simulation results with 28-nm standard cell-based technology, the proposed approximate compressor improves by 62% compared to the fastest available work. Also, at equal delays, its power consumption and area improve by 52% and 61%, respectively, compared with the best existing design. Moreover, the results indicate that the proposed approximate compressor may provide up to 53%, 86%, and 57% improvements in power–delay product, energy–delay product, and area–delay product, respectively, compared to the most efficient design. Finally, the efficiency of the proposed multiplier is investigated in image applications. The results show that the efficiency of the proposed multiplier excels the existing approximate and accurate counterparts. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Variable-Precision Approximate Floating-Point Multiplier for Efficient Deep Learning Computation.
- Author
-
Zhang, Hao and Ko, Seok-Bum
- Abstract
In this brief, a variable-precision approximate floating-point multiplier is proposed for energy efficient deep learning computation. The proposed architecture supports approximate multiplication with BFloat16 format. As the input and output activations of deep learning models usually follow normal distribution, inspired by the posit format, for numbers with different values, different precisions can be applied to represent them. In the proposed architecture, posit encoding is used to change the level of approximation, and the precision of the computation is controlled by the value of product exponent. For large exponent, smaller precision multiplication is applied to mantissa and for small exponent, higher precision computation is applied. Truncation is used as approximate method in the proposed design while the number of bit positions to be truncated is controlled by the values of the product exponent. The proposed design can achieve 19% area reduction and 42% power reduction compared to the normal BFloat16 multiplier. When applying the proposed multiplier in deep learning computation, almost the same accuracy as that of normal BFloat16 multiplier can be achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. A Hardware/Software Co-Design Methodology for Adaptive Approximate Computing in clustering and ANN Learning
- Author
-
Pengfei Huang, Chenghua Wang, Weiqiang Liu, Fei Qiao, and Fabrizio Lombardi
- Subjects
Approximate computing ,approximate multiplier ,k-means clustering ,semi-supervised learning ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing (AxC) can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have been extensively studied; however, their application at system level has not been extensively pursued. Furthermore, when approximate arithmetic circuits are applied at system level, error-accumulation effects and a convergence problem may occur in computation. Multiple approximate components can interact in a typical datapath, hence benefiting from each other. Many applications require more complex datapaths than a single multiplication. In this paper, a hardware/software co-design methodology for adaptive approximate computing is proposed. It makes use of feature constraints to guide the approximate computation at various accuracy levels in each iteration of the learning process in Artificial Neural Networks (ANNs). The proposed adaptive methodology also considers the input operand distribution and the hybrid approximation. Compared with a baseline design, the proposed method significantly reduces the power-delay product while incurring in only a small loss of accuracy. Simulation and a case study of image segmentation validate the effectiveness of the proposed methodology.
- Published
- 2021
- Full Text
- View/download PDF
30. A Cost-Efficient Approximate Dynamic Ranged Multiplication and Approximation-Aware Training on Convolutional Neural Networks
- Author
-
Hyunjin Kim and Alberto A. Del Barrio
- Subjects
Approximate computing ,approximate multiplier ,approximation-aware training ,convolutional neural network ,probabilistic multiplier ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper proposes a low-cost approximate dynamic ranged multiplier and describes its use during the training process on convolutional neural networks (CNNs). It has been noted that the approximate multiplier can be used in the convolution of CNN’s forward path. However, in CNN inference on a post-training quantization with a pre-trained model, erroneous convolution output from highly approximate multipliers significantly degrades performance. On the other hand, with the CNN model based on an approximate multiplier, the approximation-aware training process can optimize its learnable parameters, producing better classification results considering the approximate hardware. We analyze the error distribution of the approximate dynamic ranged multiplication and characterize it in order to find the most suitable approximate multiplier design. Considering the effects of normalizing the biased convolution outputs, a low standard deviation of relative errors with respect to the multiplication outputs leads to a negligible accuracy drop. Based on these facts, the hardware costs of the proposed multiplier can be further reduced by adopting the partial products’ inaccurate compression, truncated input fraction, and reduced-width multiplication output. When the proposed approximate multiplier is applied to the residual convolutional neural networks for the CIFAR-100 and Tiny-ImageNet datasets, the accuracy drops of the approximation-aware training results are negligible compared with those using 32-bit floating-point CNNs.
- Published
- 2021
- Full Text
- View/download PDF
31. CNN Inference Using a Preprocessing Precision Controller and Approximate Multipliers With Various Precisions
- Author
-
Issam Hammad, Ling Li, Kamal El-Sankary, and W. Martin Snelgrove
- Subjects
Approximate computing ,approximate multiplier ,CNN accelerator ,deep learning ,reconfigurable approximate multiplier ,precision prediction ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This article proposes boosting the multiplication performance for convolutional neural network (CNN) inference using a precision prediction preprocessor which controls various precision approximate multipliers. Previously, utilizing approximate multipliers for CNN inference was proposed to enhance the power, speed, and area at a cost of a tolerable drop in the accuracy. Low precision approximate multipliers can achieve massive performance gains; however, utilizing them is not feasible due to the large accuracy loss they cause. To maximize the multiplication performance gains while minimizing the accuracy loss, this article proposes using a tiny two-class precision controller to utilize low and high precision approximate multipliers hybridly. The performance benefits for the proposed concept are presented for multi-core multi-precision architectures and single-core reconfigurable architectures. Additionally, a design for a merged reconfigurable approximate multiplier with two precisions is proposed for utilization in single-core architectures. For performance comparison, several segments-based approximate multipliers with different precisions were synthesized using CMOS 15nm technology. For accuracy evaluation, the concept was simulated on VGG19, Xception, and DenseNet201 using the ImageNetV2 dataset. This article will demonstrate that the proposed concept can achieve significant performance gains with a minimal accuracy loss when compared to designs that utilize exact multipliers or single-precision approximate multipliers.
- Published
- 2021
- Full Text
- View/download PDF
32. Computation unit architecture for satellite image processing systems.
- Author
-
Pazhani, A. Azhagu Jaisudhan
- Subjects
- *
IMAGE processing , *IMAGING systems , *REMOTE-sensing images , *EDGE detection (Image processing) , *TELECOMMUNICATION satellites - Abstract
Computation Unit plays vital role in satellite image processing systems. Division is the least commonly used of the four basic arithmetic operations because it is too difficult to utilise. The primitive use of division is an iterative subtraction. Approximate computing is a new trend in digital design that forgoes the need for accurate computation in favour of increased speed and power performance. For error tolerant application approximate computing can reduce design complexity while increasing performance and power efficiency. This work provides new approximation compressors as well as an approach for using them to create efficient approximate multipliers. We have summed up approximate multipliers for different operand lengths using the proposed method. Detailed simulation results show that the Modified architectures achieves significant improvements in accuracy and efficiency as well as reduced area, power and latency compared to existing multiplier designs to improve the compressor efficiency recommendations are based on approximate multiplier system. The Proposed system is suitable for satellite image processing and radar image processing system with high accuracy. The proposed system is implemented in canny edge detection algorithm for measuring its performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. DeBAM: Decoder-Based Approximate Multiplier for Low Power Applications.
- Author
-
Nambi, Suresh, Kumar, U. Anil, Radhakrishnan, Kavya, Venkatesan, Mythreye, and Ahmed, Syed Ershad
- Abstract
Approximate computing is a promising method for designing power-efficient computing systems. Many image and compression algorithms are inherently error-tolerant and can allow errors up to a specific limit. In such algorithms, savings in power can be achieved by approximating the data path units, such as a multiplier. This letter presents a novel decoder logic-based multiplier design with the intent to reduce the partial products generated. Thus, leading to a reduction in the hardware complexity and power consumption while maintaining a low error rate. Our proposed design in an 8-bit format which achieves 40.96% and 22.30% power reduction compared to the accurate and approximate multipliers. Comprehensive simulations are carried out on image sharpening and compression algorithms to prove that the proposed design obtains a better quality-effort tradeoff than the existing multipliers. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
34. Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications.
- Author
-
Manikantta Reddy, K., Vasantha, M.H., Nithin Kumar, Y.B., Keshava Gopal, Ch., and Dwivedi, Devesh
- Subjects
- *
EDGE computing , *DEEP learning , *DATA transmission systems , *PARALLEL processing , *HARDWARE , *MATRIX multiplications , *MOBILE apps - Abstract
Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix–vector multiplication (MVM) and matrix–matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14 × and 5 × respective improvements over the conventional design. • Low power and high speed approximate radix-4 Booth multiplier. • Approximate hardware accelerator for edge computing applications. • Simultaneous transfer of data from the on-chip memory to off-chip memory using multiple interconnects. • Packing of data for improving data communication bandwidth. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. Gutter oil detection for food safety based on multi-feature machine learning and implementation on FPGA with approximate multipliers
- Author
-
Wei Jiang, Yuhanxiao Ma, and Ruiqi Chen
- Subjects
Gutter oil detection ,Machine learning ,FPGA ,K-NN ,Approximate multiplier ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Since consuming gutter oil does great harm to people’s health, the Food Safety Administration has always been seeking for a more effective and timely supervision. As laboratory tests consume much time, and existing field tests have excessive limitations, a more comprehensive method is in great need. This is the first time a study proposes machine learning algorithms for real-time gutter oil detection under multiple feature dimensions. Moreover, it is deployed on FPGA to be low-power and portable for actual use. Firstly, a variety of oil samples are generated by simulating the real detection environment. Next, based on previous studies, sensors are used to collect significant features that help distinguish gutter oil. Then, the acquired features are filtered and compared using a variety of classifiers. The best classification result is obtained by k-NN with an accuracy of 97.18%, and the algorithm is deployed to FPGA with no significant loss of accuracy. Power consumption is further reduced with the approximate multiplier we designed. Finally, the experimental results show that compared with all other platforms, the whole FPGA-based classification process consumes 4.77 µs and the power consumption is 65.62 mW. The dataset, source code and the 3D modeling file are all open-sourced.
- Published
- 2021
- Full Text
- View/download PDF
36. Approximate Multiplier based on Low power and reduced latency with Modified LSB design
- Author
-
Senthil Kumar K.K., Vignesh R., Vivek V.R., Ahirwar Jagdish Prasad, Makhzuna Khamdamova, and kumar R. Ram
- Subjects
approximate multiplier ,lsb ,partialproduct ,convolution ,criticalpath ,Environmental sciences ,GE1-350 - Abstract
The devised approximation multiplier can adapt the precision and processing power needed formul triplication sat run-time based on the needs of the user. To decrease error distance, we also suggest a straight forward error compensation circuit. There are two types of approximate multi pliers. Dynamic voltages caling can be used for the first kind, which controls the timing route of the multiplier. If the voltage is lower, the critical path will take longer to complete. As a result, when the time path is violated, errors occurs and approximated results are produced. These cond types involves redesigning precise multiplier circuits like the Wallace Tree Multiplier and Dadda Tree Multiplier in order to change the functional behaviors of multipliers. Most of the earlier research on rebuilding multipliers suggested erroneous m-n compressors, which have m inputs and producen outputs. It dynamically reduces the area covered under the multiplier LSB which enables the MSB in accurate manner and LSB in approximate manner. This convolution al system approach is regarded to sequential cover up more than 32 bit multiplier. Since the accompanied circuit reduce then tire area by10times lesser than original multiplier, this conventional unit is regarded as abled circuit in the segment. Since the process of compressing partial products absorbed the majority of the multiplier energy and resulted in a consider able route delay, these incorrect compressors were utilized to compress the partial products within multiplication. These functionality are over come through our experimental setup.
- Published
- 2023
- Full Text
- View/download PDF
37. Audiogram matching in hearing aid using approximate arithmetic.
- Author
-
Ramya, R. and Moorthi, S.
- Abstract
Filter banks are the major signal processing blocks that dissipate large amount of power in a portable digital hearing aid device. The power consumption can be reduced by replacing the power-hungry multipliers of the filter by power efficient approximate multipliers. This paper illustrates the application of an approximate multiplier for error tolerant hearing aid application. Frequency response masking approach is used for the development of a 10-band non-uniform approximate FIR filter bank with a minimum stop band attenuation of greater than 50 dB. Audiogram matching is done with audiograms of different types of moderate hearing loss and the matching error is computed. Simulation results show that the audiogram matching error falls within +/− 5 dB range. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers.
- Author
-
LEON, VASILEIOS, PAPAROUNI, THEODORA, PETRONGONAS, EVANGELOS, SOUDRIS, DIMITRIOS, and PEKMESTZI, KIAMAL
- Subjects
CONVOLUTIONAL neural networks ,IMAGE processing ,HARDWARE ,PARETO analysis ,MULTIPLICATION ,LOGIC ,QUALITY of service ,MULTIPLIERS (Mathematical analysis) - Abstract
Approximate computing has emerged as a promising design alternative for delivering power-efficient systems and circuits by exploiting the inherent error resiliency of numerous applications. The current article aims to tackle the increased hardware cost of floating-point multiplication units, which prohibits their usage in embedded computing. We introduce AFMU (Approximate Floating-point MUltiplier), an area/powerefficient family of multipliers, which apply two approximation techniques in the resource-hungry mantissa multiplication and can be seamlessly extended to support dynamic configuration of the approximation levels via gating signals. AFMU offers large accuracy configuration margins, provides negligible logic overhead for dynamic configuration, and detects unexpected results that may arise due to the approximations. Our evaluation shows that AFMU delivers energy gains in the range 3.6%-53.5% for half-precision and 37.2%-82.4% for single-precision, in exchange for mean relative error around 0.05%-3.33% and 0.01%-2.20%, respectively. In comparison with state-of-the-art multipliers, AFMU exhibits up to 4-6× smaller error on average while delivering more energy-efficient computing. The evaluation in image processing shows that AFMU provides sufficient quality of service, i.e., more than 50db PSNR and near 1 SSIM values, and up to 57.4% power reduction. When used in floating-point CNNs, the accuracy loss is small (or zero), i.e., up to 5.4% for MNIST and CIFAR-10, in exchange for up to 63.8% power gain. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. An efficient approximate multiplier: Design, error analysis and application.
- Author
-
Zakian, Pegah and Niaraki Asli, Rahebeh
- Abstract
Approximate circuits are extensively considered in error-resilient applications to reduce the design complexity. A multiplier is known as a key arithmetic unit of computational block, and thus improving the performance of the multiplier is significantly important to achieve an effective processor. In this paper, we develop two new inaccurate multipliers using approximate 4:2 compressor. Applying error correction block in the structure of these 8-bit multipliers remarkably increases the accuracy of design. The design performance and error metrics of the multipliers are analyzed. Furthermore, as an image processing application, we consider several benchmark images, and compare the quality metrics of image multiplication of our proposed multipliers to those obtained from different under-test multipliers of the previous studies. The design of multipliers is performed using HSPICE with 14 nm FinFET technology. The results demonstrate that the proposed multipliers provide suitable efficiency in the error analysis when compared to the exact multiplier in image processing application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Machine-Learning-Based Self-Tunable Design of Approximate Computing.
- Author
-
Masadeh, Mahmoud, Hasan, Osman, and Tahar, Sofiene
- Subjects
IMAGE processing ,DESIGN techniques ,ENERGY consumption - Abstract
Approximate computing (AC) is an emerging computing paradigm suitable for intrinsic error-tolerant applications to reduce energy consumption and execution time. Different approximate techniques and designs, at both hardware and software levels, have been proposed and demonstrated the effectiveness of relaxing the average output quality constraint. However, the output quality of AC is highly input-dependent, i.e., for some input data, the output errors may reach unacceptable levels. Therefore, there is a dire need for an input-dependent tunable approximate design. With this motivation, in this article, we propose a lightweight and efficient machine-learning-based approach to build an input-aware design selector, i.e., quality controller, to adapt the approximate design in order to meet the target output quality (TOQ). For illustration purposes, we use a library of 8-bit and 16-bit energy-efficient approximate array multipliers with 20 different settings, which are commonly used in image and audio processing applications. The simulation results, based on two sets of images, including an 8 Scene Categories Dataset, which is a benchmark of images data set, demonstrate the effectiveness of the lightweight selector where the proposed tunable design achieves a significant reduction in quality loss with relatively low overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. FPGA Implementation of Error Reduction in Energy-Efficient Truncation and Rounding-Based Scalable Approximate Multiplier.
- Author
-
Vijayan, Sreelakshmi and Rashida, K.
- Abstract
An approximation approach of scalable approximate multiplier using truncated and rounding-based technique is presented to reduce the number of partial products based on leading 1 bit position. The multiplication design is performed using arithmetic unit, truncation unit, absolute unit, shift unit for shift and add accumulation. The operation of TOSAM(3,7) contains more absolute error. This design methodology will modify all the arithmetic operations of shift and add unit and reduce the absolute error. This design is proved in higher improvements of area and energy consumptions. Finally, the work is designed in Verilog HDL and simulated and synthesized in Xilinx ISE. [ABSTRACT FROM AUTHOR]
- Published
- 2021
42. Truncated SIMD Multiplier Architecture for Approximate Computing in Low-Power Programmable Processors
- Author
-
Roberto R. Osorio and Gabriel Rodriguez
- Subjects
Digital arithmetic ,fixed-point arithmetic ,approximate computing ,approximate multiplier ,low power ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Approximate computing has been exploited for many years in application-specific architectures. Recently, it has also been proposed for low-power programmable processors. However, this poses some challenges as, in a microprocessor, the energy consumed by fetching and decoding an instruction may be significantly higher than that of the execution itself. Therefore, approximate computing would be advisable only for those instructions, in which the execution stage is significantly expensive in terms of energy consumption. In this paper, we present new architectures for truncated SIMD multipliers able to calculate signed and unsigned products from 8 × 8 to 64× 64 bits. Next, we analyze the precision loss incurred by truncation for all product sizes. We implement accurate and truncated architectures for both scalar and SIMD products and find that truncation allows area savings of up to 27%. The proposed design is experimentally evaluated in different scenarios, showing potential energy savings ranging from 29% to 42%. Finally, this paper analyzes the overall convenience of introducing truncated SIMD architectures with respect to accurate SIMD and scalar architectures.
- Published
- 2019
- Full Text
- View/download PDF
43. MACISH: Designing Approximate MAC Accelerators With Internal-Self-Healing
- Author
-
G. A. Gillani, M. A. Hanif, B. Verstoep, S. H. Gerez, M. Shafique, and A. B. J. Kokkeler
- Subjects
Approximate computing ,approximate accelerators ,approximate multiply-accumulate ,approximate multiplier ,internal-self-healing methodology ,radio astronomy processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Approximate computing studies the quality-efficiency trade-off to attain a best-efficiency (e.g., area, latency, and power) design for a given quality constraint and vice versa. Recently, self-healing methodologies for approximate computing have emerged that showed an effective quality-efficiency trade-off as compared to the conventional error-restricted approximate computing methodologies. However, the state-of-the-art self-healing methodologies are constrained to highly parallel implementations with similar modules (or parts of a datapath) in multiples of two and for square-accumulate functions through the pairing of mirror versions to achieve error cancellation. In this paper, we propose a novel methodology for an internal-self-healing (ISH) that allows exploiting self-healing within a computing element internally without requiring a paired, parallel module, which extends the applicability to irregular/asymmetric datapaths while relieving the restriction of multiples of two for modules in a given datapath, as well as going beyond square functions. We employ our ISH methodology to design an approximate multiply-accumulate (xMAC), wherein the multiplier is regarded as an approximation stage and the accumulator as a healing stage. We propose to approximate a recursive multiplier in such a way that a near-to-zero average error is achieved for a given input distribution to cancel out the error at an accurate accumulation stage. To increase the efficacy of such a multiplier, we propose a novel 2 × 2 approximate multiplier design that alleviates the overflow problem within an n × n approximate recursive multiplier. The proposed ISH methodology shows a more effective quality-efficiency trade-off for an xMAC as compared with the conventional error-restricted methodologies for random inputs and for radio-astronomy calibration processing (up to 55% better quality output for equivalent-efficiency designs).
- Published
- 2019
- Full Text
- View/download PDF
44. Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency
- Author
-
Mahmoud Masadeh, Osman Hasan, and Sofiene Tahar
- Subjects
Approximate computing ,approximate multiplier ,approximate multiple-accumulate unit (AxMAC) ,input-aware approximation ,image processing ,FPGA ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The Multiply-Accumulate Unit (MAC) is an integral computational component of all digital signal processing (DSP) architectures and thus has a significant impact on their speed and power dissipation. Due to an extraordinary explosion in the number of battery-powered “Internet of Things” (IoT) devices, the need for reducing the power consumption of DSP architectures has tremendously increased. Approximate computing (AxC) has been proposed as a potential solution for this problem targeting error-resilient applications. In this paper, we present a novel FPGA implementation for input-aware energy-efficient 8-bit approximate MAC (AxMAC) unit that reduces its power consumption by: performing multiplication operation approximately, or approximating the input operands then replacing multiplication by a simple shift operation. We propose an input-aware conditional block to bypass operands multiplication by (1) zero forwarding for zero-value operands, (2) judiciously approximating 43.8% of inputs into power-of-2 values, and (3) replacing the multiplication of power-of-2 operands by a simple shift operation. Experimental results show that these simplification techniques reduce delay, power and energy consumption with an acceptable quality degradation. We evaluate the effectiveness of the proposed AxMAC units on two image processing applications, i.e., image blending and filtering, and a logistic regression classification application. These applications demonstrate a negligible quality loss, with 66.6% energy reduction and 5% area overhead.
- Published
- 2019
- Full Text
- View/download PDF
45. LPQ-SAM: A Low-Power Quality Scalable Approximate Multiplier.
- Author
-
Iqbal, Sumbal, Hasan, Osman, Hafiz, Rehan, and Khan, Zeshan Aslam
- Subjects
- *
IMAGE processing - Abstract
Approximate computing allows compromising accuracy to attain energy and performance efficient designs. However, the accuracy requirements of many applications change on runtime and it has been often observed that traditional approximate hardware tends to either provide unacceptable results or leads to an unnecessary computational effort. Quality scalable configurations can overcome these limitations. With the same motivation, we propose a low-power quality scalable approximate multiplier (LPQ-SAM) in this paper. This low power multiplier has various accuracy reconfigurable modes, including an accurate one and thus, it can be used for both error-resilient and exact applications. LPQ-SAM is exhaustively tested for different error metrics and it has been observed that in the approximate mode, it provides up to 19% and 55% power reduction compared to the exact Booth and Wallace multipliers, respectively. For illustration purposes, we demonstrated the effectiveness of LPQ-SAM on a real-time application, i.e., image masking. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. SIBAM—Sign Inclusive Broken Array Multiplier Design for Error Tolerant Applications.
- Author
-
Sinha Roy, Avishek and Dhar, Anindya Sundar
- Abstract
Approximate computing has emerged as a promising technique to develop energy efficient design solutions for error-tolerant applications. Many research efforts have been directed towards proposing approximations in power-hungry multiplier circuits. In this brief, we have introduced two variants of a broken array approximate booth multiplier design called SIBAM with partial error correction through discarded sign bit addition. Experimental results show that up to 63% of energy savings can be achieved compared to accurate multipliers with Mean Relative Error Distance (MRED) constrained at 1.5%. Moreover, extensive analysis shows that the proposed design outperforms state-of-the-art multipliers with energy savings achievable up to 24% at 0.3% MRED constraint. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
47. Design and evaluation of low power and area efficient approximate Booth multipliers for error tolerant applications.
- Author
-
Gundavarapu, Vishal, Gowtham, P., Anita Angeline, A., and Sasipriya, P.
- Subjects
- *
MULTIPLICATION , *DESIGN - Abstract
Approximate computing is an innovative design methodology to reduce the design complexity with an improvement in power efficiency, performance and area by compromising on the requirement of accuracy. In this paper, 8-bit approximate Booth multipliers have been proposed based on the approximate Radix-4 modified Booth encoding algorithm and approximate compressors for partial product accumulation to produce the final products are proposed. Two approximate Probability Based Booth Encoders (PBBE-1 and PBBE-2) have been proposed and used in the Booth multipliers. Error parameters have been measured and compared with the existing approximate booth multipliers. Exact booth multiplier of novel design existing in the literature has also been implemented for comparison purpose. The proposed approximate multipliers are then used in applications like image multiplication and IIR bi-quad filtering to prove their performance. Simulation results prove that the proposed booth multipliers outperform the existing approximate booth multipliers in terms of power and area with better accuracy. Synthesis results prove that the proposed Multiplier 6 was found to be the most efficient with a 56 % power consumption improvement and a 47 % area improvement when compared to the exact multiplier. All the simulations are carried out using Cadence® Genus with 180 nm CMOS process technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Energy efficient approximate multipliers compatible with error-tolerant application.
- Author
-
Minaeifar, Atefeh, Abiri, Ebrahim, Hassanli, Kourosh, Karamimanesh, Mehrzad, and Ahmadi, Farshid
- Abstract
Multipliers are one of the most commonly used parts in a system, responsible for performing computations, while significantly contributing to power consumption. In this article, by removing the least significant bits, a new architecture is presented to implement 3 multipliers (Mul-1, Mul-2 and Mul-3), in order to reduce complexity and power consumption. Compared to the previous works, Mul-1 has the most accuracy in addition to its low energy consumption, therefore, it has been able to deliver a good trade-off between accuracy and energy consumption. All proposed designs and existing multipliers have been simulated and compared in 7 nm FinFET technology using Hspice tool. Moreover, the accuracy and quality of the proposed approximate multipliers are also evaluated using MATLAB. The results show that Mul-1 and Mul-3 are very efficient in image processing applications. According to the results, Mul-1 outperforms its counterpart by 10%, 50% and 50% in PDP, NMED and MRED, respectively. Furthermore, Mul-3 has satisfactory MSSIM in DSP applications and is better than its counterpart by 23% and 16% in PDP and MRED. Meanwhile, Mul-2 improves PDP by nearly 53% compared to Mul-1 and has the lowest power consumption. [Display omitted] • To reduce power consumption, some least significant bits are set to zero. • To ensure accuracy in multipliers, compressors have been employed as exact. • The error compensation circuit is presented to enhance precision in the second stage. • The use of multipliers in real applications that have high computational volume. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. SquASH: Approximate Square-Accumulate With Self-Healing
- Author
-
G. A. Gillani, Muhammad Abdullah Hanif, M. Krone, S. H. Gerez, Muhammad Shafique, and A. B. J. Kokkeler
- Subjects
Approximate computing ,approximate multiplier ,approximate squarer ,multiply-accumulate ,radio astronomy ,self-healing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Approximate computing strives to achieve the highest performance-, area-, and power-efficiency for a given quality constraint and vice versa. Conventional approximate design methodology restricts the introduction of errors to avoid a high loss in quality. However, this limits the computing efficiency and the number of pareto-optimal design alternatives for a quality-efficiency tradeoff. This paper presents a novel self-healing (SH) methodology for an approximate square-accumulate (SAC) architecture. SAC refers to a hardware architecture that computes the inner product of a vector with itself. SH exploits the algorithmic error resilience of the SAC structure to ensure an effective quality-efficiency tradeoff, wherein the squarer is regarded as an approximation stage, and the accumulator as a healing stage. We propose to deploy an approximate squarer mirror pair, such that the error introduced by one approximate squarer mirrors the error introduced by the other, i.e., the errors generated by the approximate squarers are approximately the additive inverse of each other. This helps the healing stage (accumulator) to automatically average out the error originated in the approximation stage, and thereby to minimize the quality loss. For random input vectors, SH demonstrates up to 25% and 18.6% better area and power efficiency, respectively, with a better quality output than the conventional approximate computing methodology. As a case study, SH is applied to one of the computationally expensive components (SAC) of the radio astronomy calibration application, where it shows up to 46.7% better quality for equivalent computing efficiency as that of conventional methodology.
- Published
- 2018
- Full Text
- View/download PDF
50. Impact of Approximate Multipliers on VGG Deep Learning Network
- Author
-
Issam Hammad and Kamal El-Sankary
- Subjects
AI accelerator ,approximate computing ,approximate multiplier ,CNN ,deep convolutional network ,deep learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper presents a study on the applicability of using approximate multipliers to enhance the performance of the VGGNet deep learning network. Approximate multipliers are known to have reduced power, area, and delay with the cost of an inaccuracy in output. Improving the performance of the VGGNet in terms of power, area, and speed can be achieved by replacing exact multipliers with approximate multipliers as demonstrated in this paper. The simulation results show that approximate multiplication has a very little impact on the accuracy of VGGNet. However, using approximate multipliers can achieve significant performance gains. The simulation was completed using different generated error matrices that mimic the inaccuracy that approximate multipliers introduce to the data. The impact of various ranges of the mean relative error and the standard deviation was tested. The well-known data sets CIFAR-10 and CIFAR100 were used for testing the network's classification accuracy. The impact on the accuracy was assessed by simulating approximate multiplication in all the layers in the first set of tests, and in selective layers in the second set of tests. Using approximate multipliers in all the layers leads to very little impact on the network's accuracy. In addition, an alternative approach is to use a hybrid of exact and approximate multipliers. In the hybrid approach, 39.14% of the deeper layer's multiplications can be approximate while having a reduced negligible impact on the network's accuracy.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.