49 results on '"IEEE 754"'
Search Results
2. Floating-Point Systems
- Author
-
LaMeres, Brock J. and LaMeres, Brock J.
- Published
- 2024
- Full Text
- View/download PDF
3. Efficient ASIC Implementation of Artificial Neural Network with Posit Representation of Floating-Point Numbers
- Author
-
Gupta, Abheek, Gupta, Anu, Gupta, Rajiv, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bansal, Hari Om, editor, Ajmera, Pawan K., editor, Joshi, Sandeep, editor, Bansal, Ramesh C., editor, and Shekhar, Chandra, editor
- Published
- 2023
- Full Text
- View/download PDF
4. FPGA Based Efficient IEEE 754 Floating Point Multiplier for Filter Operations
- Author
-
Selvi, C. Thirumarai, Amudha, J., Sankarasubramanian, R. S., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Arunachalam, V., editor, and Sivasankaran, K., editor
- Published
- 2021
- Full Text
- View/download PDF
5. Validation of a Formal Floating-Point Model for the Interactive Proof Assistant Isabelle/HOL
- Author
-
Lindström, Olof and Lindström, Olof
- Abstract
This thesis aims to validate the formal floating-point model implemented in the Higher-Order Logic (HOL) proof assistant Isabelle, according to the IEEE 754 Standard. By integrating a testing environment with the proof assistant, the generation and processing of a large quantity of test vectors is made possible, and the resulting empirical data can be collected and analyzed. As a result of previous research, a substantial amount of work has already been put into the construction of a testing framework tailored specifically for Isabelle’s formal floating-point model. Therefore, the contribution of this thesis is mainly to utilize the framework for conducting the testing; however, certain additions and modifications to its components are also made. This includes adding support for testing comparison operations, as well as making the two floating-point formats half-precision (16-bit) and quadruple-precision (128-bit) available for testing. Furthermore, the framework is extended to allow for infinite deterministic testing of all combinations of formats, operations, and rounding modes that are implemented. A total of 116 combinations are tested simultaneously, and the results can be monitored in real time through a command line tool. The evaluation finds that all the properties of the formal model subject to testing can be considered validated. This conclusion is based on the empirical evidence pertaining to approximately 850 million processed test vectors, among which not a single one failed.
- Published
- 2024
6. Analysis of Posit and Bfloat Arithmetic of Real Numbers for Machine Learning
- Author
-
Aleksandr Yu. Romanov, Alexander L. Stempkovsky, Ilia V. Lariushkin, Georgy E. Novoselov, Roman A. Solovyev, Vladimir A. Starykh, Irina I. Romanova, Dmitry V. Telpukhov, and Ilya A. Mkrtchan
- Subjects
Machine learning ,floating point ,posit ,IEEE 754 ,benchmark ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Modern computational tasks are often required to not only guarantee predefined accuracy, but get the result fast. Optimizing calculations using floating point numbers, as opposed to integers, is a non-trivial task. For this reason, there is a need to explore new ways to improve such operations. This paper presents analysis and comparison of various floating point formats – float, posit and bfloat. One of the promising areas in which the problem of using such values can be considered to be the most acute is neural networks. That is why we pay special attention to algorithms of linear algebra and artificial intelligence to assess efficiency of new data types in this area. The research results showed that software implementations of posit16 and posit32 have high accuracy, but they are not particularly fast; on the other hand, bfloat16 is not much different from float32 in accuracy, but significantly surpasses it in performance for large amounts of data and complex machine learning algorithms. Thus, posit16 can be used in systems with less stringent performance requirements, as well as in conditions of limited computer memory; and also in cases when bfloat16 cannot provide required accuracy. As for bfloat16, it can speed up systems based on the IEEE 754 standard, but it cannot solve all the problems of conventional floating point arithmetic. Thus, although posits and bfloats are not a full fledged replacement for float, they provide (under certain conditions) advantages that can be useful for implementation of machine learning algorithms.
- Published
- 2021
- Full Text
- View/download PDF
7. Stochastic rounding: implementation, error analysis and applications
- Author
-
Matteo Croci, Massimiliano Fasi, Nicholas J. Higham, Theo Mary, and Mantas Mikaitis
- Subjects
floating-point arithmetic ,rounding error analysis ,IEEE 754 ,binary16 ,bfloat16 ,machine learning ,Science - Abstract
Stochastic rounding (SR) randomly maps a real number x to one of the two nearest values in a finite precision number system. The probability of choosing either of these two numbers is 1 minus their relative distance to x. This rounding mode was first proposed for use in computer arithmetic in the 1950s and it is currently experiencing a resurgence of interest. If used to compute the inner product of two vectors of length n in floating-point arithmetic, it yields an error bound with constant [Formula: see text] with high probability, where u is the unit round-off. This is not necessarily the case for round to nearest (RN), for which the worst-case error bound has constant nu. A particular attraction of SR is that, unlike RN, it is immune to the phenomenon of stagnation, whereby a sequence of tiny updates to a relatively large quantity is lost. We survey SR by discussing its mathematical properties and probabilistic error analysis, its implementation, and its use in applications, with a focus on machine learning and the numerical solution of differential equations.
- Published
- 2022
- Full Text
- View/download PDF
8. Algorithm 1014: An Improved Algorithm for hypot(x,y).
- Author
-
Borges, Carlos F.
- Subjects
- *
ALGORITHMS , *LIBRARIES - Abstract
We develop fast and accurate algorithms for evaluating √x2+y2 for two floating-point numbers x and y. Library functions that perform this computation are generally named hypot(x,y). We compare five approaches that we will develop in this article to the current resident library function that is delivered with Julia 1.1 and to the code that has been distributed with the C math library for decades. We will investigate the accuracy of our algorithms by simulation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
- Author
-
Muhammad Junaid, Saad Arslan, TaeGeon Lee, and HyungWon Kim
- Subjects
floating-points ,IEEE 754 ,convolutional neural network (CNN) ,MNIST dataset ,Chemical technology ,TP1-1185 - Abstract
The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm2 and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.
- Published
- 2022
- Full Text
- View/download PDF
10. Towards a correctly-rounded and fast power function in binary64 arithmetic
- Author
-
Hubrecht, Tom, Jeannerod, Claude-Pierre, Zimmermann, Paul, Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Arithmétiques des ordinateurs, méthodes formelles, génération de code (ARIC), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Cryptology, arithmetic : algebraic methods for better algorithms (CARAMBA), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Algorithms, Computation, Image and Geometry (LORIA - ALGO), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
efficiency ,IEEE 754 ,IEEE 754 double precision binary64 format power function correct rounding efficiency ,double precision ,power function ,[INFO]Computer Science [cs] ,correct rounding ,binary64 format - Abstract
This is the extended version of an article published in the proceedings of ARITH 2023.; We design algorithms for the correct rounding of the power function x y in the binary64 IEEE 754 format, for all rounding modes, modulo the knowledge of hardest-to-round cases. Our implementation of these algorithms largely outperforms previous correctly-rounded implementations and is not far from the efficiency of current mathematical libraries, which are not correctly-rounded. Still, we expect our algorithms can be further improved for speed. The proofs of correctness are fully detailed, with the goal to enable a formal proof of these algorithms. We hope this work will motivate the next IEEE 754 revision committee to require correct rounding for mathematical functions.
- Published
- 2023
11. Improving Performance of Floating Point Division on GPU and MIC
- Author
-
Huang, Kun, Chen, Yifeng, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Wang, Guojun, editor, Zomaya, Albert, editor, Martinez, Gregorio, editor, and Li, Kenli, editor
- Published
- 2015
- Full Text
- View/download PDF
12. RadixInsert, a much faster stable algorithm for sorting floating-point numbers.
- Author
-
Maus, Arne
- Subjects
COMPUTER science ,ALGORITHMS ,SMART cards ,COMPUTER operating systems ,INTEGERS - Abstract
The problem addressed in this paper is that we want to sort an array a[] of n floating point numbers conforming to the IEEE 754 standard, both in the 64bit double precision and the 32bit single precision formats on a multi core computer with p real cores and shared memory (an ordinary PC). This we do by introducing a new stable, sorting algorithm, RadixInsert, both in a sequential version and with two parallel implementations. RadixInsert is tested on two different machines, a 2 core laptop and a 4 core desktop, outperforming the not stable Quicksort based algorithms from the Java library -- both the sequential Arrays.sort() and a merge-based parallel version Arrays.parallelsort() for 500
1.5). RadixInsert is in practice O(n), but as with Quicksort it might be possible to construct numbers where RadixInsert degenerates to an O(n²) algorithm. However, this worst case for RadixInsert was not found when sorting seven quite different distributions reported in this paper. Finally, the extra memory used by RadixInsert both in its sequential and parallel versions, is n + some minor arrays whereas the sequential Quicksort in the Java library needs basically no extra memory. However, the merge based Arrays.parallelsort() in the Java library needs the same amount of n extra memory as RadixInsert. [ABSTRACT FROM AUTHOR] - Published
- 2019
13. IEEE 754 floating-point addition for neuromorphic architecture.
- Author
-
George, Arun M., Sharma, Rahul, and Rao, Shrisha
- Subjects
- *
BUILDING additions , *FLOATING-point arithmetic - Abstract
• IEEE-754 compliant floating-point addition system for neuromorphic architectures. • Stage-wise computation for floating-point addition of two numbers. • Encoding scheme proposed to reduce inter-ensemble error. • Experiments performed to determine most suitable value of radius. • Estimated total number of neurons required to implement such a system. Neuromorphic computing is looked at as one of the promising alternatives to the traditional von Neumann architecture. In this paper, we consider the problem of doing arithmetic on neuromorphic systems and propose an architecture for doing IEEE 754 compliant addition on a neuromorphic system. A novel encoding scheme is also proposed for reducing the inter-neural ensemble error. The complex task of floating point addition is divided into sub-tasks such as exponent alignment, mantissa addition and overflow-underflow handling. We use a cascaded approach to add the two mantissas of the given floating-point numbers and then apply our encoding scheme to reduce the error produced in this approach. Overflow and underflow are handled by approximating on XOR logic. Implementation of sub-components like right shifter and multiplexer are also specified. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Le calcul sur ordinateur
- Author
-
Goualard, Frédéric, Jermann, Christophe, Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ), and LS2N
- Subjects
virgule flottante ,IEEE 754 ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,arithmétique ,entiers - Abstract
Fascicule accompagnant l'exposé sur le calcul sur ordinateur de Christophe Jermann présenté à la journée académique 2023 de l'IREM des Pays de la Loire.
- Published
- 2023
15. Investigation of posits and IEEE-754 floating points : In hardware implementations of addition and multiplication operations
- Author
-
Kylväjä, Juho, Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences, and Tampere University
- Subjects
UNUM ,IEEE 754 ,Sähkötekniikan DI-ohjelma - Master's Programme in Electrical Engineering ,posit ,floating-point ,arithmetic - Abstract
This thesis aims to investigate a relatively new alternative presentation for floating-point arithmetic the type-3 UNUM, posit for a replacement of the widely used IEEE 754 floating-point standard. The thesis's main focus is on arithmetic operations of addition and multiplication. First, literature check of posit and IEEE 754 floating-point standards formats, special cases, overflow and underflow operations, and rounding methods are conducted. Then the arithmetic implementation steps of posit and IEEE 754 addition and multiplication operations on hardware are shown. In addition, the tools used to analyze the chosen designs and the designed testbench flow for behavioral verification of the designs is described. Finally, the results were examined, followed by the conclusion. The thesis concludes that posits could replace the currently widely used IEEE 754 standard due to having better accuracy around one and better dynamic range with 8, 16 and 32-bit numbers. However, the synthesis results show that FPU achieves better area, delay and power scores than the posit designs chosen in this thesis. Furthermore, implementing compatible processors for posits would require lots of work and time. Overall, posits have great potential to replace the IEEE 754 standard. It is interesting to see how future studies on posits will affect the future of floating-point arithmetic in hardware.
- Published
- 2023
16. Reliable Computing with GNU MPFR
- Author
-
Zimmermann, Paul, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fukuda, Komei, editor, Hoeven, Joris van der, editor, Joswig, Michael, editor, and Takayama, Nobuki, editor
- Published
- 2010
- Full Text
- View/download PDF
17. Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic
- Author
-
Cornea, Marius, Anderson, Cristina, Tsen, Charles, Filipe, Joaquim, editor, Shishkov, Boris, editor, and Helfert, Markus, editor
- Published
- 2008
- Full Text
- View/download PDF
18. The CORE-MATH Project
- Author
-
Alexei Sibidanov, Paul Zimmermann, Stéphane Glondu, University of Victoria [Canada] (UVIC), Cryptology, arithmetic : algebraic methods for better algorithms (CARAMBA), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Algorithms, Computation, Image and Geometry (LORIA - ALGO), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
IEEE 754 ,efficiency ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,correct rounding - Abstract
International audience; The CORE-MATH project aims at providing opensource mathematical functions with correct rounding that can be integrated into current mathematical libraries. This article demonstrates the CORE-MATH methodology on two functions: the binary32 power function (powf) and the binary64 cube root function (cbrt). CORE-MATH already provides a full set of correctly rounded C99 functions for single precision (binary32). These functions provide similar or in some cases up to threefold speedups with respect to the GNU libc mathematical library, which is not correctly rounded. This work offers a prospect of the mandatory requirement of correct rounding for mathematical functions in the next revision of the IEEE-754 standard.
- Published
- 2022
- Full Text
- View/download PDF
19. Approximate Computing for Low Power and Security in the Internet of Things.
- Author
-
Gao, Mingze, Wang, Qian, Arafin, Md Tanvir, Lyu, Yongqiang, and Qu, Gang
- Subjects
- *
INTERNET of things , *COMPUTER networks , *COMPUTER systems , *MATHEMATICAL ability , *DIGITAL watermarking - Abstract
To save resources for Internet of Things (IoT) devices, a proposed approach segments operands and corresponding basic arithmetic operations that can be carried out by approximate function units for almost all applications. The approach also increases the security of IoT devices by hiding information for IP watermarking, digital fingerprinting, and lightweight encryption. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
20. Analysis of Posit and Bfloat Arithmetic of Real Numbers for Machine Learning
- Author
-
Ilia V. Lariushkin, Dmitry Telpukhov, I. I. Romanova, Georgy E. Novoselov, Ilya A. Mkrtchan, A.L. Stempkovsky, Aleksandr Yu. Romanov, Vladimir A. Starykh, and Roman A. Solovyev
- Subjects
Floating point ,Speedup ,General Computer Science ,Computer science ,Machine learning ,computer.software_genre ,Data type ,benchmark ,Software ,General Materials Science ,posit ,Arithmetic ,Computer memory ,Artificial neural network ,business.industry ,floating point ,General Engineering ,IEEE floating point ,TK1-9971 ,Memory management ,IEEE 754 ,Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,computer - Abstract
Modern computational tasks are often required to not only guarantee predefined accuracy, but get the result fast. Optimizing calculations using floating point numbers, as opposed to integers, is a non-trivial task. For this reason, there is a need to explore new ways to improve such operations. This paper presents analysis and comparison of various floating point formats – float, posit and bfloat. One of the promising areas in which the problem of using such values can be considered to be the most acute is neural networks. That is why we pay special attention to algorithms of linear algebra and artificial intelligence to assess efficiency of new data types in this area. The research results showed that software implementations of posit16 and posit32 have high accuracy, but they are not particularly fast; on the other hand, bfloat16 is not much different from float32 in accuracy, but significantly surpasses it in performance for large amounts of data and complex machine learning algorithms. Thus, posit16 can be used in systems with less stringent performance requirements, as well as in conditions of limited computer memory; and also in cases when bfloat16 cannot provide required accuracy. As for bfloat16, it can speed up systems based on the IEEE 754 standard, but it cannot solve all the problems of conventional floating point arithmetic. Thus, although posits and bfloats are not a full fledged replacement for float, they provide (under certain conditions) advantages that can be useful for implementation of machine learning algorithms.
- Published
- 2021
- Full Text
- View/download PDF
21. Generating Random Floating-Point Numbers by Dividing Integers: A Case Study
- Author
-
Frédéric Goualard, Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
- Subjects
Discrete mathematics ,021103 operations research ,Floating point ,Uniform distribution (continuous) ,Computer science ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,020208 electrical & electronic engineering ,0211 other engineering and technologies ,Floating-point number ,Binary number ,02 engineering and technology ,Division (mathematics) ,IEEE floating point ,Article ,Integer ,IEEE 754 ,Error analysis ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Random number - Abstract
International audience; A method widely used to obtain IEEE 754 binary floating-point numbers with a standard uniform distribution involves drawing an integer uniformly at random and dividing it by another larger integer. We survey the various instances of the algorithm that are used in actual software and point out their properties and drawbacks, particularly from the standpoint of numerical software testing and data anonymization.
- Published
- 2020
22. Area Efficient Floating Point Addition Unit With Error Detection Logic.
- Author
-
Aswani, T.S. and Premanand, B.
- Abstract
Applications that involve large dynamic range make use of the floating point operations. Addition is one of the complex operation in a floating point unit. This paper proposes an area efficient floating-point addition unit with error detection logic. Existing Leading Zero Anticipators (LZA) and error detection logics helps to reduce the delay of the general floating point unit, but are not area efficient. Here a single precision area efficient floating point addition unit is designed using an efficient Carry Select Adder together with the error detection logic. Efficient Carry Select Adder is developed using Binary to Excess-1 Converter instead of Ripple Carry Adder for cin=‘1’. The proposed design is simulated using ModelSim and is tested on Xilinx. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
23. On the definition of unit roundoff.
- Author
-
Rump, Siegfried and Lange, Marko
- Subjects
- *
FLOATING-point arithmetic , *ALGORITHMS , *ARITHMETIC mean - Abstract
The result of a floating-point operation is usually defined to be the floating-point number nearest to the exact real result together with a tie-breaking rule. This is called the first standard model of floating-point arithmetic, and the analysis of numerical algorithms is often solely based on that. In addition, a second standard model is used specifying the maximum relative error with respect to the computed result. In this note we take a more general perspective. For an arbitrary finite set of real numbers we identify the rounding to minimize the relative error in the first or the second standard model. The optimal 'switching points' are the arithmetic or the harmonic means of adjacent floating-point numbers. Moreover, the maximum relative error of both models is minimized by taking the geometric mean. If the maximum relative error in one model is $$\alpha $$ , then $$\alpha /(1-\alpha )$$ is the maximum relative error in the other model. Those maximal errors, that is the unit roundoff, are characteristic constants of a given finite set of reals: The floating-point model to be optimized identifies the rounding and the unit roundoff. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
24. Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic
- Author
-
Fasi, Massimiliano, Mikaitis, Mantas, Fasi, Massimiliano, and Mikaitis, Mantas
- Abstract
We present algorithms for performing the five elementary arithmetic operations (+, -, x, divided by, and root) in floating point arithmetic with stochastic rounding, and demonstrate the value of these algorithms by discussing various applications where stochastic rounding is beneficial. The algorithms require that the hardware be compliant with the IEEE 754 floating-point standard and that a floating-point pseudorandom number generator be available. The goal of these techniques is to emulate stochastic rounding when the underlying hardware does not support this rounding mode, as is the case for most existing CPUs and GPUs. By simulating stochastic rounding in software, one has the possibility to explore the behavior of this rounding mode and develop new algorithms even without having access to hardware implementing stochastic rounding- once such hardware becomes available, it suffices to replace the proposed algorithms by calls to the corresponding hardware routines. When stochastically rounding double precision operations, the algorithms we propose are between 7.3 and 19 times faster than the implementations that use the GNU MPFR library to simulate extended precision. We test our algorithms on various tasks, including summation algorithms and solvers for ordinary differential equations, where stochastic rounding is expected to bring advantages., Funding agencies:Royal Society of LondonIstituto Nazionale di Alta Matematica INdAM-GNCS Project 2020UK Research & Innovation (UKRI)Engineering & Physical Sciences Research Council (EPSRC) EP/P020720/1
- Published
- 2021
- Full Text
- View/download PDF
25. Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic
- Author
-
Massimiliano Fasi and Mantas Mikaitis
- Subjects
Floating point ,numerical analysis ,Computer science ,Elementary arithmetic ,Double-precision floating-point format ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,stochastic rounding ,0101 mathematics ,Arithmetic ,Pseudorandom number generator ,Rounding ,Numerical analysis ,Volume (computing) ,Floating-point arithmetic ,Extended precision ,error-free transformation ,IEEE floating point ,020202 computer hardware & architecture ,Computer Science Applications ,Human-Computer Interaction ,IEEE 754 ,Computer Science::Mathematical Software ,numerical algorithm ,Algorithm ,Information Systems - Abstract
We present algorithms for performing the five elementary arithmetic operations ( $+$ + , $-$ - , ×, $\div$ ÷ , and $\sqrt{\phantom{x}}$ x ) in floating point arithmetic with stochastic rounding, and demonstrate the value of these algorithms by discussing various applications where stochastic rounding is beneficial. The algorithms require that the hardware be compliant with the IEEE 754 floating-point standard and that a floating-point pseudorandom number generator be available. The goal of these techniques is to emulate stochastic rounding when the underlying hardware does not support this rounding mode, as is the case for most existing CPUs and GPUs. By simulating stochastic rounding in software, one has the possibility to explore the behavior of this rounding mode and develop new algorithms even without having access to hardware implementing stochastic rounding—once such hardware becomes available, it suffices to replace the proposed algorithms by calls to the corresponding hardware routines. When stochastically rounding double precision operations, the algorithms we propose are between 7.3 and 19 times faster than the implementations that use the GNU MPFR library to simulate extended precision. We test our algorithms on various tasks, including summation algorithms and solvers for ordinary differential equations, where stochastic rounding is expected to bring advantages.
- Published
- 2021
- Full Text
- View/download PDF
26. Simultaneous Floating-Point Sine and Cosine for VLIW Integer Processors.
- Author
-
Jeannerod, Claude-Pierre and Jourdan-Lu, Jingyan
- Abstract
Graphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision (including subnormals). We describe software implementations for sinf, cosf, and sincosf over [-pi/4, pi/4]that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
27. Interval arithmetic over finitely many endpoints.
- Author
-
Rump, Siegfried
- Subjects
- *
INTERVAL analysis , *ARITHMETIC , *PROOF theory , *MATHEMATICAL analysis , *TRANSCENDENTAL numbers , *FLOATING-point arithmetic - Abstract
To my knowledge all definitions of interval arithmetic start with real endpoints and prove properties. Then, for practical use, the definition is specialized to finitely many endpoints, where many of the mathematical properties are no longer valid. There seems no treatment how to choose this finite set of endpoints to preserve as many mathematical properties as possible. Here we define interval endpoints directly using a finite set which, for example, may be based on the IEEE 754 floating-point standard. The corresponding interval operations emerge naturally from the corresponding power set operations. We present necessary and sufficient conditions on this finite set to ensure desirable mathematical properties, many of which are not satisfied by other definitions. For example, an interval product contains zero if and only if one of the factors does. The key feature of the theoretical foundation is that 'endpoints' of intervals are not points but non-overlapping closed, half-open or open intervals, each of which can be regarded as an atomic object. By using non-closed intervals among its 'endpoints', intervals containing 'arbitrarily large' and 'arbitrarily close to but not equal to' a real number can be handled. The latter may be zero defining 'tiny' numbers, but also any other quantity including transcendental numbers. Our scheme can be implemented straightforwardly using the IEEE 754 floating-point standard. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
28. An Improved Algorithm for hypot(A,B)
- Author
-
Borges, Carlos and Applied Mathematics
- Subjects
fused multiply-add ,IEEE 754 ,floating point ,hypot() - Abstract
We develop a fast and accurate algorithm for evaluating a2 + b2 for two floating point numbers a and b. Library functions that perform this computation are generally named hypot(a,b). We will compare four approaches that we will develop in this paper to the current resident library func- tion that is delivered with Julia 1.1 and to the code that has been distributed with the C math library for decades. We will demonstrate the performance of our algorithms by simulation.
- Published
- 2019
29. Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors.
- Author
-
Junaid, Muhammad, Arslan, Saad, Lee, TaeGeon, and Kim, HyungWon
- Subjects
FLOATING-point arithmetic ,ARTIFICIAL intelligence ,INDUSTRY 4.0 ,ELECTRONIC data processing ,SUPERVISED learning ,INTERNET of things - Abstract
The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm
2 and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
30. Processor Design Using 32 Bit Single Precision Floating Point Unit
- Author
-
Mr. Anand S. Burud and Dr. Pradip C. Bhaskar
- Subjects
Floating point unit ,IEEE 754 ,Electronics & Communication Engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES - Abstract
The floating point operations have discovered concentrated applications in the various different fields for the necessities for high precision operation because of its incredible dynamic range, high exactness and simple operation rules. High accuracy is needed for the design and research of the floating point processing units. With the expanding necessities for the floating point operations for the fast high speed data signal processing and the logical operation, the requirements for the high speed hardware floating point arithmetic units have turned out to be increasingly requesting. The ALU is a standout amongst the most essential segments in a processor, and is ordinarily the piece of the processor that is outlined first. In this paper, a fast IEEE754 compliant 32 bit floating point arithmetic unit designed using VHDL code has been presented and all operations of addition got tested on Xilinx and verified successfully. Mr. Anand S. Burud | Dr. Pradip C. Bhaskar "Processor Design Using 32 Bit Single Precision Floating Point Unit" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: https://www.ijtsrd.com/papers/ijtsrd12912.pdf
- Published
- 2018
31. Area Efficient Floating Point Addition Unit With Error Detection Logic
- Author
-
B. Premanand and T.S. Aswani
- Subjects
Adder ,Floating point ,Computer science ,business.industry ,Binary number ,Floating-point unit ,Double-precision floating-point format ,02 engineering and technology ,LZA ,Single-precision floating-point format ,020202 computer hardware & architecture ,ModelSim ,IEEE 754 ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Carry-select adder ,VHDL ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,business ,Xilinx ,FPGA ,Computer hardware ,General Environmental Science - Abstract
Applications that involve large dynamic range make use of the floating point operations. Addition is one of the complex operation in a floating point unit. This paper proposes an area efficient floating-point addition unit with error detection logic. Existing Leading Zero Anticipators (LZA) and error detection logics helps to reduce the delay of the general floating point unit, but are not area efficient. Here a single precision area efficient floating point addition unit is designed using an efficient Carry Select Adder together with the error detection logic. Efficient Carry Select Adder is developed using Binary to Excess-1 Converter instead of Ripple Carry Adder for cin=‘1’. The proposed design is simulated using ModelSim and is tested on Xilinx.
- Published
- 2016
- Full Text
- View/download PDF
32. Enabling High Performance Posit Arithmetic Applications Using Hardware Acceleration
- Author
-
van Dam, Laurens (author) and van Dam, Laurens (author)
- Abstract
The demand for higher precision arithmetic is increasing due to the rapid development of new computing paradigms. The novel posit number representation system, as introduced by John L. Gustafson, claims to be able to provide more accurate answers to mathematical problems with equal or less number of bits compared to the well-established IEEE 754 floating point standard. In this work, the performance of the posit number format in terms of decimal accuracy is analyzed and compared to alternative number representations. A framework for performing high-precision posit arithmetic in reconfigurable logic is presented. The supported arithmetic operations can be performed without rounding off intermediate results, minimizing the loss of decimal accuracy. The proposed posit arithmetic units achieve approximately 250 MPOPS for addition, 160 MPOPS for multiplication and 180 MPOPS for accumulation operations. A hardware accelerator for performing Level 1 BLAS operations on (sparse) posit column vectors is presented. For the calculation of the vector dot product for an input vector length of 10^6 elements, a speedup of approximately 15000x compared to software is achieved. The decimal accuracy is improved by one decimal of accuracy on average compared to posit emulation in software, and two additional decimals of accuracy are achieved compared to calculation using the IEEE 754 floating point format. A study of the application of posit arithmetic in the field of bioinformatics is performed. The effect on decimal accuracy of the pair-HMM forward algorithm by replacing traditional floating point arithmetic with posit arithmetic is analyzed. It is shown that the maximum achievable decimal accuracy using posit arithmetic is higher compared to the IEEE floating point format for the same number of required bits. The design of a hardware accelerator for the pair-HMM forward algorithm using posit arithmetic is proposed for two different interfaces: a streaming-based accelerator and an ac, ISBN 978-94-6186-957-9, Electrical Engineer | Embedded Systems
- Published
- 2018
33. Interval arithmetic with fixed rounding mode
- Subjects
IEEE 754 ,successor ,predecessor ,rounding mode ,interval arithmetic ,chop rounding - Abstract
We discuss several methods to simulate interval arithmetic operations using floating-point operations with fixed rounding mode. In particular we present formulas using only rounding to nearest and using only chop rounding (towards zero). The latter was the default and only rounding on GPU (Graphics Processing Unit) and cell processors, which in turn are very fast and therefore attractive in scientific computations.
- Published
- 2016
34. Preservation of Lyapunov-Theoretic Proofs: From Real to loating-Point Numbers
- Author
-
Maisonneuve, Vivien, Centre de Recherche en Informatique (CRI), MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
proof preservation ,[INFO.INFO-SC]Computer Science [cs]/Symbolic Computation [cs.SC] ,[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] ,ellipse ,IEEE 754 ,[INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering ,Lyapunov stability ,rounding errors ,[INFO.INFO-ES]Computer Science [cs]/Embedded Systems ,floating-point ,control system ,[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
In Feron presents how Lyapunov-theoretic proofs of stability can be migrated toward computer-readable and verifiable certificates of control software behavior by relying of Floyd's and Hoare's proof system. We address the issue of errors resulting from the use of floating-point arithmetic: we present an approach to translate Feron's proof invariants on real arithmetic to similar invariants on floating-point numbers and show how our methodology applies to prove stability, thus allowing to verify whether the stability invariant still holds when the controller is implemented. We study in details the open-loop system of Feron's paper. We also use the same approach for Feron's closed-loop system, but the constraints are too tights to show stability in this second case: more leeway should be introduced in the proof on real numbers, otherwise the resulting system might be unstable.
- Published
- 2013
35. Low Cost Floating-Point Extensions to a Fixed-Point SIMD Datapath
- Author
-
Kolumban, Gaspar
- Subjects
IEEE 754 ,ePUMA ,VPE ,floating-point ,fixed-point datapath ,SIMD - Abstract
The ePUMA architecture is a novel master-multi-SIMD DSP platform aimed at low-power computing, like for embedded or hand-held devices for example. It is both a configurable and scalable platform, designed for multimedia and communications. Numbers with both integer and fractional parts are often used in computers because many important algorithms make use of them, like signal and image processing for example. A good way of representing these types of numbers is with a floating-point representation. The ePUMA platform currently supports a fixed-point representation, so the goal of this thesis will be to implement twelve basic floating-point arithmetic operations and two conversion operations onto an already existing datapath, conforming as much as possible to the IEEE 754-2008 standard for floating-point representation. The implementation should be done at a low hardware and power consumption cost. The target frequency will be 500MHz. The implementation will be compared with dedicated DesignWare components and the implementation will also be compared with floating-point done in software in ePUMA. This thesis presents a solution that on average increases the VPE datapath hardware cost by 15% and the power consumption increases by 15% on average. Highest clock frequency with the solution is 473MHz. The target clock frequency of 500MHz is thus not achieved but considering the lack of register retiming in the synthesis step, 500MHz can most likely be reached with this design.
- Published
- 2013
36. A Pseudo-Random Bit Generator Using Three Chaotic Logistic Maps
- Author
-
François, Michael, Defour, David, Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), Université d'Orléans (UO)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), Digits, Architectures et Logiciels Informatiques (DALI), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université de Perpignan Via Domitia (UPVD), and LIRMM (UM, CNRS)
- Subjects
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,IEEE 754 ,Logistic map ,Chaotic map ,PRBG ,Cryptography ,Pseudo-random - Abstract
A novel pseudo-random bit generator (PRBG) using three chaotic logistic maps is proposed. The algorithm generates at each iteration sequences of 32 bit-blocks by starting from randomly chosen initial seeds. The impact of relying on IEEE 754-2008 floating-point representation format for the generator is also taken into account. The performance of the generator is evaluated through various statistical analyses. The results show that the produced sequences possess high randomness statistical properties and good security level which make it suitable for cryptographic applications.
- Published
- 2013
37. Simultaneous floating-point sine and cosine for VLIW integer processors
- Author
-
Claude-Pierre Jeannerod, Jingyan Jourdan-Lu, Arithmetic and Computing (ARIC), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Compilation Expertise Center, STMicroelectronics [Grenoble] (ST-GRENOBLE), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,Floating point ,floating-point arithmetic ,Computer science ,02 engineering and technology ,Parallel computing ,trigonometric function ,Single-precision floating-point format ,C software implementation ,0202 electrical engineering, electronic engineering, information engineering ,Sine ,Arithmetic ,ACM: C.: Computer Systems Organization/C.1: PROCESSOR ARCHITECTURES/C.1.1: Single Data Stream Architectures/C.1.1.2: RISC/CISC, VLIW architectures ,unit in the last place ,Signal processing ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,ACM: B.: Hardware/B.2: ARITHMETIC AND LOGIC STRUCTURES/B.2.4: High-Speed Arithmetic ,020206 networking & telecommunications ,instruction level parallelism ,IEEE floating point ,020202 computer hardware & architecture ,VLIW integer processor ,Very long instruction word ,IEEE 754 ,Unit in the last place ,Integer (computer science) - Abstract
Accepted for publication in the proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2012).; International audience; Graphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. We describe software implementations for sinf, cosf, and sincosf over [-pi/4,pi/4] that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes.
- Published
- 2012
38. Interval arithmetic over finitely many endpoints
- Subjects
IEEE 754 ,Mathematical properties ,Interval arithmetic ,Finitely many endpoints - Abstract
To my knowledge all definitions of interval arithmetic start with real endpoints and prove properties. Then, for practical use, the definition is specialized to finitely many endpoints, where many of the mathematical properties are no longer valid. There seems no treatment how to choose this finite set of endpoints to preserve as many mathematical properties as possible. Here we define interval endpoints directly using a finite set which, for example, may be based on the IEEE 754 floating-point standard. The corresponding interval operations emerge naturally from the corresponding power set operations. We present necessary and sufficient conditions on this finite set to ensure desirable mathematical properties, many of which are not satisfied by other definitions. For example, an interval product contains zero if and only if one of the factors does. The key feature of the theoretical foundation is that "endpoints" of intervals are not points but non-overlapping closed, half-open or open intervals, each of which can be regarded as an atomic object. By using non-closed intervals among its "endpoints", intervals containing "arbitrarily large" and "arbitrarily close to but not equal to" a real number can be handled. The latter may be zero defining "tiny" numbers, but also any other quantity including transcendental numbers. Our scheme can be implemented straightforwardly using the IEEE 754 floating-point standard. © 2012 Springer Science + Business Media B.V.
- Published
- 2012
39. How to Square Floats Accurately and Efficiently on the ST231 Integer Processor
- Author
-
Guillaume Revy, Jingyan Jourdan-Lu, Christophe Monat, Claude-Pierre Jeannerod, Computer arithmetic (ARENAIRE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), ARENAIRE - Arithmétique des ordinateurs, STMicroelectronics [Grenoble] (ST-GRENOBLE), Digits, Architectures et Logiciels Informatiques (DALI), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Perpignan Via Domitia (UPVD), Arithmetic and Computing (ARIC), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Compilation Expertise Center, Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université de Perpignan Via Domitia (UPVD), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-École normale supérieure - Lyon (ENS Lyon)
- Subjects
Floating point ,Exploit ,Computer science ,binary floating-point arithmetic ,Rounding ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,Parameterized complexity ,squaring ,020206 networking & telecommunications ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,correct rounding ,instruction level parallelism ,01 natural sciences ,IEEE floating point ,VLIW integer processor ,Very long instruction word ,IEEE 754 ,C software implementation ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Latency (engineering) ,Instruction-level parallelism - Abstract
International audience; We consider the problem of computing IEEE floating-point squares by means of integer arithmetic. We show how to exploit the specific properties of squaring in order to design and implement algorithms that have much lower latency than those for general multiplication, while still guaranteeing correct rounding. Our algorithms are parameterized by the floating-point format, aim at high instruction-level parallelism (ILP) exposure, and cover all rounding modes. We show further that their C implementation for the binary32 format yields efficient codes for targets like the ST231 VLIW integer processor from ST Microelectronics, with a latency at least 1.75x smaller than that of general multiplication in the same context.
- Published
- 2011
- Full Text
- View/download PDF
40. Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors
- Author
-
Christian Bertin, Jean-Michel Muller, Hervé Knochel, Christophe Mouilleron, Guillaume Revy, Jingyan Jourdan-Lu, Christophe Monat, Claude-Pierre Jeannerod, STMicroelectronics [Grenoble] (ST-GRENOBLE), Computer arithmetic (ARENAIRE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Electronique, Informatique, Automatique et Systèmes (ELIAUS), Université de Perpignan Via Domitia (UPVD), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Computer science ,Optimizing compiler ,correct rounding ,instruction-level parallelism ,code generation ,02 engineering and technology ,Parallel computing ,Single-precision floating-point format ,Software ,polynomial evaluation ,C software implementation ,0202 electrical engineering, electronic engineering, information engineering ,Code generation ,[INFO.INFO-SC]Computer Science [cs]/Symbolic Computation [cs.SC] ,binary foating-point arithmetic ,business.industry ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,020206 networking & telecommunications ,IEEE floating point ,020202 computer hardware & architecture ,VLIW integer processor ,IEEE 754 ,Very long instruction word ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Instruction-level parallelism ,business ,Integer (computer science) - Abstract
International audience; Recently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) processor. We review here the techniques and software tools used or developed for this design and its implementation, and how they allowed very high instruction-level parallelism (ILP) exposure. Those key points include a hierarchical description of function evaluation algorithms, the exploitation of the standard encoding of floating-point data, the automatic generation of fast and accurate polynomial evaluation schemes, and some compiler optimizations.
- Published
- 2010
- Full Text
- View/download PDF
41. Software Aspects of IEEE Floating-Point Computations for Numerical Applications in High Energy Physics
- Author
-
Arnold, Jeffrey
- Published
- 2010
42. Bringing fast floating-point arithmetic into embedded integer processors
- Author
-
Bertin, Christian, Jeannerod, Claude-Pierre, Monat, Christophe, STMicroelectronics [Grenoble] (ST-GRENOBLE), Computer arithmetic (ARENAIRE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
binary foating-point arithmetic ,polynomial evaluation ,VLIW integer processor ,IEEE 754 ,C software implementation ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,[INFO.INFO-ES]Computer Science [cs]/Embedded Systems ,instruction-level parallelism ,code generation ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] - Published
- 2010
43. Mata Matters: Overflow, underflow and the IEEE floating–point format
- Author
-
Linhart, Jean Marie
- Subjects
missing values ,subnormal number ,normalized number ,IEEE 754 ,double precision ,MathematicsofComputing_NUMERICALANALYSIS ,underflow ,format ,binary ,overflow ,hexadecimal ,Research Methods/ Statistical Methods ,denormalized number - Abstract
Mata is Stata’s matrix language. The Mata Matters column shows how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this quarter’s column, we investigate underflow and overflow and then delve into the details of how floating-point numbers are stored in the IEEE 754 floating-point standard. We show how to test for overflow and underflow. We demonstrate how to use the %21x format to see underflow and the %16H, %16L, %8H, and %8L formats for displaying the byte content of doubles and floats.
- Published
- 2008
- Full Text
- View/download PDF
44. Design of single precision float adder (32-bit numbers) according to IEEE 754 standard using VHDL
- Author
-
Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Slovenská technická univerzita v Bratislave, Stopjaková, Viera, Zálusky, Roman, Barrabés Castillo, Arturo, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Slovenská technická univerzita v Bratislave, Stopjaková, Viera, Zálusky, Roman, and Barrabés Castillo, Arturo
- Abstract
Projecte realitzat en el marc d'un programa de mobilitat amb la Slovenská Technická Univerzita v Bratislave, Fakulta Elecktrotechniky a Informatiky, [ANGLÈS] Floating Point arithmetic is by far the most used way of approximating real number arithmetic for performing numerical calculations on modern computers. Each computer had a different arithmetic for long time: bases, significant and exponents sizes, formats, etc. Each company implemented its own model and it hindered the portability between different equipments until IEEE 754 standard appeared defining a single and universal standard. The aim of this project is implementing a 32 bit binary floating point adder/subtractor according with the IEEE 754 standard and using the hardware programming language VHDL., [CASTELLÀ] La aritmética de punto flotante es, con diferencia, el método más utilizado para aproximar la aritmética con números reales para realizar cálculos numéricos por ordenador. Durante mucho tiempo cada máquina presentaba una aritmética diferente: bases, tamaño de los significantes y exponentes, formatos, etc. Cada fabricante implementaba su propio modelo, lo que dificultaba la portabilidad entre diferentes equipos, hasta que apareció la norma IEEE 754 que definía un estándar único para todos. El objetivo de este proyecto es, a partir del estándar IEEE 754, implementar un sumador/restador binario de punto flotante de 32 bits utilizando el lenguaje de programación hardware VHDL., [CATALÀ] L'aritmètica de punt flotant és, amb diferència, el mètode més utilitzat d'aproximació a l'aritmètica amb nombres reals per realitzar càlculs numèrics per ordinador. Durant molt temps cada màquina presentava una aritmètica diferent: bases, mida dels significants i exponents, formats, etc. Cada fabricant implementava el seu propi model, fet que dificultava la portabilitat entre diferents equips, fins que va aparèixer la norma IEEE 754 que definia un estàndard únic per a tothom. L'objectiu d'aquest projecte és, a partir de l'estàndard IEEE 754, implementar un sumador/restador binari de punt flotant de 32 bits emprant el llenguatge de programació hardware VHDL.
- Published
- 2012
45. Worst Cases of a Periodic Function for Large Arguments
- Author
-
D. Stehle, Vincent Lefèvre, Guillaume Hanrot, Paul Zimmermann, Curves, Algebra, Computer Arithmetic, and so On (CACAO), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Computer arithmetic (ARENAIRE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Peter Kornerup and Jean-Michel Muller, École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Polynomial ,Floating point ,Computational complexity theory ,floating-point arithmetic ,Heuristic (computer science) ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,Double-precision floating-point format ,010103 numerical & computational mathematics ,Function (mathematics) ,correct rounding ,periodic function ,01 natural sciences ,010101 applied mathematics ,Periodic function ,IEEE 754 ,Trigonometric functions ,0101 mathematics ,Algorithm ,Mathematics ,worst case - Abstract
International audience; One considers the problem of finding hard to round cases of a periodic function for large floating-point inputs, more precisely when the function cannot be efficiently approximated by a polynomial. This is one of the last few issues that prevents from guaranteeing an efficient computation of correctly rounded transcendentals for the whole IEEE-754 double precision format. The first non-naive algorithm for that problem is presented, with an heuristic complexity of $O(2^{0.676 p})$ for a precision of $p$ bits. The efficiency of the algorithm is shown on the largest IEEE-754 double precision binade for the sine function, and some corresponding bad cases are given. We can hope that all the worst cases of the trigonometric functions in their whole domain will be found within a few years, a task that was considered out of reach until now.
- Published
- 2007
- Full Text
- View/download PDF
46. Error Bounds on Complex Floating-Point Multiplication
- Author
-
Paul Zimmermann, Richard P. Brent, Colin Percival, Curves, Algebra, Computer Arithmetic, and so On (CACAO), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Arithmetic underflow ,Floating point ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,Double-precision floating-point format ,010103 numerical & computational mathematics ,01 natural sciences ,roundoff error ,0101 mathematics ,Arithmetic ,error analysis ,Mathematics ,Discrete mathematics ,Algebra and Number Theory ,Applied Mathematics ,Complex multiplication ,[MATH.MATH-CV]Mathematics [math]/Complex Variables [math.CV] ,IEEE floating point ,010101 applied mathematics ,Computational Mathematics ,complex multiplication ,IEEE 754 ,Product (mathematics) ,floating-point number ,Multiplication ,Round-off error ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
International audience; Given floating-point arithmetic with $t$-digit base-$\beta$ significands in which all arithmetic operations are performed as if calculated to infinite precision and rounded to a nearest representable value, we prove that the product of complex values $z_0$ and $z_1$ can be computed with maximum absolute error $\abs{z_0} \abs{z_1} \frac{1}{2} \beta^{1 - t} \sqrt{5}$. In particular, this provides relative error bounds of $2^{-24} \sqrt{5}$ and $2^{-53} \sqrt{5}$ for {IEEE 754} single and double precision arithmetic respectively, provided that overflow, underflow, and denormals do not occur. We also provide the numerical worst cases for {IEEE 754} single and double precision arithmetic.
- Published
- 2007
47. Codificación binaria de int y float
- Author
-
Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros de Telecomunicación - Escola Tècnica Superior d'Enginyers de Telecomunicació, González Téllez, Alberto, Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros de Telecomunicación - Escola Tècnica Superior d'Enginyers de Telecomunicació, and González Téllez, Alberto
- Abstract
Se describe mediante un ejemplo como se codifican en binario los tipos de datos int y float del lenguaje C
- Published
- 2008
48. Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation.
- Author
-
Jeannerod, Claude-Pierre, Knochel, Herve, Monat, Christophe, and Revy, Guillaume
- Subjects
- *
FLOATING-point arithmetic , *SQUARE root , *COMPUTER systems , *POLYNOMIALS , *FIXED point theory , *DATA analysis , *COMPUTER software - Abstract
In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation schemes for such bivariate polynomials, we show further that this approach allows for high instruction-level parallelism (ILP) exposure, and thus, potentially low-latency implementations. Then, as an illustration, we detail a C implementation of our method in the case of IEEE 754-2008 binary32 floating-point data (formerly called single precision in the 1985 version of the IEEE 754 standard). This software implementation, which assumes 32-bit unsigned integer arithmetic only, is almost complete in the sense that it supports special operands, subnormal numbers, and all rounding-direction attributes, but not exception handling (that is, status flags are not set). Finally, we have carried out experiments with this implementation on the ST231, an integer processor from the STMicroelectronics' ST200 family, using the ST200 family VLIW compiler. The results obtained demonstrate the practical interest of our approach in that context: for all rounding-direction attributes, the generated assembly code is optimally scheduled and has indeed low latency (23 cycles). [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
49. Disseny d'un sumador de punt flotant de precisió simple (32 bits) basat en l'estàndard IEEE 754 utilitzant VHDL
- Author
-
Barrabés Castillo, Arturo, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Slovenská technická univerzita v Bratislave, Stopjaková, Viera, and Zálusky, Roman
- Subjects
Lògica programable ,Anàlisi numèrica ,Electrònica digital ,VHDL (Llenguatge de descripció de maquinari) ,IEEE 754 ,Enginyeria electrònica::Circuits electrònics [Àrees temàtiques de la UPC] ,VHDL ,Floating point arithmetic ,Aritmética de punto flotante ,VHDL (Computer hardware description language) ,Numerical analysis - Abstract
Projecte realitzat en el marc d'un programa de mobilitat amb la Slovenská Technická Univerzita v Bratislave, Fakulta Elecktrotechniky a Informatiky [ANGLÈS] Floating Point arithmetic is by far the most used way of approximating real number arithmetic for performing numerical calculations on modern computers. Each computer had a different arithmetic for long time: bases, significant and exponents sizes, formats, etc. Each company implemented its own model and it hindered the portability between different equipments until IEEE 754 standard appeared defining a single and universal standard. The aim of this project is implementing a 32 bit binary floating point adder/subtractor according with the IEEE 754 standard and using the hardware programming language VHDL. [CASTELLÀ] La aritmética de punto flotante es, con diferencia, el método más utilizado para aproximar la aritmética con números reales para realizar cálculos numéricos por ordenador. Durante mucho tiempo cada máquina presentaba una aritmética diferente: bases, tamaño de los significantes y exponentes, formatos, etc. Cada fabricante implementaba su propio modelo, lo que dificultaba la portabilidad entre diferentes equipos, hasta que apareció la norma IEEE 754 que definía un estándar único para todos. El objetivo de este proyecto es, a partir del estándar IEEE 754, implementar un sumador/restador binario de punto flotante de 32 bits utilizando el lenguaje de programación hardware VHDL. [CATALÀ] L'aritmètica de punt flotant és, amb diferència, el mètode més utilitzat d'aproximació a l'aritmètica amb nombres reals per realitzar càlculs numèrics per ordinador. Durant molt temps cada màquina presentava una aritmètica diferent: bases, mida dels significants i exponents, formats, etc. Cada fabricant implementava el seu propi model, fet que dificultava la portabilitat entre diferents equips, fins que va aparèixer la norma IEEE 754 que definia un estàndard únic per a tothom. L'objectiu d'aquest projecte és, a partir de l'estàndard IEEE 754, implementar un sumador/restador binari de punt flotant de 32 bits emprant el llenguatge de programació hardware VHDL.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.