Back to Search Start Over

Compile-Time and Instruction-Set Methods for Improving Floating- to Fixed-Point Conversion Accuracy.

Authors :
Aamodt, Tor M.
Chow, Paul
Source :
ACM Transactions on Embedded Computing Systems; Apr2008, Vol. 7 Issue 3, p26-26:27, 27p, 4 Diagrams, 7 Charts, 8 Graphs
Publication Year :
2008

Abstract

This paper proposes and evaluates compile time and instruction-set techniques for improving the accuracy of signal-processing algorithms run on fixed-point embedded processors. These techniques are proposed in the context of a profile guided floating- to fixed-point compiler-based conversion process. A novel fixed-point scaling algorithm (IRP) is introduced that exploits correlations between values in a program by applying fixed-point scaling, retaining as much precision as possible without causing overflow. This approach is extended into a more aggressive scaling algorithm (IRP-SA) by leveraging the modulo nature of 2's complement addition and subtraction to discard most significant bits that may not be redundant sign-extension bits. A complementary scaling technique (IDS) is then proposed that enables the fixed-point scaling of a variable to be parameterized, depending upon the context of its definitions and uses. Finally, a novel instruction-set enhancement--fractional multiplication with internal left shift (FMLS)--is proposed to further leverage interoperand correlations uncovered by the IRP-SA scaling algorithm. FMLS preserves a different subset of the full product's bits than traditional fractional fixed-point or integer multiplication. On average, FMLS combined with IRP-SA improves accuracy on processors with uniform bitwidth register architectures by the equivalent of 0.61 bits of additional precision for a set of signal-processing benchmarks (up to 2 bits). Even without employing FMLS, the IRP-SA scaling algorithm achieves additional accuracy over two previous fixed-point scaling algorithms by averages of 1.71 and 0.49 bits. Furthermore, as FMLS combines multiplication with a scaling shift, it reduces execution time by an average of 9.8%. An implementation of IDS, specialized to single-nested loops, is found to improve accuracy of a lattice filter benchmark by the equivalent of more than 16-bits of precision. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15399087
Volume :
7
Issue :
3
Database :
Complementary Index
Journal :
ACM Transactions on Embedded Computing Systems
Publication Type :
Academic Journal
Accession number :
32203919
Full Text :
https://doi.org/10.1145/1347375.1347379