Database: Complementary Index / Journal: ieee transactions on computers / Publisher: ieee / Search Limiters: Academic (Peer-Reviewed) Journals / Topic: algorithm design and analysis and algorithms - Searchworks@Jio Institute Digital Library Search Results

1. New Block Recombination for Subquadratic Space Complexity Polynomial Multiplication Based on Overlap-Free Approach.

Author: Park, Sun-Mi, Chang, Ku-Young, Hong, Dowon, and Seo, Changho
Subjects: POLYNOMIAL time algorithms, CODING theory, CRYPTOGRAPHY, POLYNOMIAL approximation, ALGORITHMS
Abstract: In this paper, we present new parallel polynomial multiplication formulas which result in subquadratic space complexity. The schemes are based on a recently proposed block recombination of polynomial multiplication formula. The proposed two-way, three-way, and four-way split polynomial multiplication formulas achieve the smallest space complexities. Moreover, by providing area-time tradeoff method, the proposed formulas enable one to choose a parallel formula for polynomial multiplication which is suited for a design environment. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

2. A Fast Pessimistic Diagnosis Algorithm for Hypercube-Like Networks under the Comparison Model.

Author: Ye, Liang-Cheng, Liang, Jia-Rong, and Lin, Hai-Xiang
Subjects: MULTIPROCESSORS, HYPERCUBE networks (Computer networks), FAULT-tolerant computing, ALGORITHMS, MATHEMATICAL models
Abstract: Diagnosis by comparison is a realistic approach to detect faults of multiprocessor systems. This paper considers a pessimistic diagnostic strategy for hypercube-like multiprocessor systems under the comparison model. The pessimistic strategy is a diagnostic process whereby all faulty nodes can be correctly identified and at most one fault-free node may be misjudged as a faulty node. We propose a pessimistic diagnosis algorithm based on the largest component in the faulty system. For a system with N = 2n nodes and n ≥ 5 , when the number of faulty nodes is bounded by 2n − 2 , the algorithm can correctly identify all nodes except at most one node left undiagnosed. The time complexity of the algorithm is O(Nlog2N). [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

3. An Improved Approximation Ratio to the Partial-Terminal Steiner Tree Problem.

Author: Lee, Chia-Wei, Huang, Chao-Wen, Pi, Wen-Hao, and Hsieh, Sun-Yuan
Subjects: APPROXIMATION theory, PROBLEM solving, COST functions, SET theory, ALGORITHMS
Abstract: We consider a generalization of both the classic Steiner tree problem and the terminal Steiner tree problem. Given a complete graph G = (V,E) with a metric cost function c:E \rightarrow \BBQ_ \geq and two proper subsets R \subset V and R^\prime \subseteq R, a partial-terminal Steiner tree is a Steiner tree which contains all vertices in \it R such that all vertices in R^\prime must be leaves. The partial-terminal Steiner tree problem is to find a partial-terminal Steiner tree of the minimum cost in G. The previously best-known approximation ratio of the problem is 2\rho, where \bf \rho is the approximation ratio of the Steiner tree problem. In this paper, we improve the ratio from 2\rho to 2\rho - \rho \over 3\rho - 2 - f, where f is a non-negative function whose value is between 0 and \rho - \rho \over 3\rho - 2. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

4. Square Root Computation over Even Extension Fields.

Author: Adj, Gora and Rodriguez-Henriquez, Francisco
Subjects: SQUARE root, FIELD extensions (Mathematics), ALGORITHMS, MATHEMATICAL models, ROUGH sets, COMPUTATIONAL complexity
Abstract: This paper presents a comprehensive study of the computation of square roots over finite extension fields. We propose two novel algorithms for computing square roots over even field extensions of the form \BBF{q^2}, with q = p^n, p an odd prime and n \geq 1. Both algorithms have an associate computational cost roughly equivalent to one exponentiation in \BBF{q^2}. The first algorithm is devoted to the case when q \equiv 1\, mod\, 4, whereas the second one handles the case when q \equiv 3\, mod\,4. Numerical comparisons show that the two algorithms presented in this paper are competitive and in some cases more efficient than the square root methods previously known. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

5. An Improvement of the Elliptic Net Algorithm.

Author: Chen, Binglong and Zhao, Chang-An
Subjects: ELLIPTIC curve cryptography, ALGORITHMS, ITERATIVE methods (Mathematics), HAMMING weight, MATHEMATICAL models
Abstract: In this paper we propose a modified Elliptic Net algorithm to compute pairings. By reducing the number of the intermediate variables which should be updated in the iteration loop of the Elliptic Net algorithm, we speed up the computation of pairings. Experimental results show that the proposed method is more efficient than the original Elliptic Net algorithm on Barreto-Naehrig (BN) curves which are the best choice for implementing pairings at 128-bit security level. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

6. Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation.

Author: Jiang, Honglan, Han, Jie, Qiao, Fei, and Lombardi, Fabrizio
Subjects: ANALOG multipliers, APPROXIMATION theory, HIGH performance computing, ALGORITHMS, COMPUTATIONAL complexity, CRITICAL path analysis
Abstract: The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix-4 (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix-8 Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate 2-bit adder is deliberately designed for calculating the sum of 1× and 2× of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the 2-bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed 16 × 16 bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate multipliers are faster and more power efficient than the accurate Booth multiplier. The multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth multiplier designs. Finally, the approximate multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth multipliers. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

7. Bit-Stuffing Algorithms for Crosstalk Avoidance in High-Speed Switching.

Author: Chang, Cheng-Shang, Cheng, Jay, Huang, Tien-Ke, Huang, Xuan-Chao, Lee, Duan-Shin, and Chen, Chao-Yi
Subjects: ALGORITHMS, SWITCHING circuits, CROSSTALK, COMPUTER simulation, MARKOV processes, ENCODING
Abstract: The crosstalk effect is one of the main problems in deep sub-micron designs of high-speed buses. To mitigate the crosstalk effect, there are several types of crosstalk avoidance codes proposed in the literature. In this paper, we are particularly interested in generating forbidden transition codes that do not have opposite transitions on any two adjacent wires. For this, we propose a sequential bit-stuffing algorithm and a parallel bit-stuffing algorithm. For the sequential bit-stuffing algorithm, we perform a worst-case analysis and a probabilistic analysis. We show by both theoretic analysis and simulations that the coding rate of the sequential bit-stuffing encoding scheme is quite close to the Shannon capacity. In particular, for a bus with $n=10$ parallel wires, the difference is only 2.2 percent. Using a Markov chain analysis, we show that the coding rate of the parallel bit-stuffing algorithm is only slightly lower than that of the sequential bit-stuffing algorithm. The implementation complexity of the parallel bit-stuffing algorithm is linear with $n$ . In comparison with the existing forbidden transition codes that use the Fibonacci representation in the literature, our bit-stuffing algorithms not only achieve higher coding rates but also have much lower implementation complexity. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

8. Algorithms for Generating Probabilities with Multivalued Stochastic Relay Circuits.

Author: Lee, David T. and Bruck, Jehoshua
Subjects: SWITCHING circuits, ALGORITHMS, PROBABILITY theory, STOCHASTIC processes, RANDOM numbers, PROBLEM solving
Abstract: The problem of random number generation dates back to Von Neumann’s work in 1951. Since then, many algorithms have been developed for generating unbiased bits from complex correlated sources as well as for generating arbitrary distributions from unbiased bits. An equally interesting, but less studied aspect is the structural component of random number generation. That is, given a set of primitive sources of randomness, and given composition rules induced by a device or nature, how can we build networks that generate arbitrary probability distributions? In this paper, we study the generation of arbitrary probability distributions in multivalued relay circuits, a generalization in which relays can take on any of $N$ states and the logical ‘and’ and ‘or’ are replaced with ‘min’ and ‘max’ respectively. These circuits can be thought of as modeling the timing of events which depend on other event occurences. We describe a duality property and give algorithms that synthesize arbitrary rational probability distributions. We prove that these networks are robust to errors and design a universal probability generator which takes input bits and outputs any desired binary probability distribution. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

9. Radix-2 Division Algorithms with an Over-Redundant Digit Set.

Author: Ebergen, Jo and Jamadagni, Navaneeth
Subjects: ALGORITHMS, REDUNDANT number systems, SET theory, BINARY number system, APPROXIMATION theory
Abstract: This paper presents a derivation of four radix-2 division algorithms by digit recurrence. Each division algorithm selects a quotient digit from the over-redundant digit set {−2, −1, 0, 1, 2}, and the selection of each quotient digit depends only on the two most-significant digits of the partial remainder in a redundant representation. Two algorithms use a two’s complement representation for the partial remainder and carry-save additions, and the other two algorithms use a binary signed-digit representation for the partial remainder and carry-free additions. Three algorithms are novel. The fourth algorithm has been presented before. Results from the synthesized netlists show that two of our fastest algorithms achieve an improvement of 10 percent in latency per iteration over a standard radix-2 SRT algorithm at the cost of 36 percent more power and 50 percent more area. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

10. Efficient All-to-All Broadcast in Gaussian On-Chip Networks.

Author: Zhang, Zhemin, Guo, Zhiyang, and Yang, Yuanyuan
Subjects: ON-chip transformers, COMPUTER networks, SYSTEMS on a chip, ELECTRIC network topology, ALGORITHMS, ROUTING (Computer network management)
Abstract: With the development of multiprocessor system on chips (MPSoCs), it is expected that hundreds of computing cores will be operating on a single chip in the near future. This will require high-performance on-chip networks with very low latency to provide a communication substrate for the increasing number of cores. In this paper, we consider Gaussian on-chip networks that are of significant topological advantages over traditional mesh and torus networks in terms of diameter and average hop distance. Many applications on MPSoCs need global data movement and global control to exchange data and synchronize the execution among cores, which require all-to-all broadcast communication. In this paper, we propose an all-to-all broadcast algorithm suitable for on-chip implementation on the Gaussian network topology. The algorithm utilizes controlled message flooding based on a broadcast pattern, which can be described in a formal, generic way for each node in terms of a few simple operations and can be easily built into router hardware. Furthermore, the generic broadcast pattern also ensures a balanced traffic load in all dimensions in the network so that minimum total latency for all-to-all broadcast can be achieved. The algorithm overlaps message switching time with transmission time in a pipelined fashion to further reduce the total communication latency of all-to-all broadcast. Comparison results demonstrate the topological merits of Gaussian networks and ultralow latency of the proposed all-to-all broadcast algorithm. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

11. 2 n R NS Scalpers for Extended 4 -Moduli Sets.

Author: Sousa, Leonel
Subjects: VERY large scale circuit integration, NUMBER systems, SET theory, MODULI theory, ALGORITHMS, FIELD programmable gate arrays
Abstract: Scaling is a key important arithmetic operation and is difficult to perform in Residue Number Systems (RNS). This paper proposes a comprehensive approach for designing efficient and accurate 2^n RNS scalers for important classes of moduli sets that have large dynamic ranges. These classes include the traditional 3-moduli set, but the exponent of the power of two modulo is augmented by a variable value x ( \lbrace 2^n-1, 2^n\underline+x, 2^n+1 \rbrace ), and any extended set with an additional modulo m_4 ( \lbrace 2^n-1, 2^n\underline+x, 2^n+1[,\underlinem_4]\rbrace ). The proposed approach embeds scaling into the formulation of the Chinese remainder theorem and the mixed radix system, and it exploits the properties of the target moduli sets to perform scaling explicitly in the RNS domain. This is accomplished by operating hierarchically on each channel without requiring reverse and forward conversions. Simple memoryless VLSI architectures are proposed based on the obtained formulations. The relative assessment indicates that not only are these architectures comprehensive and suitable for configurable systems, but they are also more efficient than the related state of the art in terms of both performance and energy. The experimental results obtained for a 90 nm CMOS ASIC technology show improvements in the area-delay product, normalized with respect to the dynamic range, of up to $57$ and $146$ percent with the proposed scalers for the augmented 3-moduli set (dynamic range of $4n-1$ bits) and an extended 4-moduli set (dynamic range of $6n$ bits), respectively. These improvements increase to $64.9$ and $263$ percent when the energy required per scaling is measured. The proposed scalers are not only flexible and cost-effective, but they are also suitable for designing and implementing energy-constrained devices, particularly mobile systems. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

12. Multi-User Computation Partitioning for Latency Sensitive Mobile Cloud Applications.

Author: Yang, Lei, Cao, Jiannong, Cheng, Hui, and Ji, Yusheng
Subjects: MULTIUSER computer systems, CLOUD computing, CELL phones, PRODUCTION scheduling, MOBILE communication systems, ALGORITHMS
Abstract: Elastic partitioning of computations between mobile devices and cloud is an important and challenging research topic for mobile cloud computing. Existing works focus on the single-user computation partitioning, which aims to optimize the application completion time for one particular single user. These works assume that the cloud always has enough resources to execute the computations immediately when they are offloaded to the cloud. However, this assumption does not hold for large scale mobile cloud applications. In these applications, due to the competition for cloud resources among a large number of users, the offloaded computations may be executed with certain scheduling delay on the cloud. Single user partitioning that does not take into account the scheduling delay on the cloud may yield significant performance degradation. In this paper, we study, for the first time, multi-user computation partitioning problem (MCPP), which considers the partitioning of multiple users’ computations together with the scheduling of offloaded computations on the cloud resources. Instead of pursuing the minimum application completion time for every single user, we aim to achieve minimum average completion time for all the users, based on the number of provisioned resources on the cloud. We show that MCPP is different from and more difficult than the classical job scheduling problems. We design an offline heuristic algorithm, namely SearchAdjust, to solve MCPP. We demonstrate through benchmarks that SearchAdjust outperforms both the single user partitioning approaches and classical job scheduling approaches by 10 percent on average in terms of application delay. Based on SearchAdjust, we also design an online algorithm for MCPP that can be easily deployed in practical systems. We validate the effectiveness of our online algorithm using real world load traces. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

13. Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator.

Author: Pedram, Ardavan, Gerstlauer, Andreas, and Geijn, Robert A. van de
Subjects: ALGORITHMS, FLOATING-point arithmetic, MATRICES (Mathematics), FACTORIZATION, LINEAR algebra, LEAST squares, PROBLEM solving
Abstract: This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

14. Period Selection for Minimal Hyperperiod in Periodic Task Systems.

Author: Ripoll, Ismael and Ballester-Ripoll, Rafael
Subjects: SCHEDULING, AUTOMATIC control systems, ALGORITHMS, REAL-time control, QUANTITATIVE research, COMPUTER science
Abstract: Task period selection is often used to adjust the workload to the available computational resources. In this paper, we propose a model where each selected period is not restricted to be a natural number, but can be any rational number within a range. Under this generalization, we contribute a period selection algorithm that yields a much smaller hyperperiod than that of previous works: with respect to the largest period, the hyperperiod with integer constraints is exponentially bounded; with rational periods the worst case is only quadratic. By means of an integer approximation at each task activation, we show how our rational period approach can work under system clock granularity; it is thus compatible with scheduling analysis practice and implementation. Our finding has practical applications in several fields of real-time scheduling: lowering complexity in table driven schedulers, reducing search space in model checking analysis, generating synthetic workload for statistical analysis of real-time scheduling algorithms, etc. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

15. Computation and Formal Verification of SRT Quotient and Square Root Digit Selection Tables.

Author: Russinoff, David M.
Subjects: STEREOTACTIC radiotherapy, SQUARE root, ALGORITHMS, MICROPROCESSORS, COMPUTER architecture, MATHEMATICAL models
Abstract: We present a comprehensive, self-contained, and mechanically verified proof of correctness of a maximally redundant SRT design for floating-point division and square root extraction, supported by verified procedures that 1) test the admissibility of a proposed digit selection table, 2) determine the minimal dimensions of an admissible table for a given arbitrary radix, and 3) generate these tables. For square root extraction, we also provide a verified procedure for generating an initial approximation that meets the accuracy requirement of the algorithm and ensures that the digit selection index derived from successive partial roots remains static throughout the computation. A radix-8 instantiation of these algorithms has been implemented in the floating-point unit of the AMD processor code-named Steamroller. To ensure their correctness, all of our results and procedures have been formalized and mechanically checked by the ACL2 prover. We present evidence of the value of this approach by comparing it to that of a more conventional published paper that reports similar results, which are shown to be fatally flawed. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

16. An Efficient Formulation of the Real-Time Feasibility Region for Design Optimization.

Author: Zeng, Haibo and Di Natale, Marco
Subjects: REAL-time computing, MATHEMATICAL optimization, COMPUTER scheduling, LINEAR programming, COMPUTATIONAL complexity, DEADLINES, ALGORITHMS, FEASIBILITY studies
Abstract: In the design of time-critical applications, schedulability analysis is used to define the feasibility region of tasks with deadlines, so that optimization techniques can find the best design solution within the timing constraints. The formulation of the feasibility region based on the response time calculation requires many integer variables and is too complex for solvers. Approximation techniques have been used to define a convex subset of the feasibility region, used in conjunction with a branch and bound approach to compute suboptimal solutions for optimal task period selection, priority assignment, or placement of tasks onto CPUs. In this paper, we provide an improved and simpler real-time schedulability test that allows an exact and efficient definition of the feasibility region in Mixed Integer Linear Programming (MILP) optimization. Our method requires a significantly smaller number of binary variables and is viable for the treatment of industrial-size problem, as shown by the experiments. [ABSTRACT FROM PUBLISHER]
Published: 2013
Full Text: View/download PDF

17. A General Framework of Side-Channel Atomicity for Elliptic Curve Scalar Multiplication.

Author: Lu, Chia-Yu, Jen, Shang-Ming, and Laih, Chi-Sung
Subjects: ELLIPTIC curves, MULTIPLICATION, ALGORITHMS, CODING theory, GENERALIZATION, APPLICATION software
Abstract: Simple power attack (SPA) is a type of side-channel attack (SCA). In the literature, many SPA-resistant scalar multiplication algorithms have been proposed, but most are inefficient and not interoperable with other coding methods. To prevent SPA, Chevallier-Mames et al. proposed a technique called side-channel atomicity for pure binary number systems. Using their method, extra costs for preventing SPA can be limited. Even though many researchers have extended this technique to other number systems, their algorithms are for specific cases and few provide implementation results. In this paper, we generalize the atomicity technique to protect nearly all existing fast coding methods/number systems. Our general framework provides security and flexibility while its efficiency is coupled to that of the coding methods. Moreover, we utilize our framework to protect the known fastest scalar multiplications by exploring application on the GLV method for GLS curves. Proof of concept programs are written in the C language along with assembly for fast field operations and run on AMD Athlon X2 245-based hardware. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

18. Comparison between Binary and Decimal Floating-Point Numbers.

Author: Brisebarre, Nicolas, Lauter, Christoph, Mezzarobba, Marc, and Muller, Jean-Michel
Subjects: FLOATING-point arithmetic, ALGORITHMS, COMPUTER science
Abstract: We introduce an algorithm to compare a binary floating-point (FP) number and a decimal FP number, assuming the “binary encoding” of the decimal formats is used, and with a special emphasis on the basic interchange formats specified by the IEEE 754-2008 standard for FP arithmetic. It is a two-step algorithm: a first pass, based on the exponents only, quickly eliminates most cases, then, when the first pass does not suffice, a more accurate second pass is performed. We provide an implementation of several variants of our algorithm, and compare them. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

19. Efficient Rule Engine for Smart Building Systems.

Author: Sun, Yan, Wu, Tin-Yu, Zhao, Guotao, and Guizani, Mohsen
Subjects: INTELLIGENT buildings, ACTUATORS, DATA mining, ALGORITHMS, WIRELESS sensor networks, INFORMATION filtering systems
Abstract: In smart building systems, the automatic control of devices relies on matching the sensed environment information to customized rules. With the development of wireless sensor and actuator networks (WSANs), low-cost and self-organized wireless sensors and actuators can enhance smart building systems, but produce abundant sensing data. Therefore, a rule engine with ability of efficient rule matching is the foundation of WSANs based smart building systems. However, traditional rule engines mainly focus on the complex processing mechanism and omit the amount of sensing data, which are not suitable for large scale WSANs based smart building systems. To address these issues, we build an efficient rule engine. Specifically, we design an atomic event extraction module for extracting atomic event from data messages, and then build a $\beta$ -network to acquire the atomic conditions for parsing the atomic trigger events. Taking the atomic trigger events as the key set of MPHF, we construct the minimal perfect hash table which can filter the majority of the unused atomic event with O (1) time overhead. Moreover, a rule engine adaption scheme is proposed to minimize the rule matching overhead. We implement the proposed rule engine in a practical smart building system. The experimental results show that the rule engine can perform efficiently and flexibly with high data throughput and large rule set. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

20. Robot Coordination for Energy-Balanced Matching and Sequence Dispatch of Robots to Events.

Author: Lukic, Milan, Barnawi, Ahmed, and Stojmenovic, Ivan
Subjects: ROBOT kinematics, MATCHING theory, PROBLEM solving, ALGORITHMS, WIRELESS sensor networks, COMPUTER simulation
Abstract: Given a set of events and a set of robots, the dispatch problem is to allocate one robot for each event to visit it. In a single round, each robot may be allowed to visit only one event (matching dispatch), or several events in a sequence (sequence dispatch). In a distributed setting, each event is discovered by a sensor and reported to a robot. Here, we present novel algorithms aimed at overcoming the shortcomings of several existing solutions. We propose pairwise distance based matching algorithm (PDM) to eliminate long edges by pairwise exchanges between matching pairs. Our sequence dispatch algorithm (SQD) iteratively finds the closest event-robot pair, includes the event in dispatch schedule of the selected robot and updates its position accordingly. When event-robot distances are multiplied by robot resistance (inverse of the remaining energy), the corresponding energy-balanced variants are obtained. We also present generalizations which handle multiple visits and timing constraints. Our localized algorithm MAD is based on information mesh infrastructure and local auctions within the robot network for obtaining the optimal dispatch schedule for each robot. The simulations conducted confirm the advantages of our algorithms over other existing solutions in terms of average robot-event distance and lifetime. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

21. R2: Incremental Reprogramming Using Relocatable Code in Networked Embedded Systems.

Author: Dong, Wei, Liu, Yunhao, Chen, Chun, Bu, Jiajun, Huang, Chao, and Zhao, Zhiwei
Subjects: GROUND penetrating radar, COMPUTER networks, ALGORITHMS, COST, DETECTORS, COMPUTER operating systems
Abstract: We present R2, an incremental reprogramming approach using relocatable code, to improve program similarity for efficient incremental reprogramming in networked embedded systems. R2 achieves a higher degree of similarity than existing approaches by mitigating effects of both function shifts and data shifts. R2 adopts a content-aware differencing algorithm to generate small delta files for efficient dissemination. Besides, it makes efficient use of memory and does not degrade program quality. We implement R2 based on TinyOS 2.1 and demonstrate its advantages through detailed analysis of TinyOS examples. We also present case studies on the software programs of a large-scale sensor system GreenOrbs. Results show that R2 reduces the dissemination cost by approximately 65 percent compared to state-of-the-art network reprogramming approach Deluge, and reduces the dissemination cost by approximately 20 percent compared to Zephyr and Hermes the latest works on incremental reprogramming. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

22. Comments on “On the Polynomial Multiplication in Chebyshev Form”.

Author: Park, Sun-Mi, Chang, Ku-Young, Hong, Dowon, and Seo, Changho
Subjects: MULTIPLICATION, CHEBYSHEV polynomials, ALGORITHMS, ARITHMETIC, MATHEMATICAL analysis
Abstract: In the above paper, Akleylek proposed an efficient multiplication algorithm for polynomials in Chebyshev form. In this comment, we show that a recombination of the above proposed algorithm induces more efficient algorithm for the multiplications of polynomials in Chebyshev form. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

23. Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation.

Author: Page, Daniel and Smart, Nigel P.
Subjects: CRYPTOGRAPHY, COMPUTER security, PENTIUM (Microprocessor), DATA encryption, COMPUTER network security, ALGORITHMS
Abstract: We describe how using a redundant Montgomery representation allows for high-performance SIMD-based implementations of LISA and elliptic curve cryptography. This is in addition to the known benefits of immunity from timing attacks afforded by the use of such a representation. We present some preliminary implementation timings using the SSE2 instruction set on a Pentium 4 processor and show that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code. This is especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels. Finally, we remark on other application areas that improve the security of our work in the context of side-channel analysis while maintaining high performance. [ABSTRACT FROM AUTHOR]
Published: 2004
Full Text: View/download PDF

24. Hardware Designs for Binary Integer Decimal-Based Rounding.

Author: Tsen, Samuel, Gonzalez-Navarro, Sonia, Schulte, Michael J., and Compton, Katherine
Subjects: COMPUTER architecture, COMPUTER input-output equipment, BINARY control systems, INTEGER programming, ROUNDING (Numerical analysis), ALGORITHMS, APPROXIMATION theory
Abstract: Decimal floating-point (DFP) arithmetic is becoming increasingly important and specifications for it are included in the revised IEEE 754 standard for floating-point arithmetic (IEEE 754-2008). The binary encoding of DFP numbers specified in IEEE 754-2008 is commonly referred to as Binary-Integer Decimal (BID). BID uses a binary integer to encode the significant, which allows it to leverage existing high-speed binary circuits. However, performing decimal rounding on these binary significant is challenging. In this paper, we propose and evaluate several approaches to perform decimal rounding in hardware for DFP numbers that use the BID encoding. We summarize several rounding techniques, present the theory and design of each proposed rounding unit, and use synthesis results to evaluate the critical path delay, latency, and area of rounding units for 64-bit BID numbers. Our results indicate that the bulk of each rounder design is occupied by a binary fixed-point multiplier that can be shared with other integer and floating-point operations. This is the first paper to present and compare a variety of techniques for BID-based rounding hardware. These techniques are valuable to designers of BID-based DFP solutions. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

25. Toward Advocacy-Free Evaluation of Packet Classification Algorithms.

Author: Song, Haoyu and Turner, Jonathan
Subjects: CLASSIFICATION, DATA packeting, ALGORITHMS, PERFORMANCE evaluation, QUALITY of service, BENCHMARKING (Management)
Abstract: Understanding the real performance of a proposed algorithm is a basic requirement for both algorithm designers and implementers. However, this is sometimes difficult to achieve. Each new algorithm published is evaluated from different perspectives and based on different assumptions. Without a common ground, it is almost impossible to compare different algorithms directly. Choosing an incompetent algorithm for an application can incur significant cost. This is especially true for packet classification in network routers, since packet classification is intrinsically a hard problem and all existing algorithms are based on some heuristics and filter set characteristics. The performance of the packet classification subsystem is critical to the overall performance of the network routers. Although numerous algorithms have been proposed so far, a benchmark that can give them consistent evaluation and reveal their comparable performance is still missing. This paper summarizes our efforts toward improving this situation. First, we conduct a high-level survey on the existing algorithms and extract some insights on the general design ideas. Second, we describe an open-source platform dedicated for advocacy-free evaluation of packet classification algorithms. Many representative algorithms are actually implemented under a set of uniform conditions and assumptions. The freely available implementations allow other researchers to easily test them under different scenarios. We also enforce some consistent and fundamental criteria for the algorithm evaluation, so that their performance and potentials are directly comparable, regardless of the actual implementation platforms. This project serves dual purpose: It helps the researchers to accelerate the innovation in the area of packet classification algorithm development by relieving them from the labor of replicating the previous work and by enabling them to quickly compare and evaluate algorithms. Meanwhile, it also helps the system implementers to easily choose the capable algorithm for their particular applications. Aiming to build an open-source library, we encourage external contributions of new algorithm implementations and evaluations under the same framework. We believe the practice will benefit the research and design community as a whole. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

26. Complexity of Data Collection, Aggregation, and Selection for Wireless Sensor Networks.

Author: Li, Xiang-Yang, Wang, Yajun, and Wang, Yu
Subjects: COMPUTATIONAL complexity, ACQUISITION of data, AGGREGATION operators, RANKING (Statistics), WIRELESS sensor networks, INFORMATION processing, CALORIC expenditure, ALGORITHMS
Abstract: Processing the gathered information efficiently is a key functionality for wireless sensor networks. In this paper, we study the time complexity, message complexity (number of messages used by all nodes), and energy cost complexity (total energy used by all nodes for transmitting messages) of some tasks, such as data collection (collecting raw data of all nodes to a sink), data aggregation (computing the aggregated value of data of all nodes), and queries for a multihop wireless sensor network of n nodes. We first present a lower bound on the complexity for the optimal methods, and then, for most of the tasks studied in this paper, we provide an (asymptotically matching) upper bound on the complexity by presenting efficient distributed algorithms to solve these problems. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

27. Novel Convolutions Using First-Order Moments.

Author: Liu, Jianguo, Pan, Chao, and Liu, Zhenbing
Subjects: MATHEMATICAL convolutions, FIRST-order logic, ALGORITHMS, READ-only memory, VERY large scale circuit integration, SCALABILITY, COMPUTER networks
Abstract: This paper presents a novel fast algorithm for digital convolutions. It is able to compute arbitrary-length convolutions more efficiently via transforming the convolution into a first-order moment. Although many additions are required, the proposed algorithm has some advantages such as the avoidance of multiplications, simple computation structure, and only integer additions. These advantages contribute to this algorithm being so easy that it can compute convolutions rapidly. Based on the proposed algorithm a very simple and scalable systolic array without multipliers and ROM has been developed leading to more efficient VLSI implementation of convolutions. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

28. Efficient 2-Approximation Algorithms for Computing 2-Connected Steiner Minimal Networks.

Author: Shen, Hong and Guo, Longkun
Subjects: APPROXIMATION theory, ALGORITHMS, COMPUTER networks, COMPARATIVE studies, SPANNING trees, POLYNOMIALS
Abstract: For an undirected and weighted graph G=(V,E) and a terminal set S\subseteq V, the 2-connected Steiner minimal network (SMN) problem requires to compute a minimum-weight subgraph of G in which all terminals are 2-connected to each other. This problem has important applications in design of survivable networks and fault-tolerant communication, and is known MAXSNP-hard , a harder subclass of NP-hard problems for which no polynomial-time approximation scheme (PTAS) is known. This paper presents an efficient algorithm of O(\vert V\vert^2\vert S\vert^3) time for computing a 2-vertex connected Steiner network (2VSN) whose weight is bounded by two times of the optimal solution 2-vertex connected SMN (2VSMN). It compares favorably with the currently known 2--approximation solution to the 2VSMN problem based on that to the survivable network design problem], with a time complexity reduction of O(\vert V\vert^5\vert E\vert^7) for strongly polynomial time and O(\vert V\vert^5\gamma ) for weakly polynomial time where \gamma is determined by the sizes of input. Our algorithm applies a novel greedy approach to generate a 2VSN through progressive improvement on a set of vertex-disjoint shortest path pairs incident with each terminal of S. The algorithm can be directly deployed to solve the 2-edge connected SMN problem at the same approximation ratio within time O(\vert V\vert^2\vert S\vert^2). To the best of our knowledge, this result presents currently the most efficient 2-approximation algorithm for the 2-connected Steiner minimal network problem. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

29. On the Computation of Correctly Rounded Sums.

Author: Kornerup, Peter, Lefevre, Vincent, Louvet, Nicolas, and Muller, Jean-Michel
Subjects: FLOATING-point arithmetic, ALGORITHMS, ROUNDING errors, COMPUTER systems, SUBTRACTION (Mathematics), ADDITION (Mathematics)
Abstract: This paper presents a study of some basic blocks needed in the design of floating-point summation algorithms. In particular, in radix-2 floating-point arithmetic, we show that among the set of the algorithms with no comparisons performing only floating-point additions/subtractions, the 2Sum algorithm introduced by Knuth is minimal, both in terms of number of operations and depth of the dependency graph. We investigate the possible use of another algorithm, Dekker's Fast2Sum algorithm, in radix-10 arithmetic. We give methods for computing, in radix 10, the floating-point number nearest the average value of two floating-point numbers. We also prove that under reasonable conditions, an algorithm performing only round-to-nearest additions/subtractions cannot compute the round-to-nearest sum of at least three floating-point numbers. Starting from an algorithm due to Boldo and Melquiond, we also present new results about the computation of the correctly-rounded sum of three floating-point numbers. For a few of our algorithms, we assume new operations defined by the recent IEEE 754-2008 Standard are available. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

30. Two Efficient Algorithms for Linear Time Suffix Array Construction.

Author: Nong, Ge, Zhang, Sen, and Chan, Wai Hong
Subjects: ALGORITHMS, LINEAR systems, COMPUTATIONAL complexity, SORTING (Electronic computers), LEAST squares, APPROXIMATION theory
Abstract: We present, in this paper, two efficient algorithms for linear time suffix array construction. These two algorithms achieve their linear time complexities, using the techniques of divide-and-conquer, and recursion. What distinguish the proposed algorithms from other linear time suffix array construction algorithms (SACAs) are the variable-length leftmost S-type (LMS) substrings and the fixed-length d-critical substrings sampled for problem reduction, and the simple algorithms for sorting these sampled substrings: the induced sorting algorithm for the variable-length LMS substrings and the radix sorting algorithm for the fixed-length d-critical substrings. The very simple sorting mechanisms render our algorithms an elegant design framework, and, in turn, the surprisingly succinct implementations. The fully functional sample implementations of our proposed algorithms require only around 100 lines of C code for each, which is only 1/10 of the implementation of the KA [CHECK END OF SENTENCE] algorithm and comparable to that of the KS [CHECK END OF SENTENCE] algorithm. The experimental results demonstrate that these two newly proposed algorithms yield the best time and space efficiencies among all the existing linear time SACAs. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

31. Self-Routing Quantum Sparse Crossbar Packet Concentrators.

Author: Ratan, Rahul and Oruc, A. Yavuz
Subjects: SELF-routing (Computer network management), QUANTUM communication, SPARSE matrices, DATA packeting, QUANTUM electronics, ELECTRONIC circuit design, ALGORITHMS
Abstract: Quantum switching networks are derived from conventional switching networks by replacing the classical switches by quantum switches. We give the quantum circuit design and routing of an n \times m network, called a quantum concentrator that can direct quantum bit packets in arbitrary quantum states from any of its k inputs to some of its k outputs, where 1 \le k \le m \le n. Our designs are based on sparse crossbars which are rectangular grids of 2 \times 2 crosspoints. Sparse crossbar concentrator structures with theoretically minimum crosspoint complexities for any values of n and m are well known but no self-routing algorithms have been reported for such concentrators. We transform two such families of optimal concentrators, called fat-slim and banded sparse crossbars into quantum networks and provide self-routing algorithms for these families of concentrators. In this process, we extend the notion of packet concentration to a quantum network and design self-routing quantum crosspoints from quantum gates. We address issues critical to quantum operation like reversibility and localized self-routing and give a rigorous proof that quantum fat-slim and banded sparse crossbar concentrators are self-routable. The self-routing algorithms described in the paper can be used for both quantum and classical sparse crossbar concentrators by the linearity property of all quantum systems. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

32. Efficient, Decentralized Computation of the Topology of Spatial Regions.

Author: Duckham, Matt, Nussbaum, Doron, Sack, Jorg-Rudiger, and Santoro, Nicola
Subjects: WIRELESS sensor networks, REASONING, ALGORITHMS, SCALAR field theory, ROBUST control, SCALABILITY, EMBEDDINGS (Mathematics), TRIANGULATION
Abstract: The capability to query the topology of spatial regions is fundamental to today's centralized spatial computing systems, like spatial databases and GIS. By contrast, this paper explores decentralized algorithms for computing the topology of spatial regions in wireless sensor networks. The approach generates global topological information about regions, using only the local knowledge of nodes and their immediate network neighbors aggregated up through spatial boundary structures. Using three basic boundary structures (boundary nodes, boundary cycles, and boundary orientation), a family of decentralized algorithms is defined that can respond efficiently to snapshot queries about the topology of spatial regions, including containment and adjacency queries. The communication complexity of the algorithm is O(n) for realistic inputs. Empirical investigation of the performance of the approach, using simulation, also confirms the efficiency, scalability, and robustness of this approach. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

33. Efficient and Strategyproof Spectrum Allocations in Multichannel Wireless Networks.

Author: Xu, Ping, Li, Xiang-Yang, and Tang, Shaojie
Subjects: SPECTRUM allocation, MULTICHANNEL communication, WIRELESS communications, APPROXIMATION theory, ALGORITHMS, GAME theory, COMPUTER scheduling, GRAPH theory
Abstract: In this paper, we study the spectrum assignment problem for wireless access networks. We assume that each secondary user will bid a certain value for exclusive usage of some spectrum channels for a certain time period or for a certain time duration. A secondary user may also require the exclusive usage of a subset of channels, or require the exclusive usage of a certain number of channels. Thus, several versions of problems are formulated under various different assumptions. For the majority of problems, we design PTAS or efficient constant-approximation algorithms such that overall profit is maximized. Here, the profit is defined as the total bids of all satisfied secondary users. As a side product of our algorithms, we are able to show that a previously studied Scheduling Split Interval Problem (SSIP) [CHECK END OF SENTENCE], in which each job is composed of t intervals, cannot be approximated within O(t^1-\epsilon ) for any small \epsilon >0 unless NP=ZPP. Opportunistic spectrum usage, although a promising technology, could suffer from the selfish behavior of secondary users. In order to improve opportunistic spectrum usage, we then propose to combine the game theory with wireless modeling. We show how to design a truthful mechanism based on all of these algorithms such that the best strategy of each secondary user to maximize its own profit is to truthfully report its actual bid. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

34. EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters.

Author: Zong, Ziliang, Manzanares, Adam, Ruan, Xiaojun, and Qin, Xiao
Subjects: COMPUTER scheduling, ALGORITHMS, CLUSTER analysis (Statistics), ENERGY consumption, COMPUTER engineering, ALGEBRA
Abstract: High-performance clusters have been widely deployed to solve challenging and rigorous scientific and engineering tasks. On one hand, high performance is certainly an important consideration in designing clusters to run parallel applications. On the other hand, the ever increasing energy cost requires us to effectively conserve energy in clusters. To achieve the goal of optimizing both performance and energy efficiency in clusters, in this paper, we propose two energy-efficient duplication-based scheduling algorithms—Energy-Aware Duplication (EAD) scheduling and Performance-Energy Balanced Duplication (PEBD) scheduling. Existing duplication-based scheduling algorithms replicate all possible tasks to shorten schedule length without reducing energy consumption caused by duplication. Our algorithms, in contrast, strive to balance schedule lengths and energy savings by judiciously replicating predecessors of a task if the duplication can aid in performance without degrading energy efficiency. To illustrate the effectiveness of EAD and PEBD, we compare them with a nonduplication algorithm, a traditional duplication-based algorithm, and the dynamic voltage scaling (DVS) algorithm. Extensive experimental results using both synthetic benchmarks and real-world applications demonstrate that our algorithms can effectively save energy with marginal performance degradation. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

35. Area-Efficient Multipliers Based on Multiple-Radix Representations.

Author: Dimitrov, Vassil S., Jarvinen, Kimmo U., and Adikari, Jithra
Subjects: MULTIPLIERS (Mathematical analysis), COMPLEMENTARY metal oxide semiconductors, ALGORITHMS, MULTIPLICATION, COMPUTER input-output equipment, CRYPTOGRAPHY, SET theory
Abstract: In this paper, we shall introduce several new algorithms for integer multiplication that are based on specific multiple-radix representation of one of the multiplicands. We provide extensive theoretical analysis and experimental results for multipliers based on the new representations on 0.18 \mu m CMOS technology. They provide a clear picture about the advantages of the new method in 64-bit hardware implementations compared to array-based classical multiplier and radix-8-based multiplier. The proposed multipliers have better area and power consumption compared to reference multipliers. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

36. Hybrid Binary-Ternary Number System for Elliptic Curve Cryptosystems.

Author: Adikari, Jithra, Dimitrov, Vassil S., and Imbert, Laurent
Subjects: BINARY number system, ELLIPTIC curves, CRYPTOGRAPHY, SCALAR field theory, ALGORITHMS, COMPUTER software, EXPERIMENTS
Abstract: Single and double scalar multiplications are the most computational intensive operations in elliptic curve based cryptosystems. Improving the performance of these operations is generally achieved by means of integer recoding techniques, which aim at minimizing the scalars' density of nonzero digits. The hybrid binary-ternary number system provides both short representations and small density. In this paper, we present three novel algorithms for both single and double scalar multiplication. We present a detailed theoretical analysis, together with timings and fair comparisons over both tripling-oriented Doche-Ichart-Kohel curves and generic Weierstrass curves. Our experiments show that our algorithms are almost always faster than their widely used counterparts. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

37. A Radix-16 Combined Complex Division/Square Root Unit with Operand Prescaling.

Author: Wang, Dong and Ercegovac, Miloš D.
Subjects: RADIXIN, SQUARE root, ZIRCONIUM, COMPUTER input-output equipment, FIELD programmable gate arrays, DIGITAL signal processing, GATE array circuits, ALGORITHMS
Abstract: We present a novel design of a radix-16 combined unit for complex division and square root in fixed-point format. A new digit-recurrence algorithm with two-step operand prescaling is developed for complex square root to avoid postscaling of the result. A combined recurrence algorithm is generalized and a scalable hardware architecture is proposed. Designs with different operand precision are implemented in Altera Stratix-II FPGA and cost and performance are evaluated and compared with reference designs of complex division or square root implemented combined or separately. The results show advantages of the proposed combined design in cost and performance. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

38. A Parallel Implementation of Montgomery Multiplication on Multicore Systems: Algorithm, Analysis, and Prototype.

Author: Chen, Zhimin and Schaumont, Patrick
Subjects: MULTICORE processors, ALGORITHMS, PROTOTYPES, PUBLIC key cryptography, PARALLEL programming, FIELD programmable gate arrays
Abstract: The Montgomery Multiplication is one of the cornerstones of public-key cryptography, with important applications in the RSA algorithm, in Elliptic-Curve Cryptography, and in the Digital Signature Standard. The efficient implementation of this long-word-length modular multiplication is crucial for the performance of public-key cryptography. Along with the strong momentum of shifting from single-core to multicore systems, we present a parallel-software implementation of the Montgomery multiplication for multicore systems. Our comprehensive analysis shows that the proposed scheme, pSHS, partitions the task in a balanced way so that each core has the same amount of job to do. In addition, we also comprehensively analyze the impact of intercore communication overhead on the performance of pSHS. The analysis reveals that pSHS is high performance, scalable over different number of cores, and stable when the communication latency changes. The analysis also tells us how to set different parameters to achieve the optimal performance. We implemented pSHS on a prototype multicore architecture configured in a Field Programmable Gate Array (FPGA). Compared with the sequential implementation, pSHS accelerates 2,048-bit Montgomery multiplication by 1.97, 3.68, and 6.13 times on, respectively, two-core, four-core, and eight-core architectures with communication latency equal to 100 clock cycles. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

39. Optimal Storage Placement for Tree-Structured Networks with Heterogeneous Channel Costs.

Author: Chiu, Ge-Ming, Yen, Li-Hsing, and Chin, Tai-Lin
Subjects: TREE graphs, HETEROGENEOUS computing, COST analysis, QUERYING (Computer science), ALGORITHMS, SIMULATION methods & models, PEER-to-peer architecture (Computer networks), WIRELESS sensor networks, DATA transmission systems
Abstract: This work considers data query applications in tree-structured networks, where a given set of source nodes generate (or collect) data and forward the data to some halfway storage nodes for satisfying queries that call for data generated by all source nodes. The goal is to determine an optimal set of storage nodes that minimizes overall communication cost. Prior work toward this problem assumed homogeneous channel cost, which may not be the case in many network environments. We generalize the optimal storage problem for a tree-structured network by considering heterogeneous channel costs. The necessary and sufficient conditions for the optimal solution are identified, and an algorithm that incurs a linear time cost is proposed. We have also conducted extensive simulations to validate the algorithm and to evaluate its performance. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

40. Simulation-Based Verification of Floating-Point Division.

Author: Guralnik, Elena, Aharoni, Merav, Birnbaum, Ariel J., and Koyfman, Anatoly
Subjects: SIMULATION methods & models, FLOATING-point arithmetic, COMPUTER input-output equipment design & construction, ALGORITHMS, COMPUTER viruses, DIVISION, COMPUTER arithmetic
Abstract: Floating-point division is known to exhibit an exceptionally wide array of corner cases, making its verification a difficult challenge. Despite the remarkable advances in formal methods, the intricacies of this operation and its implementation often render these inapplicable. Simulation-based methods remain the primary means for verification of division. FPgen is a test generation framework targeted at the floating point datapath. It has been successfully used in the simulation-based verification of a variety of hardware designs. FPgen comprises a comprehensive test plan and a powerful test generator. A proper response to the difficulties posed by division constitutes a major part of FPgen's capabilities. We present an overview of the relevant verification tasks supplied with FPgen and the underlying algorithms used to target them. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

41. A Rounding Method to Reduce the Required Multiplier Precision for Goldschmidt Division.

Author: Kong, Inwook and Swartzlander, Jr., Earl E.
Abstract: A new rounding method to reduce the required precision of the multiplier for Goldschmidt division is presented. It applies special truncation methods at the final iteration step. This requires a minor modification to the rounding constants of the multiplier. It allows twice the error tolerance of conventional methods and inclusive error bounds. The proposed method further reduces the required precision of the multiplier by considering the asymmetric error bounds of Goldschmidt dividers where the factors are computed using a one's complement operation. As a result, the proposed rounding method allows the multiplier of a three-iteration Goldschmidt divider to be implemented using only three extra bits. The proposed method has been verified using a SystemC hardware model of the divider supporting variable precision. The validity of the error analysis is also checked via simulation. The final rounding results are checked with both 10^{10} random double precision floating-point significands and an exhaustive suite of 17-bit test vectors. [ABSTRACT FROM PUBLISHER]
Published: 2010
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

41 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources