721 results
Search Results
2. Selected Papers from SLA++P 07 and 08 Model-Driven High-Level Programming of Embedded Systems
- Author
-
Maraninchi, Florence, primary, Mendler, Michael, additional, Pouzet, Marc, additional, Girault, Alain, additional, and Rutten, Eric, additional
- Published
- 2008
- Full Text
- View/download PDF
3. Multi-criteria resource allocation in modal hard real-time systems.
- Author
-
Dziurzanski, Piotr, Singh, Amit Kumar, and Indrusiak, Leandro Soares
- Subjects
RESOURCE allocation ,ENERGY dissipation - Abstract
In this paper, a novel resource allocation approach dedicated to hard real-time systems with distinctive operational modes is proposed. The aim of this approach is to reduce the energy dissipation of the computing cores by either powering them off or switching them into energy-saving states while still guaranteeing to meet all timing constraints. The approach is illustrated with two industrial applications, an engine control management and an engine control unit. Moreover, the amount of data to be migrated during the mode change is minimised. Since the number of processing cores and their energy dissipation are often negatively correlated with the amount of data to be migrated during the mode change, there is some trade-off between these values, which is also analysed in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
4. Modeling of smartphones' power using neural networks.
- Author
-
Alawnah, Sameer and Sagahyroon, Assim
- Subjects
SMARTPHONE design & construction ,ARTIFICIAL neural networks ,ENERGY consumption ,ACQUISITION of data ,PREDICTION theory - Abstract
In the work presented in this paper, we use data collected from mobile users over several weeks to develop a neural network-based prediction model for the power consumed by a smartphone. Battery life is critical to the designers of smartphones, and being able to assess scenarios of power consumption, and hence energy usage is of great value. The models developed attempt to correlate power consumption to users' behavior by using power-related data collected from smartphones with the help of specially designed logging tool or application. Experiences gained while developing the model regarding the selection of input parameters to the model, the identification of the most suitable NN (neural network) structure, and the training methodology applied are all described in this paper. To the best of our knowledge, this is the first attempt where NN is used as a vehicle to model smartphones' power, and the results obtained demonstrate that NNs models can provide reasonably accurate estimates, and therefore, further investigation of their use in this modeling problem is justified. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
5. Low-level trace correlation on heterogeneous embedded systems.
- Author
-
Bertauld, Thomas and Dagenais, Michel
- Subjects
DEBUGGING ,EMBEDDED computer systems ,LINUX operating systems ,SYSTEMS on a chip ,HETEROGENEOUS computing ,STATISTICAL correlation - Abstract
Tracing is a common method used to debug, analyze, and monitor various systems. Even though standard tools and tracing methodologies exist for standard and distributed environments, it is not the case for heterogeneous embedded systems. This paper proposes to fill this gap and discusses how efficient tracing can be achieved without having common system tools, such as the Linux Trace Toolkit ( LTTng), at hand on every core. We propose a generic solution to trace embedded heterogeneous systems and overcome the challenges brought by their peculiar architectures (little available memory, bare-metal CPUs, or exotic components for instance). The solution described in this paper focuses on a generic way of correlating traces among different kinds of processors through traces synchronization, to analyze the global state of the system as a whole. The proposed solution was first tested on the Adapteva Parallella board. It was then improved and thoroughly validated on TI's Keystone 2 System-on-Chip (SoC). [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
6. Design and evaluation of medical ultrasonic adaptive beamforming algorithm implementation on heterogeneous embedded computing platform.
- Author
-
Chen, Junying, Chen, Jinhui, and Min, Huaqing
- Subjects
MEDICAL ultrasonics ,EMBEDDED computer systems ,BEAMFORMING ,ADAPTIVE computing systems ,GRAPHICS processing units ,ALGORITHMS - Abstract
Medical ultrasonic imaging has been utilized in a variety of clinical diagnoses for many years. Recently, because of the needs of portable and mobile medical ultrasonic diagnoses, the development of real-time medical ultrasonic imaging algorithms on embedded computing platforms is a rising research direction. Typically, delay-and-sum beamforming algorithm is implemented on embedded medical ultrasonic scanners. Such algorithm is the easiest to implement at real-time frame rate, but the image quality of this algorithm is not high enough for complicated diagnostic cases. As a result, minimum-variance adaptive beamforming algorithm for medical ultrasonic imaging is considered in this paper, which shows much higher image quality than that of delay-and-sum beamforming algorithm. However, minimum-variance adaptive beamforming algorithm is a complicated algorithm with O( n ) computational complexity. Consequently, it is not easy to implement such algorithm on embedded computing platform at real-time frame rate. On the other hand, GPU is a well-known parallel computing platform for image processing. Therefore, embedded GPU computing platform is considered as a potential real-time implementation platform of minimum-variance beamforming algorithm in this paper. By applying the described effective implementation strategies, the GPU implementation of minimum-variance beamforming algorithm performed more than 100 times faster than the ARM implementation on the same heterogeneous embedded platform. Furthermore, platform power consumptions, computation energy efficiency, and platform cost efficiency of the experimental heterogeneous embedded platforms were also evaluated, which demonstrated that the investigated heterogeneous embedded computing platforms were suitable for real-time portable or mobile high-quality medical ultrasonic imaging device constructions. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
7. Design and Architectures for Signal and Image Processing.
- Author
-
Rupp, Markus, Erdogan, Ahmet T., and Granado, Bertrand
- Subjects
IMAGE processing ,RAPID prototyping - Abstract
An introduction to the journal is presented in which the author discusses the articles "Multicore software defined radio architecture for GNSS receiver signal processing", "An open framework for rapid prototyping of signal processing applications" and "Run-time HW/SW scheduling of data flow applications on reconfigurable architectures" published within the issue.
- Published
- 2009
- Full Text
- View/download PDF
8. Real-time semi-partitioned scheduling of fork-join tasks using work-stealing.
- Author
-
Maia, Cláudio, Yomsi, Patrick, Nogueira, Luís, and Pinho, Luis
- Subjects
EMBEDDED computer systems ,REAL-time computing - Abstract
This paper extends the work presented in Maia et al. (Semi-partitioned scheduling of fork-join tasks using work-stealing, 2015) where we address the semi-partitioned scheduling of real-time fork-join tasks on multicore platforms. The proposed approach consists of two phases: an offline phase where we adopt a multi-frame task model to perform the task-to-core mapping so as to improve the schedulability and the performance of the system and an online phase where we use the work-stealing algorithm to exploit tasks' parallelism among cores with the aim of improving the system responsiveness. The objective of this work is twofold: (1) to provide an alternative scheduling technique that takes advantage of the semi-partitioned properties to accommodate fork-join tasks that cannot be scheduled in any pure partitioned environment and (2) to reduce the migration overheads which has been shown to be a traditional major source of non-determinism for global scheduling approaches. In this paper, we consider different allocation heuristics and we evaluate the behavior of two of them when they are integrated within our approach. The simulation results show an improvement up to 15% of the proposed heuristic over the state-of-the-art in terms of the average response time per task set. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
9. Multiple Word-Length High-Level Synthesis.
- Author
-
Coussy, Philippe, Lhairech-Lebreton, Ghizlane, and Heller, Dominique
- Subjects
DIGITAL signal processing ,EMBEDDED computer systems ,DIGITAL electronics ,APPLICATION-specific integrated circuits ,COMPUTER simulation ,AUTOMATION - Abstract
Digital signal processing (DSP) applications are nowadays widely used and their complexity is ever growing. The design of dedicated hardware accelerators is thus still needed in system-on-chip and embedded systems. Realistic hardware implementation requires first to convert the floating-point data of the initial specification into arbitrary length data (finite-precision) while keeping an acceptable computation accuracy. Next, an optimized hardware architecture has to be designed. Considering uniformbit-width specification allows to use traditional automated design flow. However, it leads to oversized design. On the other hand, considering non uniform bit-width specification allows to get a smaller circuit but requires complex design tasks. In this paper, we propose an approach that inputs a C/C++ specification. The design flow, based on high-level synthesis (HLS) techniques, automatically generates a potentially pipeline RTL architecture described in VHDL. Both bitaccurate integer and fixed-point data types can be used in the input specification. The generated architecture uses components (operator, register, etc.) that have different widths. The design constraints are the clock period and the throughput of the application. The proposed approach considers data word length information in all the synthesis steps by using dedicated algorithms. We show in this paper the effectiveness of the proposed approach through several design experiments in the DSP domain. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
10. On Definition of a FormalModel for IEC 61499 Function Blocks.
- Author
-
Dubinin, Victor and Vyatkin, Valeriy
- Subjects
EMBEDDED computer systems ,COMPUTERS ,DISTRIBUTED operating systems (Computers) ,AUTOMATIC control systems ,CONTROL theory (Engineering) ,UBIQUITOUS computing - Abstract
Formal model of IEC 61499 syntax and its unambiguous execution semantics are important for adoption of this international standard in industry. This paper proposes some elements of such a model. Elements of IEC 61499 architecture are defined in a formal way following set theory notation. Based on this description, formal semantics of IEC 61499 can be defined. An example is shown in this paper for execution of basic function blocks. The paper also provides a solution for flattening hierarchical function block networks. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
11. pn: A Tool for Improved Derivation of Process Networks.
- Author
-
Verdoolaege, Sven, Nikolov, Hristo, and Stefanov, Todor
- Subjects
COMPUTERS ,EMBEDDED computer systems ,MULTIPROCESSORS ,CARTOGRAPHY software ,TELECOMMUNICATION - Abstract
Current emerging embedded System-on-Chip platforms are increasingly becoming multiprocessor architectures. System designers experience significant difficulties in programming these platforms. The applications are typically specified as sequential programs that do not reveal the available parallelism in an application, thereby hindering the efficient mapping of an application onto a parallel multiprocessor platform. In this paper, we present our compiler techniques for facilitating the migration from a sequential application specification to a parallel application specification using the process network model of computation. Our work is inspired by a previous research project called Compaan. With our techniques we address optimization issues such as the generation of process networks with simplified topology and communication without sacrificing the process networks' performance. Moreover, we describe a technique for compile-time memory requirement estimation which we consider as an important contribution of this paper. We demonstrate the usefulness of our techniques on several examples. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
12. Automatic Generation of Spatial and Temporal Memory Architectures for Embedded Video Processing Systems.
- Author
-
Norell, Håkan, Lawal, Najeem, and O'Nils, Mattias
- Subjects
FIELD programmable gate arrays ,COMPUTER systems ,COMPUTER storage devices ,COMPUTER architecture ,COMPUTERS - Abstract
This paper presents a tool for automatic generation of the memory management implementation for spatial and temporal realtime video processing systems targeting field programmable gate arrays (FPGAs). The generator creates all the necessary memory and control functionality for a functional spatio-temporal video processing system. The required memory architecture is automatically optimized and mapped to the FPGAs' memory resources thus producing an efficient implementation in terms of used internal resources. The results in this paper show that the tool is able to efficiently and automatically generate all required memory management modules for both spatial and temporal real-time video processing systems. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
13. Characterization of a Reconfigurable Free-Space Optical Channel for Embedded Computer Applications with Experimental Validation Using Rapid Prototyping Technology.
- Author
-
Gil-Otero, Rafael, Lim, Theodore, and Snowdon, John F.
- Subjects
FREE space optical interconnects ,OPTICAL interconnects ,RAPID prototyping ,ADAPTIVE computing systems ,CROSSTALK ,COMPUTER systems - Abstract
Free-space optical interconnects (FSOIs) are widely seen as a potential solution to current and future bandwidth bottlenecks for parallel processors. In this paper, an FSOI system called optical highway (OH) is proposed. The OH uses polarizing beam splitter-liquid crystal plate (PBS/LC) assemblies to perform reconfigurable beam combination functions. The properties of the OH make it suitable for embedding complex network topologies such as completed connected mesh or hypercube. This paper proposes the use of rapid prototyping technology for implementing an optomechanical system suitable for studying the reconfigurable characteristics of a free-space optical channel. Additionally, it reports how the limited contrast ratio of the optical components can affect the attenuation of the optical signal and the crosstalk caused by misdirected signals. Different techniques are also proposed in order to increase the optical modulation amplitude (OMA) of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
14. Linear precoding design for massive MIMO based on the minimum mean square error algorithm.
- Author
-
Ge, Zhou and Haiyan, Wu
- Subjects
LINEAR systems ,MIMO systems ,MEAN square algorithms ,ERROR analysis in mathematics ,FEEDBACK control systems - Abstract
Compared with the traditional multiple-input multiple-output (MIMO) systems, the large number of the transmit antennas of massive MIMO makes it more dependent on the limited feedback in practical systems. In this paper, we study the problem of precoding design for a massive MIMO system with limited feedback via minimizing mean square error (MSE). The feedback from mobile users to the base station (BS) is firstly considered; the BS can obtain the quantized information regarding the direction of the channels. Then, the precoding is designed by considering the effect of both noise term and quantization error under transmit power constraint. Simulation results show that the proposed scheme is robust to the channel uncertainties caused by quantization errors. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
15. A low cost and fast controller architecture for multimedia data storage and retrieval to flash-based storage device.
- Author
-
Banerjee, Samiran and Mukhopadhyay, Sumitra
- Subjects
MULTIMEDIA systems ,INFORMATION storage & retrieval systems ,COMPUTER storage devices ,ELECTRONIC systems ,SYSTEMS on a chip - Abstract
Real-time multimedia data access plays an important role in electronic systems; as time goes by, with decrease in data processing speed and increase in communication time, storage time, and retrieval time, the overall response time increases for real-time applications. Therefore, in this paper, a novel real-time, fast, low-cost, system-on-chip (SoC) controller has been proposed and implemented where large volume of data can be efficiently stored and retrieved from flash memory cards. It is being implemented only using hardware description language (HDL) on a field programmable gate array (FPGA) chip without using any other on-board or external hardware resources or high-level languages. The entire controller architecture, in a single chip, contains five different modules and is designed using finite state machine (FSM)-based approach. The modules are card initialization module (CINM), idle module (IM), card read module (CRM), card write module (CWM), and decision module (DM). The architecture is completely synthesized for Spartan 3E xc3s500e-4-fg320 FPGA with only 5% of the total logic utilization. The experimental results tested for microSD, SD, and SDHC cards of different size, and these show that the architecture uses less hardware and clock cycles for card initialization and single/multiblock read/write procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
16. Vehicle networking data-upload strategy based on mobile cloud services.
- Author
-
Yang, Jie, Wang, Jin, Wan, Li, and Liu, Xiaobing
- Subjects
DATA transmission systems ,SOCIAL networks ,SOCIAL services ,DECISION making ,INFORMATION processing - Abstract
While traditional vehicle network communication architecture is based on special short-range communication, it is difficult to meet the demand for quality of service of vehicle networking data transmission. The relevant data can be uploaded to the server through the mobile gateway, which can be transmitted to the target vehicle by the decision of the server, which can then extend the data broadcast domain and greatly reduce the time delay of data transmission. Combined with the idea of mobile cloud services, a new network architecture and data transmission method is proposed in this paper. We first describe the specific process of the gateway service to the registration of cloud service information. Secondly, we propose a method to select the cloud service gateway. The method combines historical cloud data and real-time data, and dynamically determines the gateway service provider and the service scope of the service. Gateway access to services consumers in broadcast news, they considered the communication load, stability link channel quality and other performance parameters to select the best gateway service provider, and then transmitted the data to the gateway service provider and uploaded it to their cloud. Finally, the transmission performance of the proposed method is evaluated for different traffic scenarios. The results show that the proposed method can obtain a shorter transmission delay and ensure a higher transmission success rate and the theoretical analysis herein proves the validity of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
17. Feedback recurrent neural network-based embedded vector and its application in topic model.
- Author
-
Li, Lian-sheng, Gan, Sheng-jiang, and Yin, Xiang-dong
- Subjects
VOCABULARY ,ARTIFICIAL neural networks - Abstract
While mining topics in a document collection, in order to capture the relationships between words and further improve the effectiveness of discovered topics, this paper proposed a feedback recurrent neural network-based topic model. We represented each word as a one-hot vector and embedded each document into a low-dimensional vector space. During the process of document embedding, we applied the long short-term memory method to capture the backward relationships between words and proposed a feedback recurrent neural network to capture the forward relationships between words. In the topic model, we used the original and muted document pairs as positive samples and the original and random document pairs as negative samples to train the model. The experiments show that the proposed model consumes not only lower running time and memory but also has better effectiveness during topic analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
18. A novel method for the approximation of multiplierless constant matrix vector multiplication.
- Author
-
Aksoy, Levent, Flores, Paulo, and Monteiro, José
- Subjects
DIGITAL signal processing ,LINEAR programming ,MATRICES (Mathematics) ,COSINE transforms ,DIGITAL image processing - Abstract
Since human beings have limited perceptual abilities, in many digital signal processing (DSP) applications, e.g., image and video processing, the outputs do not need to be computed accurately. Instead, their computation can be approximated so that the area, delay, and/or power dissipation of the design can be reduced. This paper presents an approximation algorithm, called aura, for the multiplierless design of the constant matrix vector multiplication (CMVM) which is a ubiquitous operation in DSP systems. aura aims to tune the constants such that the resulting matrix leads to a CMVM design which requires the fewest adders/subtractors, satisfying the given error constraints. This paper also introduces its modified version, called aura-dc, which can reduce the delay of the CMVM operation with a small increase in the number of adders/subtractors. Experimental results show that the proposed algorithms yield significant reductions in the number of adders/subtractors with respect to the original realizations without violating the error constraints, and consequently lead to CMVM designs with less area, delay, and power dissipation. Moreover, they can generate alternative CMVM designs under different error constraints, enabling a designer to choose the one that fits best in an application. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
19. Adaptive Aloha anti-collision algorithms for RFID systems.
- Author
-
Zheng, Feng and Kaiser, Thomas
- Subjects
ALGORITHM research ,RADIO frequency identification systems - Abstract
In this paper, we propose two adaptive frame size Aloha algorithms, namely adaptive frame size Aloha 1 (AFSA1) and adaptive frame size Aloha 2 (AFSA2), for solving radio frequency identification (RFID) multiple-tag anti-collision problem. In AFSA1 and AFSA2, the frame size in the next frame is adaptively changed according to the real-time collision rate measured in the current frame. It is shown that AFSA1 and AFSA2 can significantly improve the transmission efficiency of RFID systems compared to the static Aloha, and AFSA2 produces transmission efficiency similar to that of the electronic product code (EPC) Q-selection algorithm (Variant II), while the mean identification delay of AFSA2 is much smaller than that of EPC Q-selection algorithm (Variant II). It is also shown that the transmission efficiency of AFSA2 and EPC Variant II is very close to its upper bound which is obtained by assuming that the reader knows the number of unidentified tags. It is worth noting that when the threshold of the collision rate is chosen to be 0.5 or 0.6, AFSA2 can maintain the transmission efficiency well above 0.65 for the case of a typical EPC code length of 96 bits and for the investigated range of tag population, i.e., from 2 to 1000, while keeping the mean identification delay below ten transmit contentions. Very light computational burden at the reader is needed: the reader needs only to measure the collision rate in the current frame and then to double or halve the frame size accordingly. No additional computational burden is required at the tag side. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
20. Symbolic execution and timed automata model checking for timing analysis of Java real-time systems.
- Author
-
Luckow, Kasper, Păsăreanu, Corina, and Thomsen, Bent
- Subjects
REAL-time computing ,JAVA programming language ,COMPUTER software execution - Abstract
This paper presents SymRT, a tool based on a combination of symbolic execution and real-time model checking for timing analysis of Java systems. Symbolic execution is used for the generation of a safe and tight timing model of the analyzed system capturing the feasible execution paths. The model is combined with suitable execution environment models capturing the timing behavior of the target host platform including the Java virtual machine and complex hardware features such as caching. The complete timing model is a network of timed automata which directly facilitates safe estimates of worst and best case execution time to be determined using the Uppaal model checker. Furthermore, the integration of the proposed techniques into the TetaSARTS tool facilitates reasoning about additional timing properties such as the schedulability of periodically and sporadically released Java real-time tasks (under specific scheduling policies), worst case response time, and more. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
21. Design and Architectures for Signal and Image Processing.
- Author
-
Rupp, Markus, Milojevic, Dragomir, and Gogniat, Guy
- Subjects
EMBEDDED computer systems ,SYSTEMS design - Abstract
The article discusses various reports published within the issue, including one about a technique for enforcing a flexible block size, disparity range, and frame rate for hardware-based embedded adaptive stereo-vision systems, one on dynamic partial recombination (DPR) in professional electronics applications, and another on fixed-point system design.
- Published
- 2008
- Full Text
- View/download PDF
22. A new test set compression scheme for circular scan.
- Author
-
Zhang, Ling, Mei, Junjin, and Yan, Bowu
- Subjects
SET theory ,DATA compression ,HARDWARE ,SELF-testing (Computer science) ,GENETIC algorithms ,PROBLEM solving - Abstract
A new test data compression scheme for circular scan is proposed in this paper. For circular scan, the response of the previous test vector is used as the next test vector’s template, and only the conflicting bits between the previous response and the next vector are required to be updated. To reduce the test data volume and test application time, the problem addressed here is minimizing the number of conflicting bits by optimally reordering test vectors. Each vector represents a city, and the number of conflicting bits between two test vectors is regarded as the distance between them. Thus, the problem corresponds to the travelling salesman problem (TSP), which is NP-complete. The genetic algorithm is used to solve this problem. The experimental results show that the proposed scheme could reduce the test data volume efficiently without any additional hardware cost. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Dynamically Reconfigurable Architectures.
- Author
-
Bergmann, Neil, Platzner, Marco, and Teich, Jürgen
- Subjects
FIELD programmable gate arrays ,INTERNET protocols - Abstract
The article discusses various reports published within the issue, including one on the prerouting of field programmable gate array (FPGA) cores in a dynamic reconfigurable system and another on the efficient integration of pipelined Internet protocol (IP) blocks.
- Published
- 2007
- Full Text
- View/download PDF
24. Adaptive security monitoring for next-generation routers
- Author
-
Mansour, Christopher and Chasaki, Danai
- Published
- 2019
- Full Text
- View/download PDF
25. Low-Power Distributed Kalman Filter for Wireless Sensor Networks.
- Author
-
Abdelgawad, A. and Bayoumi, M.
- Subjects
DISTRIBUTED algorithms ,WIRELESS sensor networks ,KALMAN filtering ,DISTRIBUTED computing ,ENERGY consumption - Abstract
Distributed estimation algorithms have attracted a lot of attention in the past few years, particularly in the framework of Wireless Sensor Network (WSN). Distributed Kalman Filter (DKF) is one of the most fundamental distributed estimation algorithms for scalable wireless sensor fusion.Most DKF methods proposed in the literature rely on consensus filters algorithm. The convergence rate of such distributed consensus algorithms typically depends on the network topology. This paper proposes a low-power DKF. The proposed DKF is based on a fast polynomial filter. The idea is to apply a polynomial filter to the network matrix that will shape its spectrumin order to increase the convergence rate by minimizing its second largest eigenvalue. Fast convergence can contribute to significant energy saving. In order to implement the DKF in WSN, more power saving is needed. Since multiplication is the atomic operation of Kalman filter, so saving power at the multiplication level can significantly impact the energy consumption of the DKF. This paper also proposes a novel light-weight and low-power multiplication algorithm. The proposed algorithm aims to decrease the number of instruction cycles, save power, and reduce the memory storage without increasing the code complexity or sacrificing accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
26. A Systematic Development Methodology for Mixed-Mode Behavioral Models of In-Vehicle Embedded Electronic Systems.
- Author
-
Muller, Candice, Valle, Maurizio, Prodanov, William, and Buzas, Roman
- Subjects
EMBEDDED computer systems ,RELIABILITY in engineering ,SAFETY factor in engineering ,SIMULATION methods & models ,COMPUTER networks - Abstract
The rising demands for safety, power-weight reduction, and comfort make the in-vehicle network of embedded electronic systems very complex. In particular system reliability is essential, especially because of the safety requirements. Test and verification of the entire in-vehicle network by means of behavioral simulations are each time more widely adopted. To this aim, behavioral models that faithfully represent the behavior of mixed-mode-embedded systems are essential for achieving reliable simulation results. This paper presents a systematic development methodology for mixed-mode behavioral models of in-vehicle-embedded systems. The methodology allows achieving accurate models, which provide reliable system simulations. The model development methodology is described and the results of the methodology applied to two case studies are presented: (1) the mixed-mode behavioral model of a generic Flexray physical layer transceiver and (2) the mixed-mode behavioral model of a CAN bus transceiver-integrated circuit. The simulation results show that behavioral simulations are much faster than transistor level simulations. Moreover, behavioral simulations are flexible, which allows quickly changing and verifying the communication network topology if compared with hardware prototypes. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
27. A Platform-Based Methodology for System-Level Mixed-Signal Design.
- Author
-
Nuzzo, Pierluigi, Xuening Sun, Chang-Ching Wu, De Bernardinis, Fernando, and Sangiovanni-Vincentelli, Alberto
- Subjects
ELECTRONIC systems ,SYSTEMS design ,ELECTRONIC circuits ,ROBUST control ,COMPUTER networks - Abstract
The complexity of today's embedded electronic systems as well as their demanding performance and reliability requirements are such that their design can no longer be tackled with ad hoc techniques while still meeting tight time to-market constraints. In this paper, we present a system level design approach for electronic circuits, utilizing the platform-based design (PBD) paradigm as the natural framework for mixed-domain design formalization. In PBD, a meet-in-the-middle approach allows systematic exploration of the design space through a series of top-down mapping of system constraints onto component feasibility models in a platform library, which is based on bottom-up characterizations. In this framework, new designs can be assembled from the precharacterized library components, giving the highest priority to design reuse, correct assembly, and efficient design flow from specifications to implementation. We apply concepts from design centering to enforce robustness to modeling errors as well as process, voltage, and temperature variations, which are currently plaguing embedded system design in deep-submicron technologies. The effectiveness of our methodology is finally shown on the design of a pipeline A/D converter and two receiver front-ends for UMTS and UWB communications. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
28. High Speed 3D Tomography on CPU, GPU, and FPGA.
- Author
-
GAC, Nicolas, Mancini, Stéphane, Desvignes, Michel, and Houzet, Dominique
- Subjects
POSITRON emission tomography ,CACHE memory ,TOMOGRAPHY ,SYSTEMS on a chip ,IMAGE reconstruction - Abstract
Back-projection (BP) is a costly computational step in tomography image reconstruction such as positron emission tomography (PET). To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET). The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache), when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC) to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU). [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
29. Stream Execution on Embedded Wide-Issue Clustered VLIW Architectures.
- Author
-
Yan, Shan and Lin, Bill
- Subjects
PARALLEL processing ,CARTOGRAPHY software ,MULTIMEDIA communications ,SYSTEMS on a chip ,SCALABILITY ,DIGITAL communications ,ALGORITHMS - Abstract
Very long instruction word- (VLIW-) based processors have become widely adopted as a basic building block in modern System-on-Chip designs. Advances in clustered VLIW architectures have extended the scalability of the VLIW architecture paradigm to a large number of functional units and very-wide-issue widths. A central challenge with wide-issue clustered VLIW architecture is the availability of programming and automated compiler methods that can fully utilize the available computational resources. Existing compilation approaches for clustered-VLIW architectures are based on extensions of previously developed scheduling algorithms that primarily focus on the maximization of instruction-level parallelism (ILP). However, many applications do not have sufficient ILP to fully utilize a large number of functional units. On the other hand, many applications in digital communications and multimedia processing exhibit enormous amounts of data-level parallelism (DLP). For these applications, the streaming programming paradigm has been developed to explicitly expose coarse-grained data-level parallelism as well as the locality of communication between coarse-grained computation kernels. In this paper, we investigate the mapping of stream programs to wide-issue clustered VLIW processors. Our work enables designers to leverage their existing investments in VLIW-based architecture platforms to harness the advantages of the stream programming paradigm. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
30. An Evaluation of Dynamic Partial Reconfiguration for Signal and Image Processing in Professional Electronics Applications.
- Author
-
Manet, Philippe, Maufroid, Daniel, Tosi, Leonardo, Gailliard, Gregory, Mulertt, Olivier, Di Ciano, Marco, Legat, Jean-Didier, Aulagnier, Denis, Gamrat, Christian, Liberati, Raffaele, La Barba, Vincenzo, Cuvelier, Pol, Rousseau, Bertrand, and Gelineau, Paul
- Subjects
IMAGE processing ,SIGNAL processing ,SOFTWARE radio ,PROFESSIONAL employees ,APPLICATION software ,DIGITAL image processing - Abstract
Signal and image processing applications require a lot of computing resources. For low-volume applications like in professional electronics applications, FPGA are used in combination with DSP and GPP in order to reach the performances required by the product roadmaps. Nevertheless, FPGA designs are static, which raises a flexibility issue with new complex or software defined applications like software-defined radio (SDR). In this scope, dynamic partial reconfiguration (DPR) is used to bring a virtualization layer upon the static hardware of FPGA. During the last decade, DPR has been widely studied in academia. Nevertheless, there are very few real applications using it, and therefore, there is a lack of feedback providing relevant issues to address in order to improve its applicability. This paper evaluates the interest and limitations when using DPR in professional electronics applications and provides guidelines to improve its applicability. It makes a fair evaluation based on experiments made on a set of signal and image processing applications. It identifies the missing elements of the design flow to use DPR in professional electronics applications. Finally, it introduces a fast reconfiguration manager providing an 84-time improvement compared to the vendor solution. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
31. An Efficient Segmental Bus-Invert Coding Method for InstructionMemory Data Bus Switching Reduction.
- Author
-
Ji Gu and Hui Guo
- Subjects
EMBEDDED computer systems ,SYSTEMS development ,VIRTUAL storage (Computer science) ,COMPUTER science ,DECODERS & decoding - Abstract
This paper presents a bus coding methodology for the instruction memory data bus switching reduction. Compared to the existing state-of-the-art multiway partial bus-invert (MPBI) coding which relies on data bit correlation, our approach is very effective in reducing the switching activity of the instruction data buses, since little bit correlation can be observed in the instruction data. Our experiments demonstrate that the proposed encoding can reduce up to 42% of switching activity, with an average of 30% reduction, while MPBI achieves just 17.6% reduction in switching activity. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
32. Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors.
- Author
-
Qadri, Muhammad Yasir and McDonald-Maier, Klaus D.
- Subjects
EMBEDDED computer systems ,CACHE memory ,MATHEMATICAL models ,COMPUTER architecture ,SYSTEMS design - Abstract
Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
33. Evaluation and Design Space Exploration of a Time-Division Multiplexed NoC on FPGA for Image Analysis Applications.
- Author
-
Linlin Zhang, Fresse, Virginie, Khalid, Mohammed, Houzet, Dominique, and Legrand, Anne-Claire
- Subjects
DATA transmission systems ,IMAGE processing ,NETWORKS on a chip ,FIELD programmable gate arrays ,ELECTRONIC circuits ,EMBEDDED computer systems - Abstract
The aim of this paper is to present an adaptable Fat Tree NoC architecture for Field Programmable Gate Array (FPGA) designed for image analysis applications. TraditionalNetwork on Chip (NoC) is not optimal for dataflow applications with large amount of data. On the opposite, point-to-point communications are designed from the algorithm requirements but they are expensives in terms of resource and wire. We propose a dedicated communication architecture for image analysis algorithms. This communication mechanism is a generic NoC infrastructure dedicated to dataflow image processing applications, mixing circuit-switching and packet-switching communications. The complete architecture integrates two dedicated communication architectures and reusable IP blocks. Communications are based on the NoC concept to support the high bandwidth required for a large number and type of data. For data communication inside the architecture, an efficient time-division multiplexed (TDM) architecture is proposed. This NoC uses a Fat Tree (FT) topology with Virtual Channels (VCs) and flit packet-switching with fixed routes. Two versions of the NoC are presented in this paper. The results of their implementations and their Design Space Exploration (DSE) on Altera Stratix II are analyzed and compared with a point-to-point communication and illustrated with a multispectral image application. Results show that a point-to-point communication scheme is not efficient for large amount of multispectral image data communications. An NoC architecture uses only 10% of the memory blocks required for a point-to-point architecture but seven times more logic elements. This resource allocation is more adapted to image analysis algorithms as memory elements are a critical point in embedded architectures. An FT NoC-based communication scheme for data transfers provides a more appropriate solution for resource allocation. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
34. SmartCell: An Energy Efficient Coarse-Grained Reconfigurable Architecture for Stream-Based Applications.
- Author
-
Cao Liang and Xinming Huang
- Subjects
SYSTEMS on a chip ,DATA transmission systems ,STREAMING technology ,SYSTOLIC array circuits ,WAFER-scale integration of circuits - Abstract
This paper presents SmartCell, a novel coarse-grained reconfigurable architecture, which tiles a large number of processor elements with reconfigurable interconnection fabrics on a single chip. SmartCell is able to provide high performance and energy efficient processing for stream-based applications. It can be configured to operate in various modes, such as SIMD, MIMD, and systolic array. This paper describes the SmartCell architecture design, including processing element, reconfigurable interconnection fabrics, instruction and control process, and configuration scheme. The SmartCell prototype with 64 PEs is implemented using 0.13 μm CMOS standard cell technology. The core area is about 8.5mm2, and the power consumption is about 1.6mW/MHz. The performance is evaluated through a set of benchmark applications, and then compared with FPGA, ASIC, and two well-known reconfigurable architectures including RaPiD and Montium. The results show that the SmartCell can bridge the performance and flexibility gap between ASIC and FPGA. It is also about 8% and 69% more energy efficient than Montium and RaPiD systems for evaluated benchmarks. Meanwhile, SmartCell can achieve 4 and 2 times more throughput gains when comparing with Montium and RaPiD, respectively. It is concluded that SmartCell system is a promising reconfigurable and energy efficient architecture for stream processing. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
35. An FPGA Implementation of a Parallelized MT19937 Uniform Random Number Generator.
- Author
-
Sriram, Vinay and Kearney, David
- Subjects
COMPUTERS ,RANDOM number generators ,FIELD programmable gate arrays ,SYSTEMS engineering ,EMBEDDED computer systems - Abstract
Recent times have witnessed an increase in use of high-performance reconfigurable computing for accelerating large-scale simulations. A characteristic of such simulations, like infrared (IR) scene simulation, is the use of large quantities of uncorrelated random numbers. It is therefore of interest to have a fast uniform random number generator implemented in reconfigurable hardware. While there have been previous attempts to accelerate the MT19937 pseudouniform random number generator using FPGAs we believe that we can substantially improve the previous implementations to develop a higher throughput and more areatime efficient design. Due to the potential for parallel implementation of random numbers generators, designs that have both a small area footprint and high throughput are to be preferred to ones that have the high throughput but with significant extra area requirements. In this paper, we first present a single port design and then present an enhanced 624 port hardware implementation of the MT19937 algorithm. The 624 port hardware implementation when implemented on a Xilinx XC2VP70-6 FPGA chip has a throughput of 119.6 x 10
9 32 bit random numbers per second which is more than 17x that of the previously best published uniform random number generator. Furthermore it has the lowest area time metric of all the currently published FPGA-based pseudouniform random number generators. [ABSTRACT FROM AUTHOR]- Published
- 2009
- Full Text
- View/download PDF
36. Efficient Processing of a Rainfall Simulation Watershed on an FPGA-Based Architecture with Fast Access to Neighbourhood Pixels.
- Author
-
Lee Seng Yeong, Christopher Wing Hong Ngau, Li-Minn Ang, and Kah Phooi Seng
- Subjects
FIELD programmable gate arrays ,WATERSHEDS ,ARCHITECTURE ,RAINFALL - Abstract
This paper describes a hardware architecture to implement the watershed algorithm using rainfall simulation. The speed of the architecture is increased by utilizing a multiple memory bank approach to allow parallel access to the neighbourhood pixel values. In a single read cycle, the architecture is able to obtain all five values of the centre and four neighbours for a 4-connectivity watershed transform. The storage requirement of the multiple bank implementation is the same as a single bank implementation by using a graph-based memory bank addressing scheme. The proposed rainfall watershed architecture consists of two parts. The first part performs the arrowing operation and the second part assigns each pixel to its associated catchment basin. The paper describes the architecture datapath and control logic in detail and concludes with an implementation on a Xilinx Spartan-3 FPGA. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
37. Performance Analysis of Bit-Width Reduced Floating-Point Arithmetic Units in FPGAs: A Case Study of Neural Network-Based Face Detector.
- Author
-
Yongsoon Lee, Younhee Choi, Seok-Bum Ko, and Moon Ho Lee
- Subjects
COMPUTER arithmetic & logic units ,ARTIFICIAL neural networks ,FLOATING-point arithmetic ,COST control ,ERRORS - Abstract
This paper implements a field programmable gate array- (FPGA-) based face detector using a neural network (NN) and the bitwidth reduced floating-point arithmetic unit (FPU). The analytical error model, using the maximum relative representation error (MRRE) and the average relative representation error (ARRE), is developed to obtain the maximum and average output errors for the bit-width reduced FPUs. After the development of the analytical error model, the bit-width reduced FPUs and an NN are designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results, along with the experimental (VHDL) results, are compared. The analytical results and the experimental results show conformity of shape. We demonstrate that incremented reductions in the number of bits used can produce significant cost reductions including area, speed, and power. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
38. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture.
- Author
-
Khan, Jehangir, Niar, Smail, Saghir, Mazen A. R., El-Hillali, Yassin, and Rivenq-Menhaj, Atika
- Subjects
APPLICATION-specific integrated circuits ,ELECTRONIC systems ,INTEGRATED circuits ,MULTIPROCESSORS ,INFORMATION technology - Abstract
This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC) architecture optimized for Multiple Target Tracking (MTT) in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets) within its field of view. As the number of targets increases, the computational needs of the MTT system also increasemaking it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
39. Enhanced Montgomery Multiplication on DSP Architectures for Embedded Public-Key Cryptosystems.
- Author
-
Gastaldo, P., Parodi, G., and Zunino, R.
- Subjects
ALGORITHMS ,DIGITAL signal processing ,MULTIPLICATION ,EMBEDDED computer systems ,INTEGRATED circuits ,COMPUTER systems - Abstract
Montgomery's algorithm is a popular technique to speed up modular multiplications in public-key cryptosystems. This paper tackles the efficient support of modular exponentiation on inexpensive circuitry for embedded security services and proposes a variant of the finely integrated product scanning (FIPS) algorithm that is targeted to digital signal processors. The general approach improves on the basic FIPS formulation by removing potential inefficiencies and boosts the exploitation of computing resources. The reformulation of the basic FIPS structure results in a general approach that balances computational efficiency and flexibility. Experimental results on commercial DSP platforms confirm both the method's validity and its effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
40. A System C-Based Design Methodology for Digital Signal Processing Systems.
- Author
-
Haubelt, Christian, Falk, Joachim, Keinert, Joachim, Schlichter, Thomas, Streubühr, Martin, Deyhle, Andreas, Hadert, Andreas, and Teich, Jürgen
- Subjects
DIGITAL signal processing ,EMBEDDED computer systems ,COMPUTER software ,SPACE exploration ,DECODERS (Electronics) - Abstract
Digital signal processing algorithms are of big importance in many embedded systems. Due to complexity reasons and due to the restrictions imposed on the implementations, new design methodologies are needed. In this paper, we present a SystemC-based solution supporting automatic design space exploration, automatic performance evaluation, as well as automatic system generation for mixed hardware/software solutions mapped onto FPGA-based platforms. Our proposed hardware/software codesign approach is based on a SystemC-based library called SysteMoC that permits the expression of different models of computation well known in the domain of digital signal processing. It combines the advantages of executability and analyzability of many important models of computation that can be expressed in SysteMoC. We will use the example of an MPEG-4 decoder throughout this paper to introduce our novel methodology. Results from a five-dimensional design space exploration and from automatically mapping parts of the MPEG-4 decoder onto a Xilinx FPGA platform will demonstrate the effectiveness of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
41. Design Considerations for Scalable High-Performance Vision Systems Embedded in Industrial Print Inspection Machines.
- Author
-
Fürtler, Johannes, Rössler, Peter, Brodersen, Jörg, Nachtnebel, Herbert, Mayer, Konrad J., Cadek, Gerhard, and Eckel, Christian
- Subjects
EMBEDDED computer systems ,OPTICAL quality control ,FIELD programmable gate arrays ,SIGNAL processing ,ALGORITHMS - Abstract
This paper describes the design of a scalable high-performance vision system which is used in the application area of optical print inspection. The system is able to process hundreds of megabytes of image data per second coming from several high-speed/high-resolution cameras. Due to performance requirements, some functionality has been implemented on dedicated hardware based on a field programmable gate array (FPGA), which is coupled to a high-end digital signal processor (DSP). The paper discusses design considerations like partitioning of image processing algorithms between hardware and software. The main chapters focus on functionality implemented on the FPGA, including low-level image processing algorithms (flat-field correction, image pyramid generation, neighborhood operations) and advanced processing units (programmable arithmetic unit, geometry unit). Verification issues for the complex system are also addressed. The paper concludes with a summary of the FPGA resource usage and some performance results. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
42. Hardware Architecture of Reinforcement Learning Scheme for Dynamic Power Management in Embedded Systems.
- Author
-
Prabha, Viswanathan Lakshmi and ChandraMonie, Elwin
- Subjects
ELECTRONIC systems ,POWER Computing computers ,REINFORCEMENT learning ,SIMULATION methods & models ,EMBEDDED computer systems - Abstract
Dynamic power management (DPM) is a technique to reduce power consumption of electronic systems by selectively shutting down idle components. In this paper, a novel and nontrivial enhancement of conventional reinforcement learning (RL) is adopted to choose the optimal policy out of the existing DPM policies. A hardware architecture evolved fromthe VHDL model of Temporal Difference RL algorithm is proposed in this paper, which can suggest the winner policy to be adopted for any given workload to achieve power savings. The effectiveness of this approach is also demonstrated by an event-driven simulator, which is designed using JAVA for power-manageable embedded devices. The results show that RL applied to DPM can lead up to 28% power savings. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
43. ARM-FPGA-based platform for reconfigurable wireless communication systems using partial reconfiguration.
- Author
-
Rihani, Mohamad-Al-Fadl, Mroue, Mohamad, Prévotet, Jean-Christophe, Nouvel, Fabienne, and Mohanna, Yasser
- Subjects
FIELD programmable gate arrays ,WIRELESS communications ,SYSTEMS on a chip - Abstract
Today, wireless devices generally feature multiple radio access technologies (LTE, WIFI, WIMAX,...) to handle a rich variety of standards or technologies.These devices should be intelligent and autonomous enough in order to either reach a given level of performance or automatically select the best available wireless technology according to standards availability. On the hardware side, system on chip (SoC) devices integrate processors and field-programmable gate array (FPGA) logic fabrics on the same chip with fast inter-connection. This allows designing software/hardware systems and implementing new techniques and methodologies that greatly improve the performance of communication systems. In these devices, Dynamic partial reconfiguration (DPR) constitutes a well-known technique for reconfiguring only a specific area within the FPGA while other parts continue to operate independently. To evaluate when it is advantageous to perform DPR, adaptive techniques have been proposed. They consist in reconfiguring parts of the system automatically according to specific parameters. In this paper, an intelligent wireless communication system aiming at implementing an adaptive OFDM-based transmitter and performing a vertical handover in heterogeneous networks is presented. An unified physical layer for WIFI-WIMAX networks is also proposed. The system was implemented and tested on a ZedBoard which features a Xilinx Zynq-7000-SoC. The performance of the system is described, and simulation results are presented in order to validate the proposed architecture. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
44. Efficient quantization and fixed-point representation for MIMO turbo-detection and turbo-demapping.
- Author
-
Rizk, Mostafa, Baghdadi, Amer, Jézéquel, Michel, Mohanna, Yasser, and Atat, Youssef
- Subjects
SIGNAL quantization ,DIGITAL communications ,WIRELESS communications - Abstract
In the domain of wireless digital communication, floating-point arithmetic is generally used to conduct performance evaluation studies of algorithms. This is typically limited to theoretical performance evaluation in terms of communication quality and error rates. For a practical implementation perspective, using fixed-point arithmetic instead of floating-point reduces significantly implementation costs in terms of area occupation and energy consumption. However, this implies a complex conversion process, particularly if the considered algorithm includes complex arithmetic operations with high accuracy requirements and if the target system presents many configuration parameters. In this context, the purpose of the paper is to present an efficient quantization and fixed-point representation for turbo-detection and turbo-demapping. The impact of floating-to-fixed-point conversion is illustrated upon the error-rate performance of the receiver for different system configurations. Only a slight degradation in the error-rate performance of the receiver is observed when implementing the detector and demapper modules which utilize the devised quantization and fixed-point arithmetic rather than floating-point arithmetic. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
45. A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads.
- Author
-
Zacheilas, Nikos and Kalogeraki, Vana
- Subjects
COMPUTER systems ,RESOURCE allocation ,CYBER physical systems ,INDUSTRIAL clusters - Abstract
In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon's EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks' parameters that allows us to further minimize the user's spending budget and the jobs' execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
46. On the assessment of probabilistic WCET estimates reliability for arbitrary program.
- Author
-
Milutinovic, Suzana, Abella, Jaume, and Cazorla, Francisco
- Subjects
WORST-case circuit analysis ,ARBITRARY constants ,COMPUTER software execution ,COMPUTER performance ,REAL-time computing - Abstract
Measurement-Based Probabilistic Timing Analysis (MBPTA) has been shown to be an industrially viable method to estimate the Worst-Case Execution Time (WCET) of real-time program running on processors including several high-performance features. MBPTA requires hardware/software support so that program execution time, and so its WCET, has a probabilistic behaviour and can be modelled with probabilistic and statistic methods. MBPTA also requires that those events with high impact on execution time are properly captured in the ( R) runs made at analysis time. Thus, a representativeness argument is needed to provide evidence that those events have been captured. This paper addresses the MBPTA representativeness problems caused by set-associative caches and presents a novel representativeness validation method (ReVS) for cache placement. Building on cache simulation, ReVS explores the probability and impact (miss count) of those cache placements that can occur during operation. ReVS determines the number of runs R , which can be higher than R, such that those cache placements with the highest impact are effectively observed in the analysis runs, and hence, MBPTA can be reliably applied to estimate the WCET. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
47. Heart rate spectrum analysis for sleep quality detection.
- Author
-
Scherz, Wilhelm, Fritz, Daniel, Velicu, Oana, Seepold, Ralf, and Madrid, Natividad
- Subjects
HEART rate monitoring ,ELECTROCARDIOGRAPHY ,POLYSOMNOGRAPHY ,HEALTH ,SLEEP ,FOURIER transforms - Abstract
To evaluate the quality of sleep, it is important to determine how much time was spent in each sleep stage during the night. The gold standard in this domain is an overnight polysomnography (PSG). But the recording of the necessary electrophysiological signals is extensive and complex and the environment of the sleep laboratory, which is unfamiliar to the patient, might lead to distorted results. In this paper, a sleep stage detection algorithm is proposed that uses only the heart rate signal, derived from electrocardiogram (ECG), as a discriminator. This would make it possible for sleep analysis to be performed at home, saving a lot of effort and money. From the heart rate, using the fast Fourier transformation (FFT), three parameters were calculated in order to distinguish between the different sleep stages. ECG data along with a hypnogram scored by professionals was used from Physionet database, making it easy to compare the results. With an agreement rate of 41.3%, this approach is a good foundation for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
48. Prototypic implementation and evaluation of an artificial DNA for self-descripting and self-building embedded systems.
- Author
-
Brinkschulte, Uwe
- Subjects
CONSTRUCTION materials ,STRUCTURAL design ,BUILDING logistics - Abstract
Embedded systems are growing more and more complex because of the increasing chip integration density, larger number of chips in distributed applications, and demanding application fields (e.g., in cars and in households). Bio-inspired techniques like self-organization are a key feature to handle this complexity. However, self-organization needs a guideline for setting up and managing the system. In biology the structure and organization of a system is coded in its DNA. In this paper we present an approach to use an artificial DNA for that purpose. Since many embedded systems can be composed from a limited number of basic elements, the structure and parameters of such systems can be stored in a compact way representing an artificial DNA deposited in each processor core. This leads to a self-describing system. Based on the DNA, the self-organization mechanisms can build the system autonomously providing a self-building system. System repair and optimization at runtime are also possible, leading to higher robustness, dependability, and flexibility. We present a prototypic implementation and conduct a real-time evaluation using a flexible robot vehicle. Depending on the DNA, this vehicle acts as a self-balancing vehicle, an autonomous guided vehicle, a follower, or a combination of these. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
49. A novel power model for future heterogeneous 3D chip-multiprocessors in the dark silicon age
- Author
-
Asad, Arghavan, Dorostkar, Aniseh, and Mohammadi, Farah
- Published
- 2018
- Full Text
- View/download PDF
50. Efficient embedded architectures for fast-charge model predictive controller for battery cell management in electric vehicles
- Author
-
Madsen, Anne K. and Perera, Darshika G.
- Published
- 2018
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.