231 results on '"Theo Ungerer"'
Search Results
2. PIMP My Many-Core: Pipeline-Integrated Message Passing
- Author
-
Jörg Mische, Martin Frieb, Alexander Stegmeier, and Theo Ungerer
- Subjects
0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,02 engineering and technology ,ddc:004 ,Software ,020202 computer hardware & architecture ,Information Systems ,Theoretical Computer Science - Abstract
To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.
- Published
- 2020
- Full Text
- View/download PDF
3. Trustworthy self-optimization for organic computing environments using multiple simultaneous requests
- Author
-
Theo Ungerer and Nizar Msadek
- Subjects
Self-organization ,Service (systems architecture) ,Computer science ,business.industry ,Distributed computing ,020206 networking & telecommunications ,Workload ,02 engineering and technology ,Organic computing ,Self-optimization ,Autonomic computing ,Trustworthiness ,Hardware and Architecture ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Software ,Computer network - Abstract
Open distributed systems are rapidly getting more and more complex. Therefore, it is essential that such systems will be able to adapt autonomously to changes in their environment. They should be characterized by so-called self-x properties such as self-configuration, self-optimization and self-healing. The autonomous optimization of nodes at runtime in open distributed environments is a crucial part for developing self-optimizing systems. In this paper, we present a self-optimization approach that does not only consider pure load-balancing but also takes into account trust to improve the assignment of important services to trustworthy nodes. Our approach uses different optimization strategies to determine whether a service should be transferred to another node or not. The evaluation results showed that the proposed approach is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services, i.e., the achieved availability was no lower than 85% of the maximum theoretical availability value.
- Published
- 2017
- Full Text
- View/download PDF
4. Investigating Transactional Memory for High Performance Embedded Systems
- Author
-
Sebastian Weis, Christian Piatka, Theo Ungerer, Florian Haas, Sebastian Altmeyer, and Rico Amslinger
- Subjects
Transaction management ,Transactional leadership ,business.industry ,Abort ,Computer science ,Embedded system ,Factor (programming language) ,Transactional memory ,Workload ,business ,computer ,computer.programming_language - Abstract
We present a Transaction Management Unit (TMU) for Hardware Transactional Memories (HTMs). Our TMU enables three different contention management strategies, which can be applied according to the workload. Additionally, the TMU enables unbounded transactions in terms of size. Our approach tackles two challenges of traditional HTMs: (1) potentially high abort rates, (2) missing support for unbounded transactions. By enhancing a simulator with a transactional memory and our TMU, we demonstrate that our TMU achieves speedups of up to 4.2 and reduces abort rates by a factor of up to 11.6 for some of the STAMP benchmarks.
- Published
- 2020
- Full Text
- View/download PDF
5. Hardware multiversioning for fail-operational multithreaded applications
- Author
-
Florian Haas, Sebastian Weis, Theo Ungerer, Christian Piatka, Rico Amslinger, and Sebastian Altmeyer
- Subjects
Triple modular redundancy ,Multi-core processor ,Computer science ,business.industry ,Transactional memory ,020206 networking & telecommunications ,Fault tolerance ,02 engineering and technology ,Lockstep ,Thread (computing) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,ddc:004 ,Field-programmable gate array ,business ,Computer hardware - Abstract
Modern safety-critical embedded applications like autonomous driving need to be fail-operational. At the same time, high performance and low power consumption are demanded. A common way to achieve this is the use of heterogeneous multi-cores. When applied to such systems, prevalent fault tolerance mechanisms suffer from some disadvantages: Some (e.g. triple modular redundancy) require a substantial amount of duplication, resulting in high hardware costs and power consumption. Others (e.g. lockstep) require supplementary checkpointing mechanisms to recover from errors. Further approaches (e.g. software-based process-level redundancy) cannot handle the indeterminism introduced by multithreaded execution. This paper presents a novel approach for fail-operational systems using hardware transactional memory, which can also be used for embedded systems running heterogeneous multi-cores. Each thread is automatically split into transactions, which then execute redundantly. The hardware transactional memory is extended to support multiple versions, which allows the reproduction of atomic operations and recovery in case of an error. In our FPGA-based evaluation, we executed the PARSEC benchmark suite with fault tolerance on 12 cores.
- Published
- 2020
6. Support for the logical execution time model on a time-predictable multicore processor
- Author
-
Florian Kluge, Martin Schoeberl, and Theo Ungerer
- Subjects
Multi-core processor ,Computer science ,Principle of compositionality ,Message passing ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Logical execution time ,Execution time ,Bottleneck ,020202 computer hardware & architecture ,Shared memory ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Engineering (miscellaneous) - Abstract
The logical execution time (LET) model increases the compositionality of real-time task sets. Removal or addition of tasks does not influence the communication behavior of other tasks. In this work, we extend a multicore operating system running on a time-predictable multicore processor to support the LET model. For communication between tasks we use message passing on a time-predictable network-on-chip to avoid the bottleneck of shared memory. We report our experiences and present results on the costs in terms of memory and execution time.
- Published
- 2016
- Full Text
- View/download PDF
7. WCTT bounds for MPI primitives in the PaterNoster NoC
- Author
-
Jörg Mische, Theo Ungerer, Martin Frieb, and Alexander Stegmeier
- Subjects
020203 distributed computing ,Schedule ,Computer science ,02 engineering and technology ,Parallel computing ,Multiplexing ,020202 computer hardware & architecture ,Tree traversal ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Feature (machine learning) ,Communication source ,Message size ,Engineering (miscellaneous) ,Scope (computer science) - Abstract
This paper applies several variants of application independent time-division multiplexing to MPI primitives and investigates their applicability for different scopes of communication. Thereby, the scopes are characterized by the size of the network-on-chip, the number of participating nodes and the message size sent to each receiver or received from each sender, respectively. The evaluation shows that none of the observed variants feature the lowest worst-case traversal time in all situations. Instead there are multiple schedule variants which each perform best in a different scope of communication parameters.
- Published
- 2016
- Full Text
- View/download PDF
8. A functional programming model for embedded dataflow applications
- Author
-
Florian Haas, Theo Ungerer, Christoph Kuhbacher, and Christian Mellwig
- Subjects
010302 applied physics ,Functional programming ,Dataflow ,Computer science ,Fault tolerance ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Shared memory ,SPARK (programming language) ,Computer cluster ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,x86 ,Execution model ,computer ,computer.programming_language - Abstract
In this paper, we present a functional programming model and a dataflow execution model similar to distributed computing frameworks, like Apache Spark and Flink. Our programming and execution model is suitable for any platform, although its main target are safety-critical embedded systems. Therefore, we emphasize on low overhead, timing analyzability, and potential support for fault tolerance. We implemented our design for the x86 shared memory platform and showed that the performance is comparable to the performance of OpenMP.
- Published
- 2019
9. PIMP My Many-Core: Pipeline-Integrated Message Passing
- Author
-
Theo Ungerer, Jörg Mische, Alexander Stegmeier, and Martin Frieb
- Subjects
business.industry ,Address space ,Computer science ,Message passing ,020207 software engineering ,02 engineering and technology ,Pipeline (software) ,020202 computer hardware & architecture ,Shared memory ,Theory of computation ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,business ,Direct memory access ,Computer network - Abstract
To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing.
- Published
- 2019
- Full Text
- View/download PDF
10. Memristoren für zukünftige Rechnersysteme
- Author
-
Theo Ungerer, Wolfgang Karl, and Dietmar Fey Fey
- Subjects
Physics ,Gynecology ,medicine.medical_specialty ,medicine ,Computer Science Applications ,Information Systems - Abstract
Als Memristor bezeichnet man eine Klasse von elektronischen Zweitor-Bauelementen, deren Strom-/Spannungsverlauf eine zumeist durch den Nullpunkt verlaufende eingeklemmte Hystereseschleife aufweist („if it’s pinched it’s a memristor“). Memristoren bieten interessante Eigenschaften wie hohe Speicherdichten, niedrige elektrische Leistungen beim Schreiben und Lesen, Multibit-Speicherfahigkeit und CMOS-kompatbile Herstellungsprozesse. Daruber hinaus lassen sich Memristoren nicht nur zum Speichern, sondern auch zum Verarbeiten von Daten nutzen. Memristoren werden zukunftige Rechensysteme auf verschiedenen Ebenen verandern und mehr in Richtung hin zu Speicher-zentrierten Architekturen verschieben. Dies beginnt bei noch weitgehend konventionellen Ansatzen wie den sog. Storage-Class-Memories, einer neuen Speicherhierarchieebene zwischen Arbeits- und Hintergrundspeicher, und endet bei unkonventionellen Near- und In-Memory-Architekturen, in denen Speichern und Verarbeiten von Daten ohne raumlich grose und damit vergleichsweise Energie-intensive Datenbewegen zwischen Prozessoren und Speicher stattfindet. Ferner bieten Memristoren durch die dem biologischen Vorbild eines neuronalen Netzes weitgehend entsprechende direkte Nachbildung von Synapsen als flexible elektrische Widerstande vielfaltige Moglichkeiten zur Realisierung neuromorpher und biologisch-inspirierter Schaltkreise und Architekturen.
- Published
- 2020
- Full Text
- View/download PDF
11. A trustworthy, fault-tolerant and scalable self-configuration algorithm for Organic Computing systems
- Author
-
Rolf Kiefhaber, Nizar Msadek, and Theo Ungerer
- Subjects
Self-organization ,business.industry ,Computer science ,Distributed computing ,Fault tolerance ,Organic computing ,Internet hosting service ,Load balancing (computing) ,Contract Net Protocol ,Trustworthiness ,Hardware and Architecture ,Scalability ,business ,Algorithm ,Software ,Computer network - Abstract
The growing complexity of today's computing systems requires a large amount of administration, which poses a serious challenging task for manual administration. Therefore, new ways have to be found to autonomously manage them. They should be characterized by so-called self-x properties such as self-configuration, self-optimization, self-healing and self-protection. The autonomous assignment of services to nodes in a distributed way is a crucial part for developing self-configuring systems. In this paper, we introduce a self-configuration algorithm for Organic Computing systems, which aims on the one hand to equally distribute the load of services on nodes as in a typical load balancing scenario and on the other hand to assign services with different importance levels to nodes so that the more important services are assigned to more trustworthy nodes. Furthermore, the proposed algorithm includes a fault handling mechanism enabling the system to continue hosting services even in the presence of faults. The evaluation indicates that the proposed approach is suitable for large scale and distributed systems.
- Published
- 2015
- Full Text
- View/download PDF
12. Redundant Execution on Heterogeneous Multi-cores Utilizing Transactional Memory
- Author
-
Theo Ungerer, Rico Amslinger, Florian Haas, Christian Piatka, and Sebastian Weis
- Subjects
010302 applied physics ,Multi-core processor ,Computer science ,business.industry ,Transactional memory ,Fault tolerance ,02 engineering and technology ,Lockstep ,01 natural sciences ,020202 computer hardware & architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Cache ,Current (fluid) ,business - Abstract
Cycle-by-cycle lockstep execution as implemented by current embedded processors is unsuitable for energy-efficient heterogeneous multi-cores, because the different cores are not cycle synchronous. Furthermore, current and future safety-critical applications demand fail-operational execution, which requires mechanisms for error recovery.
- Published
- 2018
- Full Text
- View/download PDF
13. Analysing real-time behaviour of collective communication patterns in MPI
- Author
-
Jörg Mische, Theo Ungerer, Alexander Stegmeier, and Martin Frieb
- Subjects
Computer science ,Process (engineering) ,Computation ,Distributed computing ,02 engineering and technology ,020202 computer hardware & architecture ,Collective communication ,Shared memory ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Single-core ,Isolation (database systems) ,Massively parallel - Abstract
Worst-case execution time (WCET) analysis is crucial for designing real-time systems. While the WCET of tasks in a single core system can be upper bounded in isolation, the tasks in a manycore system are subject to shared memory interferences which impose high overestimation of the WCET bounds. However, manycore-based massively parallel applications will enter the area of real-time systems in the years ahead. Explicit message-passing and a clear separation of computation and communication facilitates WCET analysis for those programs. Thereby, the separation is especially ensured if applying collective communication.We propose a process of analysing state-of-the-art communication patterns with respect to worst case timing. As MPI is the standard for performing collective communication, we use it to show how to evaluate the timing behaviour in detail. We compare different communication patterns and show the tremendous impact of choosing an appropriate one.
- Published
- 2018
14. Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips
- Author
-
Martin Frieb, Jörg Mische, Theo Ungerer, and Alexander Stegmeier
- Subjects
SIMPLE (military communications protocol) ,Computer science ,business.industry ,Network on ,02 engineering and technology ,Execution time ,020202 computer hardware & architecture ,Software development process ,Software ,020204 information systems ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Interrupt ,business ,Computer hardware ,Buffer overflow - Abstract
Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.
- Published
- 2018
- Full Text
- View/download PDF
15. A hard real-time capable multi-core SMT processor
- Author
-
Eduardo Quinones, Stefan Metzlaff, Theo Ungerer, Jörg Mische, Mike Gerdes, Marco Paolieri, Sascha Uhrig, and Francisco J. Cazorla
- Subjects
Multi-core processor ,Computer science ,02 engineering and technology ,Parallel computing ,Execution time ,020202 computer hardware & architecture ,Task (computing) ,Worst-case execution time ,Hardware and Architecture ,020204 information systems ,Multithreading ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Predictability ,Software - Abstract
Hard real-time applications in safety critical domains require high performance and time analyzability. Multi-core processors are an answer to these demands, however task interferences make multi-cores more difficult to analyze from a worst-case execution time point of view than single-core processors. We propose a multi-core SMT processor that ensures a bounded maximum delay a task can suffer due to inter-task interferences. Multiple hard real-time tasks can be executed on different cores together with additional non real-time tasks. Our evaluation shows that the proposed MERASA multi-core provides predictability for hard real-time tasks and also high performance for non hard real-time tasks.
- Published
- 2013
- Full Text
- View/download PDF
16. Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support
- Author
-
Sebastian Weis, Theo Ungerer, Florian Haas, Gilles Pokam, and Youfeng Wu
- Subjects
010302 applied physics ,Multi-core processor ,Xeon ,business.industry ,Computer science ,Transactional memory ,Fault tolerance ,02 engineering and technology ,Lockstep ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,x86 ,Instrumentation (computer programming) ,business - Abstract
The demand for fault-tolerant execution on high performance computer systems increases due to higher fault rates resulting from smaller structure sizes. As an alternative to hardware-based lockstep solutions, software-based fault-tolerance mechanisms can increase the reliability of multi-core commercial-of-the-shelf (COTS) CPUs while being cheaper and more flexible. This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family. We leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback. Redundant execution of processes and signature-based comparison of their computations provides error detection, and transactional wrapping enables error recovery. Existing applications are enhanced towards fault-tolerant redundant execution by post-link binary instrumentation. Hardware enhancements to further increase the applicability of the approach are proposed and evaluated with SPEC CPU 2006 benchmarks. The resulting performance overhead is 47% on average, assuming the existence of the proposed hardware support.
- Published
- 2017
- Full Text
- View/download PDF
17. Evaluation of fine-grained parallelism in AUTOSAR applications
- Author
-
Bert Bodekker, Sebastian Kehr, Theo Ungerer, Milos Panic, Christian Bradatsch, Alexander Stegmeier, and Dave George
- Subjects
010302 applied physics ,Multi-core processor ,Computer science ,Serialization ,Legacy system ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Task (computing) ,AUTOSAR ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Parallelism (grammar) ,Algorithmic skeleton ,Critical path method - Abstract
Parallelization of AUTOSAR legacy software is a fundamental step to exploit the performance of multi-core electronic control units (ECUs). However, communication between runnables causes serialization and intra-task parallelization can therefore introduce large idle intervals, if a task contains a long critical path. Distributing the instructions of a runnable over cores (fine-grained parallelism) can reduce the serialization to a shorter time, but this requires an efficient and timing analyzable implementation. This paper investigates the efficiency of fine-grained parallelism for reducing the worst-case execution time in automotive applications. A pattern-supported parallelization approach is applied to extract parallelism of runnables in a structured way. Algorithmic skeletons are used to implement fine-grained parallelism in a dynamic (assignment at runtime) and in a static (a priori assignment) way. The performance evaluation showed that the static assignment is as efficient as a state-of-the-art barrier. Thereby, parallelism is explicitly expressed in a model and implemented in a timing analyzable way.
- Published
- 2017
18. Adapting TDMA arbitration for measurement-based probabilistic timing analysis
- Author
-
Carles Hernandez, Eduardo Quinones, Francisco J. Cazorla, Theo Ungerer, Jaume Abella, Milos Panic, and Barcelona Supercomputing Center
- Subjects
Probabilistic analysis ,Computer Networks and Communications ,Computer science ,Time division multiple access ,Context (language use) ,02 engineering and technology ,01 natural sciences ,Software ,Worst-case execution time ,Artificial Intelligence ,0103 physical sciences ,Arbitration policy ,0202 electrical engineering, electronic engineering, information engineering ,Probabilistic database systems ,Probabilistic analysis of algorithms ,Processor design ,010302 applied physics ,business.industry ,Enginyeria electrònica [Àrees temàtiques de la UPC] ,Static timing analysis ,Time randomization ,Programació en temps real ,020202 computer hardware & architecture ,Task (computing) ,Hardware and Architecture ,Embedded system ,Timing circuits--Design and construction--Data processing ,Probabilitats ,Processors, High performance ,business - Abstract
Critical Real-Time Embedded Systems require functional and timing validation to prove that they will perform their functionalities correctly and in time. For timing validation, a bound to the Worst-Case Execution Time (WCET) for each task is derived and passed as an input to the scheduling algorithm to ensure that tasks execute timely. Bounds to WCET can be derived with deterministic timing analysis (DTA) and probabilistic timing analysis (PTA), each of which relies upon certain predictability properties coming from the hardware/software platform beneath. In particular, specific hardware designs are needed for both DTA and PTA, which challenges their adoption by hardware vendors. This paper makes a step towards reconciling the hardware needs of DTA and PTA timing analyses to increase the likelihood of those hardware designs to be adopted by hardware vendors. In particular, we show how Time Division Multiple Access (TDMA), which has been regarded as one of the main DTA-compliant arbitration policies, can be used in the context of PTA and, in particular, of the industrially-friendly Measurement-Based PTA (MBPTA). We show how the execution time measurements taken as input for MBPTA need to be padded to obtain reliable and tight WCET estimates on top of TDMA-arbitrated hardware resources with no further hardware support. Our results show that TDMA delivers tighter WCET estimates than MBPTA-friendly arbitration policies, whereas MBPTA-friendly policies provide higher average performance. Thus, the best policy to choose depends on the particular needs of the end user. The research leading to these results has been funded by the EU FP7 under grant agreement no. 611085 (PROXIMA) and 287519 (parMERASA). This work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Miloˇs Pani´c is funded by the Spanish Ministry of Education under the FPU grant FPU12/05966. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.
- Published
- 2017
- Full Text
- View/download PDF
19. Minimally buffered deflection routing with in-order delivery in a torus
- Author
-
Martin Frieb, Theo Ungerer, Christian Mellwig, Jörg Mische, and Alexander Stegmeier
- Subjects
010302 applied physics ,business.industry ,Computer science ,Torus ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Deflection routing ,Packet switching ,Deflection (engineering) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Wormhole ,business ,Computer network - Abstract
Bufferless deflection routing is a serious alternative to wormhole flow control and packet switching. It is based on the principle of deflecting a flit to a non-optimal route instead of buffering it, when two flits compete for the same link. The major weakness of deflection is the exploding number of misrouted flits at high network load, which increases the duration of flits within the network and requires to reassemble the flits at the receiver. These deflections can be reduced significantly by adding a small side buffer instead of always deflecting flits. In the presented approach, the side buffer is complemented by a restricted deflection policy that preserves the flit order: x-y-routing in an unidirectional 2D-torus ensures that collisions are impossible, as long as a flit is transported in the same direction. Only at the transition from x- to y-direction, collisions may happen and are avoided by controlled in-order deflection in x-direction. In-order delivery not only simplifies the arbitration logic, but avoids costly mechanisms for livelock-prevention and reassembly of flits at the receiver.
- Published
- 2017
20. An Efficient Replication Approach based on Trust for Distributed Self-healing Systems
- Author
-
Theo Ungerer and Nizar Msadek
- Subjects
010302 applied physics ,060102 archaeology ,Computer science ,Self-healing ,0103 physical sciences ,0601 history and archaeology ,06 humanities and the arts ,Computational biology ,01 natural sciences ,Replication (computing) - Published
- 2017
- Full Text
- View/download PDF
21. Reduced Complexity Many-Core: Timing Predictability Due to Message-Passing
- Author
-
Theo Ungerer, Jörg Mische, Alexander Stegmeier, and Martin Frieb
- Subjects
business.industry ,Computer science ,Message passing ,Message Passing Interface ,Static timing analysis ,02 engineering and technology ,Interference (wave propagation) ,020202 computer hardware & architecture ,Shared memory ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Predictability ,Architecture ,business ,Cache coherence ,Computer network - Abstract
The Reduced Complexity Many-Core architecture (RC/MC) targets to simplify timing analysis by increasing the predictability of all components. Since shared memory interference is a major source of pessimism in many-core systems, fine-grained message passing between small cores with private memories is used instead of a global shared memory.
- Published
- 2017
- Full Text
- View/download PDF
22. Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations
- Author
-
Lucian Vintan, Theo Ungerer, Horia Calborean, and Ralf Jahr
- Subjects
Speedup ,Computer Networks and Communications ,Design space exploration ,Computer science ,business.industry ,Parallel computing ,Program optimization ,Multi-objective optimization ,Computer Science Applications ,Theoretical Computer Science ,Microarchitecture ,Computational Theory and Mathematics ,Scalability ,Code (cryptography) ,Engineering design process ,business ,Software ,Computer hardware - Abstract
Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in the vast design space of a processor architecture together with a tool for code optimizations and hence evaluate both automatically. As example, we use the Grid ALU Processor (GAP) and its postlink optimizer called GAPtimize, which can apply feedback-directed and platform-specific code optimizations. Our results show that FADSE is able to cope with both design spaces. Less than 25% of the maximal reasonable hardware effort for the scalable elements of the GAP is enough to achieve the processor's performance maximum. With a performance reduction tolerance of 10%, the necessary hardware complexity can be further reduced by about two-thirds. The found high-quality configurations are analyzed, exhibiting strong relationships between the parameters of the GAP, the distribution of complexity, and the total performance. These performance numbers can be improved by applying code optimizations concurrently to optimizing the hardware parameters. FADSE can find near-optimal configurations by effectively combining and selecting parameters for hardware and code optimizations in a short time. The maximum observed speedup is 15%. With the use of code optimizations, the maximum possible reduction of the hardware resources, while sustaining the same performance level, is 50%.Copyright © 2012 John Wiley & Sons, Ltd.
- Published
- 2012
- Full Text
- View/download PDF
23. MANJAC — Ein Many-Core-Emulator auf Multi-FPGA-Basis
- Author
-
Theo Ungerer, Sascha Uhrig, Sebastian Schlingmann, and Christian Bradatsch
- Subjects
Many core ,Basis (linear algebra) ,business.industry ,Computer science ,Embedded system ,business ,Field-programmable gate array - Published
- 2011
- Full Text
- View/download PDF
24. The Multi-Core Challenge
- Author
-
Theo Ungerer
- Subjects
Multi-core processor ,Many core ,General Computer Science ,Computer science ,business.industry ,Embedded system ,business ,Die (integrated circuit) - Abstract
Multi-cores are the contemporary solution to reach a high performance without the need of increasing the clock frequency. Multi-cores integrate two or more cores, i. e., processors, on a single die. Future multi-cores may comprise hundreds or even thousands of cores and arouse challenges for processor design, system architecture, programming languages, and application programs.
- Published
- 2010
- Full Text
- View/download PDF
25. POSTER: fault-tolerant execution on COTS multi-core processors with hardware transactional memory support
- Author
-
Youfeng Wu, Florian Haas, Sebastian Weis, Gilles Pokam, and Theo Ungerer
- Subjects
010302 applied physics ,Hardware architecture ,Multi-core processor ,Computer science ,business.industry ,Transactional memory ,Fault tolerance ,02 engineering and technology ,Lockstep ,computer.software_genre ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Software fault tolerance ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,Hardware compatibility list ,business ,computer - Abstract
Software-based fault-tolerance mechanisms can increase the reliability of multi-core CPUs while being cheaper and more flexible than hardware solutions like lockstep architectures. However, checkpoint creation, error detection and correction entail high performance overhead if implemented in software. We propose a software/hardware hybrid approach, which leverages Intel's hardware transactional memory (TSX) to support implicit checkpoint creation and fast rollback. Hardware enhancements are proposed and evaluated, leading to a resulting performance overhead of 19% on average.
- Published
- 2016
26. Trust as important factor for building robust self-x systems
- Author
-
Theo Ungerer and Nizar Msadek
- Subjects
Knowledge management ,business.industry ,Computer science ,Scale (chemistry) ,Control (management) ,Organic computing ,Social constructionism ,Autonomic computing ,Risk analysis (engineering) ,Robustness (computer science) ,Factor (programming language) ,business ,Baseline (configuration management) ,computer ,computer.programming_language - Abstract
Open self-x systems of a very large scale – interconnecting several thousand of autonomous and heterogeneous entities – become increasingly complex in their organisational structures. This is due to the fact that such systems are typically restricted to a local view in the sense that they have no global instance, which can be responsible for controlling or managing the whole system. Therefore, new ways have to be found to develop and manage them. An essential aspect that has recently gained much attention in this kind of systems is the social concept of trust. Using appropriate trust mechanisms, entities in the system can have a clue about which entities to cooperate with. This is very important to improve the robustness of self-x systems, which depends on a cooperation of autonomous entities. The contributions of this chapter are trustworthy concepts and generic self-x algorithms with the ability to self-configure, self-optimise, and self-heal that work in a distributed manner and with no central control to ensure robustness. Some experimental results of our algorithms are reported to show the improvement that can be obtained compared with the baseline measurements.
- Published
- 2016
27. Architectural support for fault tolerance in a teradevice dataflow system
- Author
-
Avi Mendelson, Sebastian Weis, Arne Garbade, Theo Ungerer, Roberto Giorgi, and Bernhard Fechner
- Subjects
010302 applied physics ,Coarse-grained dataflow ,Fault tolerance ,Fault detection ,Recovery ,Reliability ,Dataflow ,Computer science ,Distributed computing ,Process (computing) ,02 engineering and technology ,01 natural sciences ,Fault detection and isolation ,020202 computer hardware & architecture ,Theoretical Computer Science ,0103 physical sciences ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Execution model ,Software ,Dataflow architecture ,Information Systems - Abstract
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores on a single die, requests new execution paradigms. Coarse-grained dataflow execution models are able to exploit such parallelism, since they combine side-effect free execution and reduced synchronization overhead. However, the terascale transistor integration of such future chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means dynamic fault-tolerance mechanisms have to be an essential part of such future system. In this paper, we present a fault tolerant architecture for a coarse-grained dataflow system, leveraging the inherent features of the dataflow execution model. In detail, we provide methods to dynamically detect and manage permanent, intermittent, and transient faults during runtime. Furthermore, we exploit the dataflow execution model for a thread-level recovery scheme. Our results showed that redundant execution of dataflow threads can efficiently make use of underutilized resources in a multi-core, while the overhead in a fully utilized system stays reasonable. Moreover, thread-level recovery suffered from moderate overhead, even in the case of high fault rates.
- Published
- 2016
28. A parallelization approach for hard real-time systems and its application on two industrial programs: strategy and two case studies for the parallelization of hard real-time systems
- Author
-
Andreas Hugl, Theo Ungerer, Martin Frieb, Haluk Ozaktas, Ralf Jahr, and Hans Regler
- Subjects
010302 applied physics ,Source code ,Computer science ,media_common.quotation_subject ,Real-time computing ,Legacy system ,Process (computing) ,02 engineering and technology ,Parallel computing ,Avionics ,01 natural sciences ,020202 computer hardware & architecture ,Theoretical Computer Science ,Automatic parallelization ,0103 physical sciences ,Theory of computation ,0202 electrical engineering, electronic engineering, information engineering ,Algorithmic skeleton ,Legacy code ,Software ,Information Systems ,media_common - Abstract
Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring of these industrial applications is very expensive, often due to high certification efforts. Therefore, a strategy for a systematic parallelization of legacy code is needed. We present a parallelization approach for hard real-time systems, which ensures a high reusage of legacy code and preserves timing analysability. To show its applicability, we apply it on the core algorithm of an avionics application as well as on the control program of a large construction machine. We create models of the legacy programs showing the potential of parallelism, optimize them and change the source codes accordingly. The parallelized applications are placed on a predictable multi-core processor with up to 18 cores. For evaluation, we compare the worst case execution times and their speedups. Furthermore, we analyse limitations coming up at the parallelization process.
- Published
- 2016
29. The social concept of trust as enabler for robustness in open self-organising systems
- Author
-
Jörg Hähner, Elisabeth André, Theo Ungerer, Hella Seebach, Wolfgang Reif, Gerrit Anders, Jan-Philipp Steghöfer, and Christian Müller-Schloer
- Subjects
Knowledge management ,business.industry ,Autonomous agent ,Context (language use) ,02 engineering and technology ,Task (project management) ,Incentive ,Risk analysis (engineering) ,020204 information systems ,Enabling ,0202 electrical engineering, electronic engineering, information engineering ,Resource allocation ,020201 artificial intelligence & image processing ,Computational trust ,Robustness (economics) ,business - Abstract
The participants in open self-organising systems, including users and autonomous agents, operate in a highly uncertain environment in which the agents’ benevolence cannot be assumed. One way to address this challenge is to use computational trust. By extending the notion of trust as a qualifier of relationships between agents and incorporating trust into the agents’ decisions, they can cope with uncertainties stemming from unintentional as well as intentional misbehaviour. As a consequence, the system’s robustness and efficiency increases. In this context, we show how an extended notion of trust can be used in the formation of system structures, algorithmically to mitigate uncertainties in task and resource allocation, and as a sanctioning and incentive mechanism. Beyond that, we outline how the users’ trust in a self-organising system can be increased, which is decisive for the acceptance of these systems.
- Published
- 2016
30. Parallelizing industrial hard real-time applications for the parMERASA multicore
- Author
-
Jörg Mische, Hugues Cassé, Florian Kluge, Sebastian Kehr, Sascha Uhrig, Francisco J. Cazorla, Armelle Bonenfant, Bert Böddeker, Lucie Matusova, Jaume Abella, Christian Bradatsch, Milos Panic, Zai Jian Jia Li, Mike Gerdes, Theo Ungerer, Carles Hernandez, Christine Rochange, Martin Frieb, Eduardo Quinones, David George, Zlatko Petrov, Ian Broster, Pavel Zaykov, Ralf Jahr, Hans Regler, Pascal Sainrat, Arthur Pyka, Haluk Ozaktas, Andreas Hugl, Alexander Stegmeier, Nick Lay, Mathias Rohde, Institute of Computer Science - University of Augsburg (ICS), Universität Augsburg [Augsburg], University of Augsburg [Augsburg], Honeywell Technology Solutions international development centre, Brno (HTS), Honeywell International S.r.o. [Prague], DENSO (JAPAN), Bauer Group (GERMANY), Groupe de Recherche en Architecture et Compilation pour les systèmes embarqués (IRIT-TRACES), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse III - Paul Sabatier (UT3), Rapita Systems Ltd [York], Barcelona Supercomputing Center - Centro Nacional de Supercomputacion (BSC - CNS), Technische Universität Dortmund [Dortmund] (TU), project partners : Honeywell International s.r.o., Czech Republic, DENSO AUTOMOTIVE Deutschland GmbH,Germany, BAUER Maschinen GmbH, Germany, Rapita Systems Ltd, UK, Barcelona Supercomputing Center, Spain, Université Paul Sabatier, Toulouse, France, Technical University of Dortmund, Germany, and and University of Augsburg, Germany
- Subjects
010302 applied physics ,Multi-core processor ,Control algorithm ,business.industry ,Computer science ,Parallel design ,Real-time computing ,Automotive industry ,Program transformation ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Automatic parallelization ,Hardware and Architecture ,Embedded system ,0103 physical sciences ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Motion planning ,business ,Software - Abstract
International audience; The EC project parMERASA (Multicore Execution of Parallelized Hard Real-Time Applications Supporting Analyzability) investigated timing-analyzable parallel hard real-time applications running on a predictable multicore processor. A pattern-supported parallelization approach was developed to ease sequential to parallel program transformation based on parallel design patterns that are timing analyzable. The parallelization approach was applied to parallelize the following industrial hard real-time programs: 3D path planning and stereo navigation algorithms (Honeywell International s.r.o.), control algorithm for a dynamic compaction machine (BAUER Maschinen GmbH), and a diesel engine management system (DENSO AUTOMOTIVE Deutschland GmbH). This article focuses on the parallelization approach, experiences during parallelization with the applications, and quantitative results reached by simulation, by static WCET analysis with the OTAWA tool, and by measurement-based WCET analysis with the RapiTime tool.
- Published
- 2016
- Full Text
- View/download PDF
31. Data Age Diminution in the Logical Execution Time Model
- Author
-
Florian Kluge, Theo Ungerer, and Christian Bradatsch
- Subjects
010302 applied physics ,Computer science ,Real-time computing ,02 engineering and technology ,Logical execution time ,01 natural sciences ,020202 computer hardware & architecture ,Task (project management) ,Set (abstract data type) ,Control theory ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Predictability ,Arithmetic ,Jitter - Abstract
The logical execution time LET model separates logical from physical execution times. Furthermore, tasks' input and output of data occurs at predictable times that are the tasks' arrival times and deadlines, respectively. The output of data is delayed until the period end meaning that output times have no jitter. The delayed output affects the freshness of data respectively data age between interacting tasks. Recently, critics from control theory arise that the LET approach provides outdated data. We analyze the data age of communicating tasks and propose an approach that reduces the data age. Therefore, we reduce the LET of tasks such that output data is provided earlier than at a task's deadline, but still preserve the predictability of output times. To confirm the improvement on the data age, we simulate 100 randomly generated task sets. Moreover, we also simulate a task set of a real-world automotive benchmark and show an enhancement of the average data age of approximately 33i¾ź% with our approach compared to the LET model.
- Published
- 2016
- Full Text
- View/download PDF
32. Trustworthy open self-organising systems
- Author
-
Gerrit Anders, Hella Seebach, Wolfgang Reif, Jan-Philipp Steghfer, Jrg Hhner, Theo Ungerer, Christian Mller-Schloer, and Elisabeth Andr
- Subjects
Class (computer programming) ,Focal point ,Engineering ,Knowledge management ,business.industry ,Scale (chemistry) ,Realisation ,Control (management) ,computer.software_genre ,Data science ,Trustworthiness ,Smart grid ,Grid computing ,business ,computer - Abstract
This book treats the computational use of social concepts as the focal point for the realisation of a novel class of socio-technical systems, comprising smart grids, public display environments, and grid computing. These systems are composed of technical and human constituents that interact with each other in an open environment. Heterogeneity, large scale, and uncertainty in the behaviour of the constituents and the environment are the rule rather than the exception. Ensuring the trustworthiness of such systems allows their technical constituents to interact with each other in a reliable, secure, and predictable way while their human users are able to understand and control them. "Trustworthy Open Self-Organising Systems" contains a wealth of knowledge, from trustworthy self-organisation mechanisms, to trust models, methods to measure a user's trust in a system, a discussion of social concepts beyond trust, and insights into the impact open self-organising systems will have on society.
- Published
- 2016
33. Smart doorplate
- Author
-
Theo Ungerer, Faruk Bagci, Jan Petzold, and Wolfgang Trumler
- Subjects
Hardware and Architecture ,Computer science ,Middleware ,Middleware (distributed applications) ,Context awareness ,Management Science and Operations Research ,Situational ethics ,Computer security ,computer.software_genre ,computer ,Computer Science Applications - Abstract
This paper introduces the vision of smart doorplates within an office building. The doorplates are able to display current situational information about the office owner, to act instead of the office owner in case of absence, and to direct visitors to the current location of the office owner based on a location-tracking system. Different scenarios are proposed and a prototype implementation is presented.
- Published
- 2003
- Full Text
- View/download PDF
34. Trustworthy Self-optimization in Organic Computing Environments
- Author
-
Theo Ungerer, Nizar Msadek, and Rolf Kiefhaber
- Subjects
Service (business) ,Trustworthiness ,Computer science ,Distributed computing ,Node (networking) ,Value (economics) ,Workload ,Organic computing ,Self-optimization ,Autonomic computing - Abstract
In this paper, we present a self-optimization approach that does not only consider pure load-balancing but also takes into account trust to improve the assignment of important services to trustworthy nodes. Our approach uses different optimization strategies to determine whether a service should be transferred to another node or not. The evaluation results showed that the proposed approach is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services, i.e., the achieved availability was no lower than 85% of the maximum theoretical availability value.
- Published
- 2015
- Full Text
- View/download PDF
35. Utility-Based Scheduling of $$(m,k)$$-Firm Real-Time Task Sets
- Author
-
Theo Ungerer, Florian Kluge, and Markus Neuerburg
- Subjects
Mathematical optimization ,Computer science ,Control system ,Robust control system ,Robust control ,Scheduling (computing) - Abstract
The concept of a firm real-time task implies the notion of a firm deadline that should not be missed by the jobs of this task. If a deadline miss occurs, the concerned job yields no value to the system. It turns out that for some application domains, this restrictive notion can be relaxed. For example, robust control systems can tolerate that single executions of a control loop miss their deadlines, and still yield an acceptable behaviour. Thus, systems can be developed under more optimistic assumptions, e.g. by allowing overloads. However, care must be taken that deadline misses do not accumulate. This restriction can be expressed by the model of \((m,k)\)-firm real-time tasks that require that within any \(k\) successive jobs at least \(m\) jobs are executed successfully. This paper presents the heuristic utility-based algorithm MKU for scheduling sets of \((m,k)\)-firm real-time tasks. Therefore, MKU uses history-cognisant utility functions. Simulations show that for moderate overloads, MKU achieves a higher schedulability ratio than other schedulers developed for \((m,k)\)-firm real-time tasks.
- Published
- 2015
- Full Text
- View/download PDF
36. EMSBench: Benchmark und Testumgebung für reaktive Systeme
- Author
-
Florian Kluge and Theo Ungerer
- Abstract
Benchmark-Suiten fur eingebettete Echtzeitsysteme (EEZS) bilden zumeist nur Berechnungen ab, die fur solche Systeme typisch sind. Dies ermoglicht die Evaluierung der reinen Rechenleistung, andere Aspekte bleiben dabei aber ausen vor. Reaktives Verhalten und die Interaktion zwischen vielen Software-Modulen, wie man sie in heutigen komplexen EEZS findet, werden nicht abgebildet. Im Hinblick auf den Einsatz von Mehrkernprozessoren in EEZS ist dies aber von erheblicher Bedeutung. Die Forschung ist hier auf geeignete Beispielanwendungen angewiesen, um die Praktikabilitat neuer Techniken uberprufen zu konnen. Diese Arbeit unternimmt einen ersten Schritt, diese Lucke zu schliesen. Es wird das Software-Paket EMSBench vorgestellt, welches aus zwei Komponenten besteht: (1) Eine quelloffene Steuerungs-Software fur Verbrennungsmotoren, die so angepasst ist, dass sie als Benchmark- Programm fur komplexe, reaktive EEZS dienen kann. (2) Eine Emulation des Kurbelwellenverhaltens erzeugt die Eingangssignale, die das interne Verhalten des Benchmark-Programms masgeblich beeinflussen.
- Published
- 2015
- Full Text
- View/download PDF
37. Enabling TDMA Arbitration in the Context of MBPTA
- Author
-
Theo Ungerer, Carles Hernandez, Eduardo Quinones, Francisco J. Cazorla, Milo Panic, and Jaume Abella
- Subjects
Set (abstract data type) ,Multi-core processor ,Software ,business.industry ,Computer science ,Embedded system ,Probabilistic logic ,Time division multiple access ,Static timing analysis ,Context (language use) ,business ,Padding - Abstract
Current timing analysis techniques can be broadly classified into two families: deterministic timing analysis (DTA) and probabilistic timing analysis (PTA). Each family defines a set of properties to be provided (enforced) by the hardware and software platform so that valid Worst-Case Execution Time (WCET) estimates can be derived for programs running on that platform. However, the fact that each family relies on each own set of hardware designs limits their applicability and reduces the chances of those designs being adopted by hardware vendors. In this paper we show that Time Division Multiple Access (TDMA), one of the main DTA-compliant arbitration policies, can be made PTA-compliant. To that end, we analyze TDMA in the context of measurement-based PTA (MBPTA) and show that padding execution time observations conveniently leads to trustworthy and tight WCET estimates with MBPTA without introducing any hardware change. In fact, TDMA outperforms round-robin and time-randomized policies in terms of WCET in the context of MBPTA.
- Published
- 2015
38. A Trust- and Load-Based Self-Optimization Algorithm for Organic Computing Systems
- Author
-
Theo Ungerer, Nizar Msadek, and Rolf Kiefhaber
- Subjects
Task (computing) ,Load management ,Trustworthiness ,Computer science ,Distributed computing ,Workload ,Algorithm design ,Organic computing ,Algorithm ,Self-optimization ,Autonomic computing - Abstract
In this paper a new design of self optimization for organic computing systems is investigated. Its main task, i.e., beside load-balancing, is to assign services with different importance levels to nodes so that the more important services are assigned to more trustworthy nodes. The evaluation results showed that the proposed algorithm is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services.
- Published
- 2014
- Full Text
- View/download PDF
39. Effects of structured parallelism by parallel design patterns on embedded hard real-time systems
- Author
-
Theo Ungerer, Pavel Zaykov, Ralf Jahr, Haluk Ozaktas, Mike Gerdes, Christine Rochange, University of Augsburg [Augsburg], Groupe de Recherche en Architecture et Compilation pour les systèmes embarqués (IRIT-TRACES), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, and Honeywell International S.r.o. [Prague]
- Subjects
Pipelines ,business.industry ,Computer science ,Data parallelism ,Parallel design ,Message systems ,Real-time computing ,Task parallelism ,Static timing analysis ,02 engineering and technology ,Parallel computing ,Synchronization ,Structured parallelism ,020202 computer hardware & architecture ,Software ,Parallel processing (DSP implementation) ,Synchronization (computer science) ,Parallel processing ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,business ,Real-time systems - Abstract
International audience; Parallel multi-threaded applications are needed to gain advantage from multi- and many-core processors. Such processors are more frequently considered for embedded hard real-time with defined timing guarantees, too. The static timing analysis, which is one way to calculate the worst-case execution time (WCET) of parallel applications, is complex and time-consuming due to the difficulty to analyze the interferences of threads and the high annotation effort to resolve it.
- Published
- 2014
- Full Text
- View/download PDF
40. Comparison of service call implementations in an AUTOSAR multi-core OS
- Author
-
Theo Ungerer, Florian Kluge, and Christian Bradatsch
- Subjects
Service (systems architecture) ,Multi-core processor ,Record locking ,Exploit ,business.industry ,Computer science ,computer.software_genre ,Domain (software engineering) ,AUTOSAR ,Software ,Embedded system ,Operating system ,business ,computer ,Implementation - Abstract
Multi-core processors are gaining a foothold in the domain of embedded automotive systems. The AUTOSAR Release 4.1 establishes a common standard for the use of multi-core processors in automotive systems. While interfaces and functionalities are well defined in the specification, the actual implementation is left open to the software manufacturers. We exploit this room that is left by the specification for the implementation of cross-core service calls. In this paper, we compare two opposite implementation approaches that can be used in shared-memory multi-core processors. The actual execution of a service call either takes place on the affected core, or on the invoking core. Our performance evaluations indicate an advantage of a lock-based approach with execution on the invoking core.
- Published
- 2014
- Full Text
- View/download PDF
41. An Operating System for Safety-Critical Applications on Manycore Processors
- Author
-
Theo Ungerer, Mike Gerdes, and Florian Kluge
- Subjects
Many core ,Life-critical system ,Predictable behaviour ,business.industry ,Computer science ,Embedded system ,Parallelism (grammar) ,Operating system ,USable ,business ,computer.software_genre ,computer ,Domain (software engineering) - Abstract
Processor technology is advancing from bus-based multicores to network-on-chip-based many cores, posing new challenges for operating system design. In this paper, we discuss why future safety-critical systems can profit from such new architectures. To make the potentials of many core processors usable in safety-critical systems, we devise the operating system MOSSCA that is adapted to the special requirements prevailing in this domain. MOSSCA introduces abstractions that support an application developer in his work of writing safety-critical applications. Internally, MOSSCA runs in a distributed manner to achieve a high parallelism while still guaranteeing a predictable behaviour.
- Published
- 2014
- Full Text
- View/download PDF
42. Mikroprozessoren
- Author
-
Theo Ungerer
- Subjects
Very long instruction word ,Computer science ,Operating system ,EPIC ,computer.software_genre ,computer ,Computer Science Applications ,Information Systems - Published
- 2001
- Full Text
- View/download PDF
43. Performance of simultaneous multithreaded multimedia-enhanced processors for MPEG-2 video decompression
- Author
-
Theo Ungerer, Heiko Oehring, and Ulrich Sigmund
- Subjects
Instructions per cycle ,Multimedia ,Computer science ,business.industry ,Pipeline burst cache ,computer.file_format ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,computer.software_genre ,Simultaneous multithreading ,Microarchitecture ,law.invention ,Microprocessor ,Hardware and Architecture ,law ,MPEG-2 ,Media processor ,Multithreading ,Embedded system ,Operating system ,business ,computer ,Software - Abstract
This paper explores microarchitecture models for a simultaneous multithreaded (SMT) processor with multimedia enhancements. We start with a wide-issue superscalar processor, enhance it by the SMT technique, by multimedia units, and by an additional on-chip RAM storage. Our workload is a multithreaded MPEG-2 video decompression algorithm that extensively uses multimedia units. The simulations show that a single-threaded, 8-issue maximum processor (assuming an abundance of resources) reaches an instructions per cycle (IPC) count of only 1.60, while an 8-threaded 8-issue processor is able to reach an IPC of 6.07. A more realistic processor model reaches an IPC of 1.27 in the single-threaded 8-issue vs 3.03 in the 4-threaded 4-issue and 3.21 in the 8-threaded 8-issue modes. Our conclusion on next generation’s microprocessors is that a 2- or 4-threaded 4-issue processor with a small on-chip RAM accessed by a local load/store unit will be superior to a wide-issue (single-threaded) superscalar processor at least for MPEG-2 style video decompression algorithms.
- Published
- 2000
- Full Text
- View/download PDF
44. A survey of new research directions in microprocessors
- Author
-
Borut Robič, Jurij Šilc, and Theo Ungerer
- Subjects
Computer Networks and Communications ,Dataflow ,Computer science ,Fetch ,Pipeline burst cache ,Parallel computing ,computer.software_genre ,Microarchitecture ,law.invention ,Microprocessor ,Artificial Intelligence ,Hardware and Architecture ,Very long instruction word ,law ,Superscalar ,Uniprocessor system ,Instruction pipeline ,Cache ,Compiler ,FR-V ,computer ,Software ,TRACE (psycholinguistics) - Abstract
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace the gate delay by on-chip wire delay as the main obstacle to increase the chip complexity and cycle rate. The implication for the microarchitecture is that functionally partitioned designs with strict nearest neighbour connections must be developed. Among the major problems facing the microprocessor designers is the application of even higher degree of speculation in combination with functional partitioning of the processor, which prepares the way for exceeding the classical dataflow limit imposed by data dependences. In this paper we survey the current approaches to solving this problem, in particular we analyse several new research directions whose solutions are based on the complex uniprocessor architecture. A uniprocessor chip features a very aggressive superscalar design combined with a trace cache and superspeculative techniques. Superspeculative techniques exceed the classical dataflow limit where even with unlimited machine resources a program cannot execute any faster than the execution of the longest dependence chain introduced by the program's data dependences. Superspeculative processors also speculate about control dependences. The trace cache stores the dynamic instruction traces contiguously and fetches instructions from the trace cache rather than from the instruction cache. Since a dynamic trace of instructions may contain multiple taken branches, there is no need to fetch from multiple targets, as would be necessary when predicting multiple branches and fetching 16 or 32 instructions from the instruction cache. Multiscalar and trace processors define several processing cores that speculatively execute different parts of a sequential program in parallel. Multiscalar processors use a compiler to partition the program segments, whereas a trace processor uses a trace cache to generate dynamically trace segments for the processing cores. A datascalar processor runs the same sequential program redundantly on several processing elements where each processing element has different data set. This paper discusses and compares the performance potential of these complex uniprocessors.
- Published
- 2000
- Full Text
- View/download PDF
45. Exploiting Intel TSX for fault-tolerant execution in safety-critical systems
- Author
-
Theo Ungerer, Stefan Metzlaff, Florian Haas, and Sebastian Weis
- Subjects
business.industry ,Computer science ,Fault tolerance ,computer.software_genre ,Microarchitecture ,Software ,Life-critical system ,Software fault tolerance ,Embedded system ,Operating system ,Overhead (computing) ,Transactional Synchronization Extensions ,Instrumentation (computer programming) ,business ,computer - Abstract
Safety-critical systems demand increasing computational power, which requests high-performance embedded systems. While commercial-of-the-shelf (COTS) processors offer high computational performance for a low price, they do not provide hardware support for fault-tolerant execution. However, pure software-based fault-tolerance methods entail high design complexity and runtime overhead. In this paper, we present an efficient software/hardware-based redundant execution scheme for a COTS ×86 processor, which exploits the Transactional Synchronization Extensions (TSX) introduced with the Intel Haswell microarchitecture. Our approach extends a static binary instrumentation tool to insert fault-tolerant transactions and fault-detection instructions at function granularity. TSX hardware support is used for error containment and recovery. The average runtime overhead for selected SPEC2006 benchmarks was only 49% compared to a non-fault-tolerant execution.
- Published
- 2014
46. Paving the way for multi-cores in industrial hard real-time control applications
- Author
-
Mike Gerdes, Andreas Hugl, Martin Frieb, Ralf Jahr, Hans Regler, and Theo Ungerer
- Subjects
Multi-core processor ,Software ,business.industry ,Real-time Control System ,Least slack time scheduling ,Computer science ,Embedded system ,Multithreading ,Static timing analysis ,Parallel computing ,Reuse ,business ,Scheduling (computing) - Abstract
The rise of multicore processors for industrial embedded control applications forces companies to face the challenge of replacing legacy single-core applications by multithreaded programs. We present a systematic and tool-supported approach starting with existing single-core code and transforming it into multi-threaded code such that timing analysis is preserved and eased. The approach is based on (a) scheduling periodic tasks onto multiple dedicated cores as well as (b) executing other code parts after a model-based parallelization, which introduces structured parallelism only, on the remaining cores. The main advantage of our approach compared to a re-implementation is a strongly reduced effort for implementation and testing because of the reuse of existing code. The approach is demonstrated and evaluated for the control code of a foundation crane; slack time is introduced as measurement for the effectiveness.
- Published
- 2014
47. The boot process in real-time manycore processors
- Author
-
Florian Kluge, Theo Ungerer, and Mike Gerdes
- Subjects
Manycore processor ,business.industry ,Bootstrapping ,Computer science ,Embedded system ,Process (computing) ,Code (cryptography) ,Point (geometry) ,Parallel computing ,Duration (project management) ,business ,Fault (power engineering) ,Bottleneck - Abstract
Bootstrapping of embedded computers is often neglected in real-time research. Nevertheless certain real-time applications even require predictable startup times, e.g. in the case of system restarts during operation due to some fault conditions. The complexity of bootstrapping will increase in upcoming manycore processors that utilise distributed memories and networks-on-chip (NoCs), as all cores must be provided with their code and data. We propose a new approach for bootstrapping a real-time manycore processor and compare it with two state-of-the-art approaches. Moreover, we present the mwsim tool that finds an upper timing bound through an abstract simulation of the boot process and apply it for the analysis of the worst-case duration of the three approaches. The results show an advantage of up to 23% of our newly proposed approach. They show also that the performance of a real-time NoC poses a central bottleneck to the overall performance of the boot process. We discuss these results and point out possible solutions to circumvent the bottleneck.
- Published
- 2014
48. parMERASA -- Multi-core Execution of Parallelised Hard Real-Time Applications Supporting Analysability
- Author
-
Arthur Pyka, Haluk Ozaktas, Dave George, João Carlos Lopes Fernandes, Hugues Cassé, Florian Kluge, Milos Panic, Pavel Zaykov, Theo Ungerer, Armelle Bonenfant, Ralf Jahr, Bert Böddeker, Zlatko Petrov, Hans Regler, Mike Gerdes, Andreas Hugl, Christian Bradatsch, Sascha Uhrig, Jaume Abella, Mathias Rohde, Sebastian Kehr, Francisco J. Cazorla, Ian Broster, Nick Lay, Christine Rochange, Pascal Sainrat, Eduardo Quinones, Jörg Mische, University of Augsburg [Augsburg], Honeywell International S.r.o. [Prague], DENSO (JAPAN), Bauer Group (GERMANY), Groupe de Recherche en Architecture et Compilation pour les systèmes embarqués (IRIT-TRACES), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse III - Paul Sabatier (UT3), Rapita Systems Ltd [York], Barcelona Supercomputing Center - Centro Nacional de Supercomputacion (BSC - CNS), Universitat Politècnica de Catalunya [Barcelona] (UPC), Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Technische Universität Dortmund [Dortmund] (TU), Barcelona Supercomputing Center – Centro Nacional de Supercomputación - BSC-CNS (SPAIN), Centre National de la Recherche Scientifique - CNRS (FRANCE), Consejo Superior de Investigaciones Científicas - CSIC (SPAIN), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Universitat Politècnica de Catalunya - UPC (SPAIN), Honeywell (USA), Rapita System (USA), Technische Universität Dortmund - TU Dortmund (GERMANY), University of Augsburg (GERMANY), Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France), Technical University of Catalonia – Barcelona Tech (Girona, Espagne), and Institut National Polytechnique de Toulouse - INPT (FRANCE)
- Subjects
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,Computer science ,Embedded systems ,Parallel programming ,Real-time computing ,Système d'exploitation ,Automotive industry ,Réseaux et télécommunications ,02 engineering and technology ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,Many core ,Architectures Matérielles ,0202 electrical engineering, electronic engineering, information engineering ,Mixed criticality ,Multi-core processor ,Control algorithm ,Multiprocessing systems ,business.industry ,Avionics ,Systèmes embarqués ,020202 computer hardware & architecture ,Embedded system ,Scalability ,[INFO.INFO-ES]Computer Science [cs]/Embedded Systems ,020201 artificial intelligence & image processing ,[INFO.INFO-OS]Computer Science [cs]/Operating Systems [cs.OS] ,business ,System software - Abstract
International audience; Engineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores.
- Published
- 2013
- Full Text
- View/download PDF
49. A pattern-supported parallelization approach
- Author
-
Ralf Jahr, Theo Ungerer, and Mike Gerdes
- Subjects
020203 distributed computing ,Parallelism (rhetoric) ,Data parallelism ,Computer science ,Degree of parallelism ,Task parallelism ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Automatic parallelization ,Software design pattern ,Parallel programming model ,0202 electrical engineering, electronic engineering, information engineering ,Algorithmic skeleton - Abstract
In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced here, eases the transition from sequential to parallel software. It is a novel model-based approach with clear methodology and the use of parallel design patterns as known building blocks.First the Activity and Pattern Diagram is created revealing the maximum degree of parallelism expressed by parallel design patterns. Second the degree of parallelism is reduced to the optimal level providing best performance by agglomeration of activities and patterns. By this, trade-offs are respected that are caused by the target platform, e.g. the computation-communication-ratio.As implementation for the parallel design patterns a library with algorithmic skeletons can be used. This leverages development effort and simplifies the transition from sequential to parallel code effectively.
- Published
- 2013
- Full Text
- View/download PDF
50. A Comparison of Multi-objective Algorithms for the Automatic Design Space Exploration of a Superscalar System
- Author
-
Ralf Jahr, Horia Calborean, Theo Ungerer, and Lucian Vintan
- Subjects
Heuristic (computer science) ,Design space exploration ,Computer science ,Superscalar ,Particle swarm optimization ,Algorithm - Abstract
In today’s computer architectures the design spaces are huge, thus making it very difficult to find optimal configurations. One way to cope with this problem is to use Automatic Design Space Exploration (ADSE) techniques. We developed the Framework for Automatic Design Space Exploration (FADSE) which is focused on microarchitectural optimizations. This framework includes several state-of-the art heuristic algorithms.
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.