147 results on '"Theo Ungerer"'
Search Results
2. PIMP My Many-Core: Pipeline-Integrated Message Passing
- Author
-
Jörg Mische, Martin Frieb, Alexander Stegmeier, and Theo Ungerer
- Subjects
0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,02 engineering and technology ,ddc:004 ,Software ,020202 computer hardware & architecture ,Information Systems ,Theoretical Computer Science - Abstract
To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing. Recently proposed register-level message passing alternatives use special instructions to send the contents of a single register to another core. The reduced communication overhead and architectural simplicity lead to good many-core scalability. After investigating several other approaches in terms of hardware complexity and throughput overhead, we recommend a small instruction set extension to enable register-level message passing at minimal hardware costs and describe its integration into a classical five stage RISC-V pipeline.
- Published
- 2020
- Full Text
- View/download PDF
3. Trustworthy self-optimization for organic computing environments using multiple simultaneous requests
- Author
-
Theo Ungerer and Nizar Msadek
- Subjects
Self-organization ,Service (systems architecture) ,Computer science ,business.industry ,Distributed computing ,020206 networking & telecommunications ,Workload ,02 engineering and technology ,Organic computing ,Self-optimization ,Autonomic computing ,Trustworthiness ,Hardware and Architecture ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Software ,Computer network - Abstract
Open distributed systems are rapidly getting more and more complex. Therefore, it is essential that such systems will be able to adapt autonomously to changes in their environment. They should be characterized by so-called self-x properties such as self-configuration, self-optimization and self-healing. The autonomous optimization of nodes at runtime in open distributed environments is a crucial part for developing self-optimizing systems. In this paper, we present a self-optimization approach that does not only consider pure load-balancing but also takes into account trust to improve the assignment of important services to trustworthy nodes. Our approach uses different optimization strategies to determine whether a service should be transferred to another node or not. The evaluation results showed that the proposed approach is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services, i.e., the achieved availability was no lower than 85% of the maximum theoretical availability value.
- Published
- 2017
- Full Text
- View/download PDF
4. Investigating Transactional Memory for High Performance Embedded Systems
- Author
-
Sebastian Weis, Christian Piatka, Theo Ungerer, Florian Haas, Sebastian Altmeyer, and Rico Amslinger
- Subjects
Transaction management ,Transactional leadership ,business.industry ,Abort ,Computer science ,Embedded system ,Factor (programming language) ,Transactional memory ,Workload ,business ,computer ,computer.programming_language - Abstract
We present a Transaction Management Unit (TMU) for Hardware Transactional Memories (HTMs). Our TMU enables three different contention management strategies, which can be applied according to the workload. Additionally, the TMU enables unbounded transactions in terms of size. Our approach tackles two challenges of traditional HTMs: (1) potentially high abort rates, (2) missing support for unbounded transactions. By enhancing a simulator with a transactional memory and our TMU, we demonstrate that our TMU achieves speedups of up to 4.2 and reduces abort rates by a factor of up to 11.6 for some of the STAMP benchmarks.
- Published
- 2020
- Full Text
- View/download PDF
5. Support for the logical execution time model on a time-predictable multicore processor
- Author
-
Florian Kluge, Martin Schoeberl, and Theo Ungerer
- Subjects
Multi-core processor ,Computer science ,Principle of compositionality ,Message passing ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Logical execution time ,Execution time ,Bottleneck ,020202 computer hardware & architecture ,Shared memory ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Engineering (miscellaneous) - Abstract
The logical execution time (LET) model increases the compositionality of real-time task sets. Removal or addition of tasks does not influence the communication behavior of other tasks. In this work, we extend a multicore operating system running on a time-predictable multicore processor to support the LET model. For communication between tasks we use message passing on a time-predictable network-on-chip to avoid the bottleneck of shared memory. We report our experiences and present results on the costs in terms of memory and execution time.
- Published
- 2016
- Full Text
- View/download PDF
6. WCTT bounds for MPI primitives in the PaterNoster NoC
- Author
-
Jörg Mische, Theo Ungerer, Martin Frieb, and Alexander Stegmeier
- Subjects
020203 distributed computing ,Schedule ,Computer science ,02 engineering and technology ,Parallel computing ,Multiplexing ,020202 computer hardware & architecture ,Tree traversal ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Feature (machine learning) ,Communication source ,Message size ,Engineering (miscellaneous) ,Scope (computer science) - Abstract
This paper applies several variants of application independent time-division multiplexing to MPI primitives and investigates their applicability for different scopes of communication. Thereby, the scopes are characterized by the size of the network-on-chip, the number of participating nodes and the message size sent to each receiver or received from each sender, respectively. The evaluation shows that none of the observed variants feature the lowest worst-case traversal time in all situations. Instead there are multiple schedule variants which each perform best in a different scope of communication parameters.
- Published
- 2016
- Full Text
- View/download PDF
7. PIMP My Many-Core: Pipeline-Integrated Message Passing
- Author
-
Theo Ungerer, Jörg Mische, Alexander Stegmeier, and Martin Frieb
- Subjects
business.industry ,Address space ,Computer science ,Message passing ,020207 software engineering ,02 engineering and technology ,Pipeline (software) ,020202 computer hardware & architecture ,Shared memory ,Theory of computation ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,business ,Direct memory access ,Computer network - Abstract
To improve the scalability, several many-core architectures use message passing instead of shared memory accesses for communication. Unfortunately, Direct Memory Access (DMA) transfers in a shared address space are usually used to emulate message passing, which entails a lot of overhead and thwarts the advantages of message passing.
- Published
- 2019
- Full Text
- View/download PDF
8. Memristoren für zukünftige Rechnersysteme
- Author
-
Theo Ungerer, Wolfgang Karl, and Dietmar Fey Fey
- Subjects
Physics ,Gynecology ,medicine.medical_specialty ,medicine ,Computer Science Applications ,Information Systems - Abstract
Als Memristor bezeichnet man eine Klasse von elektronischen Zweitor-Bauelementen, deren Strom-/Spannungsverlauf eine zumeist durch den Nullpunkt verlaufende eingeklemmte Hystereseschleife aufweist („if it’s pinched it’s a memristor“). Memristoren bieten interessante Eigenschaften wie hohe Speicherdichten, niedrige elektrische Leistungen beim Schreiben und Lesen, Multibit-Speicherfahigkeit und CMOS-kompatbile Herstellungsprozesse. Daruber hinaus lassen sich Memristoren nicht nur zum Speichern, sondern auch zum Verarbeiten von Daten nutzen. Memristoren werden zukunftige Rechensysteme auf verschiedenen Ebenen verandern und mehr in Richtung hin zu Speicher-zentrierten Architekturen verschieben. Dies beginnt bei noch weitgehend konventionellen Ansatzen wie den sog. Storage-Class-Memories, einer neuen Speicherhierarchieebene zwischen Arbeits- und Hintergrundspeicher, und endet bei unkonventionellen Near- und In-Memory-Architekturen, in denen Speichern und Verarbeiten von Daten ohne raumlich grose und damit vergleichsweise Energie-intensive Datenbewegen zwischen Prozessoren und Speicher stattfindet. Ferner bieten Memristoren durch die dem biologischen Vorbild eines neuronalen Netzes weitgehend entsprechende direkte Nachbildung von Synapsen als flexible elektrische Widerstande vielfaltige Moglichkeiten zur Realisierung neuromorpher und biologisch-inspirierter Schaltkreise und Architekturen.
- Published
- 2020
- Full Text
- View/download PDF
9. A trustworthy, fault-tolerant and scalable self-configuration algorithm for Organic Computing systems
- Author
-
Rolf Kiefhaber, Nizar Msadek, and Theo Ungerer
- Subjects
Self-organization ,business.industry ,Computer science ,Distributed computing ,Fault tolerance ,Organic computing ,Internet hosting service ,Load balancing (computing) ,Contract Net Protocol ,Trustworthiness ,Hardware and Architecture ,Scalability ,business ,Algorithm ,Software ,Computer network - Abstract
The growing complexity of today's computing systems requires a large amount of administration, which poses a serious challenging task for manual administration. Therefore, new ways have to be found to autonomously manage them. They should be characterized by so-called self-x properties such as self-configuration, self-optimization, self-healing and self-protection. The autonomous assignment of services to nodes in a distributed way is a crucial part for developing self-configuring systems. In this paper, we introduce a self-configuration algorithm for Organic Computing systems, which aims on the one hand to equally distribute the load of services on nodes as in a typical load balancing scenario and on the other hand to assign services with different importance levels to nodes so that the more important services are assigned to more trustworthy nodes. Furthermore, the proposed algorithm includes a fault handling mechanism enabling the system to continue hosting services even in the presence of faults. The evaluation indicates that the proposed approach is suitable for large scale and distributed systems.
- Published
- 2015
- Full Text
- View/download PDF
10. Redundant Execution on Heterogeneous Multi-cores Utilizing Transactional Memory
- Author
-
Theo Ungerer, Rico Amslinger, Florian Haas, Christian Piatka, and Sebastian Weis
- Subjects
010302 applied physics ,Multi-core processor ,Computer science ,business.industry ,Transactional memory ,Fault tolerance ,02 engineering and technology ,Lockstep ,01 natural sciences ,020202 computer hardware & architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Cache ,Current (fluid) ,business - Abstract
Cycle-by-cycle lockstep execution as implemented by current embedded processors is unsuitable for energy-efficient heterogeneous multi-cores, because the different cores are not cycle synchronous. Furthermore, current and future safety-critical applications demand fail-operational execution, which requires mechanisms for error recovery.
- Published
- 2018
- Full Text
- View/download PDF
11. Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips
- Author
-
Martin Frieb, Jörg Mische, Theo Ungerer, and Alexander Stegmeier
- Subjects
SIMPLE (military communications protocol) ,Computer science ,business.industry ,Network on ,02 engineering and technology ,Execution time ,020202 computer hardware & architecture ,Software development process ,Software ,020204 information systems ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Interrupt ,business ,Computer hardware ,Buffer overflow - Abstract
Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.
- Published
- 2018
- Full Text
- View/download PDF
12. A hard real-time capable multi-core SMT processor
- Author
-
Eduardo Quinones, Stefan Metzlaff, Theo Ungerer, Jörg Mische, Mike Gerdes, Marco Paolieri, Sascha Uhrig, and Francisco J. Cazorla
- Subjects
Multi-core processor ,Computer science ,02 engineering and technology ,Parallel computing ,Execution time ,020202 computer hardware & architecture ,Task (computing) ,Worst-case execution time ,Hardware and Architecture ,020204 information systems ,Multithreading ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Predictability ,Software - Abstract
Hard real-time applications in safety critical domains require high performance and time analyzability. Multi-core processors are an answer to these demands, however task interferences make multi-cores more difficult to analyze from a worst-case execution time point of view than single-core processors. We propose a multi-core SMT processor that ensures a bounded maximum delay a task can suffer due to inter-task interferences. Multiple hard real-time tasks can be executed on different cores together with additional non real-time tasks. Our evaluation shows that the proposed MERASA multi-core provides predictability for hard real-time tasks and also high performance for non hard real-time tasks.
- Published
- 2013
- Full Text
- View/download PDF
13. Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support
- Author
-
Sebastian Weis, Theo Ungerer, Florian Haas, Gilles Pokam, and Youfeng Wu
- Subjects
010302 applied physics ,Multi-core processor ,Xeon ,business.industry ,Computer science ,Transactional memory ,Fault tolerance ,02 engineering and technology ,Lockstep ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,x86 ,Instrumentation (computer programming) ,business - Abstract
The demand for fault-tolerant execution on high performance computer systems increases due to higher fault rates resulting from smaller structure sizes. As an alternative to hardware-based lockstep solutions, software-based fault-tolerance mechanisms can increase the reliability of multi-core commercial-of-the-shelf (COTS) CPUs while being cheaper and more flexible. This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family. We leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback. Redundant execution of processes and signature-based comparison of their computations provides error detection, and transactional wrapping enables error recovery. Existing applications are enhanced towards fault-tolerant redundant execution by post-link binary instrumentation. Hardware enhancements to further increase the applicability of the approach are proposed and evaluated with SPEC CPU 2006 benchmarks. The resulting performance overhead is 47% on average, assuming the existence of the proposed hardware support.
- Published
- 2017
- Full Text
- View/download PDF
14. An Efficient Replication Approach based on Trust for Distributed Self-healing Systems
- Author
-
Theo Ungerer and Nizar Msadek
- Subjects
010302 applied physics ,060102 archaeology ,Computer science ,Self-healing ,0103 physical sciences ,0601 history and archaeology ,06 humanities and the arts ,Computational biology ,01 natural sciences ,Replication (computing) - Published
- 2017
- Full Text
- View/download PDF
15. Reduced Complexity Many-Core: Timing Predictability Due to Message-Passing
- Author
-
Theo Ungerer, Jörg Mische, Alexander Stegmeier, and Martin Frieb
- Subjects
business.industry ,Computer science ,Message passing ,Message Passing Interface ,Static timing analysis ,02 engineering and technology ,Interference (wave propagation) ,020202 computer hardware & architecture ,Shared memory ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Predictability ,Architecture ,business ,Cache coherence ,Computer network - Abstract
The Reduced Complexity Many-Core architecture (RC/MC) targets to simplify timing analysis by increasing the predictability of all components. Since shared memory interference is a major source of pessimism in many-core systems, fine-grained message passing between small cores with private memories is used instead of a global shared memory.
- Published
- 2017
- Full Text
- View/download PDF
16. Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations
- Author
-
Lucian Vintan, Theo Ungerer, Horia Calborean, and Ralf Jahr
- Subjects
Speedup ,Computer Networks and Communications ,Design space exploration ,Computer science ,business.industry ,Parallel computing ,Program optimization ,Multi-objective optimization ,Computer Science Applications ,Theoretical Computer Science ,Microarchitecture ,Computational Theory and Mathematics ,Scalability ,Code (cryptography) ,Engineering design process ,business ,Software ,Computer hardware - Abstract
Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in the vast design space of a processor architecture together with a tool for code optimizations and hence evaluate both automatically. As example, we use the Grid ALU Processor (GAP) and its postlink optimizer called GAPtimize, which can apply feedback-directed and platform-specific code optimizations. Our results show that FADSE is able to cope with both design spaces. Less than 25% of the maximal reasonable hardware effort for the scalable elements of the GAP is enough to achieve the processor's performance maximum. With a performance reduction tolerance of 10%, the necessary hardware complexity can be further reduced by about two-thirds. The found high-quality configurations are analyzed, exhibiting strong relationships between the parameters of the GAP, the distribution of complexity, and the total performance. These performance numbers can be improved by applying code optimizations concurrently to optimizing the hardware parameters. FADSE can find near-optimal configurations by effectively combining and selecting parameters for hardware and code optimizations in a short time. The maximum observed speedup is 15%. With the use of code optimizations, the maximum possible reduction of the hardware resources, while sustaining the same performance level, is 50%.Copyright © 2012 John Wiley & Sons, Ltd.
- Published
- 2012
- Full Text
- View/download PDF
17. MANJAC — Ein Many-Core-Emulator auf Multi-FPGA-Basis
- Author
-
Theo Ungerer, Sascha Uhrig, Sebastian Schlingmann, and Christian Bradatsch
- Subjects
Many core ,Basis (linear algebra) ,business.industry ,Computer science ,Embedded system ,business ,Field-programmable gate array - Published
- 2011
- Full Text
- View/download PDF
18. The Multi-Core Challenge
- Author
-
Theo Ungerer
- Subjects
Multi-core processor ,Many core ,General Computer Science ,Computer science ,business.industry ,Embedded system ,business ,Die (integrated circuit) - Abstract
Multi-cores are the contemporary solution to reach a high performance without the need of increasing the clock frequency. Multi-cores integrate two or more cores, i. e., processors, on a single die. Future multi-cores may comprise hundreds or even thousands of cores and arouse challenges for processor design, system architecture, programming languages, and application programs.
- Published
- 2010
- Full Text
- View/download PDF
19. Data Age Diminution in the Logical Execution Time Model
- Author
-
Florian Kluge, Theo Ungerer, and Christian Bradatsch
- Subjects
010302 applied physics ,Computer science ,Real-time computing ,02 engineering and technology ,Logical execution time ,01 natural sciences ,020202 computer hardware & architecture ,Task (project management) ,Set (abstract data type) ,Control theory ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Predictability ,Arithmetic ,Jitter - Abstract
The logical execution time LET model separates logical from physical execution times. Furthermore, tasks' input and output of data occurs at predictable times that are the tasks' arrival times and deadlines, respectively. The output of data is delayed until the period end meaning that output times have no jitter. The delayed output affects the freshness of data respectively data age between interacting tasks. Recently, critics from control theory arise that the LET approach provides outdated data. We analyze the data age of communicating tasks and propose an approach that reduces the data age. Therefore, we reduce the LET of tasks such that output data is provided earlier than at a task's deadline, but still preserve the predictability of output times. To confirm the improvement on the data age, we simulate 100 randomly generated task sets. Moreover, we also simulate a task set of a real-world automotive benchmark and show an enhancement of the average data age of approximately 33i¾ź% with our approach compared to the LET model.
- Published
- 2016
- Full Text
- View/download PDF
20. Smart doorplate
- Author
-
Theo Ungerer, Faruk Bagci, Jan Petzold, and Wolfgang Trumler
- Subjects
Hardware and Architecture ,Computer science ,Middleware ,Middleware (distributed applications) ,Context awareness ,Management Science and Operations Research ,Situational ethics ,Computer security ,computer.software_genre ,computer ,Computer Science Applications - Abstract
This paper introduces the vision of smart doorplates within an office building. The doorplates are able to display current situational information about the office owner, to act instead of the office owner in case of absence, and to direct visitors to the current location of the office owner based on a location-tracking system. Different scenarios are proposed and a prototype implementation is presented.
- Published
- 2003
- Full Text
- View/download PDF
21. Trustworthy Self-optimization in Organic Computing Environments
- Author
-
Theo Ungerer, Nizar Msadek, and Rolf Kiefhaber
- Subjects
Service (business) ,Trustworthiness ,Computer science ,Distributed computing ,Node (networking) ,Value (economics) ,Workload ,Organic computing ,Self-optimization ,Autonomic computing - Abstract
In this paper, we present a self-optimization approach that does not only consider pure load-balancing but also takes into account trust to improve the assignment of important services to trustworthy nodes. Our approach uses different optimization strategies to determine whether a service should be transferred to another node or not. The evaluation results showed that the proposed approach is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services, i.e., the achieved availability was no lower than 85% of the maximum theoretical availability value.
- Published
- 2015
- Full Text
- View/download PDF
22. Utility-Based Scheduling of $$(m,k)$$-Firm Real-Time Task Sets
- Author
-
Theo Ungerer, Florian Kluge, and Markus Neuerburg
- Subjects
Mathematical optimization ,Computer science ,Control system ,Robust control system ,Robust control ,Scheduling (computing) - Abstract
The concept of a firm real-time task implies the notion of a firm deadline that should not be missed by the jobs of this task. If a deadline miss occurs, the concerned job yields no value to the system. It turns out that for some application domains, this restrictive notion can be relaxed. For example, robust control systems can tolerate that single executions of a control loop miss their deadlines, and still yield an acceptable behaviour. Thus, systems can be developed under more optimistic assumptions, e.g. by allowing overloads. However, care must be taken that deadline misses do not accumulate. This restriction can be expressed by the model of \((m,k)\)-firm real-time tasks that require that within any \(k\) successive jobs at least \(m\) jobs are executed successfully. This paper presents the heuristic utility-based algorithm MKU for scheduling sets of \((m,k)\)-firm real-time tasks. Therefore, MKU uses history-cognisant utility functions. Simulations show that for moderate overloads, MKU achieves a higher schedulability ratio than other schedulers developed for \((m,k)\)-firm real-time tasks.
- Published
- 2015
- Full Text
- View/download PDF
23. EMSBench: Benchmark und Testumgebung für reaktive Systeme
- Author
-
Florian Kluge and Theo Ungerer
- Abstract
Benchmark-Suiten fur eingebettete Echtzeitsysteme (EEZS) bilden zumeist nur Berechnungen ab, die fur solche Systeme typisch sind. Dies ermoglicht die Evaluierung der reinen Rechenleistung, andere Aspekte bleiben dabei aber ausen vor. Reaktives Verhalten und die Interaktion zwischen vielen Software-Modulen, wie man sie in heutigen komplexen EEZS findet, werden nicht abgebildet. Im Hinblick auf den Einsatz von Mehrkernprozessoren in EEZS ist dies aber von erheblicher Bedeutung. Die Forschung ist hier auf geeignete Beispielanwendungen angewiesen, um die Praktikabilitat neuer Techniken uberprufen zu konnen. Diese Arbeit unternimmt einen ersten Schritt, diese Lucke zu schliesen. Es wird das Software-Paket EMSBench vorgestellt, welches aus zwei Komponenten besteht: (1) Eine quelloffene Steuerungs-Software fur Verbrennungsmotoren, die so angepasst ist, dass sie als Benchmark- Programm fur komplexe, reaktive EEZS dienen kann. (2) Eine Emulation des Kurbelwellenverhaltens erzeugt die Eingangssignale, die das interne Verhalten des Benchmark-Programms masgeblich beeinflussen.
- Published
- 2015
- Full Text
- View/download PDF
24. A Trust- and Load-Based Self-Optimization Algorithm for Organic Computing Systems
- Author
-
Theo Ungerer, Nizar Msadek, and Rolf Kiefhaber
- Subjects
Task (computing) ,Load management ,Trustworthiness ,Computer science ,Distributed computing ,Workload ,Algorithm design ,Organic computing ,Algorithm ,Self-optimization ,Autonomic computing - Abstract
In this paper a new design of self optimization for organic computing systems is investigated. Its main task, i.e., beside load-balancing, is to assign services with different importance levels to nodes so that the more important services are assigned to more trustworthy nodes. The evaluation results showed that the proposed algorithm is able to balance the workload between nodes nearly optimal. Moreover, it improves significantly the availability of important services.
- Published
- 2014
- Full Text
- View/download PDF
25. Effects of structured parallelism by parallel design patterns on embedded hard real-time systems
- Author
-
Theo Ungerer, Pavel Zaykov, Ralf Jahr, Haluk Ozaktas, Mike Gerdes, Christine Rochange, University of Augsburg [Augsburg], Groupe de Recherche en Architecture et Compilation pour les systèmes embarqués (IRIT-TRACES), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, and Honeywell International S.r.o. [Prague]
- Subjects
Pipelines ,business.industry ,Computer science ,Data parallelism ,Parallel design ,Message systems ,Real-time computing ,Task parallelism ,Static timing analysis ,02 engineering and technology ,Parallel computing ,Synchronization ,Structured parallelism ,020202 computer hardware & architecture ,Software ,Parallel processing (DSP implementation) ,Synchronization (computer science) ,Parallel processing ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,business ,Real-time systems - Abstract
International audience; Parallel multi-threaded applications are needed to gain advantage from multi- and many-core processors. Such processors are more frequently considered for embedded hard real-time with defined timing guarantees, too. The static timing analysis, which is one way to calculate the worst-case execution time (WCET) of parallel applications, is complex and time-consuming due to the difficulty to analyze the interferences of threads and the high annotation effort to resolve it.
- Published
- 2014
- Full Text
- View/download PDF
26. Comparison of service call implementations in an AUTOSAR multi-core OS
- Author
-
Theo Ungerer, Florian Kluge, and Christian Bradatsch
- Subjects
Service (systems architecture) ,Multi-core processor ,Record locking ,Exploit ,business.industry ,Computer science ,computer.software_genre ,Domain (software engineering) ,AUTOSAR ,Software ,Embedded system ,Operating system ,business ,computer ,Implementation - Abstract
Multi-core processors are gaining a foothold in the domain of embedded automotive systems. The AUTOSAR Release 4.1 establishes a common standard for the use of multi-core processors in automotive systems. While interfaces and functionalities are well defined in the specification, the actual implementation is left open to the software manufacturers. We exploit this room that is left by the specification for the implementation of cross-core service calls. In this paper, we compare two opposite implementation approaches that can be used in shared-memory multi-core processors. The actual execution of a service call either takes place on the affected core, or on the invoking core. Our performance evaluations indicate an advantage of a lock-based approach with execution on the invoking core.
- Published
- 2014
- Full Text
- View/download PDF
27. An Operating System for Safety-Critical Applications on Manycore Processors
- Author
-
Theo Ungerer, Mike Gerdes, and Florian Kluge
- Subjects
Many core ,Life-critical system ,Predictable behaviour ,business.industry ,Computer science ,Embedded system ,Parallelism (grammar) ,Operating system ,USable ,business ,computer.software_genre ,computer ,Domain (software engineering) - Abstract
Processor technology is advancing from bus-based multicores to network-on-chip-based many cores, posing new challenges for operating system design. In this paper, we discuss why future safety-critical systems can profit from such new architectures. To make the potentials of many core processors usable in safety-critical systems, we devise the operating system MOSSCA that is adapted to the special requirements prevailing in this domain. MOSSCA introduces abstractions that support an application developer in his work of writing safety-critical applications. Internally, MOSSCA runs in a distributed manner to achieve a high parallelism while still guaranteeing a predictable behaviour.
- Published
- 2014
- Full Text
- View/download PDF
28. Mikroprozessoren
- Author
-
Theo Ungerer
- Subjects
Very long instruction word ,Computer science ,Operating system ,EPIC ,computer.software_genre ,computer ,Computer Science Applications ,Information Systems - Published
- 2001
- Full Text
- View/download PDF
29. Performance of simultaneous multithreaded multimedia-enhanced processors for MPEG-2 video decompression
- Author
-
Theo Ungerer, Heiko Oehring, and Ulrich Sigmund
- Subjects
Instructions per cycle ,Multimedia ,Computer science ,business.industry ,Pipeline burst cache ,computer.file_format ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,computer.software_genre ,Simultaneous multithreading ,Microarchitecture ,law.invention ,Microprocessor ,Hardware and Architecture ,law ,MPEG-2 ,Media processor ,Multithreading ,Embedded system ,Operating system ,business ,computer ,Software - Abstract
This paper explores microarchitecture models for a simultaneous multithreaded (SMT) processor with multimedia enhancements. We start with a wide-issue superscalar processor, enhance it by the SMT technique, by multimedia units, and by an additional on-chip RAM storage. Our workload is a multithreaded MPEG-2 video decompression algorithm that extensively uses multimedia units. The simulations show that a single-threaded, 8-issue maximum processor (assuming an abundance of resources) reaches an instructions per cycle (IPC) count of only 1.60, while an 8-threaded 8-issue processor is able to reach an IPC of 6.07. A more realistic processor model reaches an IPC of 1.27 in the single-threaded 8-issue vs 3.03 in the 4-threaded 4-issue and 3.21 in the 8-threaded 8-issue modes. Our conclusion on next generation’s microprocessors is that a 2- or 4-threaded 4-issue processor with a small on-chip RAM accessed by a local load/store unit will be superior to a wide-issue (single-threaded) superscalar processor at least for MPEG-2 style video decompression algorithms.
- Published
- 2000
- Full Text
- View/download PDF
30. A survey of new research directions in microprocessors
- Author
-
Borut Robič, Jurij Šilc, and Theo Ungerer
- Subjects
Computer Networks and Communications ,Dataflow ,Computer science ,Fetch ,Pipeline burst cache ,Parallel computing ,computer.software_genre ,Microarchitecture ,law.invention ,Microprocessor ,Artificial Intelligence ,Hardware and Architecture ,Very long instruction word ,law ,Superscalar ,Uniprocessor system ,Instruction pipeline ,Cache ,Compiler ,FR-V ,computer ,Software ,TRACE (psycholinguistics) - Abstract
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace the gate delay by on-chip wire delay as the main obstacle to increase the chip complexity and cycle rate. The implication for the microarchitecture is that functionally partitioned designs with strict nearest neighbour connections must be developed. Among the major problems facing the microprocessor designers is the application of even higher degree of speculation in combination with functional partitioning of the processor, which prepares the way for exceeding the classical dataflow limit imposed by data dependences. In this paper we survey the current approaches to solving this problem, in particular we analyse several new research directions whose solutions are based on the complex uniprocessor architecture. A uniprocessor chip features a very aggressive superscalar design combined with a trace cache and superspeculative techniques. Superspeculative techniques exceed the classical dataflow limit where even with unlimited machine resources a program cannot execute any faster than the execution of the longest dependence chain introduced by the program's data dependences. Superspeculative processors also speculate about control dependences. The trace cache stores the dynamic instruction traces contiguously and fetches instructions from the trace cache rather than from the instruction cache. Since a dynamic trace of instructions may contain multiple taken branches, there is no need to fetch from multiple targets, as would be necessary when predicting multiple branches and fetching 16 or 32 instructions from the instruction cache. Multiscalar and trace processors define several processing cores that speculatively execute different parts of a sequential program in parallel. Multiscalar processors use a compiler to partition the program segments, whereas a trace processor uses a trace cache to generate dynamically trace segments for the processing cores. A datascalar processor runs the same sequential program redundantly on several processing elements where each processing element has different data set. This paper discusses and compares the performance potential of these complex uniprocessors.
- Published
- 2000
- Full Text
- View/download PDF
31. parMERASA -- Multi-core Execution of Parallelised Hard Real-Time Applications Supporting Analysability
- Author
-
Arthur Pyka, Haluk Ozaktas, Dave George, João Carlos Lopes Fernandes, Hugues Cassé, Florian Kluge, Milos Panic, Pavel Zaykov, Theo Ungerer, Armelle Bonenfant, Ralf Jahr, Bert Böddeker, Zlatko Petrov, Hans Regler, Mike Gerdes, Andreas Hugl, Christian Bradatsch, Sascha Uhrig, Jaume Abella, Mathias Rohde, Sebastian Kehr, Francisco J. Cazorla, Ian Broster, Nick Lay, Christine Rochange, Pascal Sainrat, Eduardo Quinones, Jörg Mische, University of Augsburg [Augsburg], Honeywell International S.r.o. [Prague], DENSO (JAPAN), Bauer Group (GERMANY), Groupe de Recherche en Architecture et Compilation pour les systèmes embarqués (IRIT-TRACES), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse III - Paul Sabatier (UT3), Rapita Systems Ltd [York], Barcelona Supercomputing Center - Centro Nacional de Supercomputacion (BSC - CNS), Universitat Politècnica de Catalunya [Barcelona] (UPC), Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Technische Universität Dortmund [Dortmund] (TU), Barcelona Supercomputing Center – Centro Nacional de Supercomputación - BSC-CNS (SPAIN), Centre National de la Recherche Scientifique - CNRS (FRANCE), Consejo Superior de Investigaciones Científicas - CSIC (SPAIN), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Universitat Politècnica de Catalunya - UPC (SPAIN), Honeywell (USA), Rapita System (USA), Technische Universität Dortmund - TU Dortmund (GERMANY), University of Augsburg (GERMANY), Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France), Technical University of Catalonia – Barcelona Tech (Girona, Espagne), and Institut National Polytechnique de Toulouse - INPT (FRANCE)
- Subjects
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,Computer science ,Embedded systems ,Parallel programming ,Real-time computing ,Système d'exploitation ,Automotive industry ,Réseaux et télécommunications ,02 engineering and technology ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,Many core ,Architectures Matérielles ,0202 electrical engineering, electronic engineering, information engineering ,Mixed criticality ,Multi-core processor ,Control algorithm ,Multiprocessing systems ,business.industry ,Avionics ,Systèmes embarqués ,020202 computer hardware & architecture ,Embedded system ,Scalability ,[INFO.INFO-ES]Computer Science [cs]/Embedded Systems ,020201 artificial intelligence & image processing ,[INFO.INFO-OS]Computer Science [cs]/Operating Systems [cs.OS] ,business ,System software - Abstract
International audience; Engineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores.
- Published
- 2013
- Full Text
- View/download PDF
32. A pattern-supported parallelization approach
- Author
-
Ralf Jahr, Theo Ungerer, and Mike Gerdes
- Subjects
020203 distributed computing ,Parallelism (rhetoric) ,Data parallelism ,Computer science ,Degree of parallelism ,Task parallelism ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Automatic parallelization ,Software design pattern ,Parallel programming model ,0202 electrical engineering, electronic engineering, information engineering ,Algorithmic skeleton - Abstract
In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced here, eases the transition from sequential to parallel software. It is a novel model-based approach with clear methodology and the use of parallel design patterns as known building blocks.First the Activity and Pattern Diagram is created revealing the maximum degree of parallelism expressed by parallel design patterns. Second the degree of parallelism is reduced to the optimal level providing best performance by agglomeration of activities and patterns. By this, trade-offs are respected that are caused by the target platform, e.g. the computation-communication-ratio.As implementation for the parallel design patterns a library with algorithmic skeletons can be used. This leverages development effort and simplifies the transition from sequential to parallel code effectively.
- Published
- 2013
- Full Text
- View/download PDF
33. A Comparison of Multi-objective Algorithms for the Automatic Design Space Exploration of a Superscalar System
- Author
-
Ralf Jahr, Horia Calborean, Theo Ungerer, and Lucian Vintan
- Subjects
Heuristic (computer science) ,Design space exploration ,Computer science ,Superscalar ,Particle swarm optimization ,Algorithm - Abstract
In today’s computer architectures the design spaces are huge, thus making it very difficult to find optimal configurations. One way to cope with this problem is to use Automatic Design Space Exploration (ADSE) techniques. We developed the Framework for Automatic Design Space Exploration (FADSE) which is focused on microarchitectural optimizations. This framework includes several state-of-the art heuristic algorithms.
- Published
- 2013
- Full Text
- View/download PDF
34. Impact of message based fault detectors on applications messages in a network on chip
- Author
-
Theo Ungerer, Sebastian Weis, Sebastian Schlingmann, Arne Garbade, and Bernhard Fechner
- Subjects
010302 applied physics ,Interconnection ,business.industry ,Computer science ,Quality of service ,Packet injection ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Fault (power engineering) ,01 natural sciences ,Fault detection and isolation ,020202 computer hardware & architecture ,Network on a chip ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Routing (electronic design automation) ,business ,Computer network - Abstract
Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have been developed to mitigate the influence of faults to the system. In this paper, we investigate the message overhead for fault detection monitoring with decentralized Fault Detection Units in a unified 2D-mesh and assess the resulting delays of application messages. We investigate routing algorithms for different message types and demonstrate 19% reduction of the impact of fault detection messages on application messages. We also show the limitations of prioritized fault detection messages for different application message packet injection rates.
- Published
- 2013
- Full Text
- View/download PDF
35. Ranking of direct trust, confidence, and reputation in an abstract system with unreliable components
- Author
-
Ralf Jahr, Rolf Kiefhaber, Theo Ungerer, and Nizar Msadek
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Multi-agent system ,Reliability (computer networking) ,02 engineering and technology ,Variance (accounting) ,Organic computing ,Trusted Computing ,Machine learning ,computer.software_genre ,Variety (cybernetics) ,Ranking ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Reputation ,media_common - Abstract
Trust is an important aspect in human societies. It enables cooperation and provides means to estimate potential cooperation partners. Several works have addressed how the concept of trust can be transferred to computer systems. In this paper, we present an approach to calculate trust, including direct trust, confidence, and reputation, in a network consisting of agents with changing behavior. Our metrics are highly configurable for an adaption to a wide variety of systems and situations, especially Organic Computing Systems can benefit from trust by integrating it in their algorithms implementing self-organizational behavior. We evaluate the effect of direct trust and confidence together with reputation (DTCR) in comparison with using only direct trust (DT) or direct trust with confidence (DTC). Because these metrics can be configured with many parameters leading to an immense number of possible configurations we apply a heuristic optimization algorithm to find very good setups showing the highest benefits. For this evaluation, an abstract scenario is developed and applied, it consists of unreliable components from different classes of defined mean behavior. This general scenario could model many possible industrial settings out of which a few are introduced, too. Our evaluations show that reputation and direct trust are best used together with a fluent transition between them defined by the confidence. In all cases, reputation works as a corrective when direct trust information is not optimal and potentially misleading. This leads to very good results with very limited variance, particularly we show that a small number of interactions are sufficient to obtain the best results.
- Published
- 2013
- Full Text
- View/download PDF
36. Information Dissemination in Distributed Organic Computing Systems with Distributed Hash Tables
- Author
-
Theo Ungerer, Michael Roth, Julia Schmitt, and Florian Kluge
- Subjects
business.industry ,Computer science ,Distributed computing ,Node (networking) ,Network layer ,Application layer ,Hash table ,Atomic broadcast ,Distributed data store ,Broadcast communication network ,Unicast ,business ,Communications protocol ,Computer network - Abstract
Decision making in a self-managing distributed system requires information about the system's state. Accurate and timely information enables the overall system to respond better to state changes. Distributed systems can use different network protocols to connect the nodes. Since there is no guarantee that all protocols are able to send broadcasts or that broadcasts can be sent over different protocols we use distributed hash tables to enable an application layer broadcast, which only sends unicast messages in the network layer to spread node status information in a distributed system. Our research shows that we can spread information without sending unnecessary messages. By choosing the node IDs systematically, instead of generating them randomly, we can influence the network usage in badly connected network segments.
- Published
- 2012
- Full Text
- View/download PDF
37. The Split-Phase Synchronisation Technique: Reducing the Pessimism in the WCET Analysis of Parallelised Hard Real-Time Programs
- Author
-
Mike Gerdes, Florian Kluge, Theo Ungerer, and Christine Rochange
- Subjects
010302 applied physics ,Multi-core processor ,Atomicity ,Computer science ,business.industry ,Split-phase electric power ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Memory controller ,Synchronization ,020202 computer hardware & architecture ,Instruction set ,Consistency (database systems) ,Software ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,business - Abstract
In this paper we present the split-phase synchronisation technique to reduce the pessimism in the WCET analysis of parallelised hard real-time (HRT) programs on embedded multi-core processors. We implemented the split-phase synchronisation technique in the memory controller of the HRT capable MERASA multi-core processor. The split-phase synchronisation technique allows reordering memory requests and splitting of atomic RMW operations, while preserving atomicity, consistency and timing predictability. We determine the improvement of worst-case guarantees, that is the estimated upper bounds, for two parallelised HRT programs. We achieve a WCET improvement of up to 1.26 with the split-phase synchronisation technique, and an overall WCET improvement of up to 2.9 for parallel HRT programs with different software synchronisations.
- Published
- 2012
- Full Text
- View/download PDF
38. Impact of Instruction Cache and Different Instruction Scratchpads on the WCET Estimate
- Author
-
Stefan Metzlaff and Theo Ungerer
- Subjects
Instruction set ,Hardware_MEMORYSTRUCTURES ,Memory management ,Computer science ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,Algorithm design ,Parallel computing ,Cache ,Function (mathematics) - Abstract
Hard real-time systems demand high performance, but also tight WCET estimates. The tightness of the WCET estimates strictly depends on the WCET analysis of the memory system. In this paper we quantify the impact of different instruction memories on the WCET estimates. A function-based dynamic scratchpad, a cache, and static scratchpads are compared. Furthermore, we inspect the pessimism introduced by memory access interferences at the shared off-chip memory level. It is shown that the function-based dynamic instruction scratchpad provides lower WCET estimates, because it eliminates these interferences by design. Thus the function-based dynamic scratchpad eases the analysis while also provides tight WCET estimates.
- Published
- 2012
- Full Text
- View/download PDF
39. Fine-grained timing and control flow error checking for hard real-time task execution
- Author
-
Bernhard Fechner, Theo Ungerer, Julian Wolf, and Sascha Uhrig
- Subjects
Multi-core processor ,Correctness ,Control flow ,Robustness (computer science) ,Computer science ,business.industry ,Software fault tolerance ,Embedded system ,Real-time computing ,Chip ,business ,Fault detection and isolation ,Data-flow analysis - Abstract
Robustness and reliability are essential requirements of today's embedded systems. Especially errors in the control flow of a program, e.g. caused by transient errors, may lead to a faulty system behavior potentially with catastrophic consequences. Several methods for control flow checking have been proposed during the last decades. However, these techniques mostly focus on a correct sequence of application parts but not on the correct timing behavior of the control flow, which is essential for hard real-time systems. In this paper, we present a new approach which introduces fine-grained on-line timing checks for hard real-time systems combined with a lightweight control flow monitoring technique. The proposed approach is a hybrid hardware-software technique: We instrument the application code at compile-time by adding checkpoints, which contain temporal and logical information of the control flow. During run-time, a small hardware check unit connected to the core reads the instrumented data in order to verify the correctness of the application's control flow and timing behavior. The finegrained functionality of our mechanism allows a detection of many transient errors, associated with very low detection latency. It is no longer necessary to redundantly execute code in order to monitor anomalies. The hardware overhead is limited to a small check unit (only 0.5 % of chip space compared to the processor core); according to experimental results, the execution time overhead is only 10.6 % in the average case while the memory overhead is 12.3 %.
- Published
- 2012
- Full Text
- View/download PDF
40. Fault coverage of a timing and control flow checker for hard real-time systems
- Author
-
Theo Ungerer, Bernhard Fechner, and Julian Wolf
- Subjects
business.industry ,Computer science ,Real-time computing ,Control reconfiguration ,Fault tolerance ,Fault detection and isolation ,Reliability engineering ,Stuck-at fault ,Embedded system ,Software fault tolerance ,Fault coverage ,Dependability ,Fault model ,business - Abstract
Dependability is a crucial requirement of today's embedded systems. To achieve a higher level of fault tolerance, it is necessary to develop and integrate mechanisms for a reliable fault detection. In the context of hard real-time computing, such a mechanism should also guarantee correct timing behavior, an essential requirement for these systems. In this paper, we present results of the fault coverage of a lightweight timing and control flow checker for hard real-time systems. An experimental evaluation shows that more than 30% of injected faults can be detected by our technique, while the number of errors leading to an endless loop is reduced by around 80 %. The check mechanism causes only very low overhead concerning additional memory usage (15.0% on average) and execution time (12.2% on average).
- Published
- 2012
- Full Text
- View/download PDF
41. Boosting Design Space Explorations with Existing or Automatically Learned Knowledge
- Author
-
Horia Calborean, Lucian Vintan, Theo Ungerer, and Ralf Jahr
- Subjects
Boosting (machine learning) ,Fuzzy rule ,Fuzzy Control Language ,Computer science ,Decision tree ,Benchmarking ,Data mining ,Energy consumption ,Grid ,computer.software_genre ,Multi-objective optimization ,computer ,computer.programming_language - Abstract
During development, processor architectures can be tuned and configured by many different parameters. For benchmarking, automatic design space explorations (DSEs) with heuristic algorithms are a helpful approach to find the best settings for these parameters according to multiple objectives, e.g. performance, energy consumption, or real-time constraints. But if the setup is slightly changed and a new DSE has to be performed, it will start from scratch, resulting in very long evaluation times. To reduce the evaluation times we extend the NSGA-II algorithm in this article, such that automatic DSEs can be supported with a set of transformation rules defined in a highly readable format, the fuzzy control language (FCL). Rules can be specified by an engineer, thereby representing existing knowledge. Beyond this, a decision tree classifying high-quality configurations can be constructed automatically and translated into transformation rules. These can also be seen as very valuable result of a DSE because they allow drawing conclusions on the influence of parameters and describe regions of the design space with high density of good configurations. Our evaluations show that automatically generated decision trees can classify near optimal configurations for the hardware parameters of the Grid ALU Processor (GAP) and M-Sim 2. Further evaluations show that automatically constructed transformation rules can reduce the number of evaluations required to reach the same quality of results as without rules by 43%, leading to a significant saving of time of about 25%. In the demonstrated example using rules also leads to better results.
- Published
- 2012
- Full Text
- View/download PDF
42. Realizing self-x properties by an automated planner
- Author
-
Theo Ungerer, Florian Kluge, Michael Roth, Julia Schmitt, and Rolf Kiefhaber
- Subjects
Computer science ,business.industry ,Feature (computer vision) ,Embedded system ,Distributed computing ,Middleware (distributed applications) ,Organic computing ,business ,Planner ,computer.software_genre ,computer ,computer.programming_language - Abstract
Organic Computing Systems feature self-organization techniques to manage complex distributed systems. This paper proposes an implementation of self-x techniques in an organic middleware. We extend a middleware by an Organic Manager that is based on an automated planner. The Organic Manager unites self-x features which were formerly implemented separately.
- Published
- 2011
- Full Text
- View/download PDF
43. Large drilling machine control code — Parallelisation and WCET speedup
- Author
-
Guillem Bernat, Theo Ungerer, Irakli Guliashvili, Mike Gerdes, Stefan Schnitzler, Julian Wolf, Michael Houston, and Hans Regler
- Subjects
010302 applied physics ,Multi-core processor ,Speedup ,business.industry ,Computer science ,Automotive industry ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,02 engineering and technology ,Parallel computing ,Avionics ,01 natural sciences ,020202 computer hardware & architecture ,Software ,Worst-case execution time ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,business ,Machine control - Abstract
Hard real-time applications in safety-critical domains — namely avionics, automotive, and machinery — require high-performance and timing analysability. We present research results of the parallelisation and WCET analysis of an industrial hard real-time application, i.e. the control code of a large drilling machine from BAUER Maschinen GmbH. We reached a quad-core speedup of 2.62 for the maximum observed execution time (MOET) and 1.93 on the WCET compared to the sequential version. For the WCET analysis we used the measurement-based WCET analysis tool RapiTime.
- Published
- 2011
- Full Text
- View/download PDF
44. Static Speculation as Post-link Optimization for the Grid Alu Processor
- Author
-
Basher Shehan, Theo Ungerer, Ralf Jahr, and Sascha Uhrig
- Subjects
Speedup ,Computer science ,Factor (programming language) ,Basic block ,Code (cryptography) ,Speculative execution ,Parallel computing ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Instruction-level parallelism ,Grid ,computer ,computer.programming_language ,Block (data storage) - Abstract
In this paper we propose and evaluate a post-link-optimization to increase instruction level parallelism by moving instructions from one basic block to the preceding blocks. The Grid Alu Processor used for the evaluations comprises plenty of functional units that are not completely allocated by the original instruction stream. The proposed technique speculatively performs operations in advance by using unallocated functional units. The algorithm moves instructions to multiple predecessors of a source block. If necessary, it adds compensation code to allow the shifted instructions to work on unused registers, whose values will be copied into the original target registers at the time the speculation is resolved. Evaluations of the algorithm show a maximum speedup of factor 2.08 achieved on the Grid Alu Processor compared to the unoptimized version of the same program due to a better exploitation of the ILP and an optimized mapping of loops.
- Published
- 2011
- Full Text
- View/download PDF
45. Organic Computing Middleware for Ubiquitous Environments
- Author
-
Rolf Kiefhaber, Florian Kluge, Theo Ungerer, Julia Schmitt, and Michael Roth
- Subjects
Context-aware pervasive systems ,Computer science ,Distributed computing ,Organic computing ,Planner ,computer ,Controller architecture ,computer.programming_language - Abstract
The complexity of computer systems has been increasing during the past years. To control this complexity organic computing introduces the self-x features. The Organic Computing Middleware for Ubiquitous Environments eases to manage distributed computing systems by using self-configuration, self-optimisation, self-healing and self-protection. To provide these self-x features the latest version of our middleware uses an Observer/Controller architecture with an automated planner. Planning is time consuming so we introduced additionally reflexes for faster reactions. The reflexes are learned from previous plans and can be distributed to resource restricted nodes.
- Published
- 2011
- Full Text
- View/download PDF
46. Concept of a reflex manager to enhance the planner component of an autonomic / organic system
- Author
-
Florian Kluge, Michael Roth, Rolf Kiefhaber, Julia Schmitt, and Theo Ungerer
- Subjects
Speedup ,Computer science ,business.industry ,Distributed computing ,0102 computer and information sciences ,02 engineering and technology ,Planner ,01 natural sciences ,Autonomic computing ,010201 computation theory & mathematics ,Component (UML) ,Middleware ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Reflex ,020201 artificial intelligence & image processing ,State (computer science) ,business ,computer ,computer.programming_language - Abstract
The administration of complex distributed systems is complex and therefore time-consuming. It becomes crucial to evolve techniques to speed up reaction times and support embedded nodes. Higher mammals use reflexes to ensure fast reactions in critical situations. This paper devolves this behavior to the organic middleware OCµ. With in this middleware an Automated Planner is used to administrate a distributed system. A new component called Reflex Manager is responsible to store its solutions. If the system reaches a state, that is similar to an already known one, the Reflex Manager uses its knowledge to quickly provide a solution. Conflicts between plans from the planner and the Reflex Manager are resolved by comparing and switching plans. Finally we discuss possible generalizations of our ideas.
- Published
- 2011
- Full Text
- View/download PDF
47. Dynamic Classification for Embedded Real-Time Systems
- Author
-
Jörg Mische, Theo Ungerer, and Florian Kluge
- Subjects
ComputingMilieux_THECOMPUTINGPROFESSION ,Computer science ,InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS ,ComputerApplications_GENERAL ,Real-time computing ,Organic computing ,Current (fluid) - Abstract
This article summarises the current status of the CAR-SoC project and gives an outlook to further research.
- Published
- 2011
- Full Text
- View/download PDF
48. Connectivity-sensitive algorithm for task placement on a many-core considering faulty regions
- Author
-
Sebastian Schlingmann, Sebastian Weis, Theo Ungerer, and Arne Garbade
- Subjects
Very-large-scale integration ,Computer science ,Fault tolerance ,02 engineering and technology ,Parallel computing ,Chip ,Network topology ,020202 computer hardware & architecture ,Load management ,Network on a chip ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Routing (electronic design automation) ,Algorithm - Abstract
Future many-core chips are envisioned to feature up to a thousand cores on a chip. With an increasing number of cores on a chip the problem of distributing load gets more prevalent. Even if a piece of software is designed to exploit parallelism it is not an easy to place parallel tasks on the cores to achieve maximum performance. This paper proposes the connectivity-sensitive algorithm for static task-placement onto a 2D mesh of interconnected cores. The decreased feature sizes of future VLSI chips will increase the number of permanent and transient faults. To accommodate partially faulty hardware the algorithm is designed to allow placement on irregular core structures, in particular, meshes with faulty nodes and links. The quality of the placement is measured by comparing the results to two baseline algorithms in terms of communication efficiency.
- Published
- 2011
- Full Text
- View/download PDF
49. Organic Computing — A Paradigm Shift for Complex Systems
- Author
-
Hartmut Schmeck, Theo Ungerer, and Christian Mller-Schloer
- Subjects
Engineering ,Architectural pattern ,business.industry ,Paradigm shift ,Complex system ,Intelligent decision support system ,Reconfigurability ,Context (language use) ,Organic computing ,business ,Data science ,Simulation ,Variety (cybernetics) - Abstract
Organic Computing has emerged as a challenging vision for future information processing systems. Its basis is the insight that we will increasingly be surrounded by and depend on large collections of autonomous systems, which are equipped with sensors and actuators, aware of their environment, communicating freely, and organising themselves in order to perform actions and services required by the users.These networks of intelligent systems surrounding us open fascinating ap-plication areas and at the same time bear the problem of their controllability. Hence, we have to construct such systems as robust, safe, flexible, and trustworthy as possible. In particular, a strong orientation towards human needs as opposed to a pure implementation of the tech-nologically possible seems absolutely central. The technical systems, which can achieve these goals will have to exhibit life-like or "organic" properties. "Organic Computing Systems" adapt dynamically to their current environmental conditions. In order to cope with unexpected or undesired events they are self-organising, self-configuring, self-optimising, self-healing, self-protecting, self-explaining, and context-aware, while offering complementary interfaces for higher-level directives with respect to the desired behaviour. First steps towards adaptive and self-organising computer systems are being undertaken. Adaptivity, reconfigurability, emergence of new properties, and self-organisation are hot top-ics in a variety of research groups worldwide.This book summarises the results of a 6-year priority research program (SPP) of the German Research Foundation (DFG) addressing these fundamental challenges in the design of Organic Computing systems. It presents and discusses the theoretical foundations of Organic Computing, basic methods and tools, learning techniques used in this context, architectural patterns and many applications. The final outlook shows that in the mean-time Organic Computing ideas have spawned a variety of promising new projects.
- Published
- 2011
- Full Text
- View/download PDF
50. A dynamic instruction scratchpad memory for embedded processors managed by hardware
- Author
-
Theo Ungerer, Sascha Uhrig, Irakli Guliashvili, and Stefan Metzlaff
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer science ,Interference theory ,Static timing analysis ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,Instruction memory ,Computer architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Granularity ,business ,Field-programmable gate array ,Computer hardware ,Scratchpad memory ,Content management - Abstract
This paper proposes a hardware managed instruction scratchpad on the granularity of functions which is designed for realtime systems. It guarantees that every instruction will be fetched from the local, fast and timing predictable scratchpad memory. Thus, a predictable behavior is reached that eases a precise timing analysis of the system. We estimate the hardware resources required to implement the dynamic instruction scratchpad for an FPGA. An evaluation quantifies the impact of our scratchpad on average case performance. It shows that the dynamic instruction scratchpad compared to standard instruction memories has a reasonable performance - while providing predictable behavior and easing timing analysis.
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.