1. Improving concurrency and memory usage in distributed operating systems for lightweight manycores via cooperative time-sharing lightweight tasks.
- Author
-
Souto, João Vicente and Castro, Márcio
- Subjects
- *
FLOWGRAPHS , *MEMORY , *ENERGY consumption , *SYSTEMS design - Abstract
Lightweight manycore processors arise to reconcile performance, energy efficiency, and scalability requirements on a single chip. Operating Systems (OSes) for these processors feature a distributed design, where isolated OS instances cooperate to mitigate programmability and portability issues coming from their architectural intricacies. Currently, OS services often resort to traditional execution flow abstractions (processes or threads) to implement small, periodic, or asynchronous functionalities. Although these abstractions considerably simplify the system design, they have a non-negotiable impact on the limited on-chip memories. Due to the memory restrictions, we argue that OS-level abstractions can be reshaped to reduce the OS memory footprint without introducing considerable overhead. In this context, we propose a complementary OS-level execution engine that supports cooperative time-sharing lightweight tasks that share a unique execution stack and features task synchronization via control flow and dependency graphs. This solution is orthogonal to the underlying execution support and provides numerous OS-level execution flows with reduced memory consumption. We implemented our engine in a distributed OS and executed experiments on a lightweight manycore. Our results show that it has the following advantages when compared to the classical thread abstraction: (i) it provides 63.2× more execution flows per MB of memory; (ii) it features less overhead to manage execution flows and system calls; (iii) it improves core utilization; and (iv) it exhibits competitive results on real-world applications. • The task-based engine reduces the OS memory footprint, providing 63x more execution flows per MB of memory. • The task-based engine improves core utilization and features less overhead to manage execution flows and system calls. • The OS latency of page invalidations is reduced with the task-based approach. • The performance of MPI applications is not significantly affected when communications are handled by the task-based engine. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF