Author: "Marazakis, Manolis" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Marazakis, Manolis"' showing total 131 results

Start Over Author "Marazakis, Manolis"

131 results on '"Marazakis, Manolis"'

1. Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Author: Chazapis, Antony, Maliaroudakis, Evangelos, Nikolaidis, Fotis, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The escalating complexity of applications and services encourages a shift towards higher-level data processing pipelines that integrate both Cloud-native and HPC steps into the same workflow. Cloud providers and HPC centers typically provide both execution platforms on separate resources. In this paper we explore a more practical design that enables running unmodified Cloud-native workloads directly on the main HPC cluster, avoiding resource partitioning and retaining the HPC center's existing job management and accounting policies.
Published: 2024

2. The ExaNeSt Prototype: Evaluation of Efficient HPC Communication Hardware in an ARM-based Multi-FPGA Rack

Author: Ploumidis, Manolis, Chaix, Fabien, Chrysos, Nikolaos, Assiminakis, Marios, Flouris, Vassilis, Kallimanis, Nikolaos, Kossifidis, Nikolaos, Nikoloudakis, Michael, Petrakis, Polydoros, Dimou, Nikolaos, Gianioudis, Michael, Ieronymakis, George, Ioannou, Aggelos, Kalokerinos, George, Xirouchakis, Pantelis, Ailamakis, George, Damianakis, Astrinos, Ligerakis, Michael, Makris, Ioannis, Vavouris, Theocharis, Katevenis, Manolis, Papaefstathiou, Vassilis, Marazakis, Manolis, and Mavroidis, Iakovos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We present and evaluate the ExaNeSt Prototype, a liquid-cooled rack prototype consisting of 256 Xilinx ZU9EG MPSoCs, 4 TBytes of DRAM, 16 TBytes of SSD, and configurable interconnection 10-Gbps hardware. We developed this testbed in 2016-2019 to validate the flexibility of FPGAs for experimenting with efficient hardware support for HPC communication among tens of thousands of processors and accelerators in the quest towards Exascale systems and beyond. We present our key design choices reagrding overall system architecture, PCBs and runtime software, and summarize insights resulting from measurement and analysis. Of particular note, our custom interconnect includes a low-cost low-latency network interface, offering user-level zero-copy RDMA, which we have tightly coupled with the ARMv8 processors in the MPSoCs. We have developed a system software runtime on top of these features, and have been able to run MPI. We have evaluated our testbed through MPI microbenchmarks, mini, and full MPI applications. Single hop, one way latency is $1.3$~$\mu$s; approximately $0.47$~$\mu$s out of these are attributed to network interface and the user-space library that exposes its functionality to the runtime. Latency over longer paths increases as expected, reaching $2.55$~$\mu$s for a five-hop path. Bandwidth tests show that, for a single hop, link utilization reaches $82\%$ of the theoretical capacity. Microbenchmarks based on MPI collectives reveal that broadcast latency scales as expected when the number of participating ranks increases. We also implemented a custom Allreduce accelerator in the network interface, which reduces the latency of such collectives by up to $88\%$. We assess performance scaling through weak and strong scaling tests for HPCG, LAMMPS, and the miniFE mini application; for all these tests, parallelization efficiency is at least $69\%$, or better., Comment: 45 pages, 23 figures
Published: 2023

3. Co-design and Software Architecture

Author: Gheller, Claudio, Marazakis, Manolis, Suarez, Estela, Taffoni, Giuliano, Burton, W.B., Series Editor, Shore, Steven N., Series Editor, Vardoulaki, Eleni, editor, Dembska, Marta, editor, Drabent, Alexander, editor, and Hoeft, Matthias, editor
Published: 2024
Full Text: View/download PDF

4. Computing Infrastructure

Author: Russo, Stefano Alberto, Suarez, Estela, Chazapis, Antony, Marazakis, Manolis, Taffoni, Giuliano, Burton, W.B., Series Editor, Shore, Steven N., Series Editor, Vardoulaki, Eleni, editor, Dembska, Marta, editor, Drabent, Alexander, editor, and Hoeft, Matthias, editor
Published: 2024
Full Text: View/download PDF

5. Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC

Author: Zaourar, Lilia, Benazouz, Mohamed, Mouhagir, Ayoub, Falquez, Carlos, Portero, Antoni, Ho, Nam, Suarez, Estela, Petrakis, Polydoros, Marazakis, Manolis, Sgherzi, Francesco, Fernandez, Ivan, Dolbeau, Romain, Pleiter, Dirk, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fey, Dietmar, editor, Stabernack, Benno, editor, Lankes, Stefan, editor, Pacher, Mathias, editor, and Pionteck, Thilo, editor
Published: 2024
Full Text: View/download PDF

6. Frisbee: automated testing of Cloud-native applications in Kubernetes

Author: Nikolaidis, Fotis, Chazapis, Antony, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: As more and more companies are migrating (or planning to migrate) from on-premise to Cloud, their focus is to find anomalies and deficits as early as possible in the development life cycle. We propose Frisbee, a declarative language and associated runtime components for testing cloud-native applications on top of Kubernetes. Given a template describing the system under test and a workflow describing the experiment, Frisbee automatically interfaces with Kubernetes to deploy the necessary software in containers, launch needed sidecars, execute the workflow steps, and perform automated checks for deviation from expected behavior. We evaluate Frisbee through a series of tests, to demonstrate its role in designing, and evaluating cloud-native applications; Frisbee helps in testing uncertainties at the level of application (e.g., dynamically changing request patterns), infrastructure (e.g., crashes, network partitions), and deployment (e.g., saturation points). Our findings have strong implications for the design, deployment, and evaluation of cloud applications. The most prominent is that: erroneous benchmark outputs can cause an apparent performance improvement, automated failover mechanisms may require interoperability with clients, and that a proper placement policy should also account for the clock frequency, not only the number of cores.
Published: 2021

7. Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement

Author: Vardas, Ioannis, Ploumidis, Manolis, and Marazakis, Manolis
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, C.4
Abstract: HPC systems keep growing in size to meet the ever-increasing demand for performance and computational resources. Apart from increased performance, large scale systems face two challenges that hinder further growth: energy efficiency and resiliency. At the same time, applications seeking increased performance rely on advanced parallelism for exploiting system resources, which leads to increased pressure on system interconnects. At large system scales, increased communication locality can be beneficial both in terms of application performance and energy consumption. Towards this direction, several studies focus on deriving a mapping of an application's processes to system nodes in a way that communication cost is reduced. A common approach is to express both the application's communication patterns and the system architecture as graphs and then solve the corresponding mapping problem. Apart from communication cost, the completion time of a job can also be affected by node failures. Node failures may result in job abortions, requiring job restarts. In this paper, we address the problem of assigning processes to system resources with the goal of reducing communication cost while also taking into account node failures. The proposed approach is integrated into the Slurm resource manager. Evaluation results show that, in scenarios where few nodes have a low outage probability, the proposed process placement approach achieves a notable decrease in the completion time of batches of MPI jobs. Compared to the default process placement approach in Slurm, the reduction is 18.9% and 31%, respectively for two different MPI applications., Comment: 21 pages, 8 figures, added Acknowledgements section
Published: 2020

8. COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

Author: Portero, Antoni, Falquez, Carlos, Ho, Nam, Petrakis, Polydoros, Nassyr, Stepan, Marazakis, Manolis, Dolbeau, Romain, Cifuentes, Jorge Alejandro Nocua, Alvarez, Luis Bertran, Pleiter, Dirk, Suarez, Estela, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goumas, Georgios, editor, Tomforde, Sven, editor, Brehm, Jürgen, editor, Wildermann, Stefan, editor, and Pionteck, Thilo, editor
Published: 2023
Full Text: View/download PDF

9. Running Kubernetes Workloads on HPC

Author: Chazapis, Antony, Nikolaidis, Fotis, Marazakis, Manolis, Bilas, Angelos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bienz, Amanda, editor, Weiland, Michèle, editor, Baboulin, Marc, editor, and Kruse, Carola, editor
Published: 2023
Full Text: View/download PDF

10. Event-Driven Chaos Testing for Containerized Applications

Author: Nikolaidis, Fotis, Chazapis, Antony, Marazakis, Manolis, Bilas, Angelos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bienz, Amanda, editor, Weiland, Michèle, editor, Baboulin, Marc, editor, and Kruse, Carola, editor
Published: 2023
Full Text: View/download PDF

11. Power and Performance Analysis of Persistent Key-Value Stores

Author: Mikrou, Stella, Papagiannis, Anastasios, Saloustros, Giorgos, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Abstract: With the current rate of data growth, processing needs are becoming difficult to fulfill due to CPU power and energy limitations. Data serving systems and especially persistent key-value stores have become a substantial part of data processing stacks in the data center, providing access to massive amounts of data for applications and services. Key-value stores exhibit high CPU and I/O overheads because of their constant need to reorganize data on the devices. In this paper, we examine the efficiency of two key-value stores on four servers of different generations and with different CPU architectures. We use RocksDB, a key-value that is deployed widely, e.g. in Facebook, and Kreon, a research key-value store that has been designed to reduce CPU overhead. We evaluate their behavior and overheads on an ARM-based microserver and three different generations of x86 servers. Our findings show that microservers have better power efficiency in the range of 0.68-3.6x with a comparable tail latency.
Published: 2020

12. Shall numerical astrophysics step into the era of Exascale computing?

Author: Taffoni, Giuliano, Murante, Giuseppe, Tornatore, Luca, Goz, David, Borgani, Stefano, Katevenis, Manolis, Chrysos, Nikolaos, and Marazakis, Manolis
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: High performance computing numerical simulations are today one of the more effective instruments to implement and study new theoretical models, and they are mandatory during the preparatory phase and operational phase of any scientific experiment. New challenges in Cosmology and Astrophysics will require a large number of new extremely computationally intensive simulations to investigate physical processes at different scales. Moreover, the size and complexity of the new generation of observational facilities also implies a new generation of high performance data reduction and analysis tools pushing toward the use of Exascale computing capabilities. Exascale supercomputers cannot be produced today. We discuss the major technological challenges in the design, development and use of such computing capabilities and we will report on the progresses that has been made in the last years in Europe, in particular in the framework of the ExaNeSt European funded project. We also discuss the impact of this new computing resources on the numerical codes in Astronomy and Astrophysics., Comment: 3 figures, invited talk for proceedings of ADASS XXVI, accepted by ASP Conference Series
Published: 2019

13. Interactive, Cloud-Native Workflows on HPC Using KNoC

Author: Maliaroudakis, Evangelos, Chazapis, Antony, Kanterakis, Alexandros, Marazakis, Manolis, Bilas, Angelos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Anzt, Hartwig, editor, Bienz, Amanda, editor, Luszczek, Piotr, editor, and Baboulin, Marc, editor
Published: 2022
Full Text: View/download PDF

14. Exploring the Impact of Node Failures on the Resource Allocation for Parallel Jobs

Author: Vardas, Ioannis, Ploumidis, Manolis, Marazakis, Manolis, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chaves, Ricardo, editor, B. Heras, Dora, editor, Ilic, Aleksandar, editor, Unat, Didem, editor, Badia, Rosa M., editor, Bracciali, Andrea, editor, Diehl, Patrick, editor, Dubey, Anshu, editor, Sangyoon, Oh, editor, L. Scott, Stephen, editor, and Ricci, Laura, editor
Published: 2022
Full Text: View/download PDF

15. Trace-Based Workload Generation and Execution

Author: Sfakianakis, Yannis, Kanellou, Eleni, Marazakis, Manolis, Bilas, Angelos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sousa, Leonel, editor, Roma, Nuno, editor, and Tomás, Pedro, editor
Published: 2021
Full Text: View/download PDF

16. HugeMap: Optimizing Memory-Mapped I/O with Huge Pages for Fast Storage

Author: Malliotakis, Ioannis, Papagiannis, Anastasios, Marazakis, Manolis, Bilas, Angelos, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Balis, Bartosz, editor, B. Heras, Dora, editor, Antonelli, Laura, editor, Bracciali, Andrea, editor, Gruber, Thomas, editor, Hyun-Wook, Jin, editor, Kuhn, Michael, editor, Scott, Stephen L., editor, Unat, Didem, editor, and Wyrzykowski, Roman, editor
Published: 2021
Full Text: View/download PDF

17. HugeMap: Optimizing Memory-Mapped I/O with Huge Pages for Fast Storage

Author: Malliotakis, Ioannis, primary, Papagiannis, Anastasios, additional, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

18. Trace-Based Workload Generation and Execution

Author: Sfakianakis, Yannis, primary, Kanellou, Eleni, additional, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

19. Redesign of astrophysical codes for exascale computing: the SPACE experience

Author: Ibsen, Jorge, Chiozzi, Gianluca, Taffoni, Giuliano, Mignone, Andrea, Tornatore, Luca, Sciacca, Eva, Guarrasi, Massimiliano, Lapenta, Giovanni, Riha, Lubomir, Vavrik, Radim, Vysocky, Ondrej, Kadlubiak, Kristian, Strakos, Petr, Jaros, Milan, Dolag, Klaus, Commercon, Benoit, Rezzolla, Luciano, Pierre, Khalil, Doulis, Georgios, Shen, Sijing, Marazakis, Manolis, Gregori, Daniele, Boella, Elisabetta, Perna, Gino, Zanotti, Marisa, Raffin, Erwan, Polsterer, Kai, Trujillo Gomez, Sebastian, and Marin, Guillermo
Published: 2024
Full Text: View/download PDF

20. Next generation of Exascale-class systems: ExaNeSt project and the status of its interconnect and storage development

Author: Katevenis, Manolis, Ammendola, Roberto, Biagioni, Andrea, Cretaro, Paolo, Frezza, Ottorino, Lo Cicero, Francesca, Lonardo, Alessandro, Martinelli, Michele, Paolucci, Pier Stanislao, Pastorelli, Elena, Simula, Francesco, Vicini, Piero, Taffoni, Giuliano, Pascual, Jose A., Navaridas, Javier, Luján, Mikel, Goodacre, John, Lietzow, Bernd, Mouzakitis, Angelos, Chrysos, Nikolaos, Marazakis, Manolis, Gorlani, Paolo, Cozzini, Stefano, Brandino, Giuseppe Piero, Koutsourakis, Panagiotis, Ruth, Joeri van, Zhang, Ying, and Kersten, Martin
Published: 2018
Full Text: View/download PDF

21. Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors

Author: Katevenis, George, primary, Ploumidis, Manolis, additional, and Marazakis, Manolis, additional
Published: 2023
Full Text: View/download PDF

22. User-Space I/O for s-level Storage Devices

Author: Papagiannis, Anastasios, Saloustros, Giorgos, Marazakis, Manolis, Bilas, Angelos, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Taufer, Michela, editor, Mohr, Bernd, editor, and Kunkel, Julian M., editor
Published: 2016
Full Text: View/download PDF

23. Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors - Computational Artifacts

Author: Katevenis, George, Ploumidis, Manolis, and Marazakis, Manolis
Subjects: shared-memory, multi-core, cache coherency, HPC, MPI, collectives, Intel Xeon Scalable, broadcast, intra-node
Abstract: Collection of computationtal artifacts (source code, scripts, datasets, instructions) for reproducibility of experiments featured in the associated paper: Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors George Katevenis, Manolis Ploumidis, and Manolis Marazakis ICPP 2023, Salt Lake City, Utah, USA
Published: 2023
Full Text: View/download PDF

24. eProcessor

Author: Alvarez, Lluc, primary, Ruiz, Abraham, additional, Bigas-Soldevilla, Arnau, additional, Kuroedov, Pavel, additional, Gonzalez, Alberto, additional, Mahale, Hamsika, additional, Bustamante, Noe, additional, Aguilera, Albert, additional, Minervini, Francesco, additional, Salamero, Javier, additional, Palomar, Oscar, additional, Papaefstathiou, Vassilis, additional, Psathakis, Antonis, additional, Dimou, Nikolaos, additional, Giaourtas, Michalis, additional, Mastorakis, Iasonas, additional, Ieronymakis, Georgios, additional, Matzouranis, Georgios-Michail, additional, Flouris, Vasilis, additional, Kossifidis, Nick, additional, Marazakis, Manolis, additional, Goel, Bhavishya, additional, Manivannan, Madhavan, additional, Ejaz, Ahsen, additional, Strikos, Panagiotis, additional, Vázquez, Mateo, additional, Sourdis, Ioannis, additional, Trancoso, Pedro, additional, Stenström, Per, additional, Hagemeyer, Jens, additional, Tigges, Lennart, additional, Kucza, Nils, additional, Philippe, Jean-Marc, additional, and Papaefstathiou, Ioannis, additional
Published: 2023
Full Text: View/download PDF

25. RISER: The first All- European RISC-V Cloud Server Infrastructure

Author: Marazakis, Manolis and Louloudakis, Stelios
Abstract: Public announcement of the RISER ('RISC-V for Cloud Services') project, in ERCIM News, issue Nr. 133 (April 2023). &nbsp
Published: 2023
Full Text: View/download PDF

26. RISER: Raising RISC-V to the cloud

Author: Marazakis, Manolis and Louloudakis, Stelios
Abstract: First public announcement of the RISER ('RISC-V for Cloud Services') project, in the HiPEACInfo magazine (issue Nr. 68, January 2023).
Published: 2023
Full Text: View/download PDF

27. ETP4HPC's SRA 5 - Strategic Research Agenda for High-Performance Computing in Europe - 2022

Author: Malms, Michael, Cargemel, Laurent, Suarez, Estela, Mittenzwey, Nico, Duranton, Marc, Sezer, Sakir, Prunty, Craig, Rossé-Laurent, Pascale, Pérez-Harnandez, Maria, Marazakis, Manolis, Lonsdale, Guy, Carpenter, Paul, Antoniu, Gabriel, Narasimharmurthy, Sai, Brinkman, André, Pleiter, Dirk, Haus, Utz-Uwe, Krueger, Jens, Hoppe, Hans-Christian, Laure, Erwin, Wierse, Andreas, Bartsch, Valeria, Michielsen, Kristel, Allouche, Cyril, Becker, Tobias, and Haas, Robert
Abstract: This document feeds research and development priorities devel-oped by the European HPC ecosystem into EuroHPC’s Research and Innovation Advisory Group with an aim to define the HPC Technology research Work Programme and the calls for proposals included in it and to be launched from 2023 to 2026. This SRA also describes the major trends in the deployment of HPC and HPDA methods and systems, driven by economic and societal needs in Europe, taking into account the changes ex-pected in the technologies and architectures of the expanding underlying IT infrastructure. The goal is to draw a complete pic-ture of the state of the art and the challenges for the next three to four years rather than to focus on specific technologies, implementations or solutions.
Published: 2022
Full Text: View/download PDF

28. A framework for hierarchical single-copy MPI collectives on multicore nodes

Author: Katevenis, George, primary, Ploumidis, Manolis, additional, and Marazakis, Manolis, additional
Published: 2022
Full Text: View/download PDF

29. Using gLite to Implement a Secure ICGrid

Author: Luna, Jesus, Dikaiakos, Marios D, Gjermundrod, Harald, Flouris, Michail, Marazakis, Manolis, and Bilas, Angelos
Published: 2009
Full Text: View/download PDF

30. A Data-Centric Security Analysis Of ICGrid

Author: Luna, Jesus, Flouris, Michail, Marazakis, Manolis, Bilas, Angelos, Dikaiakos, Marios D., Gjermundrod, Harald, Kyprianou, Theodoros, Gorlatch, Sergei, editor, Fragopoulou, Paraskevi, editor, and Priol, Thierry, editor
Published: 2008
Full Text: View/download PDF

31. An Analysis of Security Services in Grid Storage Systems

Author: Luna, Jesus, Flouris, Michail D., Marazakis, Manolis, Bilas, Angelos, Stagni, Federico, Forti, Alberto, Ghiselli, Antonia, Magnoni, Luca, Zappi, Riccardo, Talia, Domenico, Yahyapour, Ramin, and Ziegler, Wolfgang
Published: 2008
Full Text: View/download PDF

32. LatEst: Vertical elasticity for millisecond serverless execution

Author: Sfakianakis, Yannis, primary, Marazakis, Manolis, additional, Kozanitis, Christos, additional, and Bilas, Angelos, additional
Published: 2022
Full Text: View/download PDF

33. User-Space I/O for $$\mu $$ s-level Storage Devices

Author: Papagiannis, Anastasios, primary, Saloustros, Giorgos, additional, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2016
Full Text: View/download PDF

34. HPC for Urgent Decision-Making

Author: Marazakis, Manolis, Duranton, Marc, Pleiter, Dirk, Taffoni, Giuliano, and Hoppe, Hans-Christian
Abstract: Emerging use cases from incident response planning and broad-scope European initiatives (e.g. Destination Earth [1,2], European Green Deal and Digital Package [21]) are expected to require federated, distributed infrastructures combining computing and data platforms. These will provide elasticity enabling users to build applications and integrate data for thematic specialisation and decision support, within ever shortening response time windows. For prompt and, in particular, for urgent decision support, the conventional usage modes of HPC centres is not adequate: these rely on relatively long-term arrangements for time-scheduled exclusive use of HPC resources, and enforce well-established yet time-consuming policies for granting access. In urgent decision support scenarios, managers or members of incident response teams must initiate processing and control the resources required based on their real-time judgement on how a complex situation evolves over time. This circle of clients is distinct from the regular users of HPC centres, and they must interact with HPC workflows on-demand and in real-time, while engaging significant HPC and data processing resources in or across HPC centres. This white paper considers the technical implications of supporting urgent decisions through establishing flexible usage modes for computing, analytics and AI/ML-based applications using HPC and large, dynamic assets. The target decision support use cases will involve ensembles of jobs, data-staging to support workflows, and interactions with services/facilities external to HPC systems/centres. Our analysis identifies the need for flexible and interactive access to HPC resources, particularly in the context of dynamic workflows processing large datasets. This poses several technical and organisational challenges: short-notice secure access to HPC and data resources, dynamic resource allocation and scheduling, coordination of resource managers, support for data-intensive workflow (including data staging on node-local storage), preemption of already running workloads and interactive steering of simulations. Federation of services and resources across multiple sites will help to increase availability, provide elasticity for time-varying resource needs and enable leverage of data locality., The authors would like to thank Maria S. Perez (Professor at Universidad Polit��cnica de Madrid, Spain) for her insightful critique on earlier drafts of this whitepaper., {"references":["[1] Destination Earth (DestinE) initiative.\t https://ec.europa.eu/digital-single-market/en/destination-earth-destine","[2] Destination Earth: Use Cases Analysis, JRC Technical Report JRC122456, 2020. \t https://publications.jrc.ec.europa.eu/repository/handle/JRC122456","[3] Wilkinson MD, Dumontier M, Aalbersberg IJ, et al.. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18. Erratum in: Sci Data. 2019 Mar 19;6(1):6. PMID: 26978244; PMCID: PMC4792175.","[4] N. Brown, R. Nash, G. Gibb, B. Prodan, M. Kontak, V. Olshevsky, and W. Der Chien, \"The role of interactive supercomputing in using HPC for urgent decision making\", in Proceedings of the International Conference on High Performance Computing. Springer, 2019, pp. 528–540.","[5] G. Gibb, R. Nash, N. Brown and B. Prodan, \"The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing\", in Proceedings of the 2019 IEEE/ACM Workshop on HPC for Urgent Decision Making (UrgentHPC), 2019, pp. 24-34.","[6] Earth System Modeling Framework : https://earthsystemmodeling.org/","[7] T. C. Schulthess, P. Bauer, N. Wedi, O. Fuhrer, T. Hoefler and C. Schär, \"Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations,\" in Computing in Science & Engineering, vol. 21, no. 1, pp. 30-41, 1 Jan.-Feb. 2019, doi: 10.1109/MCSE.2018.2888788.","[8] Baker, D.N., Erickson, P.J., Fennell, J.F. et al. Space Weather Effects in the Earth's Radiation Belts. Space Sci Rev 214, 17 (2018). https://doi.org/10.1007/s11214-017-0452-10","[9] R. Kube et al., \"Near real-time analysis of big fusion data on HPC systems,\" 2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), 2020, pp. 55-63, doi: 10.1109/UrgentHPC51945.2020.00012.","[10] A. Kremin, S. Bailey, J. Guy, T. Kisner and K. Zhang, \"Rapid Processing of Astronomical Data for the Dark Energy Spectroscopic Instrument,\" 2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), 2020, pp. 1-9, doi: 10.1109/UrgentHPC51945.2020.00006.","[11] Jiang, M., Bu, C., Zeng, J. et al. Applications and challenges of high performance computing in genomics. CCF Trans. HPC (2021). https://doi.org/10.1007/s42514-021-00081-w","[12] CISCO 2020, Global Network Trends Report, Tech. rep., CISCO. URL https://www.cisco.com/c/dam/m/en_us/solutions/enterprise-networks/ networking-report/files/GLBL-ENG_NB-06_0_NA_RPT_PDF_ MOFU-no-NetworkingTrendsReport-NB_rpten018612_5.pdf","[13] Asch M, Moore T, Badia R, et al. Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry. The International Journal of High Performance Computing Applications. 2018;32(4):435-479. doi:10.1177/1094342018778123","[14] E.Yamasaki, 2012, What We Can Learn From Japan's Early Earthquake Warning System, Momentum: Volume 1: Issue 1, Article 2.","[15] F. Løvholt, S. Lorito, J. Macias, M. Volpe, J. Selva and S. Gibbons, \"Urgent Tsunami Computing,\" 2019 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), 2019, pp. 45-50, doi: 10.1109/UrgentHPC49580.2019.00011.","[16] Siew Hoon Leong, Dieter Kranzlmüller, \"Towards a General Definition of Urgent Computing,\" Procedia Computer Science, Volume 51, 2015, https://doi.org/10.1016/j.procs.2015.05.402.","[17] Tzachor, A., Whittlestone, J., Sundaram, L. et al. Artificial intelligence in a crisis needs ethics with urgency. Nat Mach Intell 2, 365–366 (2020). https://doi.org/10.1038/s42256-020-0195-0","[18] Chen, N., Liu, W., Bai, R. et al. Application of computational intelligence technologies in emergency management: a literature review. Artif Intell Rev 52, 2131–2168 (2019). https://doi.org/10.1007/s10462-017-9589-8","[19] D. Elia, S. Fiore and G. Aloisio, \"Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at Scale,\" in IEEE Access, vol. 9, pp. 73307-73326, 2021. https://doi.org/10.1109/ACCESS.2021.3079139","[20] European High Performance Computing Joint Undertaking (EuroHPC JU). https://eurohpc-ju.europa.eu","[21] A European Green Deal. https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en","[22] R. Roscher, B. Bohn, M. F. Duarte and J. Garcke, \"Explainable Machine Learning for Scientific Insights and Discoveries,\" in IEEE Access, vol. 8, pp. 42200-42216, 2020, doi: 10.1109/ACCESS.2020.2976199.","[23] Strategic Research and Innovation Agenda of the European Open Science Cloud (EOSC), Feb. 2021. https://www.eosc.eu/sites/default/files/EOSC-SRIA-V1.0_15Feb2021.pdf"]}
Published: 2022
Full Text: View/download PDF

35. Aurora: An architecture for dynamic and adaptive work sessions in open environments

Author: Marazakis, Manolis, Papadakis, Dimitris, Nikolaou, Christos, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Quirchmayr, Gerald, editor, Schweighofer, Erich, editor, and Bench-Capon, Trevor J.M., editor
Published: 1998
Full Text: View/download PDF

36. A Framework for the Encapsulation of Value-Added Services in Digital Objects

Author: Marazakis, Manolis, Papadakis, Dimitris, Papadakis, Stavros A., Goos, Gerhard, Series editor, Hartmanis, Juris, Series editor, van Leeuwen, Jan, Series editor, Nikolaou, Christos, editor, and Stephanidis, Constantine, editor
Published: 1998
Full Text: View/download PDF

37. System Infrastructure for Digital Libraries: A Survey and Outlook

Author: Nikolaou, Christos, Marazakis, Manolis, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, and Rovan, Branislav, editor
Published: 1998
Full Text: View/download PDF

38. Towards a common infrastructure for large-scale distributed applications

Author: Nikolaoul, Christos, Marazakis, Manolis, Papadakis, Dimitris, Yeorgiannakis, Yiorgos, Sairamesh, Jakka, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Peters, Carol, editor, and Thanos, Costantino, editor
Published: 1997
Full Text: View/download PDF

39. Multilevel simulation-based co-design of next generation HPC microprocessors

Author: Zaourar, Lilia, primary, Benazouz, Mohamed, additional, Mouhagir, Ayoub, additional, Jebali, Fatma, additional, Sassolas, Tanguy, additional, Weill, Jean-Christophe, additional, Falquez, Carlos, additional, Ho, Nam, additional, Pleiter, Dirk, additional, Portero, Antoni, additional, Suarez, Estela, additional, Petrakis, Polydoros, additional, Papaefstathiou, Vassilis, additional, Marazakis, Manolis, additional, Radulovic, Milan, additional, Martinez, Francesc, additional, Armejach, Adria, additional, Casas, Marc, additional, Nocua, Alejandro, additional, and Dolbeau, Romain, additional
Published: 2021
Full Text: View/download PDF

40. MARVEL: Multimodal Extreme Scale Data Analytics for Smart Cities Environments

Author: Bajovic, Dragana, primary, Bakhtiarnia, Arian, additional, Bravos, George, additional, Brutti, Alessio, additional, Burkhardt, Felix, additional, Cauchi, Daniel, additional, Chazapis, Antony, additional, Cianco, Claire, additional, Dall'Asen, Nicola, additional, Delic, Vlado, additional, Dimou, Christos, additional, Djokic, Djordje, additional, Escobar-Molero, Antonio, additional, Esterle, Lukas, additional, Eyben, Florian, additional, Farella, Elisabetta, additional, Festi, Thomas, additional, Geromitsos, Artemios, additional, Giakoumakis, Giannis, additional, Hatzivasilis, George, additional, Ioannidis, Sotiris, additional, Iosifidis, Alexandros, additional, Kallipolitou, Theodora, additional, Kalogiannis, Grigorios, additional, Kiousi, Akrivi, additional, Kopanaki, Despina, additional, Marazakis, Manolis, additional, Markopoulou, Stella, additional, Muscat, Adrian, additional, Paissan, Francesco, additional, Lobo, Tomas Pariente, additional, Pavlovic, Dusan, additional, Raptis, Theofanis P., additional, Ricci, Elisa, additional, Saez, Borja, additional, Sahito, Farhan, additional, Scerri, Kenneth, additional, Schuller, Bjorn, additional, Simic, Nikola, additional, Spanoudakis, George, additional, Tomasi, Alex, additional, Triantafyllopoulos, Andreas, additional, Valerio, Lorenzo, additional, Villazan, Javier, additional, Wang, Yiming, additional, Xuereb, Andre, additional, and Zammit, Johan, additional
Published: 2021
Full Text: View/download PDF

41. Skynet: Performance-driven Resource Management for Dynamic Workloads

Author: Sfakianakis, Yannis, primary, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

42. IOTier: A Virtual Testbed to evaluate systems for IoT environments

Author: Nikolaidis, Fotis, primary, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

43. Frisbee

Author: Nikolaidis, Fotis, primary, Chazapis, Antony, additional, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

44. Memory-mapped I/O on steroids

Author: Papagiannis, Anastasios, primary, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2021
Full Text: View/download PDF

45. Towards Resilient EU HPC Systems: A Blueprint

Author: Radojkovic, Petar, Marazakis, Manolis, Carpenter, Paul, Jeyapaul, Reiley, Gizopoulos, Dimitris, Schulz, Martin, Armejach, Adria, Ayguade, Eduard, Bodin, François, Canal, Ramon, Cappello, Franck, Chaix, Fabien, Colin de Verdiere, Guillaume, Derradji, Said, Di Carlo, Stefano, Engelmann, Christian, Laguna, Ignacio, Moreto, Miquel, Mutlu, Onur, Papadopoulos, Lazaros, Perks, Olly, Ploumidis, Manolis, Salami, Bezhad, Sazeides, Yanos, Soudris, Dimitrios, Sourdis, Yiannis, Stenstrom, Per, Thibault, Samuel, Toms, Will, Unsal, Osman, Barcelona Supercomputing Center - Centro Nacional de Supercomputacion (BSC - CNS), Foundation for Research and Technology - Hellas (FORTH), ARM Ltd [Cambridge] (ARM), National and Kapodistrian University of Athens (NKUA), Technische Universität Munchen - Université Technique de Munich [Munich, Allemagne] (TUM), Leibniz Supercomputing Centre (LRZ), Logic and applications (LOGICA), LANGAGE ET GÉNIE LOGICIEL (IRISA-D4), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-CentraleSupélec-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Universitat Politècnica de Catalunya [Barcelona] (UPC), Argonne National Laboratory [Lemont] (ANL), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Bull atos technologies, Politecnico di Torino = Polytechnic of Turin (Polito), Oak Ridge National Laboratory [Oak Ridge] (ORNL), UT-Battelle, LLC, Lawrence Livermore National Laboratory (LLNL), Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] (ETH Zürich), National Technical University of Athens [Athens] (NTUA), University of Cyprus (UCY), Chalmers University of Technology [Göteborg], Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), STatic Optimizations, Runtime Methods (STORM), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), University of Manchester [Manchester], European HPC resilience initiative, European Project: 801015,H2020,EXA2PRO(2018), European Project: 611404,EC:FP7:ICT,FP7-ICT-2013-10,CLERECO(2013), European Project, European Project: 671553,H2020,H2020-FETHPC-2014,ExaNeSt(2015), European Project: 671578,H2020,H2020-FETHPC-2014,ExaNoDe(2015), European Project: 671558,H2020,H2020-FETHPC-2014,EXDCI(2015), European Project: 780681,LEGaTO, European Project: 671632,H2020,H2020-FETHPC-2014,ECOSCALE(2015), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), University of Cyprus [Nicosia] (UCY), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
Subjects: [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
Published: 2020

46. ETP4HPC's Strategic Research Agenda for High-Performance Computing in Europe 4

Author: Malms, Michael, Ostasz, Marcin, Gilliot, Maike, Bernier-Bruna, Pascale, Cargemel, Laurent, Suarez, Estela, Cornelius, Herbert, Duranton, Marc, Koren, Benny, Rosse-Laurent, Pascale, Pérez-Hernández, María S., Marazakis, Manolis, Lonsdale, Guy, Carpenter, Paul, Antoniu, Gabriel, Narasimhamurthy, Sai, Brinkman, André, Pleiter, Dirk, Tate, Adrian, Krueger, Jens, Hoppe, Hans-Christian, Laure, Erwin, Wierse, Andreas, European Technology Platform (ETP) for High-Performance Computing (HPC) (ETP4HPC), IBM Research Lab. - Zürich, TERATEC [Bruyères-le-Chatel], Atos, Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH | Centre de recherche de Juliers, Helmholtz-Gemeinschaft = Helmholtz Association-Helmholtz-Gemeinschaft = Helmholtz Association, Megware Computer Vertrieb und Service GmbH (Megware), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Mellanox Technologies [Sunnyvale], Universidad Politécnica de Madrid (UPM), Foundation for Research and Technology - Hellas (FORTH), Scapos AG (SCAPOS), Departament d'Arquitectura de Computadors - Universitat Politècnica de Catalunya (DAC), Universitat Politècnica de Catalunya [Barcelona] (UPC), Scalable Storage for Clouds and Beyond (KerData), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SYSTÈMES LARGE ÉCHELLE (IRISA-D1), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Seagate Technology, Johannes Gutenberg - Universität Mainz = Johannes Gutenberg University (JGU), Helmholtz-Gemeinschaft = Helmholtz Association, Numerical Algorithms Group [New Mexico] (NAG), Fraunhofer (Fraunhofer-Gesellschaft), Intel Corporation [Santa Clara], Intel Corporation [USA], Royal Institute of Technology [Stockholm] (KTH ), SICOS BW [Stuttgart] (SICOS), ETP4HPC: European Technology Platform for High Performance Computing, with the support of the EXDCI-2 project, IBM Systems Development, TERATEC, Philips France Semiconducteurs, C and C Research Laboratories NEC Eur. Ltd., NEC Europe Ltd. [Middlesex], Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-CentraleSupélec-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Bretagne Sud (UBS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Johannes Gutenberg - Universität Mainz (JGU), Intel Corporation, Santa Clara CA, SICOS BW (SICOS), and European Technology Platform for High-Performance Computing (ETP4HPC)
Subjects: [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
Abstract: International audience; This Strategic Research Agenda is the fourth High Performance Computing (HPC) technology roadmap developed and maintained by ETP4HPC, the European High-Performance Computing Platform with the support of the EXDCI-2 project. It continues the tradition of a structured approach to the identification of key research objectives. The main objective of this SRA is to identify the European technology research priorities in the area of High-Performance Computing (HPC) and High-Performance Data Analytics (HPDA), which should be used by EuroHPC to build its 2021 – 2024 Work Programme.Over eighty HPC experts associated with member organisations of ETP4HPC created this document in collaboration with external technical leaders representing those areas of technology that together with HPC form what we have come to call “The Digital Continuum”. This new concept well reflects the main trend of this SRA – it is not only about developing HPC technology in order to build competitive European HPC systems but also about making our HPC solutions work together with other related technologies - the material included in this SRA is also a result of our interactions with Big Data, Internet of Things (IoT), and Artificial Intelligence (AI) and Cyber Physical Systems (CPS).
Published: 2020

47. Towards resilient EU HPC systems: A blueprint

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. VIRTUOS - Virtualisation and Operating Systems, Radojković, Petar, Marazakis, Manolis, Carpenter, Paul Matthew, Jeyapaul, Reiley, Gizopoulos, Dimitris, Schulz, Martin, Armejach Sanosa, Adrià, Ayguadé Parra, Eduard, Canal Corretger, Ramon, Moretó Planas, Miquel, Salami, Behzad, Unsal, Osman Sabri, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. VIRTUOS - Virtualisation and Operating Systems, Radojković, Petar, Marazakis, Manolis, Carpenter, Paul Matthew, Jeyapaul, Reiley, Gizopoulos, Dimitris, Schulz, Martin, Armejach Sanosa, Adrià, Ayguadé Parra, Eduard, Canal Corretger, Ramon, Moretó Planas, Miquel, Salami, Behzad, and Unsal, Osman Sabri
Abstract: This document aims to spearhead a Europe-wide discussion on HPC system resilience and to help the European HPC community define best practices for resilience. We analyse a wide range of state-of-the-art resilience mechanisms and recommend the most effective approaches to employ in large-scale HPC systems. Our guidelines will be useful in the allocation of available resources, as well as guiding researchers and research funding towards the enhancement of resilience approaches with the highest priority and utility. Although our work is focused on the needs of next generation HPC systems in Europe, the principles and evaluations are applicable globally., This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the projects ECOSCALE (grant agreement No 671632), EPI (grant agreement No 826647), EuroEXA (grant agreement No 754337), Eurolab4HPC (grant agreement No 800962), EVOLVE (grant agreement No 825061), EXA2PRO (grant agreement No 801015), ExaNest (grant agreement No 671553), ExaNoDe (grant agreement No 671578), EXDCI-2 (grant agreement No 800957), LEGaTO (grant agreement No 780681), MB2020 (grant agreement No 779877), RECIPE (grant agreement No 801137) and SDK4ED (grant agreement No 780572). The work was also supported by the European Commission’s Seventh Framework Programme under the projects CLERECO (grant agreement No 611404), the NCSA-Inria-ANL-BSC-JSCRiken-UTK Joint-Laboratory for Extreme Scale Computing – JLESC (https://jlesc.github.io/), OMPI-X project (No ECP-2.3.1.17) and the Spanish Government through Severo Ochoa programme (SEV-2015-0493). This work was sponsored in part by the U.S. Department of Energy's Office of Advanced Scientific Computing Research, program managers Robinson Pino and Lucy Nowell. This manuscript has been authored by UT-Battelle, LLC under Contract No DE-AC05-00OR22725 with the U.S. Department of Energy., Preprint
Published: 2020

48. DyRAC: Cost-aware Resource Assignment and Provider Selection for Dynamic Cloud Workloads

Author: Sfakianakis, Yannis, primary, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2020
Full Text: View/download PDF

49. Using gLite to Implement a Secure ICGrid

Author: Luna, Jesus, primary, Dikaiakos, Marios D, additional, Gjermundrod, Harald, additional, Flouris, Michail, additional, Marazakis, Manolis, additional, and Bilas, Angelos, additional
Published: 2008
Full Text: View/download PDF

50. Towards Communication Profile, Topology and Node Failure Aware Process Placement

Author: Vardas, Ioannis, primary, Ploumidis, Manolis, additional, and Marazakis, Manolis, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

131 results on '"Marazakis, Manolis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources