Descriptor: "scientific workflows" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"scientific workflows"' showing total 777 results

Start Over Descriptor "scientific workflows"

777 results on '"scientific workflows"'

1. Unveiling Modeling Patterns in Workflow Sketches: Insights for Designing an Abstract Workflow Language for Scientific Computing

Author: Lamprecht, Anna-Lena, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Hinchey, Mike, editor
Published: 2025
Full Text: View/download PDF

2. Reproducible brain PET data analysis: easier said than done.

Author: Naseri, Maryam, Ramakrishnapillai, Sreekrishna, and Carmichael, Owen T.
Subjects: POSITRON emission tomography, MAGNETIC resonance imaging, DATA libraries, SCIENTIFIC community, FUNCTIONAL magnetic resonance imaging
Abstract: While a great deal of recent effort has focused on addressing a perceived reproducibility crisis within brain structural magnetic resonance imaging (MRI) and functional MRI research communities, this article argues that brain positron emission tomography (PET) research stands on even more fragile ground, lagging behind efforts to address MRI reproducibility. We begin by examining the current landscape of factors that contribute to reproducible neuroimaging data analysis, including scientific standards, analytic plan pre-registration, data and code sharing, containerized workflows, and standardized processing pipelines. We then focus on disparities in the current status of these factors between brain MRI and brain PET. To demonstrate the positive impact that further developing such reproducibility factors would have on brain PET research, we present a case study that illustrates the many challenges faced by one laboratory that attempted to reproduce a community-standard brain PET processing pipeline. We identified key areas in which the brain PET community could enhance reproducibility, including stricter reporting policies among PET dedicated journals, data repositories, containerized analysis tools, and standardized processing pipelines. Other solutions such as mandatory pre-registration, data sharing, code availability as a condition of grant funding, and online forums and standardized reporting templates, are also discussed. Bolstering these reproducibility factors within the brain PET research community has the potential to unlock the full potential of brain PET research, propelling it toward a higher-impact future. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. MAESTRO: a lightweight ontology-based framework for composing and analyzing script-based scientific experiments.

Author: Dias, Luiz Gustavo, Lopes, Bruno, and de Oliveira, Daniel
Subjects: SCRIPTING languages (Computer science), DATA structures, SCRIPTS, WORKFLOW, BIOINFORMATICS
Abstract: Over the last decades, there has been a rapid growth in the number of scientific experiments implemented as computational simulations. These experiments typically consist of multiple steps, where different programs, in-house scripts, or services may be used at each step. Workflows have served as an abstraction to model such experiments, and such workflows can be implemented in various ways, with many users choosing scripting languages like Python. Although scripts offer users the flexibility to compose workflows with complex constructs and data structures, they typically represent isolated workflows rather than encompassing the entire experiment. Within the same experiment, users may explore different configurations to confirm or refute their hypotheses, leading to the execution of different (but associated) workflows. Composing and analyzing scientific experiments associated with multiple workflows implemented as scripts is an open, yet important, task. Poor choices during composition can lead to inconsistencies, such as format incompatibility and problems in script dependencies. Moreover, even with a well-specified and properly executed script, analyzing the data produced from an isolated workflow without knowledge of the experiment's structure, domain terms, and specifications can be challenging. In this article, we introduce MAESTRO, a lightweight framework based on the use of ontologies and provenance to assist in the composition and analysis of experiments implemented using scripts. MAESTRO integrates the concept of Experiment Lines to represent the workflow at an abstract level and employs reasoners to derive a script-based workflow based on the abstract experiment representation and to support analytical queries. The feasibility of MAESTRO was evaluated through a study in the bioinformatics domain, receiving positive feedback from experts in e-science. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Enhancing workflow efficiency with a modified Firefly Algorithm for hybrid cloud edge environments

Author: Deafallah Alsadie and Musleh Alsulami
Subjects: Scientific workflows, Cloud computing, Scheduling algorithms, Firefly Optimization Algorithm, Resource utilization, Medicine, Science
Abstract: Abstract Efficient scheduling of scientific workflows in hybrid cloud-edge environments is crucial for optimizing resource utilization and minimizing completion time. In this study, we evaluate various scheduling algorithms, emphasizing the Modified Firefly Optimization Algorithm (ModFOA) and comparing it with established methods such as Ant Colony Optimization (ACO), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO). We investigate key performance metrics, including makespan, resource utilization, and energy consumption, across both cloud and edge configurations. Scientific workflows often involve complex tasks with dependencies, which can challenge traditional scheduling algorithms. While existing methods show promise, they may not fully address the unique demands of hybrid cloud-edge environments, potentially leading to suboptimal outcomes. Our proposed ModFOA integrates cloud and edge computing resources, offering an effective solution for scheduling workflows in these hybrid environments. Through comparative analysis, ModFOA demonstrates improved performance in reducing makespan and completion times, while maintaining competitive resource utilization and energy efficiency. This study highlights the importance of incorporating cloud-edge integration in scheduling algorithms and showcases ModFOA’s potential to enhance workflow efficiency and resource management across hybrid environments. Future research should focus on refining ModFOA’s parameters and validating its effectiveness in practical hybrid cloud-edge scenarios.
Published: 2024
Full Text: View/download PDF

5. Optimizing execution time and cost while scheduling scientific workflow in edge data center with fault tolerance awareness

Author: Kadum Muhanad Mohammed and Deng Xiaoheng
Subjects: edge computing, fault tolerance, scheduling, scientific workflows, metaheuristic, particle swarm optimization, Engineering (General). Civil engineering (General), TA1-2040
Abstract: Scheduling scientific workflows is essential for edge data centers operations. Fault tolerance is a crucial focus in workflow scheduling (WS) research. This study proposed fault-tolerant WS in edge data centers using Task Prioritization Adaptive Particle Swarm Optimization (TPAPSO). The aim is to minimize the Makespan, execution costs, and overcoming failures at all workflow processing stages, including when virtual machines are insufficient or tasks fail. The approach proposes three components: initial heuristic list, scheduling tasks with TPAPSO, and implementing performance monitoring with fault tolerance (PMWFT). TPAPSO-PMWFT is simulated using CloudSim 4.0. The experiments indicate that the suggested approach shows superior results compared to existing methods.
Published: 2024
Full Text: View/download PDF

6. Energy and Scientific Workflows: Smart Scheduling and Execution.

Author: WARADE, MEHUL, LEE, KEVIN, RANAWEERA, CHATHURIKA, and SCHNEIDER, JEAN-GUY
Subjects: HIGH performance computing, PARALLEL programming, COMPUTER workstation clusters, ENERGY consumption, SCIENTIFIC computing, WORKFLOW management systems
Abstract: Energy-efficient computation is an increasingly important target in modern-day computing. Scientific computation is conducted using scientific workflows that are executed on highly scalable compute clusters. The execution of these workilows is generally geared towards optimizing run-time performance with the energy footprint of the execution being ignored. Evidently. minimizing both execution time as well as energy consumption does not have to be mutually exclusive. The aim of the research presented in this paper is to highlight the benefits of energy-aware scientific workflow execution. In this paper. a set of requirements for an energy-aware scheduler are outlined and a conceptual architecture for the scheduler is presented. The evaluation of the conceptual architecture was performed by developing a proof of concept scheduler which was able to achieve around 49.97% reduction in the energy consumption of the computation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]

Author: Thaylon Guedes, Marta Mattoso, Marcos Bedo, and Daniel de Oliveira
Subjects: Provenance, Scientific workflows, DISC systems, Domain data, Computer software, QA76.75-76.765
Abstract: While researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications with intensive I/O operations within the workflow while leveraging Spark’s in-memory data structures, (ii) Extracting domain-specific data from in-memory data structures and (iii) Implementing data versioning and capturing the provenance graph in a workflow execution. SAMbA-RaP also provides real-time reports via a web interface, enabling scientists to explore dataflow transformations and content evolution as they run workflows.
Published: 2024
Full Text: View/download PDF

8. A Framework for Automated Parallel Execution of Scientific Multi-workflow Applications in the Cloud with Work Stealing

Author: Silva, Helena S. I. L., Castro, Maria C. S., Silva, Fabricio A. B., Melo, Alba C. M. A., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Carretero, Jesus, editor, Shende, Sameer, editor, Garcia-Blas, Javier, editor, Brandic, Ivona, editor, Olcoz, Katzalin, editor, and Schreiber, Martin, editor
Published: 2024
Full Text: View/download PDF

9. In Silico Evaluation and Prediction of Pesticide Supported by Reproducible Evolutionary Workflows

Author: Oliveira, Anderson, Firmino, Fabricio, Cruz, Pedro Vieira, de Oliveira Sampaio, Jonice, da Cruz, Sérgio Manuel Serra, Albornoz, Víctor M., editor, Mac Cawley, Alejandro, editor, and Plà-Aragonés, Lluis M., editor
Published: 2024
Full Text: View/download PDF

10. Scientific Workflows Management with Blockchain: A Survey

Author: Henry, Tiphaine, Tucci-Piergiovanni, Sara, El Madhoun, Nour, editor, Dionysiou, Ioanna, editor, and Bertin, Emmanuel, editor
Published: 2024
Full Text: View/download PDF

11. Toward the Edge-Cloud Continuum Through the Serverless Workflows

Author: Sicari, Christian, Catalfamo, Alessio, Carnevale, Lorenzo, Galletta, Antonino, Celesti, Antonio, Fazio, Maria, Villari, Massimo, Fortino, Giancarlo, Series Editor, Liotta, Antonio, Series Editor, Savaglio, Claudio, editor, Zhou, MengChu, editor, and Ma, Jianhua, editor
Published: 2024
Full Text: View/download PDF

12. Budget-based resource provisioning and scheduling algorithm for scientific workflows on IaaS cloud.

Author: P, Rajasekar and P, Santhiya
Subjects: WORKFLOW, CLOUD dynamics, SCHEDULING, PRODUCTION scheduling, ALGORITHMS, CLOUD computing
Abstract: The deployment of cloud computing, specifically Infrastructure as a Service (IaaS) clouds, have become an interested topic in recent years for the execution of compute-intensive scientific workflows. These platforms deliver on-demand connectivity to those infrastructure needed for workflow execution, providing customers to pay only for the service they utilize. As a result schedulers are forced to meet a quid-pro-quo among two main QoS criteria: cost and time. The maximum of this research work has been on making scheduling algorithms with the goal of reducing infrastructure costs as fulfilling a user-specified deadline. Few algorithms, on the other hand, have considered the problem of reducing workflow execution time while staying within a budget. This work consider on the latter scenario. We offer a Budget-based resource Provisioning and Scheduling (BPS) algorithm for scientific workflows used in IaaS service. This proposal was developed to face challenges specifically to clouds like resource performance variation, resource heterogeneity, infinite on-demand connectivity, and pay-as-you-go type (i.e. per-minute pricing). It is efficient of responding to the cloud dynamics, and is powerful in creating suitable solutions that fulfill a user-specified budget and reduce the makespan of the leveraged environment. At last, the experimental events confirms that it runs a workflow efficiently with respect to achieving budget of 94% and minimizing makespan of 29% than the state-of-the-art budget-aware algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Reproducible brain PET data analysis: easier said than done

Author: Maryam Naseri, Sreekrishna Ramakrishnapillai, and Owen T. Carmichael
Subjects: reproducible science, reproducibility crisis, brain PET, positron emission tomography, scientific workflows, pre-registration, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: While a great deal of recent effort has focused on addressing a perceived reproducibility crisis within brain structural magnetic resonance imaging (MRI) and functional MRI research communities, this article argues that brain positron emission tomography (PET) research stands on even more fragile ground, lagging behind efforts to address MRI reproducibility. We begin by examining the current landscape of factors that contribute to reproducible neuroimaging data analysis, including scientific standards, analytic plan pre-registration, data and code sharing, containerized workflows, and standardized processing pipelines. We then focus on disparities in the current status of these factors between brain MRI and brain PET. To demonstrate the positive impact that further developing such reproducibility factors would have on brain PET research, we present a case study that illustrates the many challenges faced by one laboratory that attempted to reproduce a community-standard brain PET processing pipeline. We identified key areas in which the brain PET community could enhance reproducibility, including stricter reporting policies among PET dedicated journals, data repositories, containerized analysis tools, and standardized processing pipelines. Other solutions such as mandatory pre-registration, data sharing, code availability as a condition of grant funding, and online forums and standardized reporting templates, are also discussed. Bolstering these reproducibility factors within the brain PET research community has the potential to unlock the full potential of brain PET research, propelling it toward a higher-impact future.
Published: 2024
Full Text: View/download PDF

14. Corrigendum: Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models

Author: Scott Callaghan, Philip J. Maechling, Fabio Silva, Mei-Hui Su, Kevin R. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, Karan Vahi, Albert Kottke, Christine A. Goulet, Ewa Deelman, Thomas H. Jordan, and Yehuda Ben-Zion
Subjects: scientific workflows, probabilistic seismic hazard analysis, high performance computing, seismic simulations, distributed computing, computational modeling, Computer software, QA76.75-76.765
Published: 2024
Full Text: View/download PDF

15. Addressing GPU memory limitations for Graph Neural Networks in High-Energy Physics applications

Author: Claire Songhyun Lee, V. Hewes, Giuseppe Cerati, Kewei Wang, Adam Aurisano, Ankit Agrawal, Alok Choudhary, and Wei-Keng Liao
Subjects: high-performance computing, scientific workflows, graph neural networks, supercomputing, graphic processing units, deep learning, Computer software, QA76.75-76.765
Abstract: IntroductionReconstructing low-level particle tracks in neutrino physics can address some of the most fundamental questions about the universe. However, processing petabytes of raw data using deep learning techniques poses a challenging problem in the field of High Energy Physics (HEP). In the Exa.TrkX Project, an illustrative HEP application, preprocessed simulation data is fed into a state-of-art Graph Neural Network (GNN) model, accelerated by GPUs. However, limited GPU memory often leads to Out-of-Memory (OOM) exceptions during training, due to the large size of models and datasets. This problem is exacerbated when deploying models on High-Performance Computing (HPC) systems designed for large-scale applications.MethodsWe observe a high workload imbalance issue during GNN model training caused by the irregular sizes of input graph samples in HEP datasets, contributing to OOM exceptions. We aim to scale GNNs on HPC systems, by prioritizing workload balance in graph inputs while maintaining model accuracy. Our paper introduces diverse balancing strategies aimed at decreasing the maximum GPU memory footprint and avoiding the OOM exception, across various datasets.ResultsOur experiments showcase memory reduction of up to 32.14% compared to the baseline. We also demonstrate the proposed strategies can avoid OOM in application. Additionally, we create a distributed multi-GPU implementation using these samplers to demonstrate the scalability of these techniques on the HEP dataset.DiscussionBy assessing the performance of these strategies as data loading samplers across multiple datasets, we can gauge their effectiveness in both single-GPU and distributed environments. Our experiments, conducted on datasets of varying sizes and across multiple GPUs, broaden the applicability of our work to various GNN applications that handle input datasets with irregular graph sizes.
Published: 2024
Full Text: View/download PDF

16. Enhancing workflow efficiency with a modified Firefly Algorithm for hybrid cloud edge environments

Author: Alsadie, Deafallah and Alsulami, Musleh
Published: 2024
Full Text: View/download PDF

17. DE-GWO: A Multi-objective Workflow Scheduling Algorithm for Heterogeneous Fog-Cloud Environment.

Author: Shukla, Prashant and Pandey, Sudhakar
Subjects: *OPTIMIZATION algorithms, *WORKFLOW management systems, *ALGORITHMS, *WORKFLOW, *SCHEDULING, *ENERGY consumption, *PRODUCTION scheduling
Abstract: The demand for a quick response from cloud services is rapidly increasing day-by-day. Fog computing is a trending solution to fulfil the demands. When integrated with the cloud, this technology can tremendously improve the performance. Like any other technology, Fog also has the shortcoming of limited resources. The difficulty of efficient scheduling of tasks among limited resources to minimize makespan and energy consumption, while still guaranteeing appropriate execution cost, continues to be a significant issue for research. Hence, this study introduces a Differential Evolution-Grey Wolf Optimization (DE-GWO) technique to enhance the scheduling of scientific workflows under cloud-fog settings. The objective of the proposed DE-GWO algorithm is to mitigate the issue of slow convergence and low accuracy that is often seen in the classical GWO algorithm. The DE method is chosen as the evolutionary pattern of wolves to speed up convergence and enhance GWO's accuracy. This study further formulates a weighted sum based objective function which incorporates three criteria, namely makespan, cost and energy consumption. In this study, the DE-GWO technique is evaluated and compared with many conventional and hybrid optimization algorithms. The simulations use five scientific workflows datasets which includes Montage, Cybershake, Epigenomics, LIGO and SIPHT. The DE-GWO algorithm demonstrates superior performance compared to all conventional algorithms across several scientific workflows and performance criteria. The methodology has a commendable level of competitiveness when compared to other methods, since DE incorporates evolution and elimination mechanisms in GWO and GWO retains a good balance between exploration and exploitation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Improving prediction of computational job execution times with machine learning.

Author: Balis, Bartosz, Lelek, Tomasz, Bodera, Jakub, Grabowski, Michal, and Grigoras, Costin
Subjects: MACHINE learning, FEATURE selection, WORKFLOW management, WORKFLOW, ELECTRONIC data processing, FORECASTING, PRODUCTION scheduling, SYMBOLIC computation
Abstract: Summary: Predicting resource consumption and run time of computational workloads is crucial for efficient resource allocation, or cost and energy optimization. In this paper, we evaluate various machine learning techniques to predict the execution time of computational jobs. For experiments we use datasets from two application areas: scientific workflow management and data processing in the ALICE experiment at CERN. We apply a two‐stage prediction method and evaluate its performance. Other evaluated aspects include: (1) comparing performance of global (per‐workflow) versus specialized (per‐job) models; (2) impact of prediction granularity in the first stage of the two‐stage method; (3) using various feature sets, feature selection, and feature importance analysis; (4) applying symbolic regression in addition to classical regressors. Our results provide new valuable insights on using machine learning techniques to predict the runtime behavior of computational jobs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models

Author: Scott Callaghan, Philip J. Maechling, Fabio Silva, Mei-Hui Su, Kevin R. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, Karan Vahi, Albert Kottke, Christine A. Goulet, Ewa Deelman, Thomas H. Jordan, and Yehuda Ben-Zion
Subjects: scientific workflows, probabilistic seismic hazard analysis, high performance computing, seismic simulations, distributed computing, computational modeling, Computer software, QA76.75-76.765
Abstract: The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, SCEC has developed the CyberShake platform, which calculates physics-based probabilistic seismic hazard analysis (PSHA) models for regions with high-quality seismic velocity and fault models. The CyberShake platform implements a sophisticated computational workflow that includes over 15 individual codes written by 6 developers. These codes are heterogeneous, ranging from short-running high-throughput serial CPU codes to large, long-running, parallel GPU codes. Additionally, CyberShake simulation campaigns are computationally extensive, typically producing tens of terabytes of meaningful scientific data and metadata over several months of around-the-clock execution on leadership-class supercomputers. To meet the needs of the CyberShake platform, we have developed an extreme-scale workflow stack, including the Pegasus Workflow Management System, HTCondor, Globus, and custom tools. We present this workflow software stack and identify how the CyberShake platform and supporting tools enable us to meet a variety of challenges that come with large-scale simulations, such as automated remote job submission, data management, and verification and validation. This platform enabled us to perform our most recent simulation campaign, CyberShake Study 22.12, from December 2022 to April 2023. During this time, our workflow tools executed approximately 32,000 jobs, and used up to 73% of the Summit system at Oak Ridge Leadership Computing Facility. Our workflow tools managed about 2.5 PB of total temporary and output data, and automatically staged 19 million output files totaling 74 TB back to archival storage on the University of Southern California's Center for Advanced Research Computing systems, including file-based relational data and large binary files to efficiently store millions of simulated seismograms. CyberShake extreme-scale workflows have generated simulation-based probabilistic seismic hazard models that are being used by seismological, engineering, and governmental communities.
Published: 2024
Full Text: View/download PDF

20. F4ESS – a framework for interdisciplinary data-driven earth system science

Author: Doris Dransch, Daniel Eggert, and Mike Sips
Subjects: earth system science, interdisciplinary research, scientific workflows, integration, Mathematical geography. Cartography, GA1-1776
Abstract: Earth system science is an interdisciplinary effort to understand the fundamentals and interactions of environmental processes. Interdisciplinary research is challenging since it demands the integration of scientific schemes and practices from different research fields into a collaborative work environment. This paper introduces the framework F4ESS that supports this integration. F4ESS provides methods and technologies that facilitate the development of integrative work environments for Earth system science. F4ESS enables scientists a) to outline structured and summarized descriptions of scientific procedures to facilitate communication and synthesis, b) to combine a large variety of distributed data analysis software into seamless data analysis chains and workflows, c) to visually combine and interactively explore the manifold spatiotemporal data and results to support understanding and knowledge creation. The F4ESS methods and technologies are generic and can be applied in various scientific fields. We discuss F4ESS in the context of the interdisciplinary investigation of flood events.
Published: 2023
Full Text: View/download PDF

21. Evaluation of Machine Learning Techniques for Predicting Run Times of Scientific Workflow Jobs

Author: Balis, Bartosz, Grabowski, Michal, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wyrzykowski, Roman, editor, Dongarra, Jack, editor, Deelman, Ewa, editor, and Karczewski, Konrad, editor
Published: 2023
Full Text: View/download PDF

22. An Evaluation of the Correlation Between Task Characteristics and Input Data Size in Scientific Workflows

Author: Sugimura, Taichi, Koita, Takahiro, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Miraz, Mahdi H., editor, Southall, Garfield, editor, Ali, Maaruf, editor, and Ware, Andrew, editor
Published: 2023
Full Text: View/download PDF

23. Programming Abstractions for Managing Workflows on Tiered Storage Systems

Author: Ghoshal, Devarshi and Ramakrishnan, Lavanya
Subjects: Data management, scientific workflows, multi-tiered storage, burst buffer, Data Format, Networking & Telecommunications
Abstract: Scientific workflows in High Performance Computing ( HPC ) environments are processing large amounts of data. The storage hierarchy on HPC systems is getting deeper, driven by new technologies (NVRAMs, SSDs, etc.) There is a need for new programming abstractions that allow users to seamlessly manage data at the workflow level on multi-tiered storage systems, and provide optimal workflow performance and use of storage resources. In previous work, we introduced a software architecture Managing Data on Tiered Storage for Scientific Workflows (MaDaTS ) that used a Virtual Data Space ( VDS ) abstraction to hide the complexities of the underlying storage system while allowing users to control data management strategies. In this article, we detail the data-centric programming abstractions that allow users to manage a workflow around its data on the storage layer. The programming abstractions simplify data management for scientific workflows on multi-tiered storage systems, without affecting workflow performance or storage capacity. We measure the overheads and effectiveness introduced by the programming abstractions of MaDaTS. Our results show that these abstractions can optimally use the storage capacity in lesser capacity storage tiers, and simplify data management without adding any performance overheads.
Published: 2021

24. Emerging Frameworks for Advancing Scientific Workflows Research, Development, and Education

Author: Casanova, Henri, Deelman, Ewa, Gesing, Sandra, Hildreth, Michael, Hudson, Stephen, Koch, William, Larson, Jeffrey, McDowell, Mary Ann, Meyers, Natalie, Navarro, John-Luke, Papadimitriou, George, Tanaka, Ryan, Taylor, Ian, Thain, Douglas, Wild, Stefan M, Filgueira, Rosa, and da Silva, Rafael Ferreira
Subjects: Information and Computing Sciences, Human-Centred Computing, Scientific workflows, training and education, ensembles, python, concurrent computing, numerical optimization, simulation, Nvidia, GPU, monitoring
Abstract: Lightning talks of the Workflows in Support of Large-Scale Science (WORKS) workshop are a venue where the workflow community (researchers, developers, and users) can discuss work in progress, emerging technologies and frameworks, and training and education materials. This paper summarizes the WORKS 2021 lightning talks, which cover four broad topics: (i) libEnsemble, a Python library to coordinate the concurrent evaluation of dynamic ensembles of calculations; (ii) Edu WRENCH, a set of online pedagogic modules that provides simulation-driven hands-on activity in the browser; (iii) VisDict, an envisioned visual dictionary framework that will translate terms, jargon, and concepts between research domains and workflow providers; and (iv) Pegasus Kickstart, a lightweight tool for capturing workflow tasks' performance, including performance metrics from Nvidia GPUs.
Published: 2021

25. An Approach to Implementing High-Performance Computing for Problem Solving in Workflow-Based Energy Infrastructure Resilience Studies.

Author: Feoktistov, Alexander, Edelev, Alexei, Tchernykh, Andrei, Gorsky, Sergey, Basharina, Olga, and Fereferov, Evgeniy
Subjects: ENERGY infrastructure, WORKFLOW management systems, PROBLEM solving
Abstract: Implementing high-performance computing (HPC) to solve problems in energy infrastructure resilience research in a heterogeneous environment based on an in-memory data grid (IMDG) presents a challenge to workflow management systems. Large-scale energy infrastructure research needs multi-variant planning and tools to allocate and dispatch distributed computing resources that pool together to let applications share data, taking into account the subject domain specificity, resource characteristics, and quotas for resource use. To that end, we propose an approach to implement HPC-based resilience analysis using our Orlando Tools (OT) framework. To dynamically scale computing resources, we provide their integration with the relevant software, identifying key application parameters that can have a significant impact on the amount of data processed and the amount of resources required. We automate the startup of the IMDG cluster to execute workflows. To demonstrate the advantage of our solution, we apply it to evaluate the resilience of the existing energy infrastructure model. Compared to similar approaches, our solution allows us to investigate large infrastructures by modeling multiple simultaneous failures of different types of elements down to the number of network elements. In terms of task and resource utilization efficiency, we achieve almost linear speedup as the number of nodes of each resource increases. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. PPTS-PSO: a new hybrid scheduling algorithm for scientific workflow in cloud environment.

Author: Talha, Adnane and Malki, Mohammed Ouçamah Cherkaoui
Subjects: WORKFLOW management systems, VIRTUAL machine systems, ALGORITHMS, RESOURCE allocation, HEURISTIC algorithms, WORKFLOW
Abstract: The use of complex scientific workflows in cloud computing environments, taking into account different interdependency criteria, is becoming a key objective for cloud service providers and for customers. This gives the task scheduling operation a higher priority in order to improve the quality of services. In this work, we introduce a novel hybrid PPTS-PSO algorithm based on two efficient algorithms with the goal of improving the scheduling phase of a set of interdependent tasks that make up scientific workflows in the cloud-computing platform with the best execution time and cost while staying within the deadline and budget constraints. An intelligent variant of the PSO algorithm named neighborhood PSO and the heuristic PPTS algorithm are used. The suggested method can assign tasks in scientific workflows to the most appropriate cloud virtual machine. Therefore, our strategy takes into account resource allocation too. The experimental results show that our solution overcomes different algorithms in the literature with minimum iterations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. Graph neural networks for detecting anomalies in scientific workflows.

Author: Jin, Hongwei, Raghavan, Krishnan, Papadimitriou, George, Wang, Cong, Mandal, Anirban, Kiran, Mariam, Deelman, Ewa, and Balaprakash, Prasanna
Subjects: *WORKFLOW, *DIRECTED acyclic graphs
Abstract: Identifying and addressing anomalies in complex, distributed systems can be challenging for reliable execution of scientific workflows. We model these workflows as directed acyclic graphs (DAGs), where the nodes and edges of the DAGs represent jobs and their dependencies, respectively. We develop graph neural networks (GNNs) to learn patterns in the DAGs and to detect anomalies at the node (job) and graph (workflow) levels. We investigate workflow-specific GNN models that are trained on a particular workflow and workflow-agnostic GNN models that are trained across the workflows. Our GNN models, which incorporate both individual job features and topological information from the workflow, show improved accuracy and efficiency compared to conventional learning methods for detecting anomalies. While joint trained with multiple scientific workflows, our GNN models reached an accuracy more than 80% for workflow level and 75% for job level anomalies. In addition, we illustrate the importance of hyperparameter tuning method in our study that can significantly improve the metric(s) measure of evaluating the GNN models. Finally, we integrate explainable GNN methods to provide insights on job features in the workflow that cause an anomaly. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Author: Marine Djaffardjy, George Marchment, Clémence Sebe, Raphaël Blanchet, Khalid Belhajjame, Alban Gaignard, Frédéric Lemoine, and Sarah Cohen-Boulakia
Subjects: Scientific workflows, Bioinformatics, Reuse, Reproducibility, Biotechnology, TP248.13-248.65
Abstract: Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
Published: 2023
Full Text: View/download PDF

29. An empirical study on the extension of bounds on completion time and resources for scientific workflows

Author: Sirisha, D. and Prasad, S. Sambhu
Published: 2024
Full Text: View/download PDF

30. Towards a Software Development Framework for Interconnected Science Ecosystems

Author: Thakur, Addi Malviya, Hitefield, Seth, McDonnell, Marshall, Wolf, Matthew, Archibald, Richard, Drane, Lance, Roccapriore, Kevin, Ziatdinov, Maxim, McGaha, Jesse, Smith, Robert, Hetrick, John, Abraham, Mark, Yakubov, Sergey, Watson, Greg, Chance, Ben, Nguyen, Clara, Baker, Matthew, Michael, Robert, Arenholz, Elke, Mintz, Ben, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Doug, Kothe, editor, Al, Geist, editor, Pophale, Swaroop, editor, Liu, Hong, editor, and Parete-Koon, Suzanne, editor
Published: 2022
Full Text: View/download PDF

31. ParslRNA-Seq: An Efficient and Scalable RNAseq Analysis Workflow for Studies of Differentiated Gene Expression

Author: Ocaña, Kary, Cruz, Lucas, Coelho, Micaella, Terra, Rafael, Galheigo, Marcelo, Carneiro, Andre, Carvalho, Diego, Gadelha, Luiz, Boito, Francieli, Navaux, Philippe, Osthoff, Carla, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Navaux, Philippe, editor, Barrios H., Carlos J., editor, Osthoff, Carla, editor, and Guerrero, Ginés, editor
Published: 2022
Full Text: View/download PDF

32. An Energy-Efficient Load Balancing Approach for Fog Environment Using Scientific Workflow Applications

Author: Kaur, Mandeep, Aron, Rajni, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Majhi, Sudhan, editor, Prado, Rocío Pérez de, editor, and Dasanapura Nanjundaiah, Chandrappa, editor
Published: 2022
Full Text: View/download PDF

33. Auto-scaling of Scientific Workflows in Kubernetes

Author: Baliś, Bartosz, Broński, Andrzej, Szarek, Mateusz, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Groen, Derek, editor, de Mulatier, Clélia, editor, Paszynski, Maciej, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M. A., editor
Published: 2022
Full Text: View/download PDF

34. IoS: A Needed Platform for Scientific Workflow Management

Author: Takan, Savas, Gültekin, Visam, Allmer, Jens, Chen, Ming, editor, and Hofestädt, Ralf, editor
Published: 2022
Full Text: View/download PDF

35. A Deep Reinforcement Learning-Based Approach to the Scheduling of Multiple Workflows on Non-dedicated Edge Servers

Author: Gao, Yongqiang, Feng, Ke, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Shen, Hong, editor, Sang, Yingpeng, editor, Zhang, Yong, editor, Xiao, Nong, editor, Arabnia, Hamid R., editor, Fox, Geoffrey, editor, Gupta, Ajay, editor, and Malek, Manu, editor
Published: 2022
Full Text: View/download PDF

36. Fog Clustering-based Architecture for Load Balancing in Scientific Workflows

Author: Kaur, Mandeep, Aron, Rajni, Xhafa, Fatos, Series Editor, Chaki, Nabendu, editor, Devarakonda, Nagaraju, editor, Cortesi, Agostino, editor, and Seetha, Hari, editor
Published: 2022
Full Text: View/download PDF

37. A survey of provenance in scientific workflow.

Author: Lin, Songhai, Xiao, Hong, Jiang, Wenchao, Li, Dafeng, Liang, Jiaben, and Li, Zelin
Subjects: *TECHNOLOGICAL innovations, *WORKFLOW, *SCIENTIFIC models, *BLOCKCHAINS, *DATA analysis, *AUTOMATION
Abstract: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Data-intensive experiments using workflows enabled automation and provenance support, which contribute to alleviating the reproducibility crisis. This paper investigates the existing provenance models as well as scientific workflow applications. Furthermore, here we not only summarize the models at different levels, but also compare the applications, particularly the blockchain applied to the provenance in scientific workflows. After that, a new design of secure provenance system is proposed. Provenance that would be enabled by the emerging technology is also discussed at the end. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. Failure Prediction for Scientific Workflows Using Nature-Inspired Machine Learning Approach.

Author: Sridevi, S. and Katiravan, Jeevaa
Subjects: RELIABILITY in engineering, BIOLOGICALLY inspired computing, PREDICTION models, FORECASTING, MACHINE learning, WORKFLOW, SCALABILITY
Abstract: Scientific workflows have gained the emerging attention in sophisticated large-scale scientific problem-solving environments. The pay-per-use model of cloud, its scalability and dynamic deployment enables it suited for executing scientific workflow applications. Since the cloud is not a utopian environment, failures are inevitable that may result in experiencing fluctuations in the delivered performance. Though a single task failure occurs in workflow based applications, due to its task dependency nature, the reliability of the overall system will be affected drastically. Hence rather than reactive fault-tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring a nature inspired metaheuristics-Intelligent Water Drops Algorithm (IWDA) combined with an efficient machine learning approach-Support Vector Regression (SVR) for task failure prognostication which facilitates proactive fault-tolerance in the scheduling of scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and the precision accuracy of prediction is optimized by IWDA and several performance metrics were evaluated on various benchmark workflows. The experimental results prove that the proposed proactive fault-tolerant approach performs better compared with the other existing techniques. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. Neural simulation pipeline: Enabling container-based simulations on-premise and in public clouds.

Author: Chlasta, Karol, Sochaczewski, Paweł, Wójcik, Grzegorz M., and Krejtz, Izabela
Subjects: HIGH performance computing, RECOGNITION (Psychology), LARGE-scale brain networks, COMPUTATIONAL neuroscience, WEB services
Abstract: In this study, we explore the simulation setup in computational neuroscience. We use GENESIS, a general purpose simulation engine for sub-cellular components and biochemical reactions, realistic neuron models, large neural networks, and system-level models. GENESIS supports developing and running computer simulations but leaves a gap for setting up today's larger and more complex models. The field of realistic models of brain networks has overgrown the simplicity of earliest models. The challenges include managing the complexity of software dependencies and various models, setting up model parameter values, storing the input parameters alongside the results, and providing execution statistics. Moreover, in the high performance computing (HPC) context, public cloud resources are becoming an alternative to the expensive on-premises clusters. We present Neural Simulation Pipeline (NSP), which facilitates the large-scale computer simulations and their deployment to multiple computing infrastructures using the infrastructure as the code (IaC) containerization approach. The authors demonstrate the effectiveness of NSP in a pattern recognition task programmed with GENESIS, through a custom-built visual system, called RetNet(8×5,1) that uses biologically plausible Hodgkin-Huxley spiking neurons. We evaluate the pipeline by performing 54 simulations executed on-premise, at the Hasso Plattner Institute's (HPI) Future Service-Oriented Computing (SOC) Lab, and through the Amazon Web Services (AWS), the biggest public cloud service provider in the world. We report on the non-containerized and containerized execution with Docker, as well as present the cost per simulation in AWS. The results showthat our neural simulation pipeline can reduce entry barriers to neural simulations, making them more practical and cost-effective. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. EFFECTIVE STRATEGY TO ENHANCE INTERMEDIATE DATA CONFIDENTIALITY SCIENTIFIC WORKFLOWS ON DISTRIBUTED MULTI-TENANT CLOUD INFRASTRUCTURE.

Author: BABU, S. BHARATH and JOTHI, K. R.
Subjects: *COMMUNICATION infrastructure, *WORKFLOW, *VIRTUAL machine systems, *CONFIDENTIAL communications, *DATA security failures, *DATA warehousing, *BLOCKCHAINS
Abstract: Specific data transitions and evaluations are part of a scientific workflow, these are a standard process for integrating them, based on the data correlations in the corresponding repositories. To complete a scientific computing task, a massive amount of transitional (intermediate) data and sub-processes must be constructed, each one of which may be developed by separate Virtual Machines (VMs) in the cloud. Complex scientific computations can be automated by using scientific workflows. Confidentiality demands that once an intruder amasses the intermediate data, the secrecy content cannot be accessed. Thus, this research proposes a novel method for storing and preserving the Intermediate Data (ID) relied on multi-tenant architecture that ensures data confidentiality and overall operational effectiveness. The core work is focused on ensuring confidentiality through the incorporation of Block-chain techniques. The suggested work features mechanisms for data secrecy, authentication (verification), data preservation, and information sharing in managing confidential data. We use smart contracts and the principles of Block-chain to provide safe data distribution and storage, ensuring the prevention of data breaches during sharing processes without the owner's consent. A Decentralized Confidentiality Scheme (DCS) is proposed to secure intermediate data on distributed Multi-tenant Cloud Infra. In DCS, the smart consensus is duplicated across several nodes in the Block-chain network, providing integrity, confidentiality, and longevity. Infection Monkey simulation platform is used for the experimentations. Simulated findings show that the suggested procedure is helpful in enhancing intermediate data confidentiality in scientific workflows and it ensured above 98.13% protection against potential breach. [ABSTRACT FROM AUTHOR]
Published: 2023

41. F4ESS – a framework for interdisciplinary data-driven earth system science.

Author: Dransch, Doris, Eggert, Daniel, and Sips, Mike
Subjects: *EARTH system science, *DATA analysis, *INTERDISCIPLINARY research, *FACILITATED communication
Abstract: Earth system science is an interdisciplinary effort to understand the fundamentals and interactions of environmental processes. Interdisciplinary research is challenging since it demands the integration of scientific schemes and practices from different research fields into a collaborative work environment. This paper introduces the framework F4ESS that supports this integration. F4ESS provides methods and technologies that facilitate the development of integrative work environments for Earth system science. F4ESS enables scientists a) to outline structured and summarized descriptions of scientific procedures to facilitate communication and synthesis, b) to combine a large variety of distributed data analysis software into seamless data analysis chains and workflows, c) to visually combine and interactively explore the manifold spatiotemporal data and results to support understanding and knowledge creation. The F4ESS methods and technologies are generic and can be applied in various scientific fields. We discuss F4ESS in the context of the interdisciplinary investigation of flood events. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

42. An Approach to Implementing High-Performance Computing for Problem Solving in Workflow-Based Energy Infrastructure Resilience Studies

Author: Alexander Feoktistov, Alexei Edelev, Andrei Tchernykh, Sergey Gorsky, Olga Basharina, and Evgeniy Fereferov
Subjects: energy systems, resilience, vulnerability, HPC, IMDG, scientific workflows, Electronic computers. Computer science, QA75.5-76.95
Abstract: Implementing high-performance computing (HPC) to solve problems in energy infrastructure resilience research in a heterogeneous environment based on an in-memory data grid (IMDG) presents a challenge to workflow management systems. Large-scale energy infrastructure research needs multi-variant planning and tools to allocate and dispatch distributed computing resources that pool together to let applications share data, taking into account the subject domain specificity, resource characteristics, and quotas for resource use. To that end, we propose an approach to implement HPC-based resilience analysis using our Orlando Tools (OT) framework. To dynamically scale computing resources, we provide their integration with the relevant software, identifying key application parameters that can have a significant impact on the amount of data processed and the amount of resources required. We automate the startup of the IMDG cluster to execute workflows. To demonstrate the advantage of our solution, we apply it to evaluate the resilience of the existing energy infrastructure model. Compared to similar approaches, our solution allows us to investigate large infrastructures by modeling multiple simultaneous failures of different types of elements down to the number of network elements. In terms of task and resource utilization efficiency, we achieve almost linear speedup as the number of nodes of each resource increases.
Published: 2023
Full Text: View/download PDF

43. Evaluating Energy-Aware Scheduling Algorithms for I/O-Intensive Scientific Workflows

Author: Coleman, Tainã, Casanova, Henri, Gwartney, Ty, da Silva, Rafael Ferreira, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Paszynski, Maciej, editor, Kranzlmüller, Dieter, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M. A., editor
Published: 2021
Full Text: View/download PDF

44. Improving Existing WMS for Reduced Makespan of Workflows with Lambda

Author: Al-Haboobi, Ali, Kecskemeti, Gabor, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Balis, Bartosz, editor, B. Heras, Dora, editor, Antonelli, Laura, editor, Bracciali, Andrea, editor, Gruber, Thomas, editor, Hyun-Wook, Jin, editor, Kuhn, Michael, editor, Scott, Stephen L., editor, Unat, Didem, editor, and Wyrzykowski, Roman, editor
Published: 2021
Full Text: View/download PDF

45. Multi-Swarm PSO Algorithm for Static Workflow Scheduling in Cloud-Fog Environments

Author: Dineshan Subramoney and Clement N. Nyirenda
Subjects: Scientific workflows, cloud computing, fog computing, particle swarm optimization, evolutionary algorithms, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Scientific workflow scheduling involves the allocation of workflow tasks to particular computational resources. The generation of optimal solutions to reduce run-time, cost, and energy consumption, as well as ensuring proper load balancing, remains a major challenge. Therefore, this work presents a Multi-Swarm Particle Swarm Optimization (MS-PSO) algorithm to improve the scheduling of scientific workflows in cloud-fog environments. MS-PSO seeks to address the canonical PSO’s problem of premature convergence, which leads it to suboptimal solutions. In MS-PSO, particles are divided into several swarms, with each swarm having its own cognitive and social learning coefficients. This work also develops a weighted sum objective function for the workflow scheduling problem, based on four objectives: makespan, cost, energy and load balancing for cloud and fog tiers. The FogWorkflowSim Toolkit is used in the evaluation process, with the objectives serving as performance metrics. The MS-PSO approach is compared with the canonical PSO, Genetic Algorithm (GA), Differential Evolution (DE) and GA-PSO. The following scientific workflows are used in the simulations: Montage, Cybershake, Epigenomics, LIGO and SIPHT. MS-PSO outperforms the canonical PSO on all scientific workflows and under all performance metrics. It competes fairly well against the other approaches and it is more stable and reliable. It only ranks second to PSO, in terms of execution time. In future, multiple species, incorporating population update mechanisms from several algorithmic frameworks (MS-PSO, DE, GA), will be used for scientific workflow scheduling. Hybdridization of the realized algorithm with dynamic approaches will also be investigated.
Published: 2022
Full Text: View/download PDF

46. Fault Tolerant and Data Oriented Scientific Workflows Management and Scheduling System in Cloud Computing

Author: Zulfiqar Ahmad, Ali Imran Jehangiri, Nader Mohamed, Mohamed Othman, and Arif Iqbal Umar
Subjects: Cloud computing, scientific workflows, scheduling, load management, CyberShake, Montage, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Cloud computing is a virtualized, scalable, ubiquitous, and distributed computing paradigm that provides resources and services dynamically in a subscription based environment. Cloud computing provides services through Cloud Service Providers (CSPs). Cloud computing is mainly used for delivering solutions to a large number of business and scientific applications. Large-scale scientific applications are evaluated through cloud computing in the form of scientific workflows. Scientific workflows are data-intensive applications, and a single scientific workflow may be comprised of thousands of tasks. Deadline constraints, task failures, budget constraints, improper organization and management of tasks can cause inconvenience in executing scientific workflows. Therefore, we proposed a fault-tolerant and data-oriented scientific workflow management and scheduling system (FD-SWMS) in cloud computing. The proposed strategy applies a multi-criteria-based approach to schedule and manage the tasks of scientific workflows. The proposed strategy considers the special characteristics of tasks in scientific workflows, i.e., the scientific workflow tasks are executed simultaneously in parallel, in pipelined, aggregated to form a single task, and distributed to create multiple tasks. The proposed strategy schedules the tasks based on the data-intensiveness, provides a fault tolerant technique through a cluster-based approach, and makes it energy efficient through a load sharing mechanism. In order to find the effectiveness of the proposed strategy, the simulations are carried out on WorkflowSim for Montage and CyberShake workflows. The proposed FD-SWMS strategy performs better as compared with the existing state-of-the-art strategies. The proposed strategy on average reduced execution time by 25%, 17%, 22%, and 16%, minimized the execution cost by 24%, 17%, 21%, and 16%, and decreased the energy consumption by 21%, 17%, 20%, and 16%, as compared with existing QFWMS, EDS-DC, CFD, and BDCWS strategies, respectively for Montage scientific workflow. Similarly, the proposed strategy on average reduced execution time by 48%, 17%, 25%, and 42%, minimized the execution cost by 45%, 11%, 16%, and 38%, and decreased the energy consumption by 27%, 25%, 32%, and 20%, as compared with existing QFWMS, EDS-DC, CFD, and BDCWS strategies, respectively for CyberShake scientific workflow.
Published: 2022
Full Text: View/download PDF

47. A Unified Mechanism for Cloud Scheduling of Scientific Workflows

Author: Ali Kamran, Umar Farooq, Ihsan Rabbi, Kashif Zia, Muhammad Assam, Hadeel Alsolai, and Fahd N. Al-Wesabi
Subjects: Cloud computing, scheduling, scientific workflows, WorkFlowSim, CloudSim, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Scheduling plays a vital role in the efficient utilization of the available resources in clouds. This paper investigates the capabilities of the current scheduling algorithms of WorkFlowSim framework for processing scientific workflows. These investigations used four different sizes of workloads each, for, five well-known workflows. It was revealed that none of the existing algorithms is capable of efficiently executing all the four sizes of workload for the complete set of workflows. Different algorithms performed better, when they were applied to various workloads of a particular workflow. This fact was used in developing an improved unified mechanism, which is capable of using an existing algorithm that performed well in the past, against the given workload. Evaluation results showed that the proposed mechanism improved over the existing algorithms for 4 out of 5 workflows (Epigenomics, Inspiral, Cyber Shake, and Montage), when tested against an aggregated load of all sizes, in terms of simulation time. For the workflow named SIPHT, however, it responded exactly the same as Max-Min algorithm. The minimum and maximum improvements, against the existing best and worst algorithms, in percentage, for Epigenomics, Inspiral, SIPHT, Cyber Shake and Montage were 16–63, 30–68, 0–69, 30–68, and 9–71 in corresponding order. This work has an additional overhead in terms of a dedicated module to find and store algorithmic performance. It is, however, required once and, thus, the increase in execution time might be marginal. The future work intends to check the impact of compute time towards optimization parameters such as makespan, pricing and deadlines.
Published: 2022
Full Text: View/download PDF

48. Neural simulation pipeline: Enabling container-based simulations on-premise and in public clouds

Author: Karol Chlasta, Paweł Sochaczewski, Grzegorz M. Wójcik, and Izabela Krejtz
Subjects: GENESIS, scientific workflows, Docker, computer simulations, liquid state machine (LSM), Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: In this study, we explore the simulation setup in computational neuroscience. We use GENESIS, a general purpose simulation engine for sub-cellular components and biochemical reactions, realistic neuron models, large neural networks, and system-level models. GENESIS supports developing and running computer simulations but leaves a gap for setting up today's larger and more complex models. The field of realistic models of brain networks has overgrown the simplicity of earliest models. The challenges include managing the complexity of software dependencies and various models, setting up model parameter values, storing the input parameters alongside the results, and providing execution statistics. Moreover, in the high performance computing (HPC) context, public cloud resources are becoming an alternative to the expensive on-premises clusters. We present Neural Simulation Pipeline (NSP), which facilitates the large-scale computer simulations and their deployment to multiple computing infrastructures using the infrastructure as the code (IaC) containerization approach. The authors demonstrate the effectiveness of NSP in a pattern recognition task programmed with GENESIS, through a custom-built visual system, called RetNet(8 × 5,1) that uses biologically plausible Hodgkin–Huxley spiking neurons. We evaluate the pipeline by performing 54 simulations executed on-premise, at the Hasso Plattner Institute's (HPI) Future Service-Oriented Computing (SOC) Lab, and through the Amazon Web Services (AWS), the biggest public cloud service provider in the world. We report on the non-containerized and containerized execution with Docker, as well as present the cost per simulation in AWS. The results show that our neural simulation pipeline can reduce entry barriers to neural simulations, making them more practical and cost-effective.
Published: 2023
Full Text: View/download PDF

49. K-span: Open and reproducible spatial analytics using scientific workflows

Author: Abdur Forkan, Alan Both, Chris Bellman, Matt Duckham, Hamish Anderson, and Nenad Radosevic
Subjects: scientific workflows, KNIME, reproducibility, open source, geospatial analysis, Digital Elevation Model, Science
Abstract: This paper describes the design, development, and testing of a general-purpose scientific-workflows tool for spatial analytics. Spatial analytics processes are frequently complex, both conceptually and computationally. Adaptation, documention, and reproduction of bespoke spatial analytics procedures represents a growing challenge today, particularly in this era of big spatial data. Scientific workflow systems hold the promise of increased openness and transparency with improved automation of spatial analytics processes. In this work, we built and implemented a KNIME spatial analytics (“K-span”) software tool, an extension to the general-purpose open-source KNIME scientific workflow platform. The tool augments KNIME with new spatial analytics nodes by linking to and integrating a range of existing open-source spatial software and libraries. The implementation of the K-span system is demonstrated and evaluated with a case study associated with the original process of construction of the Australian national DEM (Digital Elevation Model) in the Greater Brisbane area of Queensland, Australia by Geoscience Australia (GA). The outcomes of translating example spatial analytics process into a an open, transparent, documented, automated, and reproducible scientific workflow highlights the benefits of using our system and our general approach. These benefits may help in increasing users’ assurance and confidence in spatial data products and in understanding of the provenance of foundational spatial data sets across diverse uses and user groups.
Published: 2023
Full Text: View/download PDF

50. A Provenance-based Execution Strategy for Variant GPU-accelerated Scientific Workflows in Clouds.

Author: Stockinger, Murilo B., Guerine, Marcos A., de Paula, Ubiratam, Santiago, Filipe, Frota, Yuri, Rosseti, Isabel, Plastino, Alexandre, and de Oliveira, Daniel
Abstract: Several complex scientific simulations process large amounts of distributed and heterogeneous data. These simulations are commonly modeled as scientific workflows and require High Performance Computing (HPC) environments to produce results timely. Although scientists already benefit from clusters and clouds, new hardware, such as General Purpose Graphical Processing Units (GPGPUs), can be used to speedup the execution of the workflow. Clouds also provide virtual machines (VMs) with GPU capabilities that can also be used, thus becoming hybrid clouds. This way, many workflows can be modeled considering programs that execute in GPUs, CPUs or both. A problem that arises is how to schedule workflows with variant activities (that can be executed in CPU, GPU or both) in this hybrid environment. Although existing workflow systems (WfMS) can execute in GPGPUs and clouds independently, they do not provide mechanisms for scheduling workflows with variant activities in this hybrid environment. In fact, reducing the makespan and the financial cost of variant workflows in hybrid clouds may be a difficult task. In this article, we present a scheduling strategy for Variant GPU-accelerated workflows in clouds, named PROFOUND, which schedules activations (atomic tasks) to a set of CPU and GPU/CPU VMs based on provenance data (historical data). PROFOUND is based on a combination of a mathematical formulation and a heuristic, and aims at minimizing not only the makespan, but also the financial cost involved in the execution. To evaluate PROFOUND, we used a set of benchmark instances based on synthetic and real scenarios gathered from different workflows traces. The experiments show that PROFOUND is able to solve the referred scheduling problem. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

777 results on '"scientific workflows"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources