56 results on '"Sandro, Fiore"'
Search Results
2. An EOSC-enabled Data Space environment for the climate community
- Author
-
Fabrizio Antonio, Donatello Elia, Guillaume Levavasseur, Atef Ben Nasser, Paola Nassisi, Alessandro D'Anca, Alessandra Nuzzo, Sandro Fiore, Sylvie Joussaume, and Giovanni Aloisio
- Abstract
The exponential increase in data volumes and complexities is causing a radical change in the scientific discovery process in several domains, including climate science. This affects the different stages of the data lifecycle, thus posing significant data management challenges in terms of data archiving, access, analysis, visualization, and sharing. The data space concept can support scientists' workflow and simplify the process towards a more FAIR use of data.In the context of the European Open Science Cloud (EOSC) initiative launched by the European Commission, the ENES Data Space (EDS) represents a domain-specific implementation of the data space concept. The service, developed in the frame of the EGI-ACE project, aims to provide an open, scalable, cloud-enabled data science environment for climate data analysis on top of the EOSC Compute Platform. It is accessible in the European Open Science Cloud (EOSC) through the EOSC Catalogue and Marketplace (https://marketplace.eosc-portal.eu/services/enes-data-space) and it also provides a web portal (https://enesdataspace.vm.fedcloud.eu) including information, tutorials and training materials on how to get started with its main features. The EDS integrates into a single environment ready-to-use climate datasets, compute resources and tools, all made available through the Jupyter interface, with the aim of supporting the overall scientific data processing workflow. Specifically, the data store linked to the ENES Data Space provides access to a multi-terabyte set of variable-centric collections from large-scale global climate experiments. The data pool consists of a mirrored subset of CMIP (Coupled Model Intercomparison Project) datasets from the ESGF (Earth System Grid Federation) federated data archive, collected and kept synchronized with the remote copies by using the Synda tool developed within the scope of the IS-ENES3 H2020 project. Community-based, open source frameworks (e.g., Ophidia) and libraries from the Python ecosystem provide the capabilities for data access, analysis and visualisation. Results and experiment definitions (i.e., Jupyter Notebooks) can be easily shared among users promoting data sharing and application re-use towards a more Open Science approach. An overview of the data space capabilities along with the key aspects in terms of data management will be presented in this work.
- Published
- 2023
3. Tracking and reporting peta-scale data exploitation within the Earth System Grid Federation through the ESGF Data Statistics service
- Author
-
Alessandra Nuzzo, Fabrizio Antonio, Maria Mirto, Paola Nassisi, Sandro Fiore, and Giovanni Aloisio
- Abstract
The Earth System Grid Federation (ESGF) is an international collaboration powering most global climate change research and managing the first-ever decentralized repository for handling climate science data, with multiple petabytes of data at dozens of federated sites worldwide. It is recognized as the leading infrastructure for the management and access of large distributed data volumes for climate change research and supports the Coupled Model Intercomparison Project (CMIP) and the Coordinated Regional Climate Downscaling Experiment (CORDEX), whose protocols enable the periodic assessments carried out by the IPCC, the Intergovernmental Panel on Climate Change. As trusted international repository, ESGF hosts and replicates data from a broader range of domains and communities in the Earth sciences leading thus to a strong support to standards for connecting data and application of FAIR data principles to ensure free and open access and interoperability with other similar systems in the Earth Sciences. ESGF includes a specific software component, funded by the H2020 projects IS-ENES2 and IS-ENES3, named ESGF Data Statistics, which takes care of collecting, analyzing, visualizing the data usage metrics and data archive information across the federation. It provides a distributed and scalable software infrastructure responsible for capturing a set of metrics both at single site and federation level. It collects and stores a high volume of heterogeneous metrics, covering coarse and fine grain measures such as downloads and clients statistics, aggregated cross and project-specific download statistics thus offering a more user oriented perspective of the scientific experiments. This allows providing a strong feedback on how much, how frequently and how intensively the whole federation is exploited by the end-users, as well as the most downloaded data, which somehow captures the level of interest from the community on some specific data. It also gives feedback on the less accessed data, which from one side can help designing larger-scale experiments in the future and on the other hand can help getting some insights on the long tail of research. On top of this, a view of the total amount of data published and available through ESGF offers users the possibility to monitor the status of the data archive of the entire federation. This contribution presents an overview of the Data Statistics capabilities as well as the main results in terms of data analysis and visualization.
- Published
- 2023
4. An EOSC-enabled Data Space Environment for Climate Science
- Author
-
Donatello Elia, Sandro Fiore, Fabrizio Antonio, Guillaume Levavasseur, Paola Nassisi, Alessandro D'Anca, Sylvie Joussaume, and Giovanni Aloisio
- Subjects
Open Science ,Data Space ,Climate Science - Abstract
In the context of the European Open Science Cloud, the ENES Data Space represents a domain-specific implementation of the data space concept, a digital ecosystem supporting scientific communities towards a more sustainable, effective, and FAIR use of data. Such ecosystem has been recently opened to climate users, offering datasets, tools, and services into a single environment with ready-to-use data and programmatic capabilities for the development of data science applications. Presently, the data store of the ENES Data Space provides access to models output from large-scale global experiments for climate model intercomparison. The storage and computational resources for the execution of the data space are provided by the EGI Federated Cloud e-Infrastructure. From a science gateway perspective, the ENES Data Space provides an interactive web-based environment based on the Jupyter project and available through the EOSC MarketPlace.
- Published
- 2023
- Full Text
- View/download PDF
5. A multi-model architecture based on Long Short-Term Memory neural networks for multi-step sea level forecasting
- Author
-
Giovanni Aloisio, Gabriele Accarino, Ivan Federico, Sandro Fiore, Giovanni Coppini, Marco Chiarelli, and Salvatore Causio
- Subjects
Meteorology ,Artificial neural network ,Computer Networks and Communications ,Computer science ,Climate change ,Storm surge ,020206 networking & telecommunications ,02 engineering and technology ,Mediterranean sea ,Hardware and Architecture ,Climate change scenario ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Architecture ,Coastal flood ,Software ,Sea level - Abstract
The intensification of extreme events, storm surges and coastal flooding in a climate change scenario increasingly influences human processes, especially in coastal areas where sea-based activities are concentrated. Predicting sea level near the coasts, with a high accuracy and in a reasonable amount of time, becomes a strategic task. Despite the developments of complex numerical codes for high-resolution ocean modeling, the task of making forecasts in areas at the intersection between land and sea remains challenging. In this respect, the use of machine learning techniques can represent an interesting alternative to be investigated and evaluated by numerical modelers. This article presents the application of the Long-Short Term Memory (LSTM) neural network to the problem of short-term sea level forecasting in the Southern Adriatic Northern Ionian (SANI) domain in the Mediterranean sea. The proposed multi-model architecture based on LSTM networks has been trained to predict mean sea levels three days ahead, for different coastal locations. Predictions were compared with the observation data collected through the tide-gauge devices as well as with the forecasts produced by the Southern Adriatic Northern Ionian Forecasting System (SANIFS) developed at the Euro-Mediterranean Center on Climate Change (CMCC), which provides short-term daily updated forecasts in the Mediterranean basin. Experimental results demonstrate that the multi-model architecture is able to bridge information far in time and to produce predictions with a much higher accuracy than SANIFS forecasts.
- Published
- 2021
6. ENES Data Space: an open, cloud-enabled data science environment for climate analysis
- Author
-
Fabrizio Antonio, Donatello Elia, Andrea Giannotta, Alessandra Nuzzo, Guillaume Levavasseur, Atef Ben Nasser, Paola Nassisi, Alessandro D'Anca, Sandro Fiore, Sylvie Joussaume, and Giovanni Aloisio
- Abstract
The scientific discovery process has been deeply influenced by the data deluge started at the beginning of this century. This has caused a profound transformation in several scientific domains which are now moving towards much more collaborative processes. In the climate sciences domain, the ENES Data Space aims to provide an open, scalable, cloud-enabled data science environment for climate data analysis. It represents a collaborative research environment, deployed on top of the EGI federated cloud infrastructure, specifically designed to address the needs of the ENES community. The service, developed in the context of the EGI-ACE project, provides ready-to-use compute resources and datasets, as well as a rich ecosystem of open source Python modules and community-based tools (e.g., CDO, Ophidia, Xarray, Cartopy, etc.), all made available through the user-friendly Jupyter interface. In particular, the ENES Data Space provides access to a multi-terabyte set of specific variable-centric collections from large community experiments to support researchers in climate model data analysis experiments. The data pool of the ENES Data Space consists of a mirrored subset of CMIP datasets from the ESGF federated data archive collected by using the Synda community tool in order to provide the most up to date datasets into a single location. Results and output products as well as experiment definitions (in the form of Jupyter Notebooks) can be easily shared among users through data sharing services, which are also being integrated in the infrastructure, such as EGI DataHub.The service was opened in the second part of 2021 and is now accessible in the European Open Science Cloud (EOSC) through the EOSC Portal Marketplace (https://marketplace.eosc-portal.eu/services/enes-data-space). This contribution will present an overview of the ENES Data Space service and its main features.
- Published
- 2022
- Full Text
- View/download PDF
7. Skip high-volume data transfer and access free computing resources for your CMIP6 multi-model analyses
- Author
-
Sophie Morellon, Marco Kulüke, Charlotte L Pascoe, Stephan Kindermann, Guillaume Levavasseur, Fabian Wachsmann, Regina Kwee-Hinzmann, Maria Moreno de Castro, Sandro Fiore, Paola Nassisi, Sylvie Joussaume, and Martin Juckes
- Subjects
Computer science ,Volume (compression) ,Data transmission ,Computational science - Abstract
Tired of downloading tons of model results? Is your internet connection flakey? Are you about to overload your computer’s memory with the constant increase of data volume and you need more computing resources? You can request free of charge computing time at one of the supercomputers of the Infrastructure of the European Network of Earth System modelling (IS-ENES)1, the European part of Earth System Grid Federation (ESGF)2, which also hosts and maintains more than 6 Petabytes of CMIP6 and CORDEX data.Thanks to this new EU Comission funded service, you can run your own scripts in your favorite programming language and straightforward pre- and post-process model data. There is no need for heavy data transfer, just load with one line of code the data slice you need because your script will directly access the data pool. Therefore, days-lasting calculations will be done in seconds. You can test the service, we very easily provide pre-access activities.In this session we will run Jupyter notebooks directly on the German Climate Computing Center (DKRZ)3, one of the ENES high performance computers and a ESGF data center, showing how to load, filter, concatenate, take means, and plot several CMIP6 models to compare their results, use some CMIP6 models to calculate some climate indexes for any location and period, and evaluate model skills with observational data. We will use Climate Data Operators (cdo)4 and Python packages for Big Data manipulation, as Intake5, to easily extract the data from the huge catalog, and Xarray6, to easily read NetDCF files and scale to parallel computing. We are continuously creating more use cases for multi-model evaluation, mechanisms of variability, and impact analysis, visit the demos, find more information, and apply here: https://portal.enes.org/data/data-metadata-service/analysis-platforms.[1] https://is.enes.org/[2] https://esgf.llnl.gov/[3] https://www.dkrz.de/[4] https://code.mpimet.mpg.de/projects/cdo/[5] https://intake.readthedocs.io/en/latest/[6] http://xarray.pydata.org/en/stable/
- Published
- 2021
8. Meridional distribution of moisture transport associated to Tropical Cyclones
- Author
-
Sandro Fiore, Enrico Scoccimarro, Malcolm J. Roberts, Daniele Peano, Alessandro D'Anca, Fabrizio Antonio, Annalisa Cherchi, Silvio Gualdi, and Alessio Bellucci
- Subjects
Moisture ,Distribution (number theory) ,Environmental science ,Zonal and meridional ,Tropical cyclone ,Atmospheric sciences - Abstract
Tropical cyclones (TCs) transport energy and moisture along their pathways interacting with the climate system and TCs activities are expected to extend further poleward during the 21st century.For this reason, it is important to assess the ability of state-of-the-art climate models in reproducing an accurate meridional distribution of TCs as well as a reasonable meridional portrait of moisture transport associated with TCs.Since high resolutions are required to reconstruct observed TCs activity, the present work is based on the simulations performed as part of HighResMIP in the framework of the community CMIP6 effort. To inspect this feature, two horizontal resolutions for each climate model are considered. Besides, the impact of boundary conditions, i.e. observed ocean surface state, is examined by considering both coupled and atmosphere-only configurations.In the present work, the north Atlantic region is analyzed as a sample region, while the same approach is applied on a multi-basin basis. In the sample area, climate models present a good ability in reproducing the TCs distribution, with a general underestimation at lower latitudes and a slight overestimation at high-latitudes compared to observed TCs tracks (e.g. IBTRACK).The meridional distribution of moisture transport associated with TCs is evaluated by considering the radial average of the integrated water vapor transport along the TC tracks. When compared to observation (IBTRACS and JRA-55 reanalysis), the simulated moisture transport associated with TCs displays reasonably good performance in atmosphere-only high-resolution models configuration. The interannual variability of water vapor associated with TCs, instead, is poorly represented in climate models.Climate models in high-resolution configuration can then be used in estimating future TCs meridional distribution and changes in meridional moisture transport associated with TCs.This effort is part of HighResMIP and it is developed in the framework of the EU-funded PRIMAVERA project.
- Published
- 2020
9. Boosting climate change research with direct access to high performance computers
- Author
-
Ag Stephens, Martin Juckes, Maria Moreno de Castro, Sophie Morellon, Sandro Fiore, Sylvie Joussaume, Guillaume Levavasseur, Karsten Peters, Stephan Kindermann, and Paola Nassisi
- Subjects
Boosting (machine learning) ,Computer science ,Climate change ,Environmental economics - Abstract
Earth System observational and model data volumes are constantly increasing and it can be challenging to discover, download, and analyze data if scientists do not have the required computing and storage resources at hand. This is especially the case for detection and attribution studies in the field of climate change research since we need to perform multi-source and cross-disciplinary comparisons for datasets of high-spatial and large temporal coverage. Researchers and end-users are therefore looking for access to cloud solutions and high performance compute facilities. The Earth System Grid Federation (ESGF, https://esgf.llnl.gov/) maintains a global system of federated data centers that allow access to the largest archive of model climate data world-wide. ESGF portals provide free access to the output of the data contributing to the next assessment report of the Intergovernmental Panel on Climate Change through the Coupled Model Intercomparison Project. In order to support users to directly access to high performance computing facilities to perform analyses such as detection and attribution of climate change and its impacts, the EU Commission funded a new service within the infrastructure of the European Network for Earth System Modelling (ENES, https://portal.enes.org/data/data-metadata-service/analysis-platforms). This new service is designed to reduce data transfer issues, speed up the computational analysis, provide storage, and ensure the resources access and maintenance. Furthermore, the service is free of charge, only requires a lightweight application. We will present a demo on how flexible it is to calculate climate indices from different ESGF datasets covering a wide range of temporal and spatial scales using cdo (Climate Data Operators, https://code.mpimet.mpg.de/projects/cdo/) and Jupyter notebooks running directly on the ENES partners: the DKRZ (Germany), JASMIN (UK), CMCC(Italy), and IPSL (France) high performance computing centers.
- Published
- 2020
10. Python-based Multidimensional and Parallel Climate Model Data Analysis in ECAS
- Author
-
Regina Kwee, Tobias Weigel, Hannes Thiemann, Karsten Peters, Sandro Fiore, and Donatello Elia
- Abstract
This contribution highlights the Python xarray technique in context of a climate specific application (typical formats are NetCDF, GRIB and HDF).We will see how to use in-file metadata and why they are so powerful for data analysis, in particular by looking at community specific problems, e.g. one can select purely on coordinate variable names. ECAS, the ENES Climate Analytics Service available at Deutsches Klimarechenzentrum (DKRZ), will help by enabling faster access to the high-volume simulation data output from climate modeling experiments. In this respect, we can also make use of “dask” which was developed for parallel computing and can smoothly work with xarray. This is extremely useful when we want to exploit fully the advantages of our supercomputer.Our fully integrated service offers an interface via Jupyter notebooks (ecaslab.dkrz.de). We provide an analysis environment without the need of costly transfers, accessing CF standardized data files and all accessible via the ESGF portal on our nodes (esgf-data.dkrz.de). We can analyse the data of e.g. CMIP5, CMIP6, Grand Ensemble and observation data. ECAS was developed in the frame of European Open Source Cloud (EOSC) hub.
- Published
- 2020
11. A Python-oriented environment for climate experiments at scale in the frame of the European Open Science Cloud
- Author
-
Donatello Elia, Fabrizio Antonio, Cosimo Palazzo, Paola Nassisi, Sofiane Bendoukha, Regina Kwee-Hinzmann, Sandro Fiore, Tobias Weigel, Hannes Thiemann, and Giovanni Aloisio
- Subjects
13. Climate action - Abstract
Scientific data analysis experiments and applications require software capable of handling domain-specific and data-intensive workflows. The increasing volume of scientific data is further exacerbating these data management and analytics challenges, pushing the community towards the definition of novel programming environments for dealing efficiently with complex experiments, while abstracting from the underlying computing infrastructure. ECASLab provides a user-friendly data analytics environment to support scientists in their daily research activities, in particular in the climate change domain, by integrating analysis tools with scientific datasets (e.g., from the ESGF data archive) and computing resources (i.e., Cloud and HPC-based). It combines the features of the ENES Climate Analytics Service (ECAS) and the JupyterHub service, with a wide set of scientific libraries from the Python landscape for data manipulation, analysis and visualization. ECASLab is being set up in the frame of the European Open Science Cloud (EOSC) platform - in the EU H2020 EOSC-Hub project - by CMCC (https://ecaslab.cmcc.it/) and DKRZ (https://ecaslab.dkrz.de/), which host two major instances of the environment. ECAS, which lies at the heart of ECASLab, enables scientists to perform data analysis experiments on large volumes of multi-dimensional data by providing a workflow-oriented, PID-supported, server-side and distributed computing approach. ECAS consists of multiple components, centered around the Ophidia High Performance Data Analytics framework, which has been integrated with data access and sharing services (e.g., EUDAT B2DROP/B2SHARE, Onedata), along with the EGI federated cloud infrastructure. The integration with JupyterHub provides a convenient interface for scientists to access the ECAS features for the development and execution of experiments, as well as for sharing results (and the experiment/workflow definition itself). ECAS parallel data analytics capabilities can be easily exploited in Jupyter Notebooks (by means of PyOphidia, the Ophidia Python bindings) together with well-known Python modules for processing and for plotting the results on charts and maps (e.g., Dask, Xarray, NumPy, Matplotlib, etc.). ECAS is also one of the compute services made available to climate scientists by the EU H2020 IS-ENES3 project. Hence, this integrated environment represents a complete software stack for the design and run of interactive experiments as well as complex and data-intensive workflows. One class of such large-scale workflows, efficiently implemented through the environment resources, refers to multi-model data analysis in the context of both CMIP5 and CMIP6 (i.e., precipitation trend analysis orchestrated in parallel over multiple CMIP-based datasets).
- Published
- 2020
12. On the road to exascale: Advances in High Performance Computing and Simulations—An overview and editorial
- Author
-
Waleed W. Smari, Sandro Fiore, and Mohamed Bakhouya
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Distributed computing ,020206 networking & telecommunications ,02 engineering and technology ,Systems modeling ,Supercomputer ,Exascale computing ,Software ,Hardware and Architecture ,Scalability ,Path (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business - Abstract
In recent decades, the complexity of scientific and engineering problems has increased considerably. New applications and domains that use high performance computing systems have been introduced. These trends are projected to continue for the foreseen future (Reed and Dongarra, 2015) [ 1 ]. In many areas of engineering and science, High-Performance Computing (HPC) and Simulations have become determinants of industrial competitiveness and advanced research. In fact, advances in HPC architectures, storages, networking, and software capabilities are leading to a new era in HPC and simulations, along with new challenges both in computing and systems modeling (Geist and Lucas, 2009) [ 2 ]. These developments are especially critical considering that HPC systems continue to scale up in terms of nodes, cores, and accelerators, as well as software, infrastructure and tools, which in turn are expediting the move on the path toward Exascale (Reed and Dongarra, 2015; Geist and Lucas, 2009; Dongarra and Beckman, 2011; Dosanjh et al., 2014; Engelmann, 2014) [ [1] , [2] , [3] , [4] , [5] ]. Scalability and availability represent two of the main requirements that need to be considered before conceiving of these large-scale systems (ASCAC Subcommittee on Exascale Computing, 2010). The scalability feature allows the system to proportionately grow when service demand increases, whereas availability means the system continues to provide their services despite hardware and software failures (Theodoropoulos et al., 2014; Tang et al., 2014) [ [7] , [8] ]. The goal in large-scale HPC is to accommodate both availability and scalability while staying under strict constraints on performance (e.g., processing time) and cost metrics (e.g., power consumption). This special issue is envisioned to provide examples of research work on topics related to recent advances in High Performance Computing and Simulations. It briefly addresses and explores challenges toward Exascale computing, current state-of-the-art in HPC and simulation, and the path forward in the domains of large-scale HPC systems.
- Published
- 2018
13. Enabling Server-Based Computing and FAIR Data Sharing with the ENES Climate Analytics Service
- Author
-
Sandro Fiore, D. Elia, Sofiane Bendoukha, and Tobias Weigel
- Subjects
Data sharing ,Workflow ,Data access ,Computer science ,Analytics ,business.industry ,Data management ,e-Science ,Cloud computing ,business ,Data science ,Virtual research environment - Abstract
The European Network for Earth System Modelling (ENES) Climate Analytics Service (ECAS) is a new service from the EOSC-hub project. It offers a Virtual Research Environment (VRE) to scientific users, combining a Python (Jupyter) work environment with support services for data access, computing and data sharing. ECAS is motivated by providing users with remote access to extensive computing and storage resources beyond what they may have access to locally, reducing the need to conduct costly data transfer, and helping to realize the vision of FAIR data management. ECAS aims at providing a paradigm shift for the ENES community and beyond with a strong focus on data intensive analysis, provenance management, and server-side approaches as opposed to the current ones mostly client-based, sequential and with limited or missing end-to-end analytics workflow and provenance capabilities. Furthermore, the integrated data analytics service enables basic data provenance tracking by establishing a graph of persistent identifiers (PIDs) through the whole chain, and thereby improving reusability, traceability, and reproducibility. ECAS targets multiple user groups, including researchers in lack of local computing and storage resources, researchers with interest in the high-volume climate data pools, and use within education and training scenarios.
- Published
- 2019
14. BIGSEA: A Big Data analytics platform for public transportation information
- Author
-
Dorgival Guedes, Sandro Fiore, Rosa M. Badia, Nazareno Andrade, Nádia P. Kozievitch, Walter Abrahão dos Santos, Tarciso Braz, Giovanni Aloisio, Paulo Silva, Marco Vieira, Danilo Ardagna, Fábio Morais, Nuno Antunes, Jussara M. Almeida, Daniele Lezzi, Demetrio Gomes Mestre, Andy S. Alic, Wagner Meira, Tânia Basso, Carlos Eduardo Santos Pires, Ignacio Blanquer, Matheus Maciel, Regina Moraes, Donatello Elia, Andrey Brito, Marco Lattuada, European Commission, Ministério da Ciência, Tecnologia e Inovação (Brasil), Almeida, Jussara [0000-0001-9142-2919], Antunes, Nuno [0000-0002-6044-4012], Ardagna, Danilo [0000-0003-4224-927X], Badia, Rosa M. [0000-0003-2941-5499], Braz, Tarciso [0000-0001-8620-3877], Lattuada, Marco [0000-0003-0062-6049], Lezzi, Daniele [0000-0001-5081-7244], Mestre, Demetrio [0000-0003-4727-3340], Moraes, Regina [0000-0003-0678-4777], Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Almeida, Jussara, Antunes, Nuno, Ardagna, Danilo, Badia, Rosa M., Braz, Tarciso, Lattuada, Marco, Lezzi, Daniele, Mestre, Demetrio, Moraes, Regina, Alic, A. S., Almeida, J., Aloisio, G., Andrade, N., Antunes, N., Ardagna, D., Badia, R. M., Basso, T., Blanquer, I., Braz, T., Brito, A., Elia, D., Fiore, S., Guedes, D., Lattuada, M., Lezzi, D., Maciel, M., Meira, W., Mestre, D., Moraes, R., Morais, F., Pires, C. E., Kozievitch, N. P., Santos, W. D., Silva, P., and Vieira, M.
- Subjects
Computació en núvol ,Computer Networks and Communications ,Computer science ,Performance ,Deployment ,Big data ,Library science ,Transport ,Transportation ,02 engineering and technology ,Workflows ,11. Sustainability ,CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL ,0202 electrical engineering, electronic engineering, information engineering ,Cloud computing ,European commission ,Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC] ,business.industry ,Macrodades ,020206 networking & telecommunications ,Workflow ,Work (electrical) ,Hardware and Architecture ,Software deployment ,Public transport ,020201 artificial intelligence & image processing ,business ,Software - Abstract
Analysis of public transportation data in large cities is a challenging problem. Managing data ingestion, data storage, data quality enhancement, modelling and analysis requires intensive computing and a non-trivial amount of resources. In EUBra-BIGSEA (Europe–Brazil Collaboration of Big Data Scientific Research Through Cloud-Centric Applications) we address such problems in a comprehensive and integrated way. EUBra-BIGSEA provides a platform for building up data analytic workflows on top of elastic cloud services without requiring skills related to either programming or cloud services. The approach combines cloud orchestration, Quality of Service and automatic parallelisation on a platform that includes a toolbox for implementing privacy guarantees and data quality enhancement as well as advanced services for sentiment analysis, traffic jam estimation and trip recommendation based on estimated crowdedness., The work shown in this article has been funded jointly by European Commission under the Cooperation Programme, Horizon2020 grant agreement No 690116 (EUBra-BIGSEA) and the Min-istériode Ciência,Tecnologiae Inovação(MCTI) from Brazil
- Published
- 2019
15. AMGCC 2018 Foreword
- Author
-
Hyeonsang Eom, Myungho Lee, Kento Aida, Taiga Nakamura, Yoonhee Kim, Ananta Tiwari, Ilkyeun Ra, Young Choon Lee, Kyungyong Lee, Robert Quick, Jose Luis Vazquez-Poletti, Steven Timm, Ewa Deelman, E. M. Heien, Beomseok Nam, Sangmi Lee Pallickara, Jaehwan Lee, Raffaele Montella, Sungyong Park, Youngjae Kim, Taro Tezuka, David Sarramia, Seung-Jong Park, Young-ri Choi, Jens Jensen, Justin M. Wozniak, Heon-Young Yeom, Ricardo Graciani Diaz, Sandro Fiore, Yoshio Tanaka, Jaewook Lee, Jik-Soo Kim, and Jae-Young Choi
- Subjects
business.industry ,Computer science ,Distributed computing ,Cloud computing ,business ,Grid - Published
- 2018
16. Towards an Open (Data) Science Analytics-Hub for Reproducible Multi-Model Climate Analysis at Scale
- Author
-
Dean N. Williams, Giovanni Aloisio, Donatello Elia, Sandro Fiore, Alessandro DrAnca, Ian Foster, Fabrizio Antonio, Cosimo Palazzo, Fiore, S., Elia, D., Palazzo, C., Dranca, A., Antonio, F., Williams, D. N., Foster, I., and Aloisio, G.
- Subjects
Analytics-hub ,analytics-hub ,Open science ,010504 meteorology & atmospheric sciences ,Computer science ,Big data ,provenance ,Climate change ,02 engineering and technology ,01 natural sciences ,Open Science ,11. Sustainability ,0202 electrical engineering, electronic engineering, information engineering ,reproducibility ,0105 earth and related environmental sciences ,020203 distributed computing ,Coupled model intercomparison project ,business.industry ,Data science ,Reproducibility ,Knowledge sharing ,Open data ,13. Climate action ,Analytics ,Provenance ,Data analytics ,Scientific method ,data analytic ,Data analysis ,Earth System Grid ,business - Abstract
Open Science is key to future scientific research and promotes a deep transformation in the whole scientific research process encouraging the adoption of transparent and collaborative scientific approaches aimed at knowledge sharing. Open Science is increasingly gaining attention in the current and future research agenda worldwide. To effectively address Open Science goals, besides Open Access to results and data, it is also paramount to provide tools or environments to support the whole research process, in particular the design, execution and sharing of transparent and reproducible experiments, including data provenance (or lineage) tracking. This work introduces the Climate Analytics-Hub, a new component on top of the Earth System Grid Federation (ESGF), which joins big data approaches and parallel computing paradigms to provide an Open Science environment for reproducible multi-model climate change data analytics experiments at scale. An operational implementation has been set up at the SuperComputing Centre of the Euro-Mediterranean Center on Climate Change, with the main goal of becoming a reference Open Science hub in the climate community regarding the multi-model analysis based on the Coupled Model Intercomparison Project (CMIP). This paper reports about some ESiWACE WP3 activities described in the deliverable D3.10 "ESiWACE Scheduler development and support activities"
- Published
- 2018
- Full Text
- View/download PDF
17. Recent developments in high-performance computing and simulation: distributed systems, architectures, algorithms, and applications
- Author
-
Waleed W. Smari, Sandro Fiore, and Carsten Trinitis
- Subjects
Computational Theory and Mathematics ,Computer Networks and Communications ,Computer science ,Distributed computing ,Supercomputer ,Software ,Computer Science Applications ,Theoretical Computer Science - Published
- 2015
18. High performance computing and simulation: architectures, systems, algorithms, technologies, services, and applications
- Author
-
David R.C. Hill, Sandro Fiore, and Waleed W. Smari
- Subjects
Computational Theory and Mathematics ,Computer architecture ,Computer Networks and Communications ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,02 engineering and technology ,Supercomputer ,Software ,Computer Science Applications ,Theoretical Computer Science - Published
- 2013
19. EUBrazilCC Federated Cloud
- Author
-
Jose Luis Vivas, Abmar Barros, Francisco Brasileiro, Giovanni Farias da Silva, Daniele Lezzi, Jacek Cala, Cristina D. Ururahy, Erik Torres, Ignacio Blanquer, Maria Julia de Lima, Rosa M. Badia, Sandro Fiore, Marcos Nobrega, Antônio Tadeu A. Gomes, Francisco Germano de Araújo Neto, and Giovanni Aloisio
- Subjects
Computer science ,business.industry ,Cloud computing ,Computer security ,computer.software_genre ,business ,computer - Abstract
Many e-science initiatives are currently investigating the use of cloud computing to support all kinds of scientific activities. The objective of this chapter is to describe the architecture and the deployment of the EUBrazilCC federated e-infrastructure, a Research & Development project that aims at providing a user-centric test bench enabling European and Brazilian research communities to test the deployment and execution of scientific applications on a federated intercontinental e-infrastructure. This e-infrastructure exploits existing resources that consist of virtualized data centers, supercomputers, and even opportunistically exploited desktops spread over a transatlantic geographic area. These heterogeneous resources are federated with the aid of appropriate middleware that provide the necessary features to achieve the established challenging goals. In order to elicit the requirements and validate the resulting infrastructure, three complex scientific applications have been implemented, which are also presented here.
- Published
- 2016
20. The OFIDIA Fire Danger Rating System
- Author
-
A. Raolil, Giovanni Aloisio, Marco Mancini, Michele Salis, Valentina Bacciu, Sandro Fiore, Costantino Sirca, Andrea Mariello, Alessandra Nuzzo, O. Marra, Maria Mirto, Donatella Spano, Mirto, Maria, Mariello, Andrea, Nuzzo, Alessandra, Mancini, Marco, Raolil, Alessandro, Marra, Osvaldo, Fiore, Sandro, Sirca, Costantino, Salis, Michele, Bacciu, Valentina, Spano, Donatella, and Aloisio, Giovanni
- Subjects
Meteorology ,business.industry ,Wireless sensors network ,Weather forecasting ,computer.software_genre ,Wind speed ,Primary station ,Data visualization ,Geography ,Data acquisition ,Data analytic ,Fire danger index ,Natural hazard ,Data analysis ,Fire behaviour ,business ,computer ,Wireless sensor network - Abstract
Prevention is one of the most important stages in wildfire and other natural hazard management. Fire Danger Rating Systems (FDRSs) have been adopted by many countries to enhance wildfire prevention and suppression planning. With the aim to provide real-Time fire danger forecasts and finer-scale fire behaviour analysis, an operational fire danger prevention platform has been developed within the OFIDIA project (Operational FIre Danger preventIon plAtform). The OFIDIA Fire Danger Rating System platform consists of (1) a data archive for managing weather forecasting and wireless sensors data, (2) a data analytics platform for post-processing weather data and for computing fire danger indices, and (3) a web application system for the visualization of weather and fire index maps and related timeseries. The OFIDIA platform is also connected to a Wireless Sensor Network (WSN) that gathers data from several sites in the Apulia (Italy) and Epirus (Greece) regions. The WSN is made by a primary station and several wireless sensors dislocated in wooded areas, the data acquisition process relates to variables like air temperature, relative humidity, wind speed and direction, precipitation, solar radiation, and fuel moisture.
- Published
- 2015
21. Data issues at the Euro-Mediterranean Centre for Climate Change
- Author
-
Alessandro Negro, Sandro Fiore, Salvatore Vadacca, and Giovanni Aloisio
- Subjects
Metadata ,World Wide Web ,Data collection ,Data grid ,Computer science ,Metadata management ,Dashboard (business) ,Earth and Planetary Sciences(all) ,General Earth and Planetary Sciences ,Petabyte ,Climate change ,Client-side ,Data science - Abstract
Climate Change research is even more becoming a data intensive and oriented scientific activity. Petabytes of climate data, big collections of datasets are continuously produced, delivered, accessed, processed by scientists and researchers at multiple sites at an international level. This work presents the Euro-Mediterranean Centre for Climate Change (CMCC) initiative, discussing data and metadata issues and dealing with both architectural and infrastructural aspects concerning the adopted grid enabled solution. A complete overview of the grid services deployed at the Centre is presented as well as the client side support (CMCC data portal and monitoring dashboard).
- Published
- 2009
22. Near real-time parallel processing and advanced data management of SAR images in grid environments
- Author
-
Massimo Cafaro, Italo Epicoco, Daniele Lezzi, Silvia Mocavero, Sandro Fiore, Giovanni Aloisio, Cafaro, Massimo, Epicoco, Italo, S., Fiore, D., Lezzi, S., Mocavero, and Aloisio, Giovanni
- Subjects
Synthetic aperture radar ,Speedup ,Data grid ,business.industry ,Computer science ,Data management ,Real-time computing ,Grid ,SAR processing, Parallel computing, Data grids ,Software ,Parallel processing (DSP implementation) ,business ,Software architecture ,Information Systems - Abstract
In this paper, we describe the process of parallelizing an existing, production level, sequential Synthetic Aperture Radar (SAR) processor based on the Range-Doppler algorithmic approach. We show how, taking into account the constraints imposed by the software architecture and related software engineering costs, it is still possible with a moderate programming effort to parallelize the software and present an message-passing interface (MPI) implementation whose speedup is about 8 on 9 processors, achieving near real-time processing of raw SAR data even on a moderately aged parallel platform. Moreover, we discuss a hybrid two-level parallelization approach that involves the use of both MPI and OpenMP. We also present GridStore, a novel data grid service to manage raw, focused and post-processed SAR data in a grid environment. Indeed, another aim of this work is to show how the processed data can be made available in a grid environment to a wide scientific community, through the adoption of a data grid service providing both metadata and data management functionalities. In this way, along with near real-time processing of SAR images, we provide a data grid-oriented system for data storing, publishing, management, etc.
- Published
- 2009
23. A Grid-Enabled Protein Secondary Structure Predictor
- Author
-
Sandro Fiore, Maria Mirto, Daniele Tartarini, Giovanni Aloisio, Massimo Cafaro, M., Mirto, Cafaro, Massimo, S., Fiore, Daniele, Tartarini, and Aloisio, Giovanni
- Subjects
Models, Molecular ,neural network ,Computer science ,Biomedical Engineering ,Pharmaceutical Science ,Medicine (miscellaneous) ,Bioengineering ,Machine learning ,computer.software_genre ,Protein Structure, Secondary ,Set (abstract data type) ,User-Computer Interface ,Artificial Intelligence ,Sequence Analysis, Protein ,Computer Simulation ,Electrical and Electronic Engineering ,Web services ,Internet ,Multiple sequence alignment ,Artificial neural network ,business.industry ,Proteins ,Protein structure prediction ,Grid ,Backpropagation ,Computer Science Applications ,protein structure prediction ,Models, Chemical ,Grid computing ,Multilayer perceptron ,Artificial intelligence ,business ,computer ,Algorithms ,Software ,Biotechnology - Abstract
We present an integrated Grid system for the prediction of protein secondary structures, based on the frequent automatic update of proteins in the training set. The predictor model is based on a feed-forward multilayer perceptron (MLP) neural network which is trained with the back-propagation algorithm; the design reuses existing legacy software and exploits novel grid components. The predictor takes into account the evolutionary information found in multiple sequence alignment (MSA); the information is obtained running an optimized parallel version of the PSI-BLAST tool, based on the MPI Master–Worker paradigm. The training set contains proteins of known structure. Using Grid technologies and efficient mechanisms for running the tools and extracting the data, the time needed to train the neural network is dramatically reduced, whereas the results are comparable to a set of well-known predictor tools.
- Published
- 2007
24. The Grid Resource Broker portal
- Author
-
Italo Epicoco, Sandro Fiore, Giovanni Aloisio, Daniele Lezzi, Maria Mirto, Massimo Cafaro, Gabriele Carteni, Silvia Mocavero, Aloisio, Giovanni, Cafaro, Massimo, G., Carteni, Epicoco, Italo, S., Fiore, D., Lezzi, M., Mirto, and S., Mocavero
- Subjects
Grid Portal ,Data grid ,Database ,Grid Computing ,Computer Networks and Communications ,Computer science ,Storage Resource Broker ,PROCESSORS ,INDEPENDENT TASKS ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,World Wide Web ,Semantic grid ,Computational Theory and Mathematics ,Grid computing ,Grid resources ,computer ,Software - Abstract
This paper describes the Grid Resource Broker (GRB), a Grid portal built leveraging a set of high-level, Globus-Toolkit-based Grid libraries called GRB libraries. The portal leverages the Liferay framework to provide users with an intuitive, highly customizable Web GUI. The underlying GRB middleware allows trusted users seamless access to their computational Grid environments. Copyright (c) 2007 John Wiley & Sons, Ltd.
- Published
- 2007
25. SeaConditions: Present and future sea conditions for safer navigation (www.sea-conditions.com)
- Author
-
Giuseppe Turrisi, Davide Rollo, Alessandro D'Anca, Sergio Creti, Gianandrea Mannarini, Paola Agostini, Tony Monacizzo, Sandro Fiore, Leopoldo Fazioli, Antonio Bonaduce, Giovanni Aloisio, Stefania Angela Ciliberti, Luca Tedesco, Andrea Cucco, Ivan Federico, Marina Tonani, Yogesh Kumkar, Cosimo Palazzo, Rita Lecci, Sara Martinelli, Roberto Sorgente, Marco Spagnulo, Mario Scalas, Massimiliano Drudi, Arturo Cavallo, Antonio Olita, Giovanni Coppini, Roberto Bonarelli, Nadia Pinardi, Palmalisa Marra, and Antonio Tumolo
- Subjects
User Friendly ,Service (systems architecture) ,Meteorology ,Situation awareness ,Computer science ,business.industry ,Weather forecasting ,computer.software_genre ,Data science ,Environmental data ,Bathymetry ,Mobile telephony ,business ,computer ,Dissemination - Abstract
Sea Situational Awareness (SSA) is strategically important for management purposes of Italian Seas and coastal areas. The lack of adequate dissemination of marine environmental data and consequent poor knowledge available for operations at sea reduce the response capacity, leading to loss of lives and potential socio-economic damages. The SSA topic is being addressed by "TESSA", an industrial research project funded under the PON "Ricerca & Competitivita 2007–2013" program of Ministero Italiano dell'Istruzione, dell'Universita' e della Ricerca. TESSA is a joint effort of research groups of operational oceanography and scientific computing, and aims to strengthen and consolidate the operational oceanography service and to integrate it with advanced technological platforms in order to disseminate information for the SSA. The first product of TESSA is “SeaConditions”, a public service providing ocean and weather forecasts for the Mediterranean Sea, on the web and mobile applications. Every day, forecasts are produced by operational services, such as the Mediterranean Monitoring and Forecasting Center (www.myocean.eu) for the ocean variables and ECMWF for the atmospheric variables. The service delivers detailed information with high spatial and temporal resolution. Main variables displayed on Google Maps are: bathymetry, weather and oceanographic forecasts and satellite ocean colour data. Ocean forecasts are given at different resolution since nested limited area models for Mediterranean sub-regions are also displayed. SeaConditions provides a user friendly interface with zoom and drag Google Maps' features allowing to display data with different levels of details. SeaConditions' main strength is to provide a single point of access to meteo-marine forecasts, which are based on advanced oceanographic models, remote sensing products and bathymetry, and to deliver high quality information. The SeaConditions products are available through web and mobile channels. The web portal www.sea-conditions.com is compatible with all modern web-browsers on all operating systems. For the mobile users, APPs were also developed to consider the different kind of screens and gesture/interactions. The APPs are available on AppleStore and Google Play.
- Published
- 2015
26. Ophidia: A full software stack for scientific data analytics
- Author
-
Ian Foster, Cosimo Palazzo, Alessandro D'Anca, Sandro Fiore, Dean N. Williams, Giovanni Aloisio, and Donatello Elia
- Subjects
Database ,business.industry ,Computer science ,Big data ,computer.software_genre ,Data cube ,Software analytics ,Workflow ,Software ,Analytics ,Data analysis ,Web service ,business ,computer - Abstract
The Ophidia project aims to provide a big data analytics platform solution that addresses scientific use cases related to large volumes of multidimensional data. In this work, the Ophidia software infrastructure is discussed in detail, presenting the entire software stack from level-0 (the Ophidia data store) to level-3 (the Ophidia web service front end). In particular, this paper presents the big data cube primitives provided by the Ophidia framework, discussing in detail the most relevant and available data cube manipulation operators. These primitives represent the proper foundations to build more complex data cube operators like the apex one presented in this paper. A massive data reduction experiment on a 1TB climate dataset is also presented to demonstrate the apex workflow in the context of the proposed framework.
- Published
- 2014
27. Topic 5: Parallel and Distributed Data Management
- Author
-
Sandro Fiore, Stergios V. Anastasiadis, André Brinkmann, Kostas Magoutis, María S. Pérez-Hernández, and Adrien Lebre
- Subjects
Distributed design patterns ,business.industry ,Distributed algorithm ,Computer science ,Scale (chemistry) ,Data management ,Big data ,Enhanced Data Rates for GSM Evolution ,business ,Data science - Abstract
Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in Big Data, cutting edge technologies and the latest advances in both application development and underlying storage systems can help dealing with data of such magnitude. Especially parallel and distributed approaches will enable new data management solutions that operate effectively at large scale.
- Published
- 2013
28. The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data
- Author
-
Estanislao Gonzalez, Sebastian Denvil, Mark Morgan, Dean N. Williams, Chris A. Mattmann, Luca Cinquini, Zed Pobre, Neill Miller, Daniel J. Crichton, Sandro Fiore, Stephen Pascoe, Rachana Ananthakrishnan, Philip Kershaw, Gavin M. Bell, Bob Drach, Feiyi Wang, Galen M. Shipman, John Harney, and Roland Schweitzer
- Subjects
World Wide Web ,Geospatial analysis ,Grid computing ,Application programming interface ,Computer science ,Node (computer science) ,Interoperability ,Data system ,Earth System Grid ,OpenID ,computer.software_genre ,computer - Abstract
The Earth System Grid Federation (ESGF) is a multi-agency, international collaboration that aims at developing the software infrastructure needed to facilitate and empower the study of climate change on a global scale. The ESGF's architecture employs a system of geographically distributed peer nodes, which are independently administered yet united by the adoption of common federation protocols and application programming interfaces (APIs). The cornerstones of its interoperability are the peer-to-peer messaging that is continuously exchanged among all nodes in the federation; a shared architecture and API for search and discovery; and a security infrastructure based on industry standards (OpenID, SSL, GSI and SAML). The ESGF software is developed collaboratively across institutional boundaries and made available to the community as open source. It has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the entire model output used for the next international assessment report on climate change (IPCC-AR5) and a suite of satellite observations (obs4MIPs) and reanalysis data sets (ANA4MIPs).
- Published
- 2012
29. The GRelC Project: From 2001 to 2011, 10 Years Working on Grid-DBMSs
- Author
-
Sandro Fiore, Alessandro Negro, and Giovanni Aloisio
- Subjects
Security framework ,Computer science ,Command-line interface ,business.industry ,Interoperability ,Grid ,Software engineering ,business ,Database research ,Domain (software engineering) - Abstract
This chapter provides a complete overview on the Grid Relational Catalog (GRelC) Project, a grid database research effort started in 2001 at the University of Salento. The project’s main features, its interoperability with gLite-based production grids, and a relevant show-case in the environmental domain are presented.
- Published
- 2011
30. Grid and Cloud Database Management
- Author
-
Sandro Fiore and Giovanni Aloisio
- Subjects
Cloud computing security ,business.industry ,Computer science ,Cloud computing ,Provisioning ,Virtualization ,computer.software_genre ,World Wide Web ,Utility computing ,Grid computing ,Scalability ,Cloud database ,business ,computer - Abstract
Since the 1990s Grid Computing has emerged as a paradigm for accessing and managing distributed, heterogeneous and geographically spread resources, promising that we will be able to access computer power as easily as we can access the electric power grid. Later on, Cloud Computing brought the promise of providing easy and inexpensive access to remote hardware and storage resources. Exploiting pay-per-use models and virtualization for resource provisioning, cloud computing has been rapidly accepted and used by researchers, scientists and industries. In this volume, contributions from internationally recognized experts describe the latest findings on challenging topics related to grid and cloud database management. By exploring current and future developments, they provide a thorough understanding of the principles and techniques involved in these fields. The presented topics are well balanced and complementary, and they range from well-known research projects and real case studies to standards and specifications, and non-functional aspects such as security, performance and scalability. Following an initial introduction by the editors, the contributions are organized into four sections: Open Standards and Specifications, Research Efforts in Grid Database Management, Cloud Data Management, and Scientific Case Studies.With this presentation, the book serves mostly researchers and graduate students, both as an introduction to and as a technical reference for grid and cloud database management. The detailed descriptions of research prototypes dealing with spatiotemporal or genomic data will also be useful for application engineers in these fields.
- Published
- 2011
31. Data virtualization in grid environments through the GRelC Data Access and Integration Service
- Author
-
Sandro Fiore, Alessandro Negro, and Giovanni Aloisio
- Subjects
Service (systems architecture) ,Database ,Data grid ,Computer science ,business.industry ,Data management ,Enterprise information integration ,computer.software_genre ,Grid ,Data science ,Data access ,Grid computing ,business ,computer ,Data virtualization - Abstract
Grids promote the publication, sharing and integration of scientific data, distributed across Virtual Organizations. The complexity of data management in a grid environment comes from the distribution, heterogeneity, dynamicity and number of data sources. Data virtualization is a fundamental issue to manage in a unified and virtualized manner (from structure, location, access service, performance points of view) data stored into multiple, geographically spread data sources. It represents a key point for distributed data management services which has to be addressed to build high quality production/enterprise oriented services. In this work we talk about the convergence process (among three main data grid services developed in the context of the GRelC Project) that has led to the unified GRelC Data Access and Integration Servce.
- Published
- 2009
32. Advances in the GRelC Data Access Service
- Author
-
Sandro Fiore, Giovanni Aloisio, Salvatore Vadacca, R. Barbera, Emidio Giorgio, Massimo Cafaro, Alessandro Negro, S., Fiore, A., Negro, S., Vadacca, Cafaro, Massimo, Aloisio, Giovanni, R., Barbera, and E., Giorgio
- Subjects
Service (systems architecture) ,Data grid ,business.industry ,Computer science ,Data management ,Interoperability ,computer.software_genre ,Metadata ,World Wide Web ,Data access ,Grid computing ,Metadata management ,business ,computer - Abstract
In a growing number of scientific disciplines, large data collections are emerging as important community resources. Data and metadata management exploiting the data grid paradigm is becoming more and more important as the number of involved data sources is continuously increasing and decentralizing. Efficient grid data access services are perceived as mandatory components for data management. In the grid data management area the GRelC Project has been addressing efficiency, transparency, interoperability and security issues, providing grid enabled solutions and proposing a set of data access and integration/federation services. In this paper we present the advances related to the GRelC Data Access, highlighting differences and innovations w.r.t. previous work. Basic foundations about the grid-enabled queries provided by the GRelC DAS and experimental results related to a bioinformatics international testbed on the GILDA t-Infrastructure are also reported and discussed.
- Published
- 2008
33. A Grid-Based Bioinformatics Wrapper for Biological Databases
- Author
-
Sandro Fiore, Marco Passante, Maria Mirto, Massimo Cafaro, Giovanni Aloisio, M., Mirto, S., Fiore, Cafaro, Massimo, M., Passante, and Aloisio, Giovanni
- Subjects
Biological data ,Information retrieval ,Database ,Data grid ,Computer science ,Flat file database ,Relational database ,computer.software_genre ,Bioinformatics ,Data warehouse ,Data independence ,Data redundancy ,computer ,Data integration - Abstract
With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. A strictly related critical challenge is the fact that data sources often hold the same type of data in a number of different formats; moreover, the formats expected and generated by various data analysis services are often distinct. In bioinformatics the data are often stored in flat files, therefore accessing them to retrieve a subset of records determined by constraints, is slower with respect to other approaches such as relational DBMS. We have developed a data grid system, built on top of specific biological data sources in flat file format, which carries out the ingestion into a relational DBMS for data integration reducing the data redundancy present in the biological flat files. In this work, we describe the prototype for the ingestion in a relational DBMS of the Swiss-2D PAGE flat file.
- Published
- 2008
34. The GRelC Portal: A Ubiquitous and Seamless Way to Manage Grid Databases
- Author
-
E. Verdesca, Sandro Fiore, A. Leone, Salvatore Vadacca, Alessandro Negro, and Giovanni Aloisio
- Subjects
Ubiquitous computing ,Database ,Data grid ,business.industry ,Computer science ,computer.software_genre ,Grid ,World Wide Web ,Metadata ,Data access ,Grid computing ,Web page ,Web application ,business ,computer - Abstract
Grid portals are web gateways aiming at providing a pervasive and ubiquitous access in grid to computational resources, tools, instruments, datasets and metadata via standard Web protocols. Moreover, they provide enhanced problem solving capabilities to deal with modern, large scale scientific and engineering problems. Data grid management systems are becoming increasingly important in the context of the recently adopted service oriented paradigm. The grid relational catalog (GRelC) project is working towards ubiquitous, integrated, seamless and comprehensive grid database management solutions. This paper describes the GRelC Portal, a web based grid-enabled solution for grid-database access, management and integration built on top of the GRelC Data Access Service.
- Published
- 2008
35. iGRelC: A Dashboard Implementation for Grid Environments
- Author
-
Alessandro Negro, Giovanni Aloisio, Sandro Fiore, Salvatore Vadacca, Fiore, Sandro Luigi, Negro, Alessandro, S., Vadacca, and Aloisio, Giovanni
- Subjects
Database ,Distributed database ,Computer science ,business.industry ,Process (engineering) ,Data management ,Dashboard (business) ,computer.software_genre ,Grid ,Grid computing ,Accounting information system ,TeraGrid ,business ,computer - Abstract
Nowadays production grids such as EGEE, Teragrid, DEISA adopt several tools in order to monitor jobs, check the status of the grid, manage accounting information, etc. Anyway, from the end-user perspective, monitoring the global status of the grid taking into account machines, networks, services, databases, job, etc. is not straightforward, uniform, and tightly coupled. What we present in this paper is the iGRelC dashboard, an integrated approach able to retrieve, process and display information coming from different data sources (both relational and non-relational) and published in grid by heterogeneous systems and services.
- Published
- 2008
36. Design and Implementation of a Grid Computing Environment for Remote Sensing
- Author
-
Giovanni Aloisio, Italo Epicoco, Massimo Cafaro, Sandro Fiore, Gianvito Quarta, A. PLAZA AND C. CHANG EDS, Cafaro, Massimo, Epicoco, Italo, Quarta, G., Fiore, Sandro Luigi, and Aloisio, Giovanni
- Subjects
Remote Sensing ,Grid computing ,Grid Computing ,Computer science ,Remote sensing (archaeology) ,Real-time computing ,computer.software_genre ,computer - Abstract
This chapter presents an overview of a Grid Computing Environment de- signed for remote sensing. Combining recent grid computing technologies, concepts related to problem solving environments and high performance com- puting, we show how a dynamic Earth Observation system can be designed and implemented, with the goal of management of huge quantities of data coming from space missions and for their on-demand processing and deliver- ing to final users.
- Published
- 2007
37. High Throughput Protein Similarity Searches in the LIBI Grid Problem Solving Environment
- Author
-
Rita Casadio, Giovanni Aloisio, Ivan Rossi, Maria Mirto, Sandro Fiore, Piero Fariselli, Italo Epicoco, Mirto M., Rossi I., Epicoco I., Fiore S., Fariselli P., Casadio R., Aloisio G., P. Thulasiraman, X. He, T. Li Xu, M. K. Denko, R. K. Thulasiram, L. T. Yang, Mirto, M, Rossi, I, Epicoco, Italo, Fiore, S, Fariselli, P, Casadio, R, and Aloisio, Giovanni
- Subjects
Bioinformatics, Protein Similarity Searches ,Bioinformatics requirements, Complex applications, High computing power, Problem Solving Environment (PSE) ,Computer science ,Scale (chemistry) ,Distributed computing ,Integration platform ,Problem solving environment ,Biological database ,Grid ,Supercomputer ,Throughput (business) - Abstract
Bioinformatics applications are naturally distributed, due to distribution of involved data sets, experimental data and biological databases. They require high computing power, owing to the large size of data sets and the complexity of basic computations, may access heterogeneous data, where heterogeneity is in data format, access policy, distribution, etc., and require a secure infrastructure, because they could access private data owned by different organizations. The Problem Solving Environment (PSE) is an approach and a technology that can fulfil such bioinformatics requirements. The PSE can be used for the definition and composition of complex applications, hiding programming and configuration details to the user that can concentrate only on the specific problem. Moreover, Grids can be used for building geographically distributed collaborative problem solving environments and Grid aware PSEs can search and use dispersed high performance computing, networking, and data resources. In this work, the PSE solution has been chosen as the integration platform of bioinformatics tools and data sources. In particular an experiment of multiple sequence alignment on large scale, supported by the LIBI PSE, is presented.
- Published
- 2007
38. GReIC Data Storage: A Lightweight Disk Storage Management Solution for Bioinformatics 'in silico' Experiments
- Author
-
Maria Mirto, Sandro Fiore, Massimo Cafaro, Giovanni Aloisio, Fiore, Sandro, Mirto, Maria, Cafaro, Massimo, Aloisio, Giovanni, S., Fiore, and M., Mirto
- Subjects
Bioinformatic ,Middleware ,Database ,Computer science ,business.industry ,Security of data ,Societies and institution ,Lightweight disk storage ,Information repository ,computer.software_genre ,Grid ,Bioinformatics ,Standard ,Virtual reality ,Shared resource ,Data storage equipment ,Bioinformatics data ,Converged storage ,Computer data storage ,Publication ,Grid energy storage ,Disk storage ,business ,computer - Abstract
Data grids are middleware systems that offer secure shared storage of massive scientific datasets over wide area networks. In this paper we describe the GReIC Data Storage, a novel grid storage service which has been developed within the Grid Relational Catalog (GReIC) Project. The aim of this service is to manage efficiently, securely and transparently collections of bioinformatics data concerning "in silico" experiments on the grid promoting flexible, secure and coordinated storage resource sharing and publication across virtual organizations, taking into account current grid standards and specifications.
- Published
- 2007
39. A services oriented system for bioinformatics applications on the grid
- Author
-
Giovanni, Aloisio, Massimo, Cafaro, Italo, Epicoco, Sandro, Fiore, and Maria, Mirto
- Subjects
Access to Information ,Proteomics ,Internet ,Italy ,Computational Biology ,Medical Informatics ,Problem Solving - Abstract
This paper describes the evolution of the main services of the ProGenGrid (ProteomicsGenomics Grid) system, a distributed and ubiquitous grid environment ("virtual laboratory"), based on Workflow and supporting the design, execution and monitoring of "in silico" experiments in bioinformatics.ProGenGrid is a Grid-based Problem Solving Environment that allows the composition of data sources and bioinformatics programs wrapped as Web Services (WS). The use of WS provides ease of use and fosters re-use. The resulting workflow of WS is then scheduled on the Grid, leveraging Grid-middleware services. In particular, ProGenGrid offers a modular bag of services and currently is focused on the biological simulation of two important bioinformatics problems: prediction of the secondary structure of proteins, and sequence alignment of proteins. Both services are based on an enhanced data access service.
- Published
- 2007
40. A Split & Merge Data Management Architecture for a Grid Environment
- Author
-
Massimo Cafaro, Sandro Fiore, Giovanni Aloisio, and Maria Mirto
- Subjects
Data grid ,Database ,Computer science ,business.industry ,Distributed computing ,Data management ,Interoperability ,computer.software_genre ,Grid ,Supercomputer ,Grid computing ,Web service ,business ,Space-based architecture ,computer - Abstract
Currently several applications produce huge amount of data making them available for post-processing operations in order to infer new knowledge. Main issues of these applications are the need for efficient mechanisms to access data and high performance computing to obtain the results in an acceptable time. Wrapping the applications as Web services allows interoperability with others tools and in particular with grid computing environments exploiting a large set of resources through a standard interface, to support the requirements of so-called "data intensive" applications that handle large amounts of data. This paper presents the architecture of a complex data managements system leveraging the grid computing paradigm, exploiting existing middleware developed at the University of Lecce within the ProGenGrid, GReIC and GRB projects to support high throughput applications. This architecture has been specialized for a bioinformatics domain and a case study of the use of a biological application will be also described
- Published
- 2006
41. A web service-based Grid portal for Edgebreaker compression
- Author
-
Giovanni Aloisio, Maria Cristina Barba, Sandro Fiore, Massimo Cafaro, Euro Blasi, Maria Mirto, Aloisio, Giovanni, Barba, Mc, Blasi, E, Cafaro, Massimo, Fiore, S, and Mirto, M.
- Subjects
Databases, Factual ,Medical Records Systems, Computerized ,Teleradiology ,Computer science ,Interoperability ,Information Storage and Retrieval ,Health Informatics ,computer.software_genre ,Edgebreaker algorithm ,Imaging, Three-Dimensional ,Health Information Management ,Humans ,Program Development ,Protocol (object-oriented programming) ,Advanced and Specialized Nursing ,Distributed Computing Environment ,Internet ,Database ,business.industry ,Grid portal ,Object (computer science) ,Grid ,Computational Grid ,Visualization ,Systems Integration ,Radiology Information Systems ,Computer architecture ,Globus toolkit ,Italy ,Database Management Systems ,The Internet ,Web service ,business ,computer ,Web Service ,Algorithms - Abstract
Summary Background: In health applications, and elsewhere, 3D data sets are increasingly accessed through the Internet. To reduce the transfer time while maintaining an unaltered 3D model, adequate compression and decompression techniques are needed. Recently, Grid technologies have been integrated with Web Services technologies to provide a framework for interoperable application-to-application interaction. Objectives: The paper describes an implementation of the Edgebreaker compression technique exploiting web services technology and presents a novel approach for using such services in a Grid Portal. The Grid portal, developed at the CACT/ISUFI of the University of Lecce, allows the processing and delivery of biomedical images (CT – computerized tomography – and MRI – magnetic resonance images) in a distributed environment, using the power and security of computational Grids. Methods: The Edgebreaker Compression Web Service has been deployed on a Grid portal and allows compressing and decompressing 3D data sets using the Globus toolkit GSI (Globus Security Infrastructure) protocol. Moreover, the classical algorithm has been modified extending the compression to files containing more than one object. Results and Conclusions: An implementation of the Edgebreaker compression technique and related experimental results are presented. A novel approach for using the compression web service in a Grid portal allowing storing and preprocessing of huge 3D data sets, and subsequent efficient transmission of results for remote visualization is also described.
- Published
- 2005
42. ProGenGrid: a grid-enabled platform for bioinformatics
- Author
-
Giovanni, Aloisio, Massimo, Cafaro, Sandro, Fiore, and Maria, Mirto
- Subjects
Proteomics ,Internet ,Italy ,Computer Systems ,Drug Design ,Computational Biology ,Humans ,Genomics ,Information Systems - Abstract
In this paper we describe the ProGenGrid (Proteomics and Genomics Grid) system, developed at the CACT/ISUFI of the University of Lecce which aims at providing a virtual laboratory where e-scientists can simulate biological experiments, composing existing analysis and visualization tools, monitoring their execution, storing the intermediate and final output and finally, if needed, saving the model of the experiment for updating or reproducing it. The tools that we are considering are software components wrapped as Web Services and composed through a workflow. Since bioinformatics applications need to use high performance machines or a high number of workstations to reduce the computational time, we are exploiting a Grid infrastructure for interconnecting wide-spread tools and hardware resources. As an example, we are considering some algorithms and tools needed for drug design, providing them as services, through easy to use interfaces such as the Web and Web service interfaces built using the open source gSOAP Toolkit, whereas as Grid middleware we are using the Globus Toolkit 3.2, exploiting some protocols such as GSI and GridFTP.
- Published
- 2005
43. A grid-based architecture for earth observation data access
- Author
-
Sandro Fiore, Massimo Cafaro, Gianvito Quarta, Giovanni Aloisio, Aloisio, Giovanni, Cafaro, Massimo, S., Fiore, and G., Quarta
- Subjects
Service (systems architecture) ,Earth observation ,Data access ,Geospatial analysis ,Grid computing ,Database ,Computer science ,Architecture ,computer.software_genre ,Grid ,computer - Abstract
A huge quantity of Earth Observation (EO) and geospatial data is daily produced by several organizations. These heterogeneous data are very useful in several scientific, civil, military and industrial applications.Securely and transparently storing, managing and accessing this huge quantity of data spread over distributed systems is a challenging problem. Grid computing offers today a way to achieve secure access to geographically spread storage and computational resources.In this paper we present the Distributed Earth Observation System Information Service (DEOSIS) a distributed information service, developed by CACT/ISUFI at the University of Lecce which aims at managing and accessing EO and geospatial heterogeneous data sources, in a grid environment.
- Published
- 2005
44. iGrid, a Novel Grid Information Service
- Author
-
Giovanni Aloisio, Massimo Cafaro, Silvia Mocavero, Sandro Fiore, Italo Epicoco, Maria Mirto, Daniele Lezzi, Sloot P.M.A.,Hoekstra A.G.,Priol T.,Reinefeld A.,Bubak M., Aloisio, Giovanni, Cafaro, Massimo, Epicoco, Italo, S., Fiore, D., Lezzi, M., Mirto, and S., Mocavero
- Subjects
Service (systems architecture) ,Database ,Grid Computing ,Computer science ,Dynamic data ,Distributed computing ,Mutual authentication ,computer.software_genre ,Grid ,Information Service ,Scalability ,Relational model ,Web service ,computer - Abstract
In this paper we describe iGrid, a novel Grid Information Service based on the relational model. iGrid is developed within the European GridLab project by the ISUFI Center for Advanced Computational Technologies (CACT) of the University of Lecce, Italy. Among iGrid requirements there are security, decentralized control, support for dynamic data and the possibility to handle user ' s and/or application supplied information, performance and scalability. The iGrid Information Service has been specified and carefully designed to meet these requirements.
- Published
- 2005
45. Resource and Service Discovery in the iGrid Information Service
- Author
-
Silvia Mocavero, Giovanni Aloisio, Italo Epicoco, Daniele Lezzi, Massimo Cafaro, Maria Mirto, Sandro Fiore, Gervasi, O, Gavrilova, ML, Kumar, V, Lagana, A, Lee, HP, Mun, Y, Taniar, D, Tan, CJK, Aloisio, Giovanni, Cafaro, Massimo, Epicoco, Italo, S., Fiore, D., Lezzi, M., Mirto, and S., Mocavero
- Subjects
World Wide Web ,Service (systems architecture) ,Computer science ,Testbed ,Scalability ,Service discovery ,Information system ,Resource management ,Web service ,computer.software_genre ,Grid ,computer ,Data modeling - Abstract
In this paper we describe resource and service discovery mechanisms available in iGrid, a novel Grid Information Service based on the relational model. iGrid is developed within the GridLab project by the ISUFI Center for Advanced Computational Technologies (CACT) at the University of Lecce, Italy and it is deployed on the European GridLab testbed. The GridLab Information Service provides fast and secure access to both static and dynamic information through a GSI enabled web service. Besides publishing system information, iGrid also allow publication of user’s or service supplied information. The adoption of the relational model provides a flexible model for data, and the hierarchical distributed architecture provides scalability and fault tolerance.
- Published
- 2005
46. Web services for a biomedical imaging portal
- Author
-
Giovanni Aloisio, Daniele Lezzi, Massimo Cafaro, Euro Blasi, Sandro Fiore, Maria Mirto, Aloisio, Giovanni, Cafaro, Massimo, E., Blasi, M., Mirto, S., Fiore, and D., Lezzi
- Subjects
Web standards ,medicine.medical_specialty ,Web development ,business.industry ,Computer science ,WS-I Basic Profile ,computer.software_genre ,Web application security ,World Wide Web ,medicine ,Web mapping ,Web service ,business ,computer ,Web modeling ,Data Web - Abstract
TACWeb (TAC images on the Web), is a Web-based Grid portal, developed at the CACT/ISUFI laboratory of the University of Lecce for the management of biomedical images in a distributed environment. TACWeb, building on top of the Globus Toolkit, is an interactive environment that deals with complex user's requests, regarding the acquisition of biomedical data, the "processing" and "delivering" of biomedical images, using the power and security of computational grids. Recently, Grid technologies are being integrated with Web services technologies to provide a framework for interoperable application-to-application interaction. In this paper we present an evolution of the TACWeb architecture that is compliant with the Web services approach and its main functionalities. In such a system, the basic capabilities are encapsulated and exposed as Web services allowing the development of new health applications as a composition of such services.
- Published
- 2004
47. The GRelC library: a basic pillar in the grid relational catalog architecture
- Author
-
Massimo Cafaro, Giovanni Aloisio, Sandro Fiore, Maria Mirto, Aloisio, Giovanni, Cafaro, Massimo, S., Fiore, and M., Mirto
- Subjects
SQL ,Database ,Data grid ,Relational database ,computer.internet_protocol ,Computer science ,computer.software_genre ,Grid ,Technology management ,World Wide Web ,Grid computing ,Middleware (distributed applications) ,computer ,XML ,computer.programming_language - Abstract
Today many data grid applications need to manage and process a very large amount of data distributed across multiple grid nodes and stored in relational databases. The Grid Relational Catalog Project (GRelC) developed at the CACT/ISUFI of the University of Lecce, represents an attempt to design and deploy a grid-DBMS for the Globus Community. In this paper, after defining the grid-DBMS concept, we describe the GRelC library which is layered on top of the Globus Toolkit. The user can build client applications on top of it that can easily get access to and interact with data resources.
- Published
- 2004
48. A grid environment for diesel engine chamber optimization
- Author
-
Sandro Fiore, Silvia Mocavero, Euro Blasi, Italo Epicoco, Massimo Cafaro, Giovanni Aloisio, Aloisio, Giovanni, E., Blasi, Cafaro, Massimo, Epicoco, Italo, S., Fiore, and S., Mocavero
- Subjects
Distributed Computing Environment ,Computer engineering ,Grid computing ,Computer science ,Real-time computing ,Process (computing) ,Fuel efficiency ,Performance improvement ,Grid ,Diesel engine ,computer.software_genre ,Global optimization ,computer - Abstract
The goal of this paper is to show that computer modelling techniques can be used to solve the real life problem of Diesel Engine performance improvement. The purpose of our work is to achieve the lowest emissions level and improved fuel efficiency with respect to European Norm for emissions. Particularly, we are interested to reduce NO + HC and soot emissions and to maximize the PMI, a pressure proportional to engine power. These parameters depend on combustion chamber geometry too, so we propose its optimization turning to the new micro Genetic Algorithms (micro-GA) technique. The idea consists in the automated random generation of a lot of meshes. each of these representing a different chamber geometry, with respect to some common geometric constraints: then in the use of the micro-GA to optimize, at each iteration, the results obtained in the previous steps. The innovative feature of our work is the multiobjective nature of the optimization process. This is the main reason to chose micro-GA rather than simple Genetic Agorithms. Emissions level and fuel efficiency can be evaluated using a modified version of KIVA3 code that outputs three values, each of these related to one of the three specific fitness functions to be maximized. The optimization process involves the execution of a lot of KIVA3 simulations to calculate fitness values of the chamber geometries taken in consideration during all of the optimization steps. We propose the use of Grid Computing technologies to increase the performance of the KIVA-micro-GA, showing how a distributed environment allows to reduce the computational time needed by the optimization process, taking advantage of intrinsic parallelism of micro-GA. In fact, their structure allows executing simultaneously KIVA3 simulations over the random meshes and over the geometries that populate the micro-population at each iteration. The services offered by the system are the micro-GA parameters definition. the submission of the optimization process and the monitoring of the process status. A trusted user can access the implemented services using a grid portal, called DESGrid (Grid for Diesel Engine Simulation). The analysis of the results, achieved by execution of KIMA-micro-GA on three ES40 Compaq nodes, each one equipped by four processors, shows a good reduction in both emissions and fuel consumption. In the paper we show numerical values and related geometries representation obtained after the first steps of the global optimization process execution.
- Published
- 2004
49. Special Issue on Advances in High Performance Computing and Simulation
- Author
-
Waleed W. Smari, Sandro Fiore, and Mads Nygaard
- Subjects
Computational Theory and Mathematics ,Computer architecture ,Computer Networks and Communications ,Computer science ,Supercomputer ,Software ,Computer Science Applications ,Theoretical Computer Science - Published
- 2012
50. A semantic grid-based data access and integration service for bioinformatics
- Author
-
Italo Epicoco, Massimo Cafaro, Giovanni Aloisio, Sandro Fiore, Maria Mirto, Aloisio, Giovanni, Cafaro, Massimo, Epicoco, Italo, Sandro, Fiore, and Maria, Mirto
- Subjects
Biological data ,Semantic grid ,Data access ,Workflow ,Grid computing ,Computer science ,Web service ,Semantic data model ,computer.software_genre ,Bioinformatics ,computer ,Data integration - Abstract
Given the heterogeneous nature of biological data and their intensive use in many tools, in this paper we propose a semantic data access and integration (DAI) service, based on the grid paradigm, for the bioinformatics domain. This service uses ontologies for correlating different data sets. The DAI proposed in this work is a fundamental component of the ProGenGrid system, a grid-enabled platform, which aims at the design and implementation of a virtual laboratory where e-scientists could simulate complex "in silico" experiments, composing some popular analysis and visualization tools (e.g. Blast and Rasmol) available as Web services, into a workflow. The main goal of the DAI is to provide bioinformatics tools with advanced functionalities and data integration services for heterogeneous biological data banks, such as PDB and Swiss-Prot. A case study of our specialized data access service for locating similar protein sequences is presented.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.