237 results on '"TeraGrid"'
Search Results
102. Cyberinfrastructure Usage Modalities on the TeraGrid
- Author
-
John-Paul Navarro, Chris Jordan, David Hart, Warren Smith, Von Welch, Nancy Wilkins-Diehr, John Towns, Amit Majumdar, and Daniel S. Katz
- Subjects
Modalities ,Cyberinfrastructure ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Grid computing ,Computer science ,Component (UML) ,TeraGrid ,Tera ,computer.software_genre ,Grid ,Data science ,computer - Abstract
This paper is intended to explain how the Tera Grid would like to be able to measure "usage modalities." We would like to (and are beginning to) measure these modalities to understand what objectives our users are pursuing, how they go about achieving them, and why, so that we can make changes in the Tera Grid to better support them.
- Published
- 2011
103. A simulation framework for reconfigurable processors in large-scale distributed systems
- Author
-
S. Arash Ostadzadeh, Koen Bertels, Stephan Wong, Muhammad Nadeem, and M. Faisal Nadeem
- Subjects
Computer science ,Distributed computing ,Control reconfiguration ,010103 numerical & computational mathematics ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Reconfigurable computing ,Scheduling (computing) ,Grid computing ,Distributed algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Resource management ,TeraGrid ,0101 mathematics ,computer - Abstract
The inclusion of reconfigurable processors in distributed grid systems promises to offer increased performance without compromising flexibility. Consequently, these large-scale distributed grid systems (such as TeraGrid) are utilizing reconfigurable computing resources next to general-purpose processors (GPPs) in their computing nodes. The near-optimal utilization of resources in such distributed systems considerably depends on the resource management and the application task scheduling. Many state-of-the-art simulators for application scheduling simulation in distributed computing systems have been proposed. However, there is no dedicated simulation framework to study the behavior of reconfigurable nodes in grids. The incorporation of reconfigurable nodes in these systems requires to take into account reconfigurable hardware characteristics, such as, area utilization, performance increase, reconfiguration time, and time to transfer configuration bit streams, execution code, and data. Many of these characteristics are not taken into account by traditional simulators. In this paper, we present a simulation framework for reconfigurable processors in large-scale distributed systems. It is capable of modeling reconfigurable nodes, processor configurations, and tasks in a distributed system. Furthermore, as part of the verification of the framework, we implemented a dynamic task scheduling algorithm with support for the scheduling of tasks on reconfigurable nodes. A number of experiments with various simulation parameters were conducted. The results show an expected trend. We also present a thorough discussion of the results.
- Published
- 2011
- Full Text
- View/download PDF
104. NSF Launches TeraGrid for Academic Research
- Author
-
Jeffrey Mervis
- Subjects
Engineering ,Engineering management ,Multidisciplinary ,business.industry ,TeraGrid ,business - Abstract
ADVANCED COMPUTINGThe National Science Foundation (NSF) last week launched what will be the nation's most powerful network for scientific computing. NSF has pledged $53 million to four U.S. research institutions and their commercial partners to build and operate a system expected to be up and running by 2003: the Distributed Terascale Facility, taken from its targeted capacity to perform trillions of floating-point operations per second and store hundreds of terabytes of data.
- Published
- 2001
105. Global Dimension of CI: Compete or Collaborate
- Author
-
Bement, Arden L, Jr.
- Subjects
CI Days ,Other Physical Sciences and Mathematics ,Statistics and Probability ,Databases and Information Systems ,purdue university ,Computer Sciences ,Physics ,cyberinfrastructure ,Software Engineering ,collabrative research ,Dr. Arden L. Bement ,research collabration ,Arden Bement ,Teragrid ,global policy institute ,GPRI ,national science foundation ,NSF ,Mathematics - Published
- 2010
106. Grid Appliance — On the design of self-organizing, decentralized grids
- Author
-
Arjun Prakash, Renato Figueiredo, and David Isaac Wolinsky
- Subjects
Semantic grid ,Research groups ,Grid computing ,Computer science ,Software deployment ,Distributed computing ,Middleware ,Resource management ,TeraGrid ,computer.software_genre ,Grid ,computer ,Scheduling (computing) - Abstract
“Give a man a fish, feed him for a day. Teach a man to fish, feed him for a lifetime” — Lau Tzu Grid computing projects such as TeraGrid [1], Grid'5000 [2], and OpenScience Grid [3] provide researchers access to vast amounts of compute resources, but in doing so, require the adaption of their workloads to the environments provided by these systems. Researchers do not have many alternatives as creating these types of systems involve coordination of distributed systems and expertise in networking, operating systems, security, and grid middleware. This results in many research groups creating small, in-house compute clusters where scheduling is often ad-hoc, thus limiting effective resource utilization. To address these challenges we present the “Grid Appliance.” The “Grid Appliance” enables researchers to seamlessly deploy, extend, and share their systems both locally and across network domains for both small and large scale computing grids. This paper details the design of the Grid Appliance and reports on experiences and lessons learned over four years of development and deployment involving wide-area grids.
- Published
- 2010
107. Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability
- Author
-
Victor Hazlewood, Troy Baer, Patricia Kovatch, and Phil Andrews
- Subjects
Computer science ,Operating system ,Processor scheduling ,TeraGrid ,Parallel computing ,computer.software_genre ,Supercomputer ,computer ,Cray XT5 ,Scheduling (computing) - Abstract
In late 2009, the National Institute for Computational Sciences placed in production the world’s fastest academic supercomputer (third overall), a Cray XT5 named Kraken, with almost 100,000 compute cores and a peak speed in excess of one Petaflop. Delivering over 50% of the total cycles available to the National Science Foundation users via the TeraGrid, Kraken has two missions that have historically proven difficult to simultaneously reconcile: providing the maximum number of total cycles to the community, while enabling full machine runs for “hero” users. Historically, this has been attempted by allowing schedulers to choose the correct time for the beginning of large jobs, with a concomitant reduction in utilization. At NICS, we used the results of a previous theoretical investigation to adopt a different approach, where the “clearing out” of the system is forced on a weekly basis, followed by consecutive full machine runs. As our previous simulation results suggested, this lead to a significant improvement in utilization, to over 90%. The difference in utilization between the traditional and adopted scheduling policies was the equivalent of a 300+ Teraflop supercomputer, or several million dollars of compute time per year.
- Published
- 2010
108. Cyberaide onServe: Software as a Service on Production Grids
- Author
-
Wolfgang Karl, David Kramer, Gregor von Laszewski, Tobias Kurze, Jie Tao, Lizhe Wang, and Marcel Kunze
- Subjects
Virtual appliance ,Computer science ,business.industry ,Software as a service ,Cloud computing ,computer.software_genre ,Grid ,Grid computing ,Middleware ,Middleware (distributed applications) ,Operating system ,TeraGrid ,Web service ,business ,computer - Abstract
The Software as a Service (SaaS) methodology is a key paradigm of Cloud computing. In this paper, we focus on an interesting topic – to implement a Cloud computing functionality, the SaaS model, on existing production Grid infrastructures. In general, production Grids employ a Job-Submission-Execution (JSE) model with rigid access interfaces. In this paper we develop the Cyberaide onServe, a lightweight middleware with a virtual appliance. The Cyberaide onServe implements the SaaS methodology on production Grids by translating the SaaS model to the JSE model. The Cyberaide onServe virtual appliance is deployed on demand, hosts applications as Web services, accepts Web service invocations, and finally the Cyberaide onServe executes them on production Grids. We have deployed the Cyberaide onServe on the TeraGrid infrastructure and test results show Cyberaide onServe can provide the SaaS functionality with good performance.
- Published
- 2010
109. nanoHUB.org Serving Over 120,000 Users Worldwide: It's First Cyber-Environment Assessment
- Author
-
George B. Adams, Swaroop Shivarajapura, Diane L. Beaudoin, Krishna Madhavan, and Gerhard Klimeck
- Subjects
Open science ,Computer science ,Virtual organization ,assessment ,computer.software_genre ,Grid ,Nanoscience and Nanotechnology ,World Wide Web ,nanoHUB.org ,Grid computing ,Leverage (negotiation) ,cyber-environment ,TeraGrid ,Computer aided instruction ,computer - Abstract
nanoHUB.org is a major engineering cyber- environment that annually supports over 120,000 users with online simulation and more. Over 8,500 nanoscale engineering and science researchers, educators, and learners run over 340,000 simulations with over 170 simulation tools annually. These tools allows them to transparently and interactively leverage a range of computational resources ranging from small jobs to massive simulations that execute on the Teragrid or the Open Science Grid (OSG). In this paper, we provide some background into the working of nanoHUB as a virtual organization and a cyber- environment and describe its growth pattern focusing on the mechanisms that allow the formation of a community around it.
- Published
- 2010
110. Open grid computing environments
- Author
-
Raminder Singh, Marlon Pierce, Suresh Marru, Archit Kulshrestha, and Karthik Narayna Muthuraman
- Subjects
Flexibility (engineering) ,business.industry ,Computer science ,Gateway (computer program) ,computer.software_genre ,Job management ,World Wide Web ,Workflow ,Software ,Grid computing ,TeraGrid ,Software engineering ,business ,computer - Abstract
We describe three case studies for providing advanced support for TeraGrid Science Gateways as part of our participation in the Advanced User Support (AUS) team. These case studies include providing workflow support, robust job management, and mass job submission to existing gateways targeting computational chemistry, biophysics, and bioinformatics, respectively. Selected tools from the Open Grid Computing Environments and other projects were used, demonstrating the need for flexibility when integrating tools from multiple software providers into specific gateways' software stacks.
- Published
- 2010
111. Accelerating data-intensive science with Gordon and Dash
- Author
-
Michael L. Norman and Allan Snavely
- Subjects
Software ,Shared virtual memory ,Resource (project management) ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Computer science ,business.industry ,Dash ,Operating system ,TeraGrid ,Architecture ,business ,computer.software_genre ,computer - Abstract
In 2011 SDSC will deploy Gordon, an HPC architecture specifically designed for data-intensive applications. We describe the Gordon architecture and the thinking behind the design choices by considering the needs of two targeted application classes: massive database/data mining and data-intensive predictive science simulations. Gordon employs two technologies that have not been incorporated into HPC systems heretofore: flash SSD memory, and virtual shared memory software. We report on application speedups obtained with a working prototype of Gordon in production at SDSC called Dash, currently available as a TeraGrid resource.
- Published
- 2010
112. TeraGrid Science Gateway AAAA Model
- Author
-
Von Welch, Jim Basney, and Nancy Wilkins-Diehr
- Subjects
World Wide Web ,Computer science ,Public key infrastructure ,TeraGrid ,Model implementation ,Science gateway - Abstract
In this paper, we present our experience implementing on the TeraGrid the "Science Gateway AAAA Model" we proposed in our 2005 paper. We describe how we have modified the model based on our experiences, the details of our implementation, an update on the open issues we identified in our paper, and our lessons learned.
- Published
- 2010
113. A compelling case for a centralized filesystem on the TeraGrid
- Author
-
Scott Michael, W. B. Breckenridge, Matthew R. Link, Stephen C. Simms, and Roger Smith
- Subjects
Software_OPERATINGSYSTEMS ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Database ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,computer.software_genre ,Workflow ,Wide area network ,Data_FILES ,Operating system ,Statistical analysis ,Lustre (file system) ,TeraGrid ,computer - Abstract
In this article we explore the utility of a centralized filesystem provided by the TeraGrid to both TeraGrid and non-TeraGrid sites. We highlight several common cases in which such a filesystem would be useful in obtaining scientific insight. We present results from a test case using Indiana University's Data Capacitor over the wide area network as a central filesystem for simulation data generated at multiple TeraGrid sites and analyzed at Mississippi State University. Statistical analysis of the I/O patterns and rates, via detailed trace records generated with VampirTrace, for both the Data Capacitor and a local Lustre filesystem are provided. The benefits of a centralized filesystem and potential hurdles in adopting such a system for both TeraGrid and non-TeraGrid sites are discussed.
- Published
- 2010
114. Enabling Lustre WAN for production use on the TeraGrid
- Author
-
Joshua Walgenbach, Justin P. Miller, Stephen C. Simms, and Kit Westneat
- Subjects
Software_OPERATINGSYSTEMS ,business.industry ,Computer science ,Permission ,computer.software_genre ,Visualization ,Software ,Wide area ,Operating system ,Lustre (file system) ,TeraGrid ,business ,computer ,Research data - Abstract
The Indiana University Data Capacitor wide area Lustre file system provides over 350 TB of short- to mid-term storage of large research data sets. It spans multiple geographically distributed compute, storage, and visualization resources. In order to effectively harness the power of these resources from various institutions, it has been necessary to develop software to keep ownership and permission data consistent across many client mounts. This paper describes the Data Capacitor's Lustre WAN service and the history, development, and implementation of IU's UID mapping scheme that enables Lustre WAN on the TeraGrid.
- Published
- 2010
115. Kerberized Lustre 2.0 over the WAN
- Author
-
Josephine Palencia, Robert Budden, and Kevin Sullivan
- Subjects
World Wide Web ,computer.internet_protocol ,Computer science ,Operating system ,Lustre (file system) ,TeraGrid ,Kerberos ,Grid ,computer.software_genre ,computer ,Naval research - Abstract
In this paper, we describe our current implementation of kerberized Lustre 2.0 over the WAN with partners from the Teragrid (SDSC), the Naval Research Lab, and the Open Science Grid (University of Florida). After formulating several single kerberos realms, we enable the distributed OSTs over the WAN, create local OST pools, and perform kerberized data transfers between local and remote sites. To expand the accessibility to the lustre filesystem, we also include our efforts towards cross-realm authentication and integration of Lustre 2.0 with the kerberos-enabled NFS4.
- Published
- 2010
116. DASH-IO
- Author
-
Jeffrey Bennett, Jiahua He, and Allan Snavely
- Subjects
Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,Software ,Empirical research ,business.industry ,Computer science ,Embedded system ,Dash ,Solid-state ,TeraGrid ,Latency (engineering) ,Architecture ,business - Abstract
HPC applications are becoming more and more data-intensive as a function of ever-growing simulation sizes and burgeoning data-acquisition. Unfortunately, the storage hierarchy of the existing HPC architecture has a 5-order-of-magnitude latency gap between main memory and spinning disks and cannot respond to the new data challenge well. Flash-based SSDs (Solid State Disks) are promising to fill the gap with their 2-order-of-magnitude lower latency. However, since all the existing hardware and software were designed without flash in mind, the question is how to integrate the new technology into existing architectures. DASH is a new Teragrid resource aggressively leveraging flash technology (and also distributed shared memory technology) to fill the latency gap. To explore the potentials and issues of integrating flash into today's HPC systems, we swept a large parameter space by fast and reliable measurements to investigate varying design options. We here provide some lessons we learned and also suggestions for future architecture design. Our results show that performance can be improved by 9x with appropriate existing technologies and probably further improved by future ones.
- Published
- 2010
117. TeraGrid resource selection tools
- Author
-
Kenneth Yoshimoto and Subhashini Sivagnanam
- Subjects
Resource (project management) ,Database ,Computer science ,TeraGrid ,computer.software_genre ,Grid ,computer ,Selection (genetic algorithm) ,Test (assessment) - Abstract
On a grid of computers, users often must decide between individual machines for job submission. Usually, the goal is to minimize time-to-completion. Several tools are available on TeraGrid to help users make this decision. In this paper, we use these tools to perform actual job submissions on TeraGrid machines. We evaluate the time-to-job-start effectiveness of these tools.
- Published
- 2010
118. Bringing high performance climate modeling into the classroom
- Author
-
Carol Song, Matthew Huber, Lan Zhao, Wonjun Lee, and Aaron Goldner
- Subjects
Class (computer programming) ,Multimedia ,Computer science ,Climate change ,Supercomputer ,computer.software_genre ,Engineering management ,ComputingMilieux_COMPUTERSANDEDUCATION ,Community Climate System Model ,Climate model ,TeraGrid ,Architecture ,User interface ,computer - Abstract
Climate science educators face great challenges on combining theory with hands-on practices in teaching climate modeling. Typical model runs require large computation and storage resources that may not be available on a campus. Additionally, the training and support required to bring novices up to speed would consume significant class time. The same challenges also exist across many other science and engineering disciplines. The TeraGrid science gateway program is leading the way of a new paradigm in addressing such challenges. As part of the TeraGrid science gateway initiative, The Purdue CCSM portal aims at assisting both research and education users to run Community Climate System Model (CCSM) simulations using the TeraGrid high performance computing resources. It provides a one-stop shop for creating, configuring, running CCSM simulations as well as managing jobs and processing output data. The CCSM portal was used in a Purdue graduate class for students to get hands-on experience with running world class climate simulations and use the results to study climate change impact on political policies. The CCSM portal is based on a service-oriented architecture with multiple interfaces to facilitate training. This paper describes the design of the CCSM portal with the goal of supporting classroom users, the challenges of utilizing the portal in a classroom setting, and the solutions implemented. We present two student projects from the fall 2009 class that successfully used the CCSM portal.
- Published
- 2010
119. Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
- Author
-
Karen Tomko, Amitava Majumdar, Ping Lai, Sayantan Sur, Dhabhaleswar K. Panda, Karl Schulz, Mahidhar Tatineni, William L. Barth, Y. Cui, and Sreeram Potluri
- Subjects
Communication design ,Remote direct memory access ,Semantics (computer science) ,Computer science ,Computation ,Process (computing) ,Code (cryptography) ,InfiniBand ,TeraGrid ,Parallel computing - Abstract
AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocity-stress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid Clusters every year. A significant portion of its run-time (37% in a 4,096 process run), is spent in MPI communication routines. Hence, it demands an optimized communication design coupled with a low-latency, high-bandwidth network and an efficient communication subsystem for good performance. In this paper, we analyze the performance bottlenecks of the application with regard to the time spent in MPI communication calls. We find that much of this time can be overlapped with computation using MPI non-blocking calls. We use both two-sided and MPI-2 one-sided communication semantics to re-design the communication in AWM-Olsen. We find that with our new design, using MPI-2 one-sided communication semantics, the entire application can be sped up by 12% at 4K processes and by 10% at 8K processes on a state-of-the-art InfiniBand cluster, Ranger at the Texas Advanced Computing Center (TACC).
- Published
- 2010
120. Federated login to TeraGrid
- Author
-
Von Welch, Jim Basney, and Terry Fleury
- Subjects
Software_OPERATINGSYSTEMS ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Public key infrastructure ,Login ,computer.software_genre ,Computer security ,Shibboleth ,Data resources ,World Wide Web ,ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS ,Cyberinfrastructure ,Grid computing ,ComputingMilieux_COMPUTERSANDSOCIETY ,Systems design ,TeraGrid ,computer - Abstract
We present a new federated login capability for the TeraGrid, currently the world's largest and most comprehensive distributed cyberinfrastructure for open scientific research. Federated login enables TeraGrid users to authenticate using their home organization credentials for secure access to TeraGrid high performance computers, data resources, and high-end experimental facilities. Our novel system design links TeraGrid identities with campus identities and bridges from SAML to PKI credentials to meet the requirements of the TeraGrid environment.
- Published
- 2010
121. High-fidelity real-time simulation on deployed platforms
- Author
-
D. B. P. Huynh, David J. Knezevic, Anthony T. Patera, John W. Peterson, Massachusetts Institute of Technology. Department of Mechanical Engineering, Huynh, Dinh Bao Phuong, Knezevic, David, and Patera, Anthony T.
- Subjects
Product design specification ,High fidelity ,General Computer Science ,Real-time simulation ,Computer science ,Adaptive system ,General Engineering ,TeraGrid ,Android (operating system) ,Supercomputer ,Algorithm ,Visualization ,Computational science - Abstract
We present a certified reduced basis method for high-fidelity real-time solution of parametrized partial differential equations on deployed platforms. Applications include in situ parameter estimation, adaptive design and control, interactive synthesis and visualization, and individuated product specification. We emphasize a new hierarchical architecture particularly well suited to the reduced basis computational paradigm: the expensive Offline stage is conducted pre-deployment on a parallel supercomputer (in our examples, the TeraGrid machine Ranger); the inexpensive Online stage is conducted “in the field” on ubiquitous thin/inexpensive platforms such as laptops, tablets, smartphones (in our examples, the Nexus One Android-based phone), or embedded chips. We illustrate our approach with three examples: a two-dimensional Helmholtz acoustics “horn” problem; a three-dimensional transient heat conduction “Swiss Cheese” problem; and a three-dimensional unsteady incompressible Navier–Stokes low-Reynolds-number “eddy-promoter” problem., United States. Air Force Office of Scientific Research (Grant FA9550-07-1-0425), United States. Air Force Office of Scientific Research (OSD Grant FA9550-09-1-0613), National Science Foundation (U.S.) (University of Texas at Austin. Texas Advanced Computing Center Grant TG-ASC100016)
- Published
- 2010
122. Benchmarking Parallel I/O Performance for a Large Scale Scientific Application on the TeraGrid
- Author
-
Frank Löffler, Erik Schnetter, Gabrielle Allen, and Jian Tao
- Subjects
Scale (ratio) ,Computer science ,Scalability ,Bandwidth (computing) ,computer.file_format ,TeraGrid ,Benchmarking ,Parallel computing ,Hierarchical Data Format ,computer ,Parallel I/O - Abstract
This paper is a report on experiences in benchmarking I/O performance on leading computational facilities on the NSF TeraGrid network with a large scale scientific application. Instead of focusing only on the raw file I/O bandwidth provided by different machine architectures, the I/O performance and scalability of the computational tools and libraries that are used in current production simulations are tested as a whole, however with focus mostly on bulk transfers. It is seen that the I/O performance of our production code scales very well, but is limited by the I/O system itself at some point. This limitation occurs at a low percentage of the computational size of the machines, which shows that at least for the application used for this paper the I/O system can be an important limiting factor in scaling up to the full size of the machine.
- Published
- 2010
123. Performance evaluation of Gfarm and GPFS-WAN in data grid environment
- Author
-
Hui-Shan Chen, Chia-Yen Liu, and Kuo-Yang Cheng
- Subjects
File system ,Data grid ,Computer science ,business.industry ,Grid file ,computer.software_genre ,Distributed data store ,Computer data storage ,Operating system ,Lustre (file system) ,TeraGrid ,business ,computer ,Data transmission - Abstract
The network technologies and storage devices are developing quickly and the technical devices are inexpensive. Therefore, users have a large amount of storage space, Data Grid can collect distributed storage device to share to other users. In this paper, we discuss several file systems that consist of NFS, SRB, iRODS, Lustre, Gfarm and GPFS-WAN. In Data Grid, we select the specific file system to administrate the storage devices. We must consider the file system that obtains scalibilty, security and stability. Finally, we choose two file systems that consist of Gfarm and GPFS-WAN to be our evaluative target. Gfarm and GPFS-WAN are used to build up large-scale dataset storage system that consists of PRAGMA and TeraGrid, respectively. In this paper, we use Gfarm and GPFS-WAN to create a Data Grid environment and evaluate the performance of data transmission, respectively. As results, the GPFS-WAN's performance of data transmission is better than Gfarm. The data transmission is one of the factors in Data Grid environment. For instance, the system obtains replication, open source software, easy to install and maintain that is important to be a great Data Grid environment, these important information will be discussed in this paper.
- Published
- 2010
124. Science on the TeraGrid
- Author
-
Daniel S. Katz, Scott Callaghan, Robert Harkness, Shantenu Jha, Krzysztof Kurowski, Steven Manos, Sudhakar Pamidighantam, Marlon Pierce, Beth Plale, Carol Song, and John Towns
- Subjects
Grid computing ,Computer science ,e-Science ,General Medicine ,TeraGrid ,computer.software_genre ,Supercomputer ,computer ,Data science ,Computational science - Published
- 2010
- Full Text
- View/download PDF
125. AMP: A Science-driven Web-based Application for the TeraGrid
- Author
-
Matthew Woitaszek, Travis S. Metcalfe, and Ian Shorrock
- Subjects
FOS: Computer and information sciences ,Web development ,Computer science ,FOS: Physical sciences ,02 engineering and technology ,01 natural sciences ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Web application ,Orchestration (computing) ,010303 astronomy & astrophysics ,Instrumentation and Methods for Astrophysics (astro-ph.IM) ,Solar and Stellar Astrophysics (astro-ph.SR) ,computer.programming_language ,business.industry ,Python (programming language) ,Astrophysics - Solar and Stellar Astrophysics ,Computer Science - Distributed, Parallel, and Cluster Computing ,Software deployment ,020201 artificial intelligence & image processing ,TeraGrid ,Distributed, Parallel, and Cluster Computing (cs.DC) ,User interface ,business ,Software engineering ,Astrophysics - Instrumentation and Methods for Astrophysics ,computer - Abstract
The Asteroseismic Modeling Portal (AMP) provides a web-based interface for astronomers to run and view simulations that derive the properties of Sun-like stars from observations of their pulsation frequencies. In this paper, we describe the architecture and implementation of AMP, highlighting the lightweight design principles and tools used to produce a functional fully-custom web-based science application in less than a year. Targeted as a TeraGrid science gateway, AMP's architecture and implementation are intended to simplify its orchestration of TeraGrid computational resources. AMP's web-based interface was developed as a traditional standalone database-backed web application using the Python-based Django web development framework, allowing us to leverage the Django framework's capabilities while cleanly separating the user interface development from the grid interface development. We have found this combination of tools flexible and effective for rapid gateway development and deployment., Comment: 7 pages, 2 figures, in Proceedings of the 5th Grid Computing Environments Workshop
- Published
- 2010
- Full Text
- View/download PDF
126. Workflow-Based High Performance Data Transfer and Ingestion to Support Petascale Simulations on TeraGrid
- Author
-
Clark C. Guest, Yifeng Cui, Philip Maechling, Jun Zhou, and Sashka Davis
- Subjects
Petascale computing ,Workflow ,Grid computing ,Computer science ,Operating system ,TeraGrid ,computer.software_genre ,Supercomputer ,computer ,Pipeline (software) ,Computational science ,Data modeling ,Electronic data interchange - Abstract
We report on high performance data transfer and ingestion design carried out in a scientific workflow project to support Southern California Earthquake Center (SCEC) petascale simulations on TeraGrid (TG), which is conducive to utilize the grid resource to pipeline data pre- and post-processing in this workflow simulation. We develop an enhanced prototype framework that brings together Globus Toolkit and advanced MPI batch jobs for reliable and efficient data transfer between heterogeneous supercomputer clusters on TG. The framework automates the whole process of data transfer without human intervention and it can recover automatically from any failures during the transfers. We also examine optimization approaches for ingesting simulation data into the iRODS (Integrated Rule-Oriented Data System) digital library. The average transfer rate from TACC Ranger to iRODS achieves 133MB/sec, 5 times faster than conventional methods. Experiments performed on TG clusters demonstrated that these concurrent data transfer and ingestion mechanisms can shorten the processing time of the scientific workflow and significantly reduce the load as well.
- Published
- 2010
127. Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing
- Author
-
Abhinav Thota, Nayong Kim, Soon-Heum Ko, Shantenu Jha, and Joohyun Kim
- Subjects
Grid computing ,Computer science ,Distributed computing ,Load modeling ,Dynamic priority scheduling ,TeraGrid ,Load balancing (computing) ,computer.software_genre ,computer ,Submission requirement ,Dynamic resource ,Scheduling (computing) - Abstract
Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.
- Published
- 2010
128. Simplifying Complex Software Assembly: The Component Retrieval Language and Implementation
- Author
-
Erik Schnetter, Steven R. Brandt, Gabrielle Allen, Frank Löffler, and Eric L. Seidel
- Subjects
FOS: Computer and information sciences ,Source code ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Field (computer science) ,Computer Science - Software Engineering ,Software ,0103 physical sciences ,Web page ,0202 electrical engineering, electronic engineering, information engineering ,D.2.7 ,D.3.2 ,media_common ,Computer Science - Programming Languages ,010308 nuclear & particles physics ,business.industry ,Simulation software ,Software Engineering (cs.SE) ,Component-based software engineering ,020201 artificial intelligence & image processing ,TeraGrid ,Software engineering ,business ,computer ,Software versioning ,Programming Languages (cs.PL) - Abstract
Assembling simulation software along with the associated tools and utilities is a challenging endeavor, particularly when the components are distributed across multiple source code versioning systems. It is problematic for researchers compiling and running the software across many different supercomputers, as well as for novices in a field who are often presented with a bewildering list of software to collect and install. In this paper, we describe a language (CRL) for specifying software components with the details needed to obtain them from source code repositories. The language supports public and private access. We describe a tool called GetComponents which implements CRL and can be used to assemble software. We demonstrate the tool for application scenarios with the Cactus Framework on the NSF TeraGrid resources. The tool itself is distributed with an open source license and freely available from our web page., Comment: 8 pages, 5 figures, TeraGrid 2010
- Published
- 2010
- Full Text
- View/download PDF
129. SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems
- Author
-
Lukasz Lacinski, Shantenu Jha, and Andre Luckow
- Subjects
Computer science ,business.industry ,Distributed computing ,Cloud computing ,computer.software_genre ,Application software ,Grid computing ,Scalability ,Pilot job ,TeraGrid ,business ,Execution model ,Implementation ,computer - Abstract
The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.
- Published
- 2010
130. A grid meta scheduler for a distributed interoperable workflow management system
- Author
-
Marco Passante, Maria Mirto, Giovanni Aloisio, Mirto, Maria, Passante, Marco, and Aloisio, Giovanni
- Subjects
Computer science ,Distributed computing ,TeraGrid ,GRID middleware ,Grid workflow ,computer.software_genre ,Grid ,Workflow Management Systems ,Large scale experiment ,DRMAA ,Heterogeneous resource ,Semantic grid ,Workflow ,Effective management ,Grid computing ,Grid infrastructure ,Middleware (distributed applications) ,End user ,Meta-Scheduler ,computer ,Workflow management system ,Global reach ,Grid environment ,Work-flow - Abstract
The resources needed to execute workflows in a Grid environment are commonly highly distributed, heterogeneous, and managed by different organizations. One of the main challenges in the development of Grid infrastructure services is the effective management of those resources in such a way that much of the heterogeneity is hidden from the end-user. This requires the ability to orchestrate the use of various resources of different types. Grid production infrastructures such as EGEE, DEISA and TeraGrid allow sharing of heterogeneous resources in order for supporting large-scale experiments. The interoperability of these infrastructures and the middleware stacks enabling applications migration and/or aggregating the combined resources of these infrastructures, are of particular importance to facilitate a grid with global reach. The relevance of current and emerging standards for such interoperability is the goal of many researches in the recent years. Starting from a first prototype of grid workflow, we have improved the current version introducing several features such as fault tolerance, security and a different management of the jobs. So our grid meta scheduler, named GMS (Grid Meta Scheduler), has been redesigned, able to be interoperable among grid middleware (gLite, Unicore and Globus) when executing workflows. It allows the composition of batch, parameter sweep and MPI based jobs.
- Published
- 2010
131. Computer Simulation of Multibody Dynamical Systems in a TeraGrid Environment
- Author
-
Abdul Muqtadir Mohammed and Shanzhong Shawn Duan
- Subjects
Parallelizable manifold ,Dynamical systems theory ,Computer science ,Control engineering ,TeraGrid ,Multibody system ,Computing systems ,Field (computer science) ,Computational science ,Virtual prototyping - Abstract
Despite the great growth in capability of computer hardware, the system size, complexity of structures, and time scales present in virtual prototyping of multibody dynamical systems will continue to challenge the field of computational multibody dynamics for the foreseeable future. In this paper, the scientific problems in virtual prototyping of multibody dynamical systems are articulated. Implementation of an efficient parallelizable algorithm on TeraGrid computing systems is further discussed. Various simulation cases and computing results are presented to demonstrate impact of the TeraGrid to the performance of the algorithm.Copyright © 2010 by ASME
- Published
- 2010
132. An Autonomic Approach to Integrated HPC Grid and Cloud Usage
- Author
-
Manish Parashar, Shantenu Jha, Hyunjoo Kim, and Yaakoub El-Khamra
- Subjects
Computer science ,business.industry ,Distributed computing ,Cloud computing ,computer.software_genre ,Grid ,Scheduling (computing) ,Autonomic computing ,Workflow ,Grid computing ,High-throughput computing ,TeraGrid ,business ,computer - Abstract
Clouds are rapidly joining high-performance Grids as viable computational platforms for scientific exploration and discovery, and it is clear that production computational infrastructures will integrate both these paradigms in the near future. As a result, understanding usage modes that are meaningful in such a hybrid infrastructure is critical. For example, there are interesting application workflows that can benefit from such hybrid usage modes to, per- haps, reduce times to solutions, reduce costs (in terms of currency or resource allocation), or handle unexpected runtime situations (e.g., unexpected delays in scheduling queues or unexpected failures). The primary goal of this paper is to experimentally investigate, from an applications perspective, how autonomics can enable interesting usage modes and scenarios for integrating HPC Grid and Clouds. Specifically, we used a reservoir characterization application workflow, based on Ensemble Kalman Filters (EnKF) for history matching, and the CometCloud autonomic Cloud engine on a hybrid platform consisting of the TeraGrid and Amazon EC2, to investigate 3 usage modes (or autonomic objectives) – acceleration, conservation and resilience.
- Published
- 2009
133. TeraGrid's integrated information service
- Author
-
Laura Pearlman, Maytal Dahan, John McGee, Chris Jordan, Jason Brechin, Diana Diehl, Warren Smith, David Hart, Ian Foster, Jason Reilly, John-Paul Navarro, Eric Blau, Rob Light, Tom Scavo, Charlie Catlett, Michael Dwyer, Lee Liming, Michael Shapiro, Rion Dooley, Kate Ericson, Stuart Martin, Nancy Wilkins-Diehr, Ed Hanna, and Shava Smallen
- Subjects
World Wide Web ,Computer science ,business.industry ,Information architecture ,Directory service ,Information system ,Systems architecture ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Use case ,TeraGrid ,business - Abstract
The NSF TeraGrid project has designed and constructed a federated integrated information service (IIS) to serve its capability publishing and discovery needs. This service has also proven helpful in automating TeraGrid's operational activities. We describe the requirements that motivated this work; IIS's system architecture, information architecture, and information content; processes that IIS currently supports; and how various layers of the system architecture are being used. We also review motivating use cases that have not yet been satisfied by IIS and outline approaches for future work.
- Published
- 2009
134. Evolving interfaces to impacting technology
- Author
-
Rion Dooley, Stephen Mock, Patrick Hurley, Praveen Nuthulapati, and Maytal Dahan
- Subjects
Ajax ,Multimedia ,Web 2.0 ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,computer.software_genre ,World Wide Web ,Web application ,TeraGrid ,Architecture ,business ,computer ,Mobile device ,computer.programming_language - Abstract
The TeraGrid User Portal (TGUP) [1] is a web portal that aggregates and simplifies access to TeraGrid information and services for active TeraGrid users. The purpose of the TGUP is to make using the large number of diverse resources and services of the TeraGrid easier for the national open science community, thus increasing their productivity and the impact of the TeraGrid project. As the portal capabilities have expanded and improved TGUP usage surpassed 300,000 hits a month. To continue increasing the impact and visibility of the TeraGrid project and the TeraGrid User Portal the team developed TGUP Mobile. TGUP Mobile is a lightweight, responsive web application providing a subset of TGUP capabilities via a mobile device. This paper describes the architecture, design, and development of the TGUP Mobile application, and examines the community acceptance and synergy created through the development of both the traditional portal and TGUP Mobile.
- Published
- 2009
135. Delivering real-time satellite data to a broader audience
- Author
-
Carol Song, Larry Biehl, Lan Zhao, Shuang Wu, and Rakesh Veeramacheneni
- Subjects
World Wide Web ,Cyberinfrastructure ,Test data generation ,Computer science ,Gadget ,TeraGrid ,Data as a service ,User interface ,Resource Provider ,Cyberspace - Abstract
This paper presents our work of enhancing the existing cyberinfrastructure using Web 2.0 technologies for delivering real time satellite data to a broader audience. As a resource provider on the TeraGrid, Purdue University hosts several data collections for earth and environmental research, including a large number of satellite data products that are received and generated in real time at the Purdue Terrestrial Observatory (PTO). A science gateway portal, named PRESTIGE, has been developed as the access point to services related to the satellite data generated at PTO. It remains a major challenge to help potential users easily discover the data products that are available and access them based on their needs. In an effort to address this challenge, we developed different interfaces for different user groups. We use the Google gadget as a new way to deliver satellite data and information to a broad user community. Three satellite data viewer gadgets have been developed and deployed. They connect to a data generation and processing system at the backend. The gadgets embrace the recent Web 2.0 technologies in bringing the rich access and sharing capabilities available in the general cyberspace to the scientific user communities.
- Published
- 2009
136. From campus resources to federated international grids
- Author
-
Stefan J. Zasada and Peter V. Coveney
- Subjects
Semantic grid ,Grid computing ,Data grid ,Hosting environment ,Computer science ,Distributed computing ,TeraGrid ,User interface ,computer.software_genre ,Grid ,computer ,National Grid - Abstract
Key to broadening participation in grid computing is the provision of easy to use access mechanisms and user interfaces, to allow a wide range of users with different skill sets to access the computational and data resources on offer. The Application Hosting Environment is one such middleware tool, hiding much of the complexity of dealing with grid resources from the user, and allowing them to interact with applications rather than machines. The nature of AHE means that it can be used as a single interface to a wide variety of resources, ranging from those provided at a departmental or institutional level to international federated grids of supercomputers. The number, range and size of resources made available by federating grids makes possible scientific investigations that would previously not have been feasible. In this paper we describe how we have deployed AHE to offer access to federated local and grid resources provided by the TeraGrid, UK National Grid Service and EU DEISA grid. We also present two case studies where AHE has been used to facilitate production level scientific simulation across these resources.
- Published
- 2009
137. Using dynamic accounts to enable access to advanced resources through science gateways
- Author
-
Michael E. Papka, Joseph A. Insley, and Ti Leggett
- Subjects
World Wide Web ,Service (systems architecture) ,Resource (project management) ,Computer science ,TeraGrid ,Gateway (computer program) ,Visualization - Abstract
Science Gateways have emerged as a valuable solution for providing large numbers of users with access to advanced computing resources. Additionally, they can hide many of the complexities often associated with using such resources effectively. Many gateways use a community account, which is shared by all gateway users on the backend compute resource. In some cases this can lead to problems when it comes to segregation of user data. To address this issue, we have investigated the use of dynamic accounts, where each gateway user is dynamically allocated their own account on the backend resource. We describe some of the features of the Dynamic Account service and explain how it has been integrated into the TeraGrid Visualization Gateway. We also discuss problems encountered, identify remaining open issues, and describe directions for future work.
- Published
- 2009
138. SimpleGrid 2.0
- Author
-
Yan Liu, Shaowen Wang, and Nancy Wilkins-Diehr
- Subjects
Service (systems architecture) ,medicine.medical_specialty ,Web 2.0 ,Computer science ,Gateway (computer program) ,computer.software_genre ,World Wide Web ,Cyberinfrastructure ,medicine ,TeraGrid ,Web resource ,Web service ,computer ,Web modeling - Abstract
The science gateway approach has been widely employed to bridge cyberinfrastructure and domain science communities for advancing scientific and engineering problem solving. Numerous efforts have been made toward developing Web-based gateway systems to enable efficient and integrated access to cyberinfrastructure resources and services. As the number of science gateways grows rapidly, however, it still remains challenging for a science gateway to be widely adopted and used by its targeted science community users mainly because of the difficulties in efficiently providing transparent access to cyberinfrastructure, creating user-friendly gateway user environments, and developing and integrating domain-specific gateway applications. This paper develops a usability-oriented Web approach to address these difficulties, and extends the SimpleGrid Toolkit to enable the learning and development of highly usable gateway software components. The paper illustrates the use of cutting edge Web 2.0 technologies in building highly interactive and collaborative gateway user environments. Based on service-oriented architecture, a gateway Web service framework is designed and implemented to efficiently convert domain scientific programs into REST Web services. This framework allows flexible service deployment and integration by representing a service as a Web resource that can be accessed using standard Web protocols. SimpleGrid is packaged and documented for efficient learning, as illustrated by the experiences of using the SimpleGrid Toolkit in TeraGrid science gateway tutorials and targeted gateway support.
- Published
- 2009
139. A simulation toolkit to investigate the effects of grid characteristics on workflow completion time
- Author
-
Michael O. McCracken and Allan Snavely
- Subjects
Workflow ,Ranking ,Computer science ,Distributed computing ,TeraGrid ,Grid ,Workflow engine ,Queue ,Workflow management system ,Workflow technology - Abstract
Advances in technology and the increasing number and scale of compute resources have enabled larger computational science experiments and given researchers many choices of where and how to store data and perform computation. Analyzing the time to completion of their experiments is important for scientists to make the best use of both human and computational resources, but it is difficult to do in a comprehensive fashion because it involves experiment, system and user variables and their interactions with each configuration of systems. We present a simulation toolkit for analysis of computational science experiments and estimation of their time to completion. Our approach uses a minimal description of the experiment's workflow, and separate information about the systems being evaluated.We evaluate our approach using synthetic experiments that reflect actual workflow patterns, executed on systems from the NSF TeraGrid. Our evaluation focuses on ranking the available systems in order of expected experiment completion time. We show that with sufficient system information, the model can help investigate alternative systems and evaluate workflow bottlenecks. We also discuss the challenges posed by volatile queue wait time behavior, and suggest some methods to improve the accuracy of simulation for near-term workflow executions. We evaluate the impact of advance notice of predictable spikes in queue wait time due to down-time and reservations. We show that given advance notice, the probability of a correct ranking for a sample of synthetic workflows could increase from 59% to 74% or even 79%.
- Published
- 2009
140. VGrADS
- Author
-
T. Mark Huang, Dmitrii Zagorodnov, Yang-Suk Kee, Kiran Thyagaraja, Daniel Nurmi, Dennis Gannon, Charles Koelbel, Lavanya Ramakrishnan, Rich Wolski, Graziano Obertelli, Asim YarKhan, and Anirban Mandal
- Subjects
business.industry ,Computer science ,Distributed computing ,Cloud computing ,Fault tolerance ,computer.software_genre ,Grid ,Job queue ,Scheduling (computing) ,Workflow ,e-Science ,Operating system ,TeraGrid ,business ,computer ,Workflow management system - Abstract
Today's scientific workflows use distributed heterogeneous resources through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical applications, predictable quality of service is necessary across these distributed resources. VGrADS' virtual grid execution system (vgES) provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workflows. Specifically, this paper reports on our experiences with (1) virtualized reservations for batchqueue systems, (2) coordinated usage of TeraGrid (batch queue), Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus (cloud) resources, and (3) fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workflow planning method to balance performance, reliability and cost considerations. The results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems.
- Published
- 2009
141. Seamless integration of data services between spatial information Grid and TeraGrid based on broker-based data management model
- Author
-
Wenyang Yu, Guoqing Li, ZhenChun Huang, Yi Zeng, Carol Song, and Dingsheng Liu
- Subjects
Geospatial analysis ,Database ,Data grid ,Storage Resource Broker ,business.industry ,Data management ,computer.software_genre ,Grid ,Geography ,Broker Pattern ,TeraGrid ,business ,computer ,Data integration - Abstract
Most of the space agencies have built Grid systems to manage large volumes of spatial data archives and products. However, the heterogeneous data structure, the distributed storage location, and the gradual progress of building data service systems, make such spatial grid systems to be grid islands. The broker-based manage model can hide complexity and heterogeneity of spatial data sources, so that the research on broker-based data service is meaningful to promote inter-Grid collaboration for earth observation applications. This paper discusses the special problems of spatial information integration and some features of broker-based data management model. We demonstrate the prototype of building broker-based model to integrate heterogeneous data grid. This work securely provides querying and managing geospatial data and services, and transparent access to the related sources under Grid and Web Service environment. The paper also describe our experiences of case study on seamless integration with Purdue TeraGrid Data by using Storage Resource Broker, which is based on the extensible data service interfaces of China Spatial Information Grid.
- Published
- 2009
142. Developing autonomic distributed scientific applications
- Author
-
Shantenu Jha and Yaakoub El-Khamra
- Subjects
Large class ,Theoretical computer science ,Computer science ,Proof of concept ,Distributed computing ,A priori and a posteriori ,Ensemble Kalman filter ,TeraGrid ,Architecture ,History matching ,Rendering (computer graphics) - Abstract
The development of a simple effective distributed applications that can utilize multiple distributed resources remains challenging. Therefore, not surprisingly, it is difficult to implement advanced application characteristics - such as autonomic behaviour for distributed applications. Notwithstanding, there exist a large class of applications which could benefit immensely with support for autonomic properties and behaviour. For example, many applications have irregular and highly variable resource requirements which are very difficult to predict in advance. As a consequence of irregular execution characteristics, dynamic resource requirements are difficult to predict a priori thus rendering static resource mapping techniques such as work flows ineffective; in general the resource utilization problem can be addressed more efficiently using autonomic approaches. This paper discusses the design and development of a prototype framework that can support many of the requirements of Autonomic applications that desire to use Computational Grids. We provide here an initial description of the features and the architecture of the Lazarus framework developed using SAGA, integrate it with an Ensemble Kalman Filter application, and demonstrate the advantages - performance and lower development cost, of the framework. As proof of concept we deploy Lazarus on several different machines on the TeraGrid, and show the effective utilization of several heterogeneous resources and distinct performance enhancements that autonomics provides. Careful analysis provides insight into the primary reason underlying the performance improvements, namely a late-binding and an optimal choice of the configuration of resources selected.
- Published
- 2009
143. 2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009
- Author
-
D. S. Katz, J. Daly, N. DeBardeleben, M. Elnozahy, B. Kramer, S. Lathrop, N. Nystrom, K. Milfeld, S. Sanielevici, S. Scott, L. Votta, null LANL, null IBM, null Shodor Foundation, null ORNL, and null Sun Microsystems
- Subjects
Petascale computing ,Message logging ,Computer science ,Distributed computing ,Blue Waters ,Redundancy (engineering) ,Extreme scale computing ,Fault tolerance ,Crash ,TeraGrid ,Computer security ,computer.software_genre ,computer - Abstract
This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be amore » good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.« less
- Published
- 2009
144. Dynamic Provision of Computing Resources from Grid Infrastructures and Cloud Providers
- Author
-
Eduardo Huedo, Ignacio M. Llorente, Constantino Vázquez Blanco, and Rubén S. Montero
- Subjects
Service (systems architecture) ,Internet ,Computer science ,business.industry ,Distributed computing ,Redes de ordenadores ,020206 networking & telecommunications ,Cloud computing ,Provisioning ,02 engineering and technology ,computer.software_genre ,Grid ,7. Clean energy ,Semantic grid ,Hardware ,Utility computing ,Grid computing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,TeraGrid ,business ,computer - Abstract
Grid computing involves the ability to harness together the power of computing resources. In this paper we push forward this philosophy and show technologies enabling federation of grid infrastructures regardless of their interface. The aim is to provide the ability to build arbitrary complex grid infrastructure able to sustain the demand required by any given service. In this very same line, this paper also addresses mechanisms that potentially can be used to meet a given quality of service or satisfy peak demands this service may have. These mechanisms imply the elastic growth of the grid infrastructure making use of cloud providers, regardless of whether they are commercial, like Amazon EC2 and GoGrid, or scientific, like Globus Nimbus. Both these technologies of federation and dynamic provisioning are demonstrated in two experiments. The first is designed to show the feasibility of the federation solution by harnessing resources of the TeraGrid, EGEE and Open Science Grid infrastructures through a single point of entry. The second experiment is aimed to show the overheads caused in the process of offloading jobs to resources created in the cloud.
- Published
- 2009
145. Experiment and Workflow Management Using Cyberaide Shell
- Author
-
Gregor von Laszewski, Lizhe Wang, Kumar Mahinthakumar, Andrew J. Younge, and Xi He
- Subjects
Workflow ,Cyberinfrastructure ,Grid computing ,Command-line interface ,Computer science ,Distributed computing ,Middleware (distributed applications) ,Shell (computing) ,TeraGrid ,computer.software_genre ,computer ,Application lifecycle management - Abstract
In recent years the power of Grid computing has grown exponentially through the development of advanced middleware systems. While usage has increased, the penetration of Grid computing in the scientific community has been less than expected by some. This is due to a steep learning curve and high entry barrier that limit the use of Grid computing and advanced cyberinfrastructure. In order for the scientists to focus on actual scientific tasks, specialized tools and services need to be developed to ease the integration of complex middleware. Our solution is Cyberaide Shell, an advanced but simple to use systemshell which provides access to the powerful cyberinfrastructure available today. Cyberaide Shell provides a dynamic interface that allows access to complex cyberinfrastructure in an easy and intuitive fashion on an ad-hoc basis. This is accomplished by abstracting the complexities of resource, task, and application management through a scriptable command line interface. Through a service integration mechanism, the shell’s functionality is exposed to a wide variety of frameworks and programming languages. Cyberaide Shell includes specialized experiment management and workflow commands that, with the scriptable nature of a shell, provide a set of services which where previously unavailable. The usability of Cyberaide Shell is demonstrated using a Water Threat Management application deployed on the TeraGrid.
- Published
- 2009
146. The TeraShake Computational Platform for Large-Scale Earthquake Simulations
- Author
-
Amit Chourasia, Thomas H. Jordan, Philip Maechling, Y. Cui, Reagan Moore, and Kim B. Olsen
- Subjects
Earthquake simulation ,Storage Resource Broker ,Computer science ,Component (UML) ,Scalability ,Message Passing Interface ,Initialization ,TeraGrid ,Seismology ,Visualization ,Computational science - Abstract
Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
- Published
- 2009
147. The Grid Enablement and Sustainable Simulation of Multiscale Physics Applications
- Author
-
Satoshi Sekiguchi, Yoshio Tanaka, Aiichiro Nakano, Yingwen Song, Hiroshi Takemiya, and Shuji Ogata
- Subjects
Grid computing ,Computer science ,Server ,Distributed computing ,Middleware ,Process control ,Fault tolerance ,TeraGrid ,Diffusion (business) ,computer.software_genre ,Grid ,computer - Abstract
The understanding of H diffusion in materials is pivotal to designing suitable processes. Though a nudged elastic band (NEB)+molecular dynamics (MD)/quantum mechanics (QM) algorithm has been developed to simulate H diffusion in materials by our group, it is often not computationally feasible for large-scale models on a conventional single system. We thus gridify the NEB+MD/QM algorithm on the top of an integrated framework developed by our group. A two days simulation on H diffusion in alumina has been successfully carried out over a Trans-Pacific Grid infrastructure consisting of supercomputers provided by TeraGrid and AIST. In this paper, we describe the NEB+MD/QM algorithm, briefly introduce the framework middleware, present the grid enablement work, and report the techniques to achieve fault-tolerance and load-balance for sustainable simulation. We believe our experience is of benefit to both middleware developers and application users.
- Published
- 2009
148. The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two
- Author
-
John-Paul Navarro, Nancy Wilkins-Diehr, Jim Basney, Marlon Pierce, Wenjun Wu, Stuart Martin, Thomas D. Uram, L. Strand, Choonhan Youn, and T. Scavo
- Subjects
Computer science ,business.industry ,Management science ,Intersection (set theory) ,Scientific discovery ,Authorization ,computer.software_genre ,Multidisciplinary team ,Data science ,Software ,Grid computing ,Problem solving environment ,TeraGrid ,business ,computer - Abstract
Problem solving environments (PSEs) are increasingly important for scientific discovery. Today's most challenging problems often require multi-disciplinary teams, the ability to analyze very large amounts of data, and the need to rely on infrastructure built by others rather than reinventing solutions for each science team. The TeraGrid Science Gateways program recognizes these challenges and works with science teams to harness high-end resources that significantly extend a PSE's functionality.
- Published
- 2008
149. Network and Physical Upgrade of EV Wilkins Computing Center to Support a 600 Node Grid Used with Remote Sensing of SAR Polar DATA
- Author
-
L. Hayden, S. Walton, and K. Hayden
- Subjects
Operations research ,business.industry ,Computer science ,Node (networking) ,Information technology ,computer.software_genre ,Grid ,Upgrade ,Grid computing ,General partnership ,Instrumentation (computer programming) ,TeraGrid ,business ,computer ,Computer network - Abstract
Polar grid is a National Science Foundation (NSF) Major Research Instrumentation(MRI) funded partnership of Indiana University and Elizabeth City State University. The partnership?s goal is to acquire and deploy the computing infrastructure needed to investigate the urgent problems in glacial melting. This poster will detail the work involved in preparing the E.V. Wilkins computer center to support a 100 node grid, server and lab with future expansion to 600 nodes. This grid system will be connected to the TeraGrid.
- Published
- 2008
150. Data Visualization and Analysis of CIC Graduate Student TeraGrid Resource Usage
- Author
-
Elaine Wah and E. Johnson
- Subjects
Association rule learning ,Computer science ,business.industry ,computer.software_genre ,Data science ,Visualization ,Resource (project management) ,Data visualization ,Grid computing ,ComputingMilieux_COMPUTERSANDEDUCATION ,Resource allocation ,TeraGrid ,Cluster analysis ,business ,computer - Abstract
The computing resources of the TeraGrid are a powerful tool for research by graduate students throughout the world. In this paper we analyze the usage of the TeraGrid by a sample of 876 graduate students from the institutions that form the Committee on Institutional Cooperation. We investigate demographic information and usage of TeraGrid resources and present visualizations. We also analyze the sample using data mining algorithms, specifically rule association learning and hierarchical clustering, and create interactive visualizations. This work reveals interesting patterns about the research done by graduate students on the TeraGrid and the resources they use.
- Published
- 2008
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.