Author: "Daniel A Katz" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

1. Understanding the multifaceted geospatial software ecosystem: a survey approach

Author: Rebecca Vandewalle, Anand Padmanabhan, William C. Barley, Shaowen Wang, and Daniel S. Katz
Subjects: Geospatial analysis, Computer science, Software ecosystem, 05 social sciences, Geography, Planning and Development, 0211 other engineering and technologies, 0507 social and economic geography, 02 engineering and technology, Library and Information Sciences, computer.software_genre, Data science, Cyberinfrastructure, Convergence (relationship), 050703 geography, computer, 021101 geological & geomatics engineering, Information Systems
Abstract: Understanding the characteristics of the rapidly evolving geospatial software ecosystem in the United States is critical to enable convergence research and education that are dependent on geospatia...
Published: 2020

2. Extended Abstract

Author: Daniel S. Katz, Mihael Hategan, Yadu Babuji, Anna Woodard, Zhuozhao Li, Ian Foster, Ben Clifford, Kyle Chard, and Michael Wilde
Subjects: FOS: Computer and information sciences, 020203 distributed computing, Computer science, 02 engineering and technology, Parallel computing, Link (geometry), Python (programming language), 01 natural sciences, 010305 fluids & plasmas, Runtime system, Dependency graph, Computer Science - Distributed, Parallel, and Cluster Computing, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Parallelism (grammar), General Earth and Planetary Sciences, Distributed, Parallel, and Cluster Computing (cs.DC), computer, General Environmental Science, computer.programming_language
Abstract: Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together functions via the exchange of data. Parsl establishes a dynamic dependency graph and sends tasks for execution on connected resources when dependencies are resolved. Parsl's runtime system enables different compute resources to be used, from laptops to supercomputers, without modification to the Parsl program.
Published: 2021

3. LexNLP: Natural language processing and information extraction for legal and regulatory texts

Author: Eric M. Detterman, Michael James Bommarito, and Daniel Martin Katz
Subjects: Information extraction, Word embedding, Computer science, business.industry, Artificial intelligence, computer.software_genre, business, computer, Natural language processing
Abstract: LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at this https URL.
Published: 2021

4. Machine Learning and Law

Author: Daniel Martin Katz and John J. Nay
Subjects: business.industry, Computer science, Artificial intelligence, business, Machine learning, computer.software_genre, computer
Published: 2021

5. Extreme Scale Survey Simulation with Python Workflows

Author: Thomas D. Uram, Katrin Heitmann, Kyle Chard, Daniel S. Katz, Yadu Babuji, and Antonio Villarreal
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Distributed computing, FOS: Physical sciences, Python (programming language), Pipeline (software), Data set, Software portability, Software, Workflow, Computer Science - Distributed, Parallel, and Cluster Computing, Scalability, Code (cryptography), Distributed, Parallel, and Cluster Computing (cs.DC), Astrophysics - Instrumentation and Methods for Astrophysics, business, computer, Instrumentation and Methods for Astrophysics (astro-ph.IM), computer.programming_language
Abstract: The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will soon carry out an unprecedented wide, fast, and deep survey of the sky in multiple optical bands. The data from LSST will open up a new discovery space in astronomy and cosmology, simultaneously providing clues toward addressing burning issues of the day, such as the origin of dark energy and and the nature of dark matter, while at the same time yielding data that will, in turn, pose fresh new questions. To prepare for the imminent arrival of this remarkable data set, it is crucial that the associated scientific communities be able to develop the software needed to analyze it. Computational power now available allows us to generate synthetic data sets that can be used as a realistic training ground for such an effort. This effort raises its own challenges -- the need to generate very large simulations of the night sky, scaling up simulation campaigns to large numbers of compute nodes across multiple computing centers with different architectures, and optimizing the complex workload around memory requirements and widely varying wall clock times. We describe here a large-scale workflow that melds together Python code to steer the workflow, Parsl to manage the large-scale distributed execution of workflow components, and containers to carry out the image simulation campaign across multiple sites. Taking advantage of these tools, we developed an extreme-scale computational framework and used it to simulate five years of observations for 300 square degrees of sky area. We describe our experiences and lessons learned in developing this workflow capability, and highlight how the scalability and portability of our approach enabled us to efficiently execute it on up to 4000 compute nodes on two supercomputers., Comment: Proceeding for eScience 2021, 9 pages, 5 figures
Published: 2021
Full Text: View/download PDF

6. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Author: Ion Androutsopoulos, Dirk Hartung, Nikolaos Aletras, Michael James Bommarito, Ilias Chalkidis, Daniel Martin Katz, and Abhik Jana
Subjects: History, Language understanding, Polymers and Plastics, Computer science, Natural language understanding, ComputingMilieux_LEGALASPECTSOFCOMPUTING, computer.software_genre, Legal domain, Data science, Industrial and Manufacturing Engineering, Artificial intelligence and law, Benchmark (surveying), Business and International Management, Legal practice, Set (psychology), computer
Abstract: Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.
Published: 2021

7. Law Smells - Defining and Detecting Problematic Patterns in Legal Drafting

Author: Dirk Hartung, Daniel Martin Katz, Corinna Coupette, Maximilian Böther, and Janis Beckedorf
Subjects: History, Phrase, Polymers and Plastics, Syntax (programming languages), Computer science, Maintainability, Code smell, ComputingMilieux_LEGALASPECTSOFCOMPUTING, computer.software_genre, Industrial and Manufacturing Engineering, Code refactoring, Law, Taxonomy (general), Business and International Management, Element (criminal law), computer, Natural language
Abstract: Building on the computer science concept of code smells, we initiate the study of law smells, i.e., patterns in legal texts that pose threats to the comprehensibility and maintainability of the law. With five intuitive law smells as running examples — namely, duplicated phrase, long element, large reference tree, ambiguous syntax, and natural language obsession — we develop a comprehensive law smell taxonomy. This taxonomy classifies law smells by when they can be detected, which aspects of law they relate to, and how they can be discovered. We introduce text-based and graph-based methods to identify instances of law smells, confirming their utility in practice using the United States Code as a test case. Our work demonstrates how ideas from software engineering can be leveraged to assess and improve the quality of legal code, thus drawing attention to an understudied area in the intersection of law and computer science and highlighting the potential of computational legal drafting.
Published: 2021

8. Scalable Parallel Programming in Python with Parsl

Author: Daniel S. Katz, Anna Woodard, Kyle Chard, Zhuozhao Li, Michael Wilde, Ian Foster, Yadu Babuji, and Ben Clifford
Subjects: business.product_category, Programming language, Computer science, Scalable parallelism, Python (programming language), computer.software_genre, Supercomputer, Scripting language, Laptop, Scalability, Use case, business, computer, computer.programming_language
Abstract: Python is increasingly the lingua franca of scientific computing. It is used as a higher level language to wrap lower-level libraries and to compose scripts from various independent components. However, scaling and moving Python programs from laptops to supercomputers remains a challenge. Here we present Parsl, a parallel scripting library for Python. Parsl makes it straightforward for developers to implement parallelism in Python by annotating functions that can be executed asynchronously and in parallel, and to scale analyses from a laptop to thousands of nodes on a supercomputer or distributed system. We examine how Parsl is implemented, focusing on syntax and usage. We describe two scientific use cases in which Parsl's intuitive and scalable parallelism is used.
Published: 2019

9. Parsl: Pervasive Parallel Programming in Python

Author: Kyle Chard, Daniel S. Katz, Anna Woodard, Yadu Babuji, Zhuozhao Li, Justin M. Wozniak, Ben Clifford, Ryan Chard, Rohan Kumar, Michael Wilde, Lukasz Lacinski, and Ian Foster
Subjects: FOS: Computer and information sciences, 020203 distributed computing, Computer Science - Programming Languages, business.industry, Computer science, Big data, 02 engineering and technology, Parallel computing, Python (programming language), computer.software_genre, Supercomputer, Dependency graph, Computer Science - Distributed, Parallel, and Cluster Computing, Scripting language, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Blue Waters, 020201 artificial intelligence & image processing, Use case, Distributed, Parallel, and Cluster Computing (cs.DC), business, computer, Programming Languages (cs.PL), computer.programming_language
Abstract: High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250 000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.
Published: 2019

10. Managing genomic variant calling workflows with Swift/T

Author: Yan W. Asmann, Liudmila Sergeevna Mainzer, Yingxue Ren, Daniel S. Katz, Matthew Kendzior, Justin M. Wozniak, Jennie Zermeno, Matthew Weber, Jacob R Heldenbrand, Tiffany Wenting Li, Faisal M. Fadlelmola, Elliott Rodriguez, Azza Ahmed, and Katherine I Kendig
Subjects: Man-Computer Interface, Swift, Economics, Computer science, Data management, Big data, Social Sciences, Cloud computing, computer.software_genre, Workflow, Computer Architecture, Database and Informatics Methods, 0302 clinical medicine, Computer cluster, Psychology, Graphical User Interfaces, Language, Data Management, computer.programming_language, media_common, 0303 health sciences, Multidisciplinary, Genomics, computer.file_format, Scalability, Engineering and Technology, Medicine, Executable, Workflow management system, Research Article, Employment, Computer and Information Sciences, Bioinformatics, Science, media_common.quotation_subject, Jobs, Research and Analysis Methods, Genome Complexity, Software portability, 03 medical and health sciences, Genetics, Animals, Humans, Engines, 030304 developmental biology, business.industry, Mechanical Engineering, Cognitive Psychology, Computational Biology, Biology and Life Sciences, Chromosome, Genome Analysis, Debugging, Scripting language, Labor Economics, Human Factors Engineering, Operating system, Cognitive Science, business, Software engineering, computer, Software, 030217 neurology & neurosurgery, Neuroscience, User Interfaces
Abstract: Genomic variant discovery is frequently performed using the GATK Best Practices variant calling pipeline, a complex workflow with multiple steps, fans/merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Here we describe a wrapper for the GATK-based variant calling workflow using the Swift/T parallel scripting language. Standard built-in features include the flexibility to split by chromosome before variant calling, optionally permitting the analysis to continue when faulty samples are detected, and allowing users to analyze multiple samples in parallel within each cluster node. The use of Swift/T conveys two key advantages: (1) Thanks to the embedded ability of Swift/T to transparently operate in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.,) a single workflow is trivially portable across numerous clusters; (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, conditional on the analyst’s choice, which makes the workflow easy to maintain. This modular design permits separation of the workflow into multiple stages and the request of resources optimal for each stage of the pipeline. While Swift/T’s implicit data-level parallelism eliminates the need for the developer to code parallel analysis of multiple samples, it does make debugging of the workflow a bit more difficult, as is the case with any implicitly parallel code. With the above features, users have a powerful and portable way to scale up their variant calling analysis to run in many traditional computer cluster architectures.https://github.com/ncsa/Swift-T-Variant-Callinghttp://swift-t-variant-calling.readthedocs.io/en/latest/
Published: 2019

11. Leading-edge research in cluster, cloud, and grid computing: Best papers from the IEEE/ACM CCGrid 2015 conference

Author: Xiaobo Zhou and Daniel S. Katz
Subjects: Leading edge, Computer Networks and Communications, Computer science, business.industry, Distributed computing, 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, computer.software_genre, Grid computing, Hardware and Architecture, Middleware, Middleware (distributed applications), 0202 electrical engineering, electronic engineering, information engineering, Cluster (physics), 020201 artificial intelligence & image processing, business, computer, Software
Abstract: Architectures, networks, and systems and middleware technologies have been advancing.These advances lead to new concepts and platforms for computing, ranging from clusters and grids to clouds and datacenters.The CCGrid 2015 Conference discussed research and results on topics related to these concepts and platforms, and their applications.This special section presents five high-quality papers, extended from CCGrid 2015 papers.
Published: 2017

12. OpenEDGAR: Open Source Software for SEC EDGAR Analysis

Author: Eric M. Detterman, Michael James Bommarito, and Daniel Martin Katz
Subjects: Metadata, Parsing, Open source, Database, Computer science, Server, Electronic data, Open source software, MIT License, Python (programming language), computer.software_genre, computer, computer.programming_language
Abstract: OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications, and is distributed under MIT License.
Published: 2018

13. Understanding Software in Research: Initial Results from Examining Nature and a Call for Collaboration

Author: Udit Nangia and Daniel S. Katz
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Lightning (connector), 05 social sciences, 050301 education, Data science, Software Engineering (cs.SE), Data set, Set (abstract data type), Computer Science - Software Engineering, Software, SPARK (programming language), 0509 other social sciences, 050904 information & library sciences, business, 0503 education, computer, computer.programming_language
Abstract: This lightning talk paper discusses an initial data set that has been gathered to understand the use of software in research, and is intended to spark wider interest in gathering more data. The initial data analyzes three months of articles in the journal Nature for software mentions. The wider activity that we seek is a community effort to analyze a wider set of articles, including both a longer timespan of Nature articles as well as articles in other journals. Such a collection of data could be used to understand how the role of software has changed over time and how it varies across fields., Comment: lightning talk submitted to WSSSPE5.2 (http://wssspe.researchcomputing.org.uk/wssspe5/)
Published: 2017

14. Report on the first workshop on negative and null results in eScience

Author: Justin M. Wozniak, Douglas Thain, Silvia D. Olabarriaga, Daniel S. Katz, Ketan Maheshwari, APH - Methodology, and Epidemiology and Data Science
Subjects: Computer Networks and Communications, Computer science, Null (mathematics), 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Computer Science Applications, Theoretical Computer Science, Computational Theory and Mathematics, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, Software
Published: 2017

15. JETS: Language and System Support for Many-Parallel-Task Workflows

Author: Justin M. Wozniak, Michael Wilde, and Daniel S. Katz
Subjects: Computer Networks and Communications, Computer science, Distributed computing, Multiprocessing, computer.software_genre, Task (computing), Workflow, Coupling (computer programming), Hardware and Architecture, Middleware (distributed applications), Component (UML), Electronic performance support systems, Implementation, computer, Software, Information Systems
Abstract: Many-task computing is a well-established paradigm for implementing loosely coupled applications (tasks) on large-scale computing systems. However, few of the model's existing implementations provide efficient, low-latency support for executing tasks that are tightly coupled multiprocessing applications. Thus, a vast array of parallel applications cannot readily be used effectively within many-task workloads. In this work, we present JETS, a middleware component that provides high performance support for many-parallel-task computing (MPTC). JETS is based on a highly concurrent approach to parallel task dispatch and on new capabilities now available in the MPICH2 MPI implementation and the ZeptoOS Linux operating system. JETS represents an advance over the few known examples of multilevel many-parallel-task scheduling systems: it more efficiently schedules and launches many short-duration parallel application invocations; it overcomes the challenges of coupling the user processes of each multiprocessing application invocation via the messaging fabric; and it concurrently manages many application executions in various stages. We report here on the JETS architecture and its performance on both synthetic benchmarks and an MPTC application in molecular dynamics.
Published: 2013

16. Swift: A language for distributed parallel scripting

Author: Ben Clifford, Daniel S. Katz, Justin M. Wozniak, Mihael Hategan, Michael Wilde, and Ian Foster
Subjects: File system, Many-task computing, Computer Networks and Communications, Dataflow, Computer science, Programming language, Programming complexity, computer.software_genre, Computer Graphics and Computer-Aided Design, Theoretical Computer Science, Artificial Intelligence, Hardware and Architecture, Scripting language, Parallel programming model, Programming paradigm, Data-intensive computing, computer, Software, Language construct
Abstract: Scientists, engineers, and statisticians must execute domain-specific application programs many times on large collections of file-based data. This activity requires complex orchestration and data management as data is passed to, from, and among application invocations. Distributed and parallel computing resources can accelerate such processing, but their use further increases programming complexity. The Swift parallel scripting language reduces these complexities by making file system structures accessible via language constructs and by allowing ordinary application programs to be composed into powerful parallel scripts that can efficiently utilize parallel and distributed resources. We present Swift's implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distributed parallel execution.
Published: 2011

17. Distance measures for dynamic citation networks

Author: Michael James Bommarito, Daniel Martin Katz, James H. Fowler, and Jon Zelner
Subjects: Statistics and Probability, Physics - Physics and Society, Dynamic network analysis, FOS: Physical sciences, Digraph, Physics and Society (physics.soc-ph), Condensed Matter Physics, Preferential attachment, computer.software_genre, Distance measures, Hierarchical clustering, Set (abstract data type), Data mining, Cluster analysis, computer, MathematicsofComputing_DISCRETEMATHEMATICS, Curse of dimensionality, Mathematics
Abstract: Acyclic digraphs arise in many natural and artificial processes. Among the broader set, dynamic citation networks represent a substantively important form of acyclic digraphs. For example, the study of such networks includes the spread of ideas through academic citations, the spread of innovation through patent citations, and the development of precedent in common law systems. The specific dynamics that produce such acyclic digraphs not only differentiate them from other classes of graphs, but also provide guidance for the development of meaningful distance measures. In this article, we develop and apply our sink distance measure together with the single-linkage hierarchical clustering algorithm to both a two-dimensional directed preferential attachment model as well as empirical data drawn from the first quarter century of decisions of the United States Supreme Court. Despite applying the simplest combination of distance measures and clustering algorithms, analysis reveals that more accurate and more interpretable clusterings are produced by this scheme., Comment: 7 pages, 5 figures. Revision: Added application to the network of the first quarter-century of Supreme Court citations. Revision 2: Significantly expanded, includes application on random model as well
Published: 2010

18. A mathematical approach to the study of the United States Code

Author: Daniel Martin Katz and Michael James Bommarito
Subjects: FOS: Computer and information sciences, Statistics and Probability, Structure (mathematical logic), Physics - Physics and Society, Measure (data warehouse), Computer science, Programming language, media_common.quotation_subject, Representation (systemics), FOS: Physical sciences, Computer Science - Digital Libraries, Physics and Society (physics.soc-ph), Characterization (mathematics), Condensed Matter Physics, computer.software_genre, Computer Science - Information Retrieval, Computer Science - Computers and Society, State (polity), Computers and Society (cs.CY), Code (cryptography), Digital Libraries (cs.DL), computer, Information Retrieval (cs.IR), media_common
Abstract: The United States Code (Code) is a document containing over 22 million words that represents a large and important source of Federal statutory law. Scholars and policy advocates often discuss the direction and magnitude of changes in various aspects of the Code. However, few have mathematically formalized the notions behind these discussions or directly measured the resulting representations. This paper addresses the current state of the literature in two ways. First, we formalize a representation of the United States Code as the union of a hierarchical network and a citation network over vertices containing the language of the Code. This representation reflects the fact that the Code is a hierarchically organized document containing language and explicit citations between provisions. Second, we use this formalization to measure aspects of the Code as codified in October 2008, November 2009, and March 2010. These measurements allow for a characterization of the actual changes in the Code over time. Our findings indicate that in the recent past, the Code has grown in its amount of structure, interdependence, and language., Comment: 5 pages, 6 figures, 2 tables.
Published: 2010

19. Optimizing Workflow Data Footprint

Author: Karan Vahi, Kent Blackburn, Arun Ramakrishnan, John C. Good, Ewa Deelman, Rizos Sakellariou, Duncan A. Brown, Gurmeet Singh, Gaurang Mehta, Henan Zhao, Daniel S. Katz, Stephen Fairhurst, G. Bruce Berriman, and David Meyers
Subjects: Database, business.industry, Computer science, Data management, Dynamic data, computer.software_genre, Workflow engine, Computer Science Applications, Workflow technology, QA76.75-76.765, Workflow, Computer data storage, Data file, Computer software, business, computer, Software, Workflow management system
Abstract: In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime.
Published: 2007

20. Interlanguage parallel scripting for distributed-memory scientific computing

Author: Timothy G. Armstrong, Daniel S. Katz, Michael Wilde, Ketan Maheshwari, Justin M. Wozniak, and Ian Foster
Subjects: File system, Programming language, Computer science, Fortran, Interoperability, Python (programming language), computer.software_genre, Scripting language, Blue Waters, Operating system, Distributed memory, computer, Machine code, computer.programming_language
Abstract: Scripting languages such as Python and R have been widely adopted as tools for the development of scientific software because of the expressiveness of the languages and their available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, and parallel filesystem overheads due to the small file system accesses common in scripted approaches. We present a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. We present a technique to efficiently launch scripted applications on supercomputers, and we demonstrate high performance, such as invoking 14M Python interpreters per second on Blue Waters.
Published: 2015

21. Porting Ordinary Applications to Blue Gene/Q Supercomputers

Author: Ketan Maheshwari, Olle Heinonen, T. Andrew Binkowski, Dmitry Karpeyev, Xiaoliang Zhong, Michael Wilde, Timothy G. Armstrong, Daniel S. Katz, and Justin M. Wozniak
Subjects: Scheme (programming language), Swift, Class (computer programming), Computer science, Parallel computing, computer.software_genre, Supercomputer, Porting, Workflow, Scripting language, Operating system, IBM, computer, computer.programming_language
Abstract: Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques -- sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt's sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.
Published: 2015

22. Session details: Session 1

Author: Daniel S. Katz
Subjects: Multimedia, Session (computer science), computer.software_genre, Psychology, computer
Published: 2015

23. Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales

Author: Daniel S. Katz, Scott Klasky, Marc Gamell, Hemanth Kolla, Manish Parashar, and Jacqueline H. Chen
Subjects: Titan (supercomputer), business.industry, Computer science, Distributed computing, Operating system, business, computer.software_genre, computer, Data recovery
Abstract: Application resilience is a key challenge that must be addressed in order to realize the exascale vision. Process/node failures, an important class of failures, are typically handled today by terminating the job and restarting it from the last stored checkpoint. This approach is not expected to scale to exascale. In this paper we present Fenix, a framework for enabling recovery from process/node/blade/cabinet failures for MPI-based parallel applications in an online (i.e., Without disrupting the job) and transparent manner. Fenix provides mechanisms for transparently capturing failures, re-spawning new processes, fixing failed communicators, restoring application state, and returning execution control back to the application. To enable automatic data recovery, Fenix relies on application-driven, diskless, implicitly coordinated check pointing. Using the S3D combustion simulation running on the Titan Cray-XK7 production system at ORNL, we experimentally demonstrate Felix's ability to tolerate high failure rates (e.g., More than one per minute) with low overhead while sustaining performance.
Published: 2014

24. Evaluating storage systems for scientific data in the cloud

Author: Justin M. Wozniak, Matei Ripeanu, Daniel S. Katz, Victor M. Zavala, Hao Yang, Michael Wilde, and Ketan Maheshwari
Subjects: Computer science, business.industry, Group method of data handling, Distributed computing, Cloud computing, computer.software_genre, Data access, Scripting language, Virtual machine, Converged storage, Distributed data store, Operating system, business, computer, Workflow application
Abstract: Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, effective data handling is critical to efficient application execution. This paper investigates the capabilities of a variety of POSIX-accessible distributed storage systems to manage data access patterns resulting from workflow application executions in the cloud. We leverage the expressivity of the Swift parallel scripting framework to benchmark the performance of a number of storage systems using synthetic workloads and three real-world applications. We characterize two representative commercial storage systems (Amazon S3 and HDFS, respectively) and two emerging research-based storage systems (Chirp/Parrot and MosaStore). We find the use of aggregated node-local resources effective and economical compared with remotely located S3 storage. Our experiments show that applications run at scale with MosaStore show up to 30\% improvement in makespan time compared with those run with S3. We also find that storage-system driven application deployments in the cloud results in better runtime performance compared with an on-demand data-staging driven approach.
Published: 2014

25. The Case for Workflow-Aware Storage:An Opportunity Study

Author: Abmar Barros, Gilles Fedak, Ketan Maheshwari, Daniel S. Katz, Michael Wilde, Hao Yang, Matei Ripeanu, Emalayan Vairavanathan, Samer Al-Kiswany, Lauro Beltrao Costa, Department of Electrical and Computer Engineering [Vancouver], University of British Columbia (UBC), Universidade Federal de Campina Grande [Campina Grande] (UFCG), Argonne National Laboratory [Lemont] (ANL), University of Chicago, Algorithms and Software Architectures for Distributed and HPC Platforms (AVALON), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Subjects: Database, Computer Networks and Communications, business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Information repository, computer.software_genre, Workflow engine, Workflow technology, Workflow, Hardware and Architecture, Converged storage, Computer data storage, Large-scale storage system· Workflow-aware· Storage system· Workflow runtime engine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], business, computer, Software, Workflow management system, Information Systems, Storage violation
Abstract: International audience; This article evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing generic storage systems unable to harness all optimization opportunities as this often requires enabling conflicting optimizations or even conflicting design decisions at the storage system level. Second, most workflow runtime engines make suboptimal scheduling decisions as they lack the detailed data location information that is generally hidden by the storage system. This paper presents a limit study that evaluates the potential gains from building a workflow-aware storage system that supports per-file access optimizations and exposes data location. Our evaluation using synthetic benchmarks and real applications shows that a workflow-aware storage system can bring significant performance gains: up to 3x performance gains compared to a vanilla distributed storage system deployed on the same resources yet unaware of the possible file-level optimizations.
Published: 2014

26. An assessment of a Beowulf system for a wide class of analysis and design software

Author: Tom Cwik, P. Wang, John Z. Lou, Thomas Sterling, Daniel S. Katz, Paul L. Springer, and B. H. Kwan
Subjects: Ethernet, Hardware_MEMORYSTRUCTURES, ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION, Computer science, Fast Ethernet, General Engineering, Pentium, computer.software_genre, law.invention, Microprocessor, Backplane, law, Node (computer science), Operating system, Software design, Hardware_ARITHMETICANDLOGICSTRUCTURES, computer, Software, Dram
Abstract: A typical Beowulf system, such as the machine at the Jet Propulsion Laboratory (JPL), may comprise 16 nodes interconnected by 100 base T Fast Ethernet. Each node may include a single Inter Pentium Pro 200 MHz microprocessor, 128 MBytes of DRAM, 2.5 GBytes of IDE disk, and PCI bus backplane, and an assortment of other devices.
Published: 1998

27. Scalable, finite element analysis of electromagnetic scattering and radiation

Author: Tom Cwik, Daniel S. Katz, and John Z. Lou
Subjects: Electromagnetic field, Fortran, Scattering, Adaptive mesh refinement, Computer science, General Engineering, Radiation, Finite element method, Computational science, Mesh generation, Scalability, computer, Software, computer.programming_language
Abstract: In this paper a method for simulating electromagnetic fields scattered from complex objects is reviewed; namely, an unstructured finite element code that does not use traditional mesh partitioning algorithms.
Published: 1998

28. Parallelizing the execution of sequential scripts

Author: Justin M. Wozniak, Zhao Zhang, Daniel S. Katz, Timothy G. Armstrong, and Ian Foster
Subjects: Runtime system, Computer science, Programming language, Scripting language, Server-side scripting, Programming paradigm, HTML scripting, Active Scripting, Dynamic web page, computer.software_genre, Remote scripting, computer
Abstract: Scripting is often used in science to create applications via the composition of existing programs. Parallel scripting systems allow the creation of such applications, but each system introduces the need to adopt a somewhat specialized programming model. We present an alternative scripting approach, AMFS Shell, that lets programmers express parallel scripting applications via minor extensions to existing sequential scripting languages, such as Bash, and then execute them in-memory on large-scale computers. We define a small set of commands between the scripts and a parallel scripting runtime system, so that programmers can compose their scripts in a familiar scripting language. The underlying AMFS implements both collective (fast file movement) and functional (transformation based on content) file management. Tasks are handled by AMFS's built-in execution engine. AMFS Shell is expressive enough for a wide range of applications, and the framework can run such applications efficiently on large-scale computers.
Published: 2013

29. MTC envelope

Author: Michael Wilde, Ian Foster, Zhao Zhang, Daniel S. Katz, and Justin M. Wozniak
Subjects: File system, Profiling (computer programming), 020203 distributed computing, business.industry, Computer science, Computation, Concurrency, Context (language use), 010103 numerical & computational mathematics, 02 engineering and technology, computer.software_genre, 01 natural sciences, Software, Scripting language, 0202 electrical engineering, electronic engineering, information engineering, Operating system, 0101 mathematics, business, computer, Envelope (motion)
Abstract: Many scientific applications can be efficiently expressed with the parallel scripting (many-task computing, MTC) paradigm. These applications are typically composed of several stages of computation, with tasks in different stages coupled by a shared file system abstraction. However, we often see poor performance when running these applications on large scale computers due to the applications' frequency and volume of filesystem I/O and the absence of appropriate optimizations in the context of parallel scripting applications. In this paper, we show the capability of existing large scale computers to run parallel scripting applications by first defining the MTC envelope and then evaluating the envelope by benchmarking a suite of shared filesystem performance metrics. We also seek to determine the origin of the performance bottleneck by profiling the parallel scripting applications' I/O behavior and mapping the I/O operations to the MTC envelope. We show an example shared filesystem envelope and present a method to predict the I/O performance given the applications' level of I/O concurrency and I/O amount. This work is instrumental in guiding the development of parallel scripting applications to make efficient use of existing large scale computers, and to evaluate performance improvements in the hardware/software stack that will better facilitate parallel scripting applications.
Published: 2013

30. Swift/T: Large-scale Application Composition via Distributed-memory Dataflow Processing

Author: Daniel S. Katz, Ian Foster, Michael Wilde, Timothy G. Armstrong, Justin M. Wozniak, and Ewing Lusk
Subjects: Swift, Dataflow, Programming language, Fortran, Computer science, media_common.quotation_subject, Optimizing compiler, Parallel computing, computer.software_genre, Debugging, Scalability, Concurrent computing, Distributed memory, computer, media_common, computer.programming_language
Abstract: Many scientific applications are conceptually built up from independent component tasks as a parameter study, optimization, or other search. Large batches of these tasks may be executed on high-end computing systems; however, the coordination of the independent processes, their data, and their data dependencies is a significant scalability challenge. Many problems must be addressed, including load balancing, data distribution, notifications, concurrent programming, and linking to existing codes. In this work, we present Swift/T, a programming language and runtime that enables the rapid development of highly concurrent, task-parallel applications. Swift/T is composed of several enabling technologies to address scalability challenges, offers a high-level optimizing compiler for user programming and debugging, and provides tools for binding user code in C/C++/Fortran into a logical script. In this work, we describe the Swift/T solution and present scaling results from the IBM Blue Gene/P and Blue Gene/Q.
Published: 2013
Full Text: View/download PDF

31. Job and data clustering for aggregate use of multiple production cyberinfrastructures

Author: Ian Foster, Ketan Maheshwari, Philip Maechling, Michael Wilde, Daniel S. Katz, Zhao Zhang, Allan Espinosa, and S. Callaghan
Subjects: Service (systems architecture), Workflow, Earthquake simulation, Database, Scripting language, Computer science, Distributed computing, Aggregate (data warehouse), Cluster analysis, computer.software_genre, computer, Throughput (business), Bottleneck
Abstract: In this paper, we address the challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with distributed, heterogeneous resources that can be obtained opportunistically from grids and clouds. We seek to minimize time to solution by maximizing the amount of work that can be efficiently done on the distributed resources. We identify data movement as the main bottleneck in effectively utilizing the combined local and distributed resources. We address this by analyzing the I/O characteristics of the application, processor acquisition rate (from a pilot-job service), and the data movement throughput of the infrastructure. With these factors in mind, we explore a combination of strategies including partitioning of computation (over HPC and distributed resources) and job clustering.We validate our approach with a theoretical study and with preliminary measurements on the Ranger HPC system and distributed Open Science Grid resources. More complete performance results will be presented in the final submission of this paper.
Published: 2012

32. A Workflow-Aware Storage System: An Opportunity Study

Author: Michael Wilde, Matei Ripeanu, Samer Al-Kiswany, Daniel S. Katz, Zhao Zhang, Lauro Beltrao Costa, and Emalayan Vairavanathan
Subjects: Object storage, Workflow, Database, Computer science, Distributed computing, Converged storage, Information repository, computer.software_genre, Workflow engine, computer, Workflow management system, Workflow technology, Storage violation
Abstract: This paper evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing storage systems unable to harness all optimization opportunities as this often requires conflicting optimization options or even conflicting design decision at the level of the storage system. Second, when scheduling, workflow runtime engines make suboptimal decisions as they lack detailed data location information. This paper discusses the feasibility, and evaluates the potential performance benefits brought by, building a workflow-aware storage system that supports per-file access optimizations and exposes data location. To this end, this paper presents approaches to determine the application-specific data access patterns, and evaluates experimentally the performance gains of a workflow-aware storage approach. Our evaluation using synthetic benchmarks shows that a workflow-aware storage system can bring significant performance gains: up to 7x performance gain compared to the distributed storage system - MosaStore and up to 16x compared to a central, well provisioned, NFS server.
Published: 2012

33. Panel

Author: Ioan Raicu, Jack Dongarra, Daniel S. Katz, David Abramson, and Daniel A. Reed
Subjects: Fabric computing, Many-task computing, Grid computing, Utility computing, business.industry, Computer science, Distributed computing, Data-intensive computing, Cloud computing, business, computer.software_genre, computer
Published: 2011

34. Cyberinfrastructure Usage Modalities on the TeraGrid

Author: John-Paul Navarro, Chris Jordan, David Hart, Warren Smith, Von Welch, Nancy Wilkins-Diehr, John Towns, Amit Majumdar, and Daniel S. Katz
Subjects: Modalities, Cyberinfrastructure, ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION, Grid computing, Computer science, Component (UML), TeraGrid, Tera, computer.software_genre, Grid, Data science, computer
Abstract: This paper is intended to explain how the Tera Grid would like to be able to measure "usage modalities." We would like to (and are beginning to) measure these modalities to understand what objectives our users are pursuing, how they go about achieving them, and why, so that we can make changes in the Tera Grid to better support them.
Published: 2011

35. Understanding Scientific Applications for Cloud Environments

Author: Daniel S. Katz, Andre Merzky, Shantenu Jha, Andre Luckow, and Katerina Stamou
Subjects: Access network, Computer science, business.industry, Server, Bandwidth (computing), Provisioning, Cloud computing, Service provider, business, Virtualization, computer.software_genre, Data science, computer
Abstract: Distributed systems and their specific incarnations have evolved significantly over the years. Most often, these evolutionary steps have been a consequence of external technology trends, such as the significant increase in network/bandwidth capabilities that have occurred. It can be argued that the single most important driver for cloud computing environments is the advance in virtualization technology that has taken place. But what implications does this advance, leading to today’s cloud environments, have for scientific applications? The aim of this chapter is to explore how clouds can support scientific applications. Before we can address this important issue, it is imperative to (a) provide a working model and definition of clouds and (b) understand how they differ from other computational platforms such as grids and clusters. At a high level, cloud computing is defined by Mell and Grance [1] as a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. We view clouds not as a monolithic isolated platform but as part of a large distributed ecosystem. But are clouds a natural evolution of distributed systems, or are they a fundamental new paradigm? Prima facie, cloud concepts are derived from other systems, such as the implicit model of clusters as static
Published: 2011

36. Science on the TeraGrid

Author: Daniel S. Katz, Scott Callaghan, Robert Harkness, Shantenu Jha, Krzysztof Kurowski, Steven Manos, Sudhakar Pamidighantam, Marlon Pierce, Beth Plale, Carol Song, and John Towns
Subjects: Grid computing, Computer science, e-Science, General Medicine, TeraGrid, computer.software_genre, Supercomputer, computer, Data science, Computational science
Published: 2010
Full Text: View/download PDF

37. Critical perspectives on large-scale distributed applications and production Grids

Author: Daniel S. Katz, Manish Parashar, Omer Rana, Jon Weissman, and Shantenu Jha
Subjects: SIMPLE (military communications protocol), business.industry, Computer science, media_common.quotation_subject, Scale (chemistry), Distributed computing, computer.software_genre, Grid, Data visualization, Cyberinfrastructure, Grid computing, business, computer, Sophistication, media_common, TRACE (psycholinguistics)
Abstract: It is generally accepted that the ability to develop large-scale distributed applications that are extensible and independent of infrastructure details has lagged seriously behind other developments in cyberinfrastructure. As the sophistication and scale of distributed infrastructure increases, the complexity of successfully developing and deploying distributed applications increases both quantitatively and in qualitatively newer ways. In this paper we trace the evolution of a representative set of “state-of-the-art” distributed applications and production infrastructure; in doing so we aim to provide insight into the evolving sophistication of distributed applications — from simple generalizations of legacy static high-performance to applications composed of multiple loosely-coupled and dynamic components. The ultimate aim of this work is to highlight that even accounting for the fact that developing applications for distributed infrastructure is a difficult undertaking, there are suspiciously few novel and interesting distributed applications that utilize production Grid infrastructure. Along the way, we aim to provide an appreciation for the fact that developing distributed applications and the theory and practise of production Grid infrastructure have often not progressed in phase. Progress in the next phase and generation of distributed applications will require stronger coupling between the design and implementation of production infrastructure and the theory of distributed applications, including but not limited to explicit support for distributed application usage modes and advances that enable distributed applications to scale-out.
Published: 2009

38. An innovative application execution toolkit for multicluster grids

Author: Tevfik Kosar, Zhou Lei, Gabrielle Allen, Daniel S. Katz, Shantenu Jha, J. Ramanujam, and Zhifeng Yun
Subjects: Computer science, business.industry, Distributed computing, Interoperability, computer.software_genre, Turnaround time, Scheduling (computing), Grid computing, Operating system, Resource management, The Internet, business, computer, Execution model
Abstract: Multicluster grids provide one promising solution to satisfying growing computation demands of compute-intensive applications by collaborating various networked clusters. However, it is challenging to seamlessly integrate all participating clusters in different domains into a virtual computation platform. In order to take full advantages of multicluster grids capability, computer scientists need to deal with how to collaborate practically and efficiently participating autonomic systems to execute Grid-enabled applications. We make efforts on grid resource management and implement a toolkit called Pelecanus to improve the overall performance of application execution in multicluster grids environment. The Pelecanus takes advantages of the DA-TC (Dynamic Assignment with Task Containers) execution model to improve resource interoperability and enhance application execution and monitoring. Experiments show that it can significantly reduce turnaround time and increase resource utilization for certain applications with large number of sequential jobs.
Published: 2009

39. GADA 2008 PC Co-chairs’ Message

Author: Daniel S. Katz, Dennis Gannon, Pilar Herrero, and Maria Pérez
Subjects: Grid computing, business.industry, Computer science, Library science, Software engineering, business, computer.software_genre, computer
Abstract: This volume contains the papers presented at GADA 2008, the International Symposium on Grid Computing,High-Performance and Distributed Applications. The purpose of the GADA series of conferences, held within the framework of the OnTheMove Federated Conferences (OTM), is to bring together researchers, developers, professionals and students in order to advance research and development in the areas of grid computing and distributed systems and applications. This year's conference was held in Monterrey, Mexico, November 13-14, 2008.
Published: 2008

40. GADA 2007 PC Co-chairs’ Message

Author: Daniel S. Katz, Maria Pérez, Pilar Herrero, and Domenico Talia
Subjects: World Wide Web, Grid computing, Distributed algorithm, Computer science, Distributed computing, computer.software_genre, computer
Abstract: This volume contains the papers presented at GADA 2007, the International Symposium on Grid Computing, High-Performance and Distributed Applications. The purpose of the GADA series of conferences, held in the framework of the OnTheMove Federated Conferences (OTM), is to bring together researchers, developers, professionals and students in order to advance research and development in the areas of grid computing and distributed systems and applications. This year’s conference was in Vilamoura, Algarve, Portugal, during November 29–30.
Published: 2007

41. Session details: Grid performance

Author: Daniel S. Katz
Subjects: Multimedia, Computer science, Session (computer science), Grid, computer.software_genre, computer
Published: 2007

42. EnLIGHTened Computing: An architecture for co-allocating network, compute, and other grid resources for high-end applications

Author: Harry G. Perros, Andrei Hutanu, Yufeng Xin, Savera Tanwir, Jon MacLaren, Daniel S. Katz, Lina Battestilli, S.R. Thorpe, Joe Mambretti, S. Sundar, Seung-Jong Park, John H. Moore, and Gigi Karmous-Edwards
Subjects: Data grid, business.industry, computer.internet_protocol, Computer science, Distributed computing, Testbed, Multiprotocol Label Switching, computer.software_genre, Grid, Network management, Semantic grid, Grid computing, Middleware, business, computer, Computer network
Abstract: Many emerging high performance applications require distributed infrastructure that is significantly more powerful and flexible than traditional grids. Such applications require the optimization, close integration, and control of all grid resources, including networks. The EnLIGHTened (ENL) computing project has designed an architectural framework that allows grid applications to dynamically request (in-advance or on-demand) any type of grid resource: computers, storage, instruments, and deterministic, high-bandwidth network paths, including lightpaths. Based on application requirements, the ENL middleware communicates with grid resource managers and, when availability is verified, co-allocates all the necessary resources. ENLpsilas domain network manager controls all network resource allocations to dynamically setup and delete dedicated circuits using generalized multiprotocol label switching (GMPLS) control plane signaling. In order to make optimal brokering decisions, the ENL middleware uses near-real-time performance information about grid resources. A prototype of this architectural framework on a national-scale testbed implementation has been used to demonstrate a small number of applications. Based on this, a set of changes for the middleware have been laid out and are being implemented.
Published: 2007

43. Data-Oriented Distributed Computing for Science: Reality and Possibilities

Author: Gabrielle Allen, Yi Chao, Daniel S. Katz, Joseph C. Jacob, and Peggy Li
Subjects: World Wide Web, Computer science, business.industry, Distributed computing, Parallel algorithm, The Internet, Web service, business, Grid, computer.software_genre, computer
Abstract: As is becoming commonly known, there is an explosion happening in the amount of scientific data that is publicly available One challenge is how to make productive use of this data This talk will discuss some parallel and distributed computing projects, centered around virtual astronomy, but also including other scientific data-oriented realms It will look at some specific projects from the past, including Montage, Grist, OurOcean, and SCOOP, and will discuss the distributed computing, Grid, and Web-service technologies that have successfully been used in these projects.
Published: 2006

44. Message from the Chairs

Author: David Abramson, Vassil Alexandrov, Daniel S. Katz, Chung-Ta King, Ken A. Hawick, Rajkumar Buyya, John J. Morrison, Ewa Deelman, Francis C. M. Lau, Hong Ong, Jian Yang, Liang Jie Zhang, Putchong Uthayopas, Omer Rana, Dana Petcu, Mike Ashworth, Domenico Laforenza, David De Roure, Roy Williams, Marcin Paprzycki, Paul Coddington, Cho-Li Wang, Reagan Moore, Thomas P. Yunck, Savas Parastatidis, and Mark Baker
Subjects: Multimedia, Computer science, computer.software_genre, computer
Published: 2005

45. A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

Author: Ewa Deelman, Gurmeet Singh, A.C. Laity, Daniel S. Katz, G. B. Berriman, J. Good, M. H. Su, Joseph C. Jacob, Carl Kesselman, and Thomas A. Prince
Subjects: Set (abstract data type), Workflow, Grid computing, Computer science, Computer graphics (images), Parallel computing, Directed graph, computer.software_genre, Grid, Application software, Directed acyclic graph, computer
Abstract: This paper compares two methods for running an application composed of a set of modules on a grid. The set of modules (collectively called Montage) generates large astronomical image mosaics by composing multiple small images. The workflow that describes a particular run of Montage can be expressed as a directed acyclic graph (DAG), or as a short sequence of parallel (MPI) and sequential programs. In the first case, Pegasus can be used to run the workflow. In the second case, a short shell script that calls each program can be run. In this paper, we discuss the Montage modules, the workflow run for a sample job, and the two methods of actually running the workflow. We examine the run time for each method and compare the portions that differ between the two methods.
Published: 2005

46. The Pegasus portal

Author: Gurmeet Singh, John C. Good, Kent Blackburn, Ewa Deelman, G. Bruce Berriman, Albert Lazzarini, Joseph C. Jacob, Karan Vahi, Mei-Hui Su, Gaurang Mehta, Scott Koranda, and Daniel S. Katz
Subjects: Database, business.industry, Computer science, Metadata description, computer.software_genre, Grid, Scheduling (computing), Semantic grid, Workflow, Grid computing, Resource allocation, Web application, business, computer
Abstract: Pegasus is a planning framework for mapping abstract workflows for execution on the Grid. This paper presents the implementation of a web-based portal for submitting workflows to the Grid using Pegasus. The portal also includes components for generating abstract workflows based on a metadata description of the desired data products and application-specific services. We describe our experiences in using this portal for two Grid applications. A major contribution of our work is in introducing several components that can be useful for Grid portals and hence should be included in Grid portal development toolkits.
Published: 2005

47. Swift/T

Author: Ian Foster, Justin M. Wozniak, Ewing Lusk, Daniel S. Katz, Timothy G. Armstrong, and Michael Wilde
Subjects: Swift, Dataflow, Computer science, Data flow programming, Astrophysics::High Energy Astrophysical Phenomena, Concurrency, Parallel computing, computer.software_genre, Programming language implementation, Computer Graphics and Computer-Aided Design, Task (project management), Data flow diagram, Scalability, Computer Science::Programming Languages, computer, Software, computer.programming_language
Abstract: Swift/T, a novel programming language implementation for highly scalable data flow programs, is presented.
Published: 2013

48. Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand

Author: Daniel S. Katz, John C. Good, Mei-Hu Su, Ewa Deelman, Joseph C. Jacob, G. Bruce Berriman, Gurmeet Singh, Carl Kesselman, Anastasia C. Laity, Thomas A. Prince, Quinn, Peter J., and Bridger, Alan
Subjects: Service (systems architecture), business.industry, Computer science, ComputerApplications_COMPUTERSINOTHERSYSTEMS, computer.file_format, Grid, computer.software_genre, Workflow, Grid computing, Computer graphics (images), TeraGrid, Executable, Software engineering, business, computer, Workflow management system
Abstract: This paper describes the design of a grid-enabled version of Montage, an astronomical image mosaic service, suitable for large scale processing of the sky. All the re-projection jobs can be added to a pool of tasks and performed by as many processors as are available, exploiting the parallelization inherent in the Montage architecture. We show how we can describe the Montage application in terms of an abstract workflow so that a planning tool such as Pegasus can derive an executable workflow that can be run in the Grid environment. The execution of the workflow is performed by the workflow manager DAGMan and the associated Condor-G. The grid processing will support tiling of images to a manageable size when the input images can no longer be held in memory. Montage will ultimately run operationally on the Teragrid. We describe science applications of Montage, including its application to science product generation by Spitzer Legacy Program teams and large-scale, all-sky image processing projects.
Published: 2004

49. Architecture for access to a compute-intensive image mosaic service in the NVO

Author: John C. Good, Reagan Moore, Serge Monkewitz, Joseph C. Jacob, D. W. Curkendall, Daniel S. Katz, M. Kong, Thomas A. Prince, G. Bruce Berriman, and Roy Williams
Subjects: Upload, Service (systems architecture), Data access, Database, Computer science, Management system, Information system, National Virtual Observatory, computer.software_genre, Grid, computer, Failover
Abstract: The National Virtual Observatory (NVO) will provide on-demand access to data collections, data fusion services and compute intensive applications. The paper describes the development of a framework that will support two key aspects of these objectives: a compute engine that will deliver custom image mosaics, and a "request management system," based on an e-business applications server, for job processing, including monitoring, failover and status reporting. We will develop this request management system to support a diverse range of astronomical requests, including services scaled to operate on the emerging computational grid infrastructure. Data requests will be made through existing portals to demonstrate the system: the NASA/IPAC Extragalactic Database (NED), the On-Line Archive Science Information Services (OASIS) at the NASA/IPAC Infrared Science Archive (IRSA); the Virtual Sky service at Caltech's Center for Advanced Computing Research (CACR), and the yourSky mosaic server at the Jet Propulsion Laboratory (JPL).© (2002) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
Published: 2002

50. Integrated design and simulation for millimeter-wave antenna systems

Author: Tom Cwik, F. Villegas, and Daniel S. Katz
Subjects: Optimal design, Engineering, Integrated design, Iterative design, business.industry, Design tool, computer.software_genre, Radiation pattern, Electronic engineering, Computer Aided Design, Software design, Physical design, business, computer
Abstract: Several instruments operating in the microwave and millimeter-wave bands are to be developed over the next several years at JPL or in conjunction with various other companies and laboratories. The design and development of these instruments requires an environment that can produce a microwave or millimeter-wave optics design, and can assess sensitivity of key design criteria (beamwidth, gain, sidelobe levels, etc.) to thermal and mechanical operating environments. An integrated design tool has been developed to carry out the design and analysis using software building blocks from the computer-aided design, thermal, structural and electromagnetic analysis fields. The capability to simultaneously assess the effects of design parameter variation resulting from thermal and structural loads can reduce design and validation cost and generally lead to more optimal designs, hence higher performing instruments. In this paper the development and application of MODTool (Millimeter-wave Optics Design), a design tool that efficiently integrates existing millimeter-wave optics design software with a solid body modeler and with thermal and structural analysis packages is discussed. The design tool is also directly useful over other portions of the spectrum, though thermal or dynamic loads may have less influence on antenna patterns at the longer wavelengths. Under a common interface, interactions between the various components of a design can be efficiently evaluated and optimized. One key component is the use of physical optics analysis software for antenna pattern analysis.
Published: 2002

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

59 results on '"Daniel A Katz"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources