Author: "Shawn Bowers" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shawn Bowers"' showing total 9 results

Start Over Author "Shawn Bowers" Publisher ieee

9 results on '"Shawn Bowers"'

1. Approaches for Implementing Persistent Queues within Data-Intensive Scientific Workflows

Author: Shawn Bowers and Michael Agun
Subjects: Workflow, External storage, Relational database, Computer science, Dataflow, Distributed computing, Model of computation, Workflow engine, Workflow management system, Workflow technology
Abstract: Many scientific workflow systems are built on dataflow-based models of computation in which data drives the execution of workflow components. An advantage of using dataflow models is their straightforward semantics (which includes support for branching, merging, and looping) and their ability to concurrently execute workflow steps. However, for many data-intensive workflows the dataflow model often requires data buffering. Current systems largely perform buffering through in-memory queues which can lead to buffer overflow and performance degradation as queues reach capacity (e.g., because of paging). We describe an alternative framework that leverages external storage to implement buffers (which we refer to as persistent queues) within data-intensive scientific workflows. Our framework can easily be used with different underlying storage technologies, and we consider and evaluate three distinct approaches: a traditional relational database implementation, a non-relational implementation designed for fast reads and writes, and a specialized approach that can further reduce external buffering overhead. In addition, the use of persistent queues can provide detailed provenance information ``for free'' by capturing the input and output information of each workflow component during workflow execution. Although many systems provide such provenance information, we show how this information can be captured both efficiently and can be used to improve overall workflow performance through persistent queues.
Published: 2011

2. ObsDB: A System for Uniformly Storing and Querying Heterogeneous Observational Data

Author: Shawn Bowers, Huiping Cao, Jay Kudo, and Mark Schildhauer
Subjects: Exploratory data analysis, Information retrieval, Computer science, Relational database, Observational Model, Ontology (information science), External Data Representation, Data structure, Semantic heterogeneity, Data modeling
Abstract: Earth and environmental scientists collect and use a wide range of observational data. This data often exhibits high structural and semantic heterogeneity due to the variety of data collected and the ways in which observational datasets are structured in practice. However, to address questions at broad temporal, geographic, and biological scales, researchers often need to access and combine data from many observational datasets. This paper presents a system called ObsDB that helps to address these challenges by providing an integrated environment for storing, querying, and analyzing heterogeneous data based on a semantic observational model. The model allows for ontology-based descriptions of observational datasets and provides a common representation for storing observational data. The obsdb system is built on top of standard relational database technology and provides a declarative query language for accessing observations. Integrated support is also provided for exploratory data analysis, allowing users to call analytical scripts created using the R system over stored observational data.
Published: 2010

3. Linking multiple workflow provenance traces for interoperable collaborative science

Author: Manish Kumar Anand, Carole Goble, Shawn Bowers, Paolo Missier, Anandarup Sarkar, Biva Shrestha, Ilkay Altintas, Bertram Ludäscher, and Saumen Dey
Subjects: Data sharing, World Wide Web, Metadata, Workflow, Computer science, Windows Workflow Foundation, Interoperability, Data science, Workflow engine, Workflow management system, Workflow technology
Abstract: Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.
Published: 2010

4. Towards best-effort merge of taxonomically organized data

Author: Shawn Bowers, David Thau, and Bertram Ludäscher
Subjects: Instruction set, Evolution biology, Computer science, Data exchange, Data mining, computer.software_genre, computer, Merge (version control), Electronic data interchange
Abstract: We consider the task of merging datasets that have been organized using different, but aligned taxonomies. We assume such a merge is intended to create a single dataset that unambiguously describes the information in the source datasets using the alignment. We also assume that the merged result should reflect the observations of the datasets as specifically as possible. Typically, there will be no single merge result that is both unambiguous and maximally specific. In this case, a user may be provided with a set of possible merged datasets. If the user requires a single dataset, that dataset loses specificity. Here we examine whether the data exchange setting can provide a way to derive a “best-effort” merge. We find that the data exchange setting might be a good candidate for providing the merge, but further research is needed.
Published: 2010

5. Provenance browser: Displaying and querying scientific workflow provenance graphs

Author: Shawn Bowers, Bertram Ludäscher, and Manish Kumar Anand
Subjects: Information retrieval, Computer science, business.industry, computer.internet_protocol, InformationSystems_DATABASEMANAGEMENT, computer.file_format, Query language, Data structure, Visualization, World Wide Web, Data dependency, Data visualization, Workflow, RDF, business, computer, XML
Abstract: This demonstration presents an interactive provenance browser for visualizing and querying data dependency (lineage) graphs produced by scientific workflow runs. The browser allows users to explore different views of provenance as well as to express complex and recursive graph queries through a high-level query language (QLP). Answers to QLP queries are lineage preserving in that queries return sets of lineage dependencies (denoting provenance graphs), which can be further queried and visually displayed (as graphs) in the browser. By combining provenance visualization, navigation, and query, the provenance browser can enable scientists to more easily access and explore scientific workflow provenance information.
Published: 2010

6. XML-based computation for scientific workflows

Author: Daniel Zinn, Shawn Bowers, and Bertram Ludäscher
Subjects: Workflow, computer.internet_protocol, Semantics (computer science), Computer science, Computation, Distributed computing, Data transformation, Routing (electronic design automation), computer, XML, Data modeling
Abstract: Scientific workflows are increasingly used to rapidly integrate existing algorithms to create larger and more complex programs. However, designing workflows using purely dataflow-oriented computation models introduces a number of challenges, including the need to use low-level components to mediate and transform data (so-called shims) and large numbers of additional “wires” for routing data to components within a workflow. To address these problems, we employ Virtual Data Assembly Lines (VDAL), a modeling paradigm that can eliminate most shims and reduce wiring complexity. We show how a VDAL design can be implemented using existing XML technologies and how static analysis can provide significant help to scientists during workflow design and evolution, e.g., by displaying actor dependencies or by detecting so-called unproductive actors.
Published: 2010

7. X-CSR: Dataflow Optimization for Distributed XML Process Pipelines

Author: Bertram Ludäscher, Timothy M. McPhillips, Shawn Bowers, and Daniel Zinn
Subjects: Distributed database, Programming language, Dataflow, computer.internet_protocol, Computer science, Efficient XML Interchange, XML validation, computer.file_format, computer.software_genre, Pipeline (software), XML framework, Streaming XML, XML schema, computer, XML, computer.programming_language
Abstract: XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called Delta-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas'') of XML data collections while keeping the overall collection structure intact. We show how to optimize the execution of Delta-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR** optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped'') to downstream pipeline steps. Finally, we present evaluation results for a real-world scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as technical report (http://www.cs.ucdavis.edu/research/tech-reports/2008/CSE-2008-15.pdf).** X-CSR: _X_ML _C_ut, _S_hip, and _R_eassemble; pronounced "X-scissor''
Published: 2009

8. Improving Data Discovery for Metadata Repositories through Semantic Search

Author: Mark Schildhauer, Matthew B. Jones, Shawn Bowers, Joshua S. Madin, and Chad Berkley
Subjects: Metadata, World Wide Web, Information retrieval, Computer science, business.industry, Semantic search, Data discovery, Semantic technology, Ontology (information science), business, Semantic Web, Metadata repository, Data mapping
Abstract: The amount of ecological data available electronically is increasing at a rapid rate, e.g., over 15,000 data sets are available today in the Knowledge Network for Biocom-plexity (KNB) alone. Using the existing search capabilities of these online data repositories, however, scientists struggle to quickly locate data that are relevant to their needs or that will integrate with their current data sets. Semantic technologies aim at addressing many of these problems and hold the promise of enabling more powerful "smart" searches of online data archives. We describe new semantic search features within the Metacat meta-data system, which is used by many ecological research sites around the world for archiving their data using a standardized metadata format. Our semantic search sys-tem adds to Metacat the ability to store OWL-DL ontologies in addition to semantic annotations that link data set attributes to ontology terms. Our approach also extends Metacat to improve metadata search in multiple ways: (i) by expanding standard keyword searches with ontology term hierarchies; (ii) by allowing keyword searches to be applied to annotations in addition to traditional meta-data; and (iii) by allowing more structured searches over annotations via ontology terms. We describe our implementation of these extensions, and compare and contrast these different types of search for a corpus of annotated documents. As data repositories continue to grow, these tools will be instrumental in helping scientists precisely locate and then interpret data for their research needs.
Published: 2009

9. Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow

Author: Terence Critchlow, Bertram Ludäscher, Anne H. H. Ngu, and Shawn Bowers
Subjects: Dataflow, business.industry, Computer science, Distributed computing, Reuse, Modular design, Data structure, Workflow, Control flow, Information engineering, Component (UML), Layer (object-oriented design), business, Dataflow architecture
Abstract: Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, modeling "control-flow intensive" tasks using dataflow constructs often leads to overly complicated workflows that are hard to comprehend, reuse, and maintain. We describe a generic framework, based on scientific workflow templates and frames, for embedding control-flow intensive subtasks within dataflow process networks. This approach can seamlessly handle complex control-flow without sacrificing the benefits of dataflow. We illustrate our approach with a real-world scientific workflow from the astrophysics domain, requiring remote execution and file transfer in a semi-reliable environment. For such workflows, we also describe a 3-layered architecture based on frames and templates where the top-layer consists of an overall dataflow process network, the second layer consists of a tranducer template for modeling the desired control-flow behavior, and the bottom layer consists of frames inside the template that are specialized by embedding the desired component implementation. Our approach can enable scientific workflows that are more robust (faulttolerance strategies can be defined by control-flow driven transducer templates) and at the same time more reusable, since the embedding of frames and templates yields more structured and modular workflow designs.
Published: 2006

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Shawn Bowers"'

1. Approaches for Implementing Persistent Queues within Data-Intensive Scientific Workflows

2. ObsDB: A System for Uniformly Storing and Querying Heterogeneous Observational Data

3. Linking multiple workflow provenance traces for interoperable collaborative science

4. Towards best-effort merge of taxonomically organized data

5. Provenance browser: Displaying and querying scientific workflow provenance graphs

6. XML-based computation for scientific workflows

7. X-CSR: Dataflow Optimization for Distributed XML Process Pipelines

8. Improving Data Discovery for Metadata Repositories through Semantic Search

9. Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

9 results on '"Shawn Bowers"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources