115 results on '"Frank Dehne"'
Search Results
2. In Silico Engineering of Synthetic Binding Proteins from Random Amino Acid Sequences
- Author
-
Daniel Burnside, Andrew Schoenrock, Houman Moteshareie, Mohsen Hooshyar, Prabh Basra, Maryam Hajikarimlou, Kevin Dick, Brad Barnes, Tom Kazmirchuk, Matthew Jessulat, Sylvain Pitre, Bahram Samanfar, Mohan Babu, James R. Green, Alex Wong, Frank Dehne, Kyle K. Biggar, and Ashkan Golshani
- Subjects
Science - Abstract
Summary: Synthetic proteins with high affinity and selectivity for a protein target can be used as research tools, biomarkers, and pharmacological agents, but few methods exist to design such proteins de novo. To this end, the In-Silico Protein Synthesizer (InSiPS) was developed to design synthetic binding proteins (SBPs) that bind pre-determined targets while minimizing off-target interactions. InSiPS is a genetic algorithm that refines a pool of random sequences over hundreds of generations of mutation and selection to produce SBPs with pre-specified binding characteristics. As a proof of concept, we design SBPs against three yeast proteins and demonstrate binding and functional inhibition of two of three targets in vivo. Peptide SPOT arrays confirm binding sites, and a permutation array demonstrates target specificity. Our foundational approach will support the field of de novo design of small binding polypeptide motifs and has robust applicability while offering potential advantages over the limited number of techniques currently available. : Biological Sciences; Bioinformatics; Protein Family Determination Subject Areas: Biological Sciences, Bioinformatics, Protein Family Determination
- Published
- 2019
- Full Text
- View/download PDF
3. Human–Soybean Allergies: Elucidation of the Seed Proteome and Comprehensive Protein–Protein Interaction Prediction
- Author
-
Bradley Barnes, Julia Hooker, Michael Sadowski, Bahram Samanfar, Hiroyuki Aoki, Ashkan Golshani, Kevin Dick, Arezo Pattang, Mohan Babu, Frank Dehne, Elroy R. Cober, Daniel Burnside, Sadhna Phanse, Nour Nissan, James R. Green, and Le Hoa Tan
- Subjects
Proteome ,fungi ,food and beverages ,General Chemistry ,Computational biology ,Biology ,Biochemistry ,Interactome ,Protein–protein interaction ,Homo sapiens ,Seeds ,Hypersensitivity ,Soybean Proteins ,Human proteome project ,Humans ,Protein–protein interaction prediction ,Soybeans ,Soybean crop - Abstract
The soybean crop, Glycine max (L.) Merr., is consumed by humans, Homo sapiens, worldwide. While the respective bodies of literature and -omics data for each of these organisms are extensive, comparatively few studies investigate the molecular biological processes occurring between the two. We are interested in elucidating the network of protein-protein interactions (PPIs) involved in human-soybean allergies. To this end, we leverage state-of-the-art sequence-based PPI predictors amenable to predicting the enormous comprehensive interactome between human and soybean. A network-based analytical approach is proposed, leveraging similar interaction profiles to identify candidate allergens and proteins involved in the allergy response. Interestingly, the predicted interactome can be explored from two complementary perspectives: which soybean proteins are predicted to interact with specific human proteins and which human proteins are predicted to interact with specific soybean proteins. A total of eight proteins (six specific to the human proteome and two to the soy proteome) have been identified and supported by the literature to be involved in human health, specifically related to immunological and neurological pathways. This study, beyond generating the most comprehensive human-soybean interactome to date, elucidated a soybean seed interactome and identified several proteins putatively consequential to human health.
- Published
- 2021
- Full Text
- View/download PDF
4. Evolution of protein-protein interaction networks in yeast.
- Author
-
Andrew Schoenrock, Daniel Burnside, Houman Moteshareie, Sylvain Pitre, Mohsen Hooshyar, James R Green, Ashkan Golshani, Frank Dehne, and Alex Wong
- Subjects
Medicine ,Science - Abstract
Interest in the evolution of protein-protein and genetic interaction networks has been rising in recent years, but the lack of large-scale high quality comparative datasets has acted as a barrier. Here, we carried out a comparative analysis of computationally predicted protein-protein interaction (PPI) networks from five closely related yeast species. We used the Protein-protein Interaction Prediction Engine (PIPE), which uses a database of known interactions to make sequence-based PPI predictions, to generate high quality predicted interactomes. Simulated proteomes and corresponding PPI networks were used to provide null expectations for the extent and nature of PPI network evolution. We found strong evidence for conservation of PPIs, with lower than expected levels of change in PPIs for about a quarter of the proteome. Furthermore, we found that changes in predicted PPI networks are poorly predicted by sequence divergence. Our analyses identified a number of functional classes experiencing fewer PPI changes than expected, suggestive of purifying selection on PPIs. Our results demonstrate the added benefit of considering predicted PPI networks when studying the evolution of closely related organisms.
- Published
- 2017
- Full Text
- View/download PDF
5. A computational approach to rapidly design peptides that detect SARS-CoV-2 surface protein S
- Author
-
Maryam Hajikarimlou, Mohsen Hooshyar, Mohamed Taha Moutaoufik, Khaled A Aly, Taha Azad, Sarah Takallou, Sasi Jagadeesan, Sadhna Phanse, Kamaledin B Said, Bahram Samanfar, John C Bell, Frank Dehne, Mohan Babu, and Ashkan Golshani
- Subjects
Structural Biology ,Applied Mathematics ,Genetics ,Molecular Biology ,Computer Science Applications - Abstract
The coronavirus disease 19 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prompted the development of diagnostic and therapeutic frameworks for timely containment of this pandemic. Here, we utilized our non-conventional computational algorithm, InSiPS, to rapidly design and experimentally validate peptides that bind to SARS-CoV-2 spike (S) surface protein. We previously showed that this method can be used to develop peptides against yeast proteins, however, the applicability of this method to design peptides against other proteins has not been investigated. In the current study, we demonstrate that two sets of peptides developed using InSiPS method can detect purified SARS-CoV-2 S protein via ELISA and Surface Plasmon Resonance (SPR) approaches, suggesting the utility of our strategy in real time COVID-19 diagnostics. Mass spectrometry-based salivary peptidomics shortlist top SARS-CoV-2 peptides detected in COVID-19 patients’ saliva, rendering them attractive SARS-CoV-2 diagnostic targets that, when subjected to our computational platform, can streamline the development of potent peptide diagnostics of SARS-CoV-2 variants of concern. Our approach can be rapidly implicated in diagnosing other communicable diseases of immediate threat.
- Published
- 2022
6. SPR Distance Computation for Unrooted Trees
- Author
-
Christian Blouin, Andrew Rau-Chaplin, Frank Dehne, and Glenn Hickey
- Subjects
unrooted trees ,SPR distance ,lateral gene transfer ,phylogenetic tree metrics ,Evolution ,QH359-425 - Abstract
The subtree prune and regraft distance (dSPR) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of dSPR and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm’s computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute dSPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.
- Published
- 2008
7. In Silico Engineering of Synthetic Binding Proteins from Random Amino Acid Sequences
- Author
-
Bahram Samanfar, Kyle K. Biggar, Ashkan Golshani, Andrew Schoenrock, Mohsen Hooshyar, Kevin Dick, James R. Green, Matthew Jessulat, Mohan Babu, Tom Kazmirchuk, Prabh Basra, Frank Dehne, Alex Wong, Daniel Burnside, Maryam Hajikarimlou, Sylvain Pitre, Brad Barnes, and Houman Moteshareie
- Subjects
0301 basic medicine ,Bioinformatics ,In silico ,Peptide ,02 engineering and technology ,Computational biology ,medicine.disease_cause ,DNA-binding protein ,Article ,03 medical and health sciences ,Protein Family Determination ,medicine ,Binding site ,lcsh:Science ,chemistry.chemical_classification ,Mutation ,Multidisciplinary ,Yeast Proteins ,A protein ,Biological Sciences ,021001 nanoscience & nanotechnology ,3. Good health ,Amino acid ,030104 developmental biology ,chemistry ,lcsh:Q ,0210 nano-technology - Abstract
Summary Synthetic proteins with high affinity and selectivity for a protein target can be used as research tools, biomarkers, and pharmacological agents, but few methods exist to design such proteins de novo. To this end, the In-Silico Protein Synthesizer (InSiPS) was developed to design synthetic binding proteins (SBPs) that bind pre-determined targets while minimizing off-target interactions. InSiPS is a genetic algorithm that refines a pool of random sequences over hundreds of generations of mutation and selection to produce SBPs with pre-specified binding characteristics. As a proof of concept, we design SBPs against three yeast proteins and demonstrate binding and functional inhibition of two of three targets in vivo. Peptide SPOT arrays confirm binding sites, and a permutation array demonstrates target specificity. Our foundational approach will support the field of de novo design of small binding polypeptide motifs and has robust applicability while offering potential advantages over the limited number of techniques currently available., Graphical Abstract, Highlights • InSiPS engineers synthetic binding proteins (SBPs) using primary protein sequence • SBPs are designed to a bind a target protein and avoid “off-target” interactions • Binding and functional inhibition of two of three target proteins in yeast is demonstrated • Our new approach offers advantages over alternative tools that rely on 3D models, Biological Sciences; Bioinformatics; Protein Family Determination
- Published
- 2019
8. SPR Distance Computation for Trees
- Author
-
Glenn Hickey, Frank Dehne, Andrew Rau-Chaplin, and Christian Blouin
- Subjects
Evolution ,QH359-425 - Abstract
The subtree prune and regraft distance ( d SPR ) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of d SPR and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees , which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm's computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute d SPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.
- Published
- 2008
- Full Text
- View/download PDF
9. Compressing Data Cube in Parallel OLAP Systems
- Author
-
Frank Dehne, Todd Eavis, and Boyong Liang
- Subjects
OLAP ,Data cube ,Compressing ,Parallel processing ,Science (General) ,Q1-390 - Abstract
This paper proposes an efficient algorithm to compress the cubes in the progress of the parallel data cube generation. This low overhead compression mechanism provides block-by-block and record-by-record compression by using tuple difference coding techniques, thereby maximizing the compression ratio and minimizing the decompression penalty at run-time. The experimental results demonstrate that the typical compression ratio is about 30:1 without sacrificing running time. This paper also demonstrates that the compression method is suitable for Hilbert Space Filling Curve, a mechanism widely used in multi-dimensional indexing.
- Published
- 2007
- Full Text
- View/download PDF
10. VOLAP: A Scalable Distributed Real-Time OLAP System for High-Velocity Data
- Author
-
Andrew Rau-Chaplin, Frank Dehne, Neil Burke, and David Robillard
- Subjects
Distributed database ,Computer science ,business.industry ,Online analytical processing ,Serialization ,Distributed computing ,Aggregate (data warehouse) ,Cloud computing ,02 engineering and technology ,Computational Theory and Mathematics ,Hardware and Architecture ,SAP HANA ,020204 information systems ,Signal Processing ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Queries per second ,business - Abstract
This paper presents VelocityOLAP (VOLAP), a distributed real-time OLAP system for high-velocity data. VOLAP makes use of dimension hierarchies, is highly scalable, exploits both multi-core and multi-processor parallelism, and can guarantee serializable execution of insert and query operations. In contrast to other high performance OLAP systems such as SAP HANA or IBM Netezza that rely on vertical scaling or special purpose hardware, VOLAP supports cost-efficient horizontal scaling on commodity hardware or modest cloud instances. Experiments on 20 Amazon EC2 nodes with TPC-DS data show that VOLAP is capable of bulk ingesting data at over 600 thousand items per second, and processing streams of interspersed insertions and aggregate queries at a rate of approximately 50 thousand insertions and 20 thousand aggregate queries per second with a database of 1 billion items. VOLAP is designed to support applications that perform large aggregate queries, and provides similar high performance for aggregations ranging from a few items to nearly the entire database.
- Published
- 2018
- Full Text
- View/download PDF
11. PIPE4: Fast PPI Predictor for Comprehensive Inter- and Cross-Species Interactomes
- Author
-
Bahram Samanfar, Stephen J. Molnar, Kevin Dick, Bradley Barnes, Elroy R. Cober, James R. Green, Le Hoa Tan, Kyle K. Biggar, Ashkan Golshani, Frank Dehne, and Benjamin Mimee
- Subjects
0301 basic medicine ,Computer science ,Arabidopsis ,lcsh:Medicine ,02 engineering and technology ,Computational biology ,Saccharomyces cerevisiae ,Protein function predictions ,Interactome ,Models, Biological ,Article ,03 medical and health sciences ,Mice ,Rhabditida ,Protein Interaction Mapping ,Animals ,Humans ,Metabolomics ,Protein Interaction Maps ,lcsh:Science ,Multidisciplinary ,lcsh:R ,Computational Biology ,021001 nanoscience & nanotechnology ,Computational biology and bioinformatics ,030104 developmental biology ,Drosophila melanogaster ,Glycine ,Host-Pathogen Interactions ,HIV-1 ,lcsh:Q ,Soybeans ,0210 nano-technology - Abstract
The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter- and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter- and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) inter-species predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross- and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas.
- Published
- 2019
12. New BSP/CGM algorithms for spanning trees
- Author
-
Henrique Mongelli, Jayme Luiz Szwarcfiter, Jucele Franca de Alencar Vasconcellos, Siang Wun Song, Edson Norberto Cáceres, and Frank Dehne
- Subjects
Discrete mathematics ,020203 distributed computing ,Spanning tree ,Computer science ,Parallel algorithm ,REDES DE COMPUTADORES ,Graph theory ,02 engineering and technology ,Minimum spanning tree ,Theoretical Computer Science ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Software - Abstract
Computing a spanning tree (ST) and a minimum ST (MST) of a graph are fundamental problems in graph theory and arise as a subproblem in many applications. In this article, we propose parallel algorithms to these problems. One of the steps of previous parallel MST algorithms relies on the heavy use of parallel list ranking which, though efficient in theory, is very time-consuming in practice. Using a different approach with a graph decomposition, we devised new parallel algorithms that do not make use of the list ranking procedure. We proved that our algorithms are correct, and for a graph [Formula: see text], [Formula: see text], and [Formula: see text], the algorithms can be executed on a Bulk Synchronous Parallel/Coarse Grained Multicomputer (BSP/CGM) model using [Formula: see text] communications rounds with [Formula: see text] computation time for each round. To show that our algorithms have good performance on real parallel machines, we have implemented them on graphics processing unit. The obtained speedups are competitive and showed that the BSP/CGM model is suitable for designing general purpose parallel algorithms.
- Published
- 2019
13. SpeeDB: fast structural protein searches
- Author
-
David Robillard, Frank Dehne, Scott Hazelhurst, and Phelelani T. Mpangase
- Subjects
Models, Molecular ,Statistics and Probability ,Proteases ,Protein Conformation ,Computer science ,Protein Data Bank (RCSB PDB) ,Biochemistry ,Protein structure ,Humans ,Amino Acids ,Databases, Protein ,Molecular Biology ,chemistry.chemical_classification ,Information retrieval ,biology ,Structural protein ,Computational Biology ,Active site ,Hydrogen Bonding ,computer.file_format ,Construct (python library) ,Protein Data Bank ,Computer Science Applications ,Amino acid ,Computational Mathematics ,Identification (information) ,Computational Theory and Mathematics ,chemistry ,Structural Homology, Protein ,biology.protein ,computer ,Algorithms ,Software ,Peptide Hydrolases - Abstract
Motivation: Interactions between amino acids are important determinants of the structure, stability and function of proteins. Several tools have been developed for the identification and analysis of such interactions in proteins based on the extensive studies carried out on high-resolution structures from Protein Data Bank (PDB). Although these tools allow users to identify and analyze interactions, analysis can only be performed on one structure at a time. This makes it difficult and time consuming to study the significance of these interactions on a large scale. Results: SpeeDB is a web-based tool for the identification of protein structures based on structural properties. SpeeDB queries are executed on all structures in the PDB at once, quickly enough for interactive use. SpeeDB includes standard queries based on published criteria for identifying various structures: disulphide bonds, catalytic triads and aromatic–aromatic, sulphur–aromatic, cation–π and ionic interactions. Users can also construct custom queries in the user interface without any programming. Results can be downloaded in a Comma Separated Value (CSV) format for further analysis with other tools. Case studies presented in this article demonstrate how SpeeDB can be used to answer various biological questions. Analysis of human proteases revealed that disulphide bonds are the predominant type of interaction and are located close to the active site, where they promote substrate specificity. When comparing the two homologous G protein-coupled receptors and the two protein kinase paralogs analyzed, the differences in the types of interactions responsible for stability accounts for the differences in specificity and functionality of the structures. Availability and implementation: SpeeDB is available at http://www.parallelcomputing.ca as a web service. Contact: d@drobilla.net Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2015
- Full Text
- View/download PDF
14. Scalable real-time OLAP on cloud architectures
- Author
-
R. Zhou, Andrew Rau-Chaplin, Frank Dehne, Q. Kong, and Hamidreza Zaboli
- Subjects
Computer Networks and Communications ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Cloud computing ,02 engineering and technology ,computer.software_genre ,Theoretical Computer Science ,Artificial Intelligence ,020204 information systems ,Server ,0202 electrical engineering, electronic engineering, information engineering ,Database ,business.industry ,Transaction processing ,Online analytical processing ,InformationSystems_DATABASEMANAGEMENT ,Data warehouse ,Tree (data structure) ,Hardware and Architecture ,Scalability ,Online transaction processing ,020201 artificial intelligence & image processing ,Tuple ,business ,computer ,Software - Abstract
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a scalable Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree. CR-OLAP utilizes a scalable cloud infrastructure consisting of multiple commodity servers (processors). That is, with increasing database size, CR-OLAP dynamically increases the number of processors to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as "report the total sales in all stores located in California and New York during the months February-May of all years". We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, for an Amazon EC2 cloud instance with 16 processors, a data warehouse with 160 million tuples, and a TPC-DS OLAP query stream where each query aggregates between 60% and 95% of the database, CR-OLAP achieved a query latency of below 0.3 s which can be considered a real time response. Collaboration with the IBM on alleviating performance bottlenecks for OLAP queries.OLAP queries may aggregate large portions of the database, creating bottlenecks.We study the use of parallel computing on scalable clouds to accelerate queries.Our system, CR-OLAP, is based on a new scalable distributed index structure.CR-OLAP uses dynamic cloud elasticity to improve performance.
- Published
- 2015
- Full Text
- View/download PDF
15. Designing Anti-Zika Virus Peptides Derived from Predicted Human-Zika Virus Protein-Protein Interactions
- Author
-
Maryam Hajikarimlou, Andrew Low, Edana Cassol, Ashkan Golshani, Bahram Samanfar, Kevin Dick, Brad Barnes, Clara Lettl, Tom Kazmirchuk, Andrew Schoenrock, Frank Dehne, Mohsen Hooshyar, Katayoun Omidi, James R. Green, Duale Ahmed, Sylvain Pitre, Alex Wong, Houman Moteshareie, Daniel Burnside, and Mohan Babu
- Subjects
0301 basic medicine ,In silico ,Microbial Sensitivity Tests ,Computational biology ,Biochemistry ,Virus ,Zika virus ,Protein–protein interaction ,Viral Proteins ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,ZikV Infection ,Humans ,030304 developmental biology ,0303 health sciences ,biology ,Mechanism (biology) ,Organic Chemistry ,Zika Virus ,biology.organism_classification ,Virology ,3. Good health ,Computational Mathematics ,030104 developmental biology ,Drug Design ,Protein–protein interaction prediction ,Peptides ,030217 neurology & neurosurgery ,Protein Binding - Abstract
The production of anti-Zika virus (ZIKV) therapeutics has become increasingly important as the propagation of the devastating virus continues largely unchecked. Notably, a causal relationship between ZIKV infection and neurodevelopmental abnormalities has been widely reported, yet a specific mechanism underlying impaired neurological development has not been identified. Here, we report on the design of several synthetic competitive inhibitory peptides against key pathogenic ZIKV proteins through the prediction of protein-protein interactions (PPIs). Often, PPIs between host and viral proteins are crucial for infection and pathogenesis, making them attractive targets for therapeutics. Using two complementary sequence-based PPI prediction tools, we first produced a comprehensive map of predicted human-ZIKV PPIs (involving 209 human protein candidates). We then designed several peptides intended to disrupt the corresponding host-pathogen interactions thereby acting as anti-ZIKV therapeutics. The data generated in this study constitute a foundational resource to aid in the multi-disciplinary effort to combat ZIKV infection, including the design of additional synthetic proteins.
- Published
- 2017
- Full Text
- View/download PDF
16. Computational approaches toward the design of pools for the in vitro selection of complex aptamers
- Author
-
Xuemei Luo, Michel Dumontier, Sylvain Pitre, Ashkan Golshani, Maria C. DeRosa, James R. Green, Frank Dehne, and Maureen McKeague
- Subjects
Aptamer ,Molecular Sequence Data ,Computational biology ,aptamer pool design ,Biology ,Article ,DNA sequencing ,chemistry.chemical_compound ,Genetic algorithm ,genetic algorithm ,Computer Simulation ,RNA/DNA secondary structure ,Molecular Biology ,Selection (genetic algorithm) ,Genetics ,Sequence ,Base Sequence ,ATP aptamer ,RNA ,Aptamers, Nucleotide ,chemistry ,in vitro selection ,Nucleic Acid Conformation ,random pool ,Systematic evolution of ligands by exponential enrichment ,DNA - Abstract
It is well known that using random RNA/DNA sequences for SELEX experiments will generally yield low-complexity structures. Early experimental results suggest that having a structurally diverse library, which, for instance, includes high-order junctions, may prove useful in finding new functional motifs. Here, we develop two computational methods to generate sequences that exhibit higher structural complexity and can be used to increase the overall structural diversity of initial pools for in vitro selection experiments. Random Filtering selectively increases the number of five-way junctions in RNA/DNA pools, and Genetic Filtering designs RNA/DNA pools to a specified structure distribution, whether uniform or otherwise. We show that using our computationally designed DNA pool greatly improves access to highly complex sequence structures for SELEX experiments (without losing our ability to select for common one-way and two-way junction sequences).
- Published
- 2010
- Full Text
- View/download PDF
17. Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences
- Author
-
C. North, Michel Dumontier, Sylvain Pitre, X. Luo, M. Alamgir, Frank Dehne, Matthew Jessulat, Ashkan Golshani, Albert Chan, and James R. Green
- Subjects
0303 health sciences ,Saccharomyces cerevisiae Proteins ,Saccharomyces cerevisiae ,Membrane protein interactions ,Computational Biology ,Biology ,biology.organism_classification ,Genome ,Yeast ,Protein–protein interaction ,Novel gene ,03 medical and health sciences ,0302 clinical medicine ,Membrane protein ,Biochemistry ,Sequence Analysis, Protein ,030220 oncology & carcinogenesis ,Protein Interaction Mapping ,Genetics ,Genome, Fungal ,Peptides ,Gene ,030304 developmental biology - Abstract
Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.
- Published
- 2008
- Full Text
- View/download PDF
18. RCUBE
- Author
-
Andrew Rau-Chaplin, Todd Eavis, and Frank Dehne
- Subjects
Theoretical computer science ,Computer science ,Online analytical processing ,Search engine indexing ,Volume (computing) ,InformationSystems_DATABASEMANAGEMENT ,Parallel computing ,ROLAP ,Supercomputer ,Data warehouse ,Hardware and Architecture ,Scalability ,Software ,MOLAP - Abstract
This article addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multidimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, and number of processors. Our method is also incrementally maintainable. Using “surrogate” group-bys, it allows for the efficient processing of arbitrary OLAP queries on partial cubes, where not all of the group-bys have been materialized. Our experiments with RCUBE show that the ROLAP advantage of better scalability, in comparison to MOLAP, can be maintained while providing a fast and flexible index for OLAP queries.
- Published
- 2008
- Full Text
- View/download PDF
19. PnP: sequential, external memory, and parallel iceberg cube computation
- Author
-
Andrew Rau-Chaplin, Todd Eavis, Frank Dehne, and Ying Chen
- Subjects
Information Systems and Management ,Speedup ,Computer science ,Computation ,Parallel computing ,Data structure ,Data cube ,Hardware and Architecture ,Scalability ,Pruning (decision trees) ,Cube ,Software ,Auxiliary memory ,Information Systems - Abstract
We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks. We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.
- Published
- 2008
- Full Text
- View/download PDF
20. SPR Distance Computation for Unrooted Trees
- Author
-
Frank Dehne, Andrew Rau-Chaplin, Glenn Hickey, and Christian Blouin
- Subjects
SPR distance ,Reduction (recursion theory) ,Computer science ,Heuristic (computer science) ,Generalization ,Computation ,lcsh:Evolution ,0102 computer and information sciences ,computer.software_genre ,unrooted trees ,01 natural sciences ,Distance measures ,Combinatorics ,03 medical and health sciences ,Genetics ,lcsh:QH359-425 ,Ecology, Evolution, Behavior and Systematics ,Original Research ,030304 developmental biology ,0303 health sciences ,Phylogenetic tree ,phylogenetic tree metrics ,lateral gene transfer ,Computer Science Applications ,Nondeterministic algorithm ,Tree (data structure) ,010201 computation theory & mathematics ,Data mining ,computer - Abstract
The subtree prune and regraft distance (d SPR ) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been exten- sive study on the computation of d SPR and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance compu- tation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an effi cient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm's computation time. Our method is a heuristic version of a fi xed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute d SPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures. 1. Introduction trees are used to describe evolutionary relationships. DNA or protein sequences are associated with the leaves of the tree and the internal nodes correspond to speciation or gene duplication events. In order to model ancestor-descendant relationships on the tree, a direction must be associated with its edges by assigning a root. Often, insuffi cient information exists to determine the root and the tree is left unrooted. Unrooted trees still provide a notion of evolutionary relationship between organ- isms even if the direction of descent remains unknown. The phylogenetic tree representation has recently come under scrutiny with critics claiming that it is too simple to properly model microbial evolution, particularly in the presence of lateral gene trans- fer (LGT) events (Doolittle, 1999). A LGT is the transfer of genetic material between species by means other than inheritance and thus cannot be represented in a tree as it would create a cycle. The preva- lence of LGT events in microbial evolution can, however, still be studied using phylogenetic trees. Given a pair of trees describing the same sets of species, each constructed using different sets of genes, a LGT event corresponds to a displacement of a common subtree, referred to as a SPR operation. The SPR distance is the minimum number of SPR operations, denoted by d SPR , that explain the topological differences between a pair of trees. It is equivalent to the number of transfers in the most parsimonious LGT scenario (Beiko and Hamilton, 2006). In general, d SPR can be used as a measure of the topo- logical difference between two trees, e.g. for comparing the outputs of different tree construction algorithms. Tree bisection and reconnection (TBR) is a generalization of SPR that allows the pruned subtree to be rerooted before being regrafted. Computation of the TBR distance (d TBR ) was shown to be NP-hard (nondeterministic polynomial-time hard) by Allen and Steel (2001), who also provided two rules that reduce two input trees to a size that is a linear functions of d TBR without altering their distance. These rules, which reduce common chains and subtrees, also form the basis of algorithms that compute the SPR distance between rooted trees (d rSPR ) (Bordewich and Semple, 2004) as well as hybridization number (h) (Bordewich et al. 2007), see Section 3.3. Such algorithms proceed as follows. First the distance problem is shown to be equivalent to counting components of a maximum agreement forest
- Published
- 2008
21. Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel
- Author
-
Andrew Rau-Chaplin, Todd Eavis, Ying Chen, and Frank Dehne
- Subjects
Clustering high-dimensional data ,Data cube ,Speedup ,Hardware and Architecture ,Computer science ,Online analytical processing ,Scalability ,Multiprocessing ,Parallel computing ,ROLAP ,Software ,Data warehouse - Abstract
This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. Experiments show that our improved parallel method provides optimal, linear, speedup for at least 32 processors. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes.
- Published
- 2006
- Full Text
- View/download PDF
22. The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
- Author
-
Frank Dehne, Andrew Rau-Chaplin, and Todd Eavis
- Subjects
Information Systems and Management ,Computer science ,Distributed computing ,Online analytical processing ,InformationSystems_DATABASEMANAGEMENT ,ROLAP ,computer.software_genre ,Data structure ,Data warehouse ,Very large database ,Data cube ,Knowledge extraction ,Hardware and Architecture ,Relational model ,Data mining ,computer ,Software ,Information Systems - Abstract
On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.
- Published
- 2006
- Full Text
- View/download PDF
23. MAXIMIZING A VORONOI REGION: THE CONVEX CASE
- Author
-
Frank Dehne, Rolf Klein, and Raimund Seidel
- Subjects
Applied Mathematics ,Regular polygon ,Convex position ,Lloyd's algorithm ,Weighted Voronoi diagram ,Theoretical Computer Science ,Bowyer–Watson algorithm ,Combinatorics ,Computational Mathematics ,Computational Theory and Mathematics ,Power diagram ,Geometry and Topology ,Voronoi diagram ,Centroidal Voronoi tessellation ,Mathematics - Abstract
Given a set S of s points in the plane, where do we place a new point, p, in order to maximize the area of its region in the Voronoi diagram of S and p? We study the case where the Voronoi neighbors of p are in convex position, and prove that there is at most one local maximum.
- Published
- 2005
- Full Text
- View/download PDF
24. CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines
- Author
-
Frank Dehne, Ryan Taylor, and Albert Chan
- Subjects
Connected component ,020203 distributed computing ,Speedup ,Computer science ,List ranking ,Sorting ,Eulerian path ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Graph ,Theoretical Computer Science ,symbols.namesake ,Shared memory ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Bipartite graph ,Graph (abstract data type) ,Prefix sum ,0101 mathematics ,Software - Abstract
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGMgraph implements parallel methods for various graph problems. Our implementations of deterministic list ranking, Euler tour, connected components, spanning forest, and bipartite graph detection are, to our knowledge, the first efficient implementations for PC clusters. Our library also includes CGMlib, a library of basic CGM tools such as sorting, prefix sum, one-to-all broadcast, all-to-one gather, h-Relation, all-to-all broadcast, array balancing, and CGM partitioning. Both libraries are available for download at http://www.scs.carleton.ca/~cgm. In the experimental part of this paper, we demonstrate the performance of our methods on four different architectures: a gigabit connected high performance PC cluster, a smaller PC cluster connected via fast ethernet, a network of workstations, and a shared memory machine. Our experiments show that our library provides good parallel speedup and scalability on all four platforms. The communication overhead is, in most cases, small and does not grow significantly with an increasing number of processors. This is a very important feature of CGM algorithms which makes them very efficient in practice.
- Published
- 2005
- Full Text
- View/download PDF
25. Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors
- Author
-
Frank Dehne, Ying Chen, Andrew Rau-Chaplin, and Todd Eavis
- Subjects
Information Systems and Management ,Speedup ,Computer science ,Relational database ,Online analytical processing ,Parallel computing ,ROLAP ,Data structure ,Data warehouse ,Data cube ,Hardware and Architecture ,Cube ,Row ,Software ,Information Systems - Abstract
The pre-computation of data cubes is critical to improving the response time of on-line analytical processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. The paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology. We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2000000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of /spl ap/227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of /spl ap/846 million rows (21.7 Gigabytes) in under 47 minutes.
- Published
- 2004
- Full Text
- View/download PDF
26. Solving large FPT problems on coarse-grained parallel machines
- Author
-
James J. Cheetham, Andrew Rau-Chaplin, Frank Dehne, Peter J. Taillon, and Ulrike Stege
- Subjects
Sequence ,Computer Networks and Communications ,Computer science ,Applied Mathematics ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Theoretical Computer Science ,Computational Theory and Mathematics ,Cover (topology) ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Parallelism (grammar) ,Code (cryptography) ,020201 artificial intelligence & image processing ,Algorithm ,Computational biochemistry - Abstract
Fixed-parameter tractability (FPT) techniques have recently been successful in solving NP-complete problem instances of practical importance which were too large to be solved with previous methods. In this paper, we show how to enhance this approach through the addition of parallelism, thereby allowing even larger problem instances to be solved in practice. More precisely, we demonstrate the potential of parallelism when applied to the bounded-tree search phase of FPT algorithms. We apply our methodology to the k-Vertex Cover problem which has important applications in, for example, the analysis of multiple sequence alignments for computational biochemistry. We have implemented our parallel FPT method for the k-Vertex Cover problem using C and the MPI communication library, and tested it on a 32-node Beowulf cluster. This is the first experimental examination of parallel FPT techniques. As part of our experiments, we solved larger instances of k-Vertex Cover than in any previously reported implementations. For example, our code can solve problem instances with k⩾400 in less than 1.5h.
- Published
- 2003
- Full Text
- View/download PDF
27. [Untitled]
- Author
-
Andrew Rau-Chaplin, Todd Eavis, Frank Dehne, and Susanne E. Hambrusch
- Subjects
Data cube ,Algebra and Number Theory ,Speedup ,Computer science ,Message passing ,Disk array ,sort ,Parallel computing ,Load balancing (computing) ,Data structure ,Auxiliary memory - Abstract
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p.
- Published
- 2003
- Full Text
- View/download PDF
28. Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP
- Author
-
Siang Wun Song, Edson Norberto Cáceres, Frank Dehne, Afonso Ferreira, and Alessandro Roncato
- Subjects
Discrete mathematics ,Connected component ,General Computer Science ,Applied Mathematics ,List ranking ,Parallel algorithm ,Ear decomposition ,Tree (graph theory) ,Computer Science Applications ,Combinatorics ,Bulk synchronous parallel ,ALGORITMOS ,Lowest common ancestor ,Connectivity ,Mathematics - Abstract
In this paper we present deterministic parallel algorithms for the coarse-grained multicomputer (CGM) and bulk synchronous parallel (BSP) models for solving the following well-known graph problems: (1) list ranking, (2) Euler tour construction in a tree, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, and (7) 2-edge connectivity and biconnectivity (testing and component computation). The algorithms require O(log p) communication rounds with linear sequential work per round (p = no. processors, N = total input size). Each processor creates, during the entire algorithm, messages of total size O(log (p) (N/p)) . The algorithms assume that the local memory per processor (i.e., N/p ) is larger than p e , for some fixed e > 0 . Our results imply BSP algorithms with O(log p) supersteps, O(g log (p) (N/p)) communication time, and O(log (p) (N/p)) local computation time. It is important to observe that the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p . With growing problem size, only the sizes of the messages grow but the total number of messages remains unchanged. Due to the considerable protocol overhead associated with each message transmission, this is an important property. The result for Problem (1) is a considerable improvement over those previously reported. The algorithms for Problems (2)—(7) are the first practically relevant parallel algorithms for these standard graph problems.
- Published
- 2002
- Full Text
- View/download PDF
29. Efficient prediction of human protein-protein interactions at a global scale
- Author
-
Ke Jin, Fredrik Barrenäs, Ashkan Golshani, Frank Dehne, Hui Wang, Sylvain Pitre, Mohan Babu, Andrew Schoenrock, Mikael Benson, Alex Wong, Bahram Samanfar, Mohsen Hooshyar, Michael A. Langston, Alamgir, Charles A. Phillips, James R. Green, Sadhna Phanse, Katayoun Omidi, and Yuan Gui
- Subjects
Interactome ,Proteome ,Protein-protein interactions ,Computational biology ,Biology ,Biochemistry ,Protein–protein interaction ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Protein Interaction Mapping ,Human proteome project ,Humans ,Molecular Biology ,030304 developmental biology ,Massively parallel computing ,0303 health sciences ,Genome, Human ,business.industry ,Applied Mathematics ,Scale (chemistry) ,Human proteome ,Computational Biology ,Proteins ,Personalized medicine ,3. Good health ,Computer Science Applications ,030220 oncology & carcinogenesis ,Network analysis ,Human genome ,Computational prediction ,DNA microarray ,business ,Software ,Research Article - Abstract
Background Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. Results On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments. Conclusions The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0383-1) contains supplementary material, which is available to authorized users.
- Published
- 2014
- Full Text
- View/download PDF
30. WordNet++: A lexicon for the Color-X-method
- Author
-
Reind P. van de Riet, Frank Dehne, and Ans A. G. Steuten
- Subjects
Discrete mathematics ,Information Systems and Management ,Computer science ,business.industry ,WordNet ,computer.software_genre ,Lexicon ,eXtended WordNet ,Domain (ring theory) ,Order (group theory) ,Object-orientation ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
In this article we discuss what kind of information can be obtained from W ord N et and what kind of information should be added to W ord N et in order to make it better suitable for the support of the C olor -X method. We will present an extension of W ord N et (called W ord N et ++), which contains a number of special types of relationships that are not available in W ord N et . Additionally, W ord N et ++ is instantiated with knowledge about some particular domain. Also, we will show some example scenarios for how W ord N et ++ can be used for verification of C olor -X-models and we will show how this lexicon can be used in a special C ase -tool for supporting C olor -X.
- Published
- 2001
- Full Text
- View/download PDF
31. Coarse-Grained Parallel Geometric Search
- Author
-
Frank Dehne, Albert Chan, and Andrew Rau-Chaplin
- Subjects
Computer Networks and Communications ,Computer science ,Computation ,Segment tree ,Parallel algorithm ,Triangulation (social science) ,Computational geometry ,Polygon triangulation ,Theoretical Computer Science ,Combinatorics ,Artificial Intelligence ,Hardware and Architecture ,Polygon ,Point location ,Search problem ,Time complexity ,Algorithm ,Software - Abstract
We present a parallel algorithm for solving thenext element search problemon a set of line segments, using a BSP-like model referred to as thecoarse grained multicomputer(CGM). The algorithm requiresO(1) communication rounds (h-relations withh=O(n/p)),O((n/p)logn) local computation, andO((n/p)logp) memory per processor, assumingn/p?p. Our result implies solutions to the point location, trapezoidal decomposition, and polygon triangulation problems. A simplified version for axis-parallel segments requires onlyO(n/p) memory per processor, and we discuss an implementation of this version. As in a previous paper by Develliers and Fabri (Int. J. Comput. Geom. Appl.6(1996), 487?506), our algorithm is based on a distributed implementation of segment trees which are of sizeO(nlogn). This paper improves onop. cit.in several ways: (1) It studies the more general next element search problem which also solves, e.g., planar point location. (2) The algorithms require onlyO((n/p)logn) local computation instead ofO(logp*(n/p)logn). (3) The algorithms require onlyO((n/p)logp) local memory instead ofO((n/p)logn).
- Published
- 1999
- Full Text
- View/download PDF
32. Randomized parallel list ranking for distributed memory multiprocessors
- Author
-
Frank Dehne and Siang Wun Song
- Subjects
Discrete mathematics ,Conjecture ,Computational complexity theory ,Computer science ,List ranking ,Parallel algorithm ,Value (computer science) ,Theoretical Computer Science ,Bounded function ,Theory of computation ,Distributed memory ,Algorithm ,Software ,Information Systems - Abstract
We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=O(logp+log logn) communication rounds (h-relations withh=O(n/p)) andO(n/p)) local computation. We then outline an improved version that requires high probability, onlyr⩽(4k+6) log(2/3p)+8=O(k logp) communication rounds wherek=min{i⩾0 |ln(i+1)n⩽(2/3p)2i+1}. Notek
- Published
- 1997
- Full Text
- View/download PDF
33. A distributed tree data structure for real-time OLAP on cloud architectures
- Author
-
Frank Dehne, Andrew Rau-Chaplin, Q. Kong, Hamidreza Zaboli, and R. Zhou
- Subjects
Database ,Transaction processing ,business.industry ,Computer science ,Online analytical processing ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,InformationSystems_DATABASEMANAGEMENT ,Cloud computing ,computer.software_genre ,Data warehouse ,Tree (data structure) ,Online transaction processing ,Tuple ,business ,Software architecture ,computer - Abstract
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.
- Published
- 2013
- Full Text
- View/download PDF
34. A Bayesian Approach to Model Matching with Geometric Hashing
- Author
-
Andrew Rau-Chaplin, Afonso Ferreira, and Frank Dehne
- Subjects
Search engine indexing ,Feature extraction ,Hash function ,Parallel algorithm ,Hash table ,Feature (computer vision) ,Signal Processing ,Quadtree ,Computer Vision and Pattern Recognition ,Geometric hashing ,Algorithm ,Computer Science::Databases ,Software ,Mathematics - Abstract
Geometric hashing methods provide an efficient approach to indexing from image features into a database of models. The hash functions that have typically been used involve quantization of the values, which can result in nongraceful degradation of the performance of the system in the presence of noise. Intuitively, it is desirable to replace the quantization of hash values and the resulting binning of hash entries by a method that gives increasingly less weight to a hash table entry as a hashed feature becomes more distant from the hash entry position. In this paper, we show how these intuitive notions can be translated into a well-founded Bayesian approach to object recognition and give precise formulas for the optimal weight functions that should be used in hash space. These extensions allow the geometric hashing method to be viewed as a Bayesian maximum-likelihood framework. We demonstrate the validity of the approach by performing similarity-invariant object recognition using models obtained from drawings of military aircraft and automobiles and test images from real-world grayscale images of the same aircraft and automobile types. Our experimental results represent a complete object recognition system, since the feature extraction process is automated. Our system is scalable and works rapidly and very efficiently on an 8K-processor CM - 2, and the quality of results using similarity-invariant model matching is excellent.
- Published
- 1995
- Full Text
- View/download PDF
35. Hypercube Algorithms for Parallel Processing of Pointer-Based Quadtrees
- Author
-
Frank Dehne, Andrew Rau-Chaplin, and Afonso G. Ferreira
- Subjects
Signal Processing ,Computer Vision and Pattern Recognition ,Software - Published
- 1995
- Full Text
- View/download PDF
36. Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps
- Author
-
Bahram Samanfar, Frank Dehne, Mohsen Hooshyar, James R. Green, Sylvain Pitre, Ashkan Golshani, Matthew Jessulat, and Andrew Schoenrock
- Subjects
Genetics ,0303 health sciences ,03 medical and health sciences ,0302 clinical medicine ,Multidisciplinary ,Co occurring ,030220 oncology & carcinogenesis ,Computational biology ,Biology ,Article ,Protein Interaction Map ,030304 developmental biology - Abstract
A goal of the post-genomics era has been to elucidate a detailed global map of protein-protein interactions (PPIs) within a cell. Here, we show that the presence of co-occurring short polypeptide sequences between interacting protein partners appears to be conserved across different organisms. We present an algorithm to automatically generate PPI prediction method parameters for various organisms and illustrate that global PPIs can be predicted from previously reported PPIs within the same or a different organism using protein primary sequences. The PPI prediction code is further accelerated through the use of parallel multi-core programming, which improves its usability for large scale or proteome-wide PPI prediction. We predict and analyze hundreds of novel human PPIs, experimentally confirm protein functions and importantly predict the first genome-wide PPI maps for S. pombe (∼9,000 PPIs) and C. elegans (∼37,500 PPIs).
- Published
- 2012
- Full Text
- View/download PDF
37. Construction of d-Dimensional Hyperoctrees on a Hypercube Multiprocessor
- Author
-
M. Nassar, Andrew Rau-Chaplin, Andreas Fabri, Frank Dehne, and Radhakrishna Valiveti
- Subjects
Input/output ,Computer Networks and Communications ,Computer science ,Parallel algorithm ,02 engineering and technology ,Parallel computing ,Object (computer science) ,020202 computer hardware & architecture ,Theoretical Computer Science ,Set (abstract data type) ,Combinatorics ,Artificial Intelligence ,Hardware and Architecture ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,SIMD ,Hypercube ,Hypercube multiprocessor ,Time complexity ,Software - Abstract
We present a parallel algorithm for the construction of the hyperoctree representing a d-dimensional object from a set of n (d ? 1)-dimensional hyperoctrees, representing adjacent cross sections of this object. On a p-processor SIMD hypercube the time complexity of our algorithm is O((m/p) log p log n), where m is the maximum of input and output size.
- Published
- 1994
- Full Text
- View/download PDF
38. Multisearch Techniques: Parallel Data Structures on Mesh-Connected Computers
- Author
-
Russ Miller, Jyh-Jong Tsay, Frank Dehne, Mikhail J. Atallah, and Andrew Rau-Chaplin
- Subjects
Discrete mathematics ,Convex hull ,Web search query ,Computer Sciences ,Computer Networks and Communications ,Computer science ,Multisearch ,Regular polygon ,Data structure ,Graph ,Theoretical Computer Science ,Combinatorics ,Polyhedron ,Tree traversal ,Artificial Intelligence ,Hardware and Architecture ,Tangent space ,Software - Abstract
The multisearch problem is defined as follows. Given a data structure D modeled as a graph with n constant-degree nodes, perform O(n) searches on D. Let r be the length of the longest search path associated with a search process, and assume that the paths are determined "on-line." That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in O([formula] + r ([formula]/log n)) time on a [formula] × [formula]n mesh-connected computer. For many data structures, the search path traversed when answering one search query has length r = O(log n). For these cases, our algorithm processes O(n) such queries in asymptotically optimal Θ([formula]) time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel on-line tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (line-polyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and three-dimensional convex hull).
- Published
- 1994
- Full Text
- View/download PDF
39. Bridging the gap between systems biology and medicine
- Author
-
Peter Raasch, David M. Rocke, Bruce J. Aronow, Paolo Provero, Mikael Benson, Jesper Tegnér, Frank Dehne, Michael A. Langston, Yves Moreau, Daniel Dalevi, Charles Auffray, Dana Marshall, Devdatt Dubhashi, and Gilles Clermont
- Subjects
Computational model ,Medicin och hälsovetenskap ,Computer science ,Systems biology ,Bioinformatics ,USable ,Data science ,Medical and Health Sciences ,Bridging (programming) ,Systems medicine ,Correspondence ,Genetics ,Molecular Medicine ,Genetics(clinical) ,Molecular Biology ,Genetics (clinical) - Abstract
Systems biology has matured considerably as a discipline over the last decade, yet some of the key challenges separating current research efforts in systems biology and clinically useful results are only now becoming apparent. As these gaps are better defined, the new discipline of systems medicine is emerging as a translational extension of systems biology. How is systems medicine defined? What are relevant ontologies for systems medicine? What are the key theoretic and methodologic challenges facing computational disease modeling? How are inaccurate and incomplete data, and uncertain biologic knowledge best synthesized in useful computational models? Does network analysis provide clinically useful insight? We discuss the outstanding difficulties in translating a rapidly growing body of data into knowledge usable at the bedside. Although core-specific challenges are best met by specialized groups, it appears fundamental that such efforts should be guided by a roadmap for systems medicine drafted by a coalition of scientists from the clinical, experimental, computational, and theoretic domains.
- Published
- 2009
- Full Text
- View/download PDF
40. Optical clustering on a mesh-connected computer
- Author
-
Russ Miller, Andrew Rau-Chaplin, and Frank Dehne
- Subjects
Connected component ,Computer science ,Correlation clustering ,Parallel algorithm ,Graph theory ,Interval (mathematics) ,Cluster analysis ,Computational geometry ,Algorithm ,Software ,k-medians clustering ,Information Systems ,Theoretical Computer Science - Abstract
In this paper, we present optimal parallel algorithms for optical clustering on a mesh-connected computer.Optical clustering is a clustering technique based on the principal of optical resolution, and is of particular interest in picture analysis. The algorithms we present are based on the application of parallel algorithms in computational geometry and graph theory. In particular, we show that given a setS ofN points in the Euclidean plane, the following problems can be solved in optimal\(O\left( {\sqrt N } \right)\) time on a mesh-connected computer of sizeN. 1. Determine the optical clusters ofS with respect to a given separation parameter. 2. Given an interval [a, b] representing the number of optical clusters desired in the clustering ofS, determine the range of the separation parameter that will result in such an optical clustering.
- Published
- 1991
- Full Text
- View/download PDF
41. Yeast Features: Identifying Significant Features Shared Among Yeast Proteins for Functional Genomics
- Author
-
Michel Dumontier, James R. Green, Nadereh Mir-Rashed, Myron L. Smith, Frank Dehne, Ashkan Golshani, Veronika Eroukova, James J. Cheetham, and Alamgir
- Subjects
Bioinformatics ,Computational biology ,Biology ,Phenotype ,Yeast ,Annotation ,Molecular Cell Biology ,General Materials Science ,Identification (biology) ,Set (psychology) ,Functional genomics ,Gene ,Function (biology) - Abstract
Background High throughput yeast functional genomics experiments are revealing associations among tens to hundreds of genes using numerous experimental conditions. To fully understand how the identified genes might be involved in the observed system, it is essential to consider the widest range of biological annotation possible. Biologists often start their search by collating the annotation provided for each protein within databases such as the Saccharomyces Genome Database, manually comparing them for similar features, and empirically assessing their significance. Such tasks can be automated, and more precise calculations of the significance can be determined using established probability measures. Results We developed Yeast Features, an intuitive online tool to help establish the significance of finding a diverse set of shared features among a collection of yeast proteins. A total of 18,786 features from the Saccharomyces Genome Database are considered, including annotation based on the Gene Ontology’s molecular function, biological process and cellular compartment, as well as conserved domains, protein-protein and genetic interactions, complexes, metabolic pathways, phenotypes and publications. The significance of shared features is estimated using a hypergeometric probability, but novel options exist to improve the significance by adding background knowledge of the experimental system. For instance, increased statistical significance is achieved in gene deletion experiments because interactions with essential genes will never be observed. We further demonstrate the utility by suggesting the functional roles of the indirect targets of an aminoglycoside with a known mechanism of action, and also the targets of an herbal extract with a previously unknown mode of action. The identification of shared functional features may also be used to propose novel roles for proteins of unknown function, including a role in protein synthesis for YKL075C. Conclusions Yeast Features (YF) is an easy to use web-based application (http://software.dumontierlab.com/yeastfeatures/) which can identify and prioritize features that are shared among a set of yeast proteins. This approach is shown to be valuable in the analysis of complex data sets, in which the extracted associations revealed significant functional relationships among the gene products.
- Published
- 2008
42. Parallel Simulated Annealing for Materialized View Selection in Data Warehousing Environments
- Author
-
Frank Dehne, Othmar Korn, Bela Stantic, and Roozbeh Derakhshan
- Subjects
Set (abstract data type) ,Heuristic ,Computer science ,View ,Simulated annealing ,Materialized view ,Plan (drawing) ,Data mining ,computer.software_genre ,computer ,Selection (genetic algorithm) ,Data warehouse - Abstract
In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to minimize view maintenance and query processing costs. Some existing approaches are applicable only for small problems, which are far from reality. In this paper we introduce a new approach for materialized view selection using Parallel Simulated Annealing (PSA) that selects views from an input Multiple View Processing Plan (MVPP). With PSA, we are able to perform view selection on MVPPs having hundreds of queries and thousands of views. Also, in our experimental study we show that our method provides a significant improvement in the quality of the obtained set of materialized views over existing heuristic and sequential simulated annealing algorithms.
- Published
- 2008
- Full Text
- View/download PDF
43. Optimal visibility algorithms for binary images on the hypercube
- Author
-
Ivan Stojmenovic, Frank Dehne, and Quoc T. Pham
- Subjects
Pixel ,Computer science ,Binary image ,Visibility (geometry) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Parallel algorithm ,Image processing ,Theoretical Computer Science ,Image (mathematics) ,Computer Science::Computer Vision and Pattern Recognition ,Hypercube ,SIMD ,Algorithm ,Software ,ComputingMethodologies_COMPUTERGRAPHICS ,Information Systems - Abstract
Consider an×n binary image. Given a directionD, the parallel visibility problem consists of determining for each pixel of the image the portion that is visible (i.e., not obstructed by any other black pixel of the image) in directionD from infinity. A related problem, referred to as point visibility, is to compute for each pixel the portion that is visible from a given pointp. In this paper, we deriveO(logn) time SIMD algorithms for each of these two problems on the hypercube, where one processor is assigned to every pixel of the image. Since the worst case communication distance of two processors in an 2-processor hypercube is 2 logn, it follows that both of the above algorithms are asymptotically optimal.
- Published
- 1990
- Full Text
- View/download PDF
44. An Efficient Computational Geometry Method for Detecting Dotted Lines in Noisy Images
- Author
-
L. Ficocelli and Frank Dehne
- Subjects
General Computer Science ,Computer science ,business.industry ,Image processing ,Computer vision ,Artificial intelligence ,Computational geometry ,business ,Algorithm - Published
- 1990
- Full Text
- View/download PDF
45. Computing the largest empty rectangle on One- and two-dimensional processor arrays
- Author
-
Frank Dehne
- Subjects
Computer Networks and Communications ,Parallel algorithm ,Computational geometry ,Theoretical Computer Science ,Vector processor ,Combinatorics ,Two-dimensional space ,Artificial Intelligence ,Hardware and Architecture ,Largest empty rectangle ,Point (geometry) ,Rectangle ,Time complexity ,Software ,Mathematics - Abstract
Given a rectangle R (with its edges parallel to the coordinate axes) containing a set S = { s 1 ,…, s n } of n points in the Euclidean plane, consider the problem of finding the largest area subrectangle r in R with sides parallel to the coordinate axes that contains no point of S . We present optimal parallel algorithms for solving this problem on one- and two-dimensional arrays of processors.
- Published
- 1990
- Full Text
- View/download PDF
46. Implementing OLAP Query Fragment Aggregation and Recombination for the OLAP Enabled Grid
- Author
-
Andrew Rau-Chaplin, Frank Dehne, and Michael Lawrence
- Subjects
Data grid ,Grid computing ,Computer science ,Online analytical processing ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,InformationSystems_DATABASEMANAGEMENT ,Cache ,Data mining ,computer.software_genre ,Grid ,computer ,Data warehouse ,Scheduling (computing) - Abstract
In this paper we propose a new query processing method for the OLAP enabled grid, which blends sophisticated cache extraction techniques and data grid scheduling to efficiently satisfy OLAP queries in a distributed fashion. The heart of our approach is our query fragment aggregation and recombination (FAR) strategy that partitions OLAP queries into subqueries which can be effectively answered by retrieving and aggregating multiple fragments of cached data from nearby grid sources, or as a last resort, more remote backend data warehouses. We have implemented and experimentally evaluated our query processing method and found that our strategy reduces query time between 50% and 60% for practical user cache sizes and network parameters.
- Published
- 2007
- Full Text
- View/download PDF
47. A Coarse Grained Parallel Algorithm for Hausdorff Voronoi Diagrams
- Author
-
Anil Maheshwari, Frank Dehne, and Ryan Taylor
- Subjects
Computational complexity theory ,Computer science ,Open problem ,Computer cluster ,Parallel algorithm ,Hausdorff space ,Parallel computing ,Computational geometry ,Voronoi diagram ,Algorithm - Abstract
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing objects in time O((n log4 n)/p) for input size n and p processors. In addition, our parallel algorithm also implies a new sequential HVD algorithm that constructs HVDs for non-crossing objects in time O(n log4 n). This improves on previous sequential results and solves an open problem posed by Papadopoulou and Lee (2004)
- Published
- 2006
- Full Text
- View/download PDF
48. A Pre-Coalition Protocol for Minimizing Message Passing and State Information Updates in the A3pviGrid System
- Author
-
Frank Dehne, Avinash Shankaranarayanan, and Andrew Lewis
- Subjects
business.industry ,Computer science ,Distributed computing ,Multi-agent system ,media_common.quotation_subject ,Message passing ,Peer-to-peer ,computer.software_genre ,Grid ,Negotiation ,Grid computing ,Computer cluster ,business ,computer ,Protocol (object-oriented programming) ,media_common ,Computer network - Abstract
The primary problem that affects the use of efficient grid or cluster computing systems are the discovery and usage of parallel and high performance grid computing applications and related services. The primary objective of this paper is to build on a protocol that minimizes the number of messages exchanged between the various peers by applying new grid based pre-coalition concepts to the peer to peer grid services model A3pviGrid. Lists of agents are maintained in the form of categorized listings that help in the formation of potential coalitions and negotiations among the agents. By developing a new pre- coalition protocol we are able to simulate the number of messages passed inside the A3pviGrid system that uses predetermined coalition formations to utilize coalition leaders/agents for communication on behalf of a clan or society of agents or peers
- Published
- 2006
- Full Text
- View/download PDF
49. Parallel querying of ROLAP cubes in the presence of hierarchies
- Author
-
Andrew Rau-Chaplin, Todd Eavis, and Frank Dehne
- Subjects
Data cube ,Hierarchy ,Theoretical computer science ,Computer science ,Online analytical processing ,Search engine indexing ,Overhead (computing) ,Granularity ,Data mining ,Cube ,ROLAP ,computer.software_genre ,computer - Abstract
Online Analytical Processing is a powerful framework for the analysis of organizational data. OLAP is often supported by a logical structure known as a data cube, a multidimensional data model that offers an intuitive array-based perspective of the underlying data. Supporting efficient indexing facilities for multi-dimensional cube queries is an issue of some complexity. In practice, the difficulty of the indexing problem is exacerbated by the existence of attribute hierarchies that sub-divide attributes into aggregation layers of varying granularity. In this paper, we present a hierarchy and caching framework that supports the efficient and transparent manipulation of attribute hierarchies within a parallel ROLAP environment. Experimental results verify that, when compared to the non-hierarchical case, very little overhead is required to handle streams of arbitrary hierarchical queries.
- Published
- 2005
- Full Text
- View/download PDF
50. A Coarse-Grained Parallel Algorithm for Spanning Tree and Connected Components
- Author
-
Henrique Mongelli, Frank Dehne, Edson Norberto Cáceres, Siang Wun Song, and Jayme Luiz Szwarcfiter
- Subjects
Connected component ,Spanning tree ,Computer science ,List ranking ,Sorting ,Parallel algorithm ,Euler tour technique ,Eulerian path ,Minimum spanning tree ,Graph ,symbols.namesake ,Integer sorting ,Bipartite graph ,symbols ,Algorithm ,Connectivity ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Dehne et al. present a BSP/CGM algorithm for computing a spanning tree and the connected components of a graph, that requires O(log p) communication rounds, where p is the number of processors. It requires the solution of the Euler tour problem which in turn is based on the solution of the list ranking problem. We present a new approach that does not need to solve the Euler tour or the list ranking problem. It is based on the integer sorting algorithm which can be implemented efficiently on the BSP/CGM model [1].
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.