Author: "Asa Ben-Hur" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Asa Ben-Hur"' showing total 164 results

Start Over Author "Asa Ben-Hur"

164 results on '"Asa Ben-Hur"'

51. PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources [version 1; referees: 2 approved]

Author: Indika Kahanda, Christopher Funk, Karin Verspoor, and Asa Ben-Hur
Subjects: Research Article, Articles, Bioinformatics, Genomics, human phenotype ontology, structured SVM
Abstract: The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
Published: 2015
Full Text: View/download PDF

52. A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

Author: Robbie Waugh, Artur Jarmolowski, Qingshun Quinn Li, Sarah Elizabeth Harvey, Cristiane Paula Gomes Calixto, Sascha Laubinger, Dorothee Staiger, Xiao-Ning Zhang, Yamile Marquez, Lianfeng Gu, Anireddy S. N. Reddy, Wenbin Guo, Runxuan Zhang, Gao Yubang, Martin Crespi, Andrea Barta, Motoaki Seki, Asa ben Hur, Theresa Wiebner-Kroh, John W. S. Brown, Liming Xiong, Juan Carlos Entizne, Maho Tanaka, Akihiro Matsui, Max Coulter, Michael F. Jantsch, Richard Kuo, Zofia Szweykowska-Kulinska, Stefan Riegler, Linda Milne, Katherine J. Denby, Enamul Huq, Ramanjulu Sunkar, Shih-Long Tu, Maria Kalyna, Tino Koester, and Andreas Wachter
Subjects: Gene expression profiling, Transcriptome, Polyadenylation, Arabidopsis, Alternative splicing, Arabidopsis thaliana, splice, Computational biology, Biology, biology.organism_classification, Gene
Abstract: BackgroundAccurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis.ResultsWe present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis- match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage.ConclusionsAtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.
Published: 2021

53. CREME: Cis-Regulatory Module Explorer for the human genome.

Author: Roded Sharan, Asa Ben-Hur, Gabriela G. Loots, and Ivan Ovcharenko
Published: 2004
Full Text: View/download PDF

54. On probabilistic analog automata.

Author: Asa Ben-Hur, Alexander Roitershtein, and Hava T. Siegelmann
Published: 2004
Full Text: View/download PDF

55. Probabilistic analysis of a differential equation for linear programming.

Author: Asa Ben-Hur, Joshua Feinberg, Shmuel Fishman, and Hava T. Siegelmann
Published: 2003
Full Text: View/download PDF

56. A Theory of Complexity for Continuous Time Systems.

Author: Asa Ben-Hur, Hava T. Siegelmann, and Shmuel Fishman
Published: 2002
Full Text: View/download PDF

57. Support Vector Clustering.

Author: Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik
Published: 2001

58. Extended Archaeal Histone-Based Chromatin Structure Regulates Global Gene Expression in Thermococcus kodakarensis

Author: Thomas J. Santangelo, Travis J. Sanders, Sudili Fernando, Brett W. Burkhart, Robert L Vickerman, Alexandra M. Gehring, Asa Ben-Hur, Fahad Ullah, and Andrew F. Gardner
Subjects: Microbiology (medical), Regulation of gene expression, archaea, histone, Biology, biology.organism_classification, Microbiology, QR1-502, Chromatin, Thermococcus kodakarensis, Cell biology, Thermococcus, chemistry.chemical_compound, Histone, chemistry, Gene expression, biology.protein, chromatin, Epigenetics, RNA-seq, transcriptome, DNA
Abstract: Histone proteins compact and organize DNA resulting in a dynamic chromatin architecture impacting DNA accessibility and ultimately gene expression. Eukaryotic chromatin landscapes are structured through histone protein variants, epigenetic marks, the activities of chromatin-remodeling complexes, and post-translational modification of histone proteins. In most Archaea, histone-based chromatin structure is dominated by the helical polymerization of histone proteins wrapping DNA into a repetitive and closely gyred configuration. The formation of the archaeal-histone chromatin-superhelix is a regulatory force of adaptive gene expression and is likely critical for regulation of gene expression in all histone-encoding Archaea. Single amino acid substitutions in archaeal histones that block formation of tightly packed chromatin structures have profound effects on cellular fitness, but the underlying gene expression changes resultant from an altered chromatin landscape have not been resolved. Using the model organism Thermococcus kodakarensis, we genetically alter the chromatin landscape and quantify the resultant changes in gene expression, including unanticipated and significant impacts on provirus transcription. Global transcriptome changes resultant from varying chromatin landscapes reveal the regulatory importance of higher-order histone-based chromatin architectures in regulating archaeal gene expression.
Published: 2021

59. Decoding co-/post-transcriptional complexities of plant transcriptomes and epitranscriptome using next-generation sequencing technologies

Author: Anireddy S. N. Reddy, Lianfeng Gu, Jie Huang, Naeem H. Syed, Asa Ben-Hur, and Suomeng Dong
Subjects: 0106 biological sciences, Polyadenylation, RNA Splicing, Green Fluorescent Proteins, Arabidopsis, RNA polymerase II, Computational biology, Genes, Plant, 01 natural sciences, Biochemistry, DNA sequencing, 03 medical and health sciences, Protein Isoforms, RNA-Seq, RNA Processing, Post-Transcriptional, 030304 developmental biology, Plant Proteins, Regulation of gene expression, 0303 health sciences, biology, Base Sequence, Sequence Analysis, RNA, Gene Expression Profiling, Alternative splicing, RNA, High-Throughput Nucleotide Sequencing, Chromatin, Alternative Splicing, RNA splicing, biology.protein, Nanopore sequencing, Transcriptome, 010606 plant biology & botany
Abstract: Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications. The emerging theme from recent studies is that reprogramming of gene expression in response to developmental cues and stresses at the co-/post-transcriptional level likely plays a crucial role in eliciting appropriate responses for optimal growth and plant survival under adverse conditions. Although the mechanisms by which developmental cues and different stresses regulate co-/post-transcriptional splicing are largely unknown, a few recent studies indicate that the external cues target spliceosomal and splicing regulatory proteins to modulate alternative splicing. In this review, we provide an overview of recent discoveries on the dynamics and complexities of plant transcriptomes, mechanistic insights into splicing regulation, and discuss critical gaps in co-/post-transcriptional research that need to be addressed using diverse genomic and biochemical approaches.
Published: 2020

60. Improving COVID-19 Testing Efficiency using Guided Agglomerative Sampling

Author: Fayyaz ul Amir Afsar Minhas, Dimitris K. Grammatopoulos, Imran Amin, Nasir M. Rajpoot, Asa Ben-Hur, David Snead, Lawrence Young, and Neil Anderson
Subjects: education.field_of_study, Coronavirus disease 2019 (COVID-19), Computer science, Statistics, Population, Large population, Sampling (statistics), education, Hierarchical clustering
Abstract: One of the challenges in the current COVID-19 crisis is the time and cost of performing tests especially for large-scale population surveillance. Since, the probability of testing positive in large population studies is expected to be small (https://github.com/foxtrotmike/AS.
Published: 2020
Full Text: View/download PDF

61. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

Author: Christopher S. Funk, Indika Kahanda, Asa Ben-Hur, and Karin M. Verspoor
Published: 2015
Full Text: View/download PDF

62. SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq.

Author: Mark F. Rogers, Christina Boucher 0001, and Asa Ben-Hur
Published: 2013
Full Text: View/download PDF

63. Choosing negative examples for the prediction of protein-protein interactions.

Author: Asa Ben-Hur and William Stafford Noble
Published: 2006
Full Text: View/download PDF

64. On probabilistic analog automata

Author: Asa Ben-Hur, Alexander Roitershtein, and Hava T. Siegelmann
Published: 2003

65. Macroscopical Molecular Computation with Gene Networks.

Author: Hava T. Siegelmann and Asa Ben-Hur
Published: 2000

66. Probabilistic analysis of a differential equation for linear programming

Author: Asa Ben-Hur, Joshua Feinberg, Shmuel Fishman, and Hava T. Siegelmann
Published: 2001

67. A self-attention model for inferring cooperativity between regulatory features

Author: Asa Ben-Hur and Fahad Ullah
Subjects: Computer science, AcademicSubjects/SCI00010, Open problem, Arabidopsis, Value (computer science), Cooperativity, Computational biology, Biology, Machine learning, computer.software_genre, Cell Line, 03 medical and health sciences, Deep Learning, 0302 clinical medicine, Gene expression, Narese/17, Genetics, Feature (machine learning), Humans, Regulatory Elements, Transcriptional, Nucleotide Motifs, Promoter Regions, Genetic, Transcription factor, 030304 developmental biology, 0303 health sciences, Sequence, Mechanism (biology), business.industry, Deep learning, Self attention, Promoter, Genomics, Cell culture, Chromatin Immunoprecipitation Sequencing, Methods Online, Artificial intelligence, business, computer, 030217 neurology & neurosurgery, Transcription Factors
Abstract: Motivation: Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problem. Results: We present SATORI, a Self-ATtentiOn based model to predict Regulatory element Interactions. Our approach combines convolutional and recurrent layers with a self-attention mechanism that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. We evaluate our method on simulated data and three complex datasets: human TAL1-GATA1 transcription factor ChIP-Seq, DNase I Hypersensitive Sites (DHSs) in human promoters across 164 cell lines, and genome-wide DNase I-Seq and ATAC-Seq peaks across 36 arabidopsis samples. In each of the three experiments SATORI identified numerous statistically significant TF-TF interactions, many of which have been previously reported. Our method is able to detect higher numbers of these experimentally verified TF-TF interactions than the existing Feature Interaction Score, and also has the advantage of not requiring a computationally expensive post-processing step. Finally, SATORI can be used for detection of any type of feature interaction in models that use a similar attention mechanism, and is not limited to the detection of TF-TF interactions.
Published: 2021

68. Development of the Automated Primer Design Workflow Uniqprimer and Diagnostic Primers for the Broad-Host-Range Plant Pathogen

Author: Shaista, Karim, R Ryan, McNally, Afnan S, Nasaruddin, Alexis, DeReeper, Ramil P, Mauleon, Amy O, Charkowski, Jan E, Leach, Asa, Ben-Hur, and Lindsay R, Triplett
Subjects: Bacteriological Techniques, Enterobacteriaceae, North America, Agriculture, DNA Primers, Plant Diseases, Solanum tuberosum
Abstract: Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with
Published: 2019

69. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Author: Renzhi Cao, Alice C. McHardy, Cen Wan, Jonathan G. Lees, Vedrana Vidulin, Alex Warwick Vesztrocy, Huy N Nguyen, Devon Johnson, Ian Sillitoe, Alessandro Petrini, Richard Bonneau, Hans Moen, Peter L. Freddolino, Rui Fa, Alfredo Benso, Jianlin Cheng, Indika Kahanda, Qizhong Mao, Zihan Zhang, Chenguang Zhao, Rebecca L. Hurto, Predrag Radivojac, Stefano Di Carlo, Sayoni Das, Suwisa Kaewphan, Sabeur Aridhi, Alan Medlar, Casey S. Greene, Constance J. Jeffery, Christophe Dessimoz, Jose Manuel Rodriguez, Gianfranco Politano, Michele Berselli, Jia-Ming Chang, Deborah A. Hogan, Julian Gough, Tunca Doğan, David T. Jones, Claire O'Donovan, Volkan Atalay, Paolo Fontana, Feng Zhang, Shuwei Yao, Robert Hoehndorf, Olivier Lichtarge, Alex W. Crocker, Ahmet Sureyya Rifaioglu, Rabie Saidi, Farrokh Mehryary, Neven Sumonja, Yang Zhang, Florian Boecker, Jie Hou, Christine A. Orengo, Matteo Re, Natalie Thurlby, Chengxin Zhang, Stefano Pascarelli, Alberto Paccanaro, Hafeez Ur Rehman, Yuxiang Jiang, Mohammad R. K. Mofrad, Naihui Zhou, Asa Ben-Hur, Steven E. Brenner, Martti Tolvanen, Filip Ginter, Mark N. Wass, Patricia C. Babbitt, David W. Ritchie, George Georghiou, Stefano Toppo, Caleb Chandler, Larry Davis, Da Chen Emily Koo, Itamar Borukhov, Petri Törönen, Rengul Cetin-Atalay, Fabio Fabris, Haixuan Yang, Kai Hakala, Silvio C. E. Tosatto, Domenico Cozzetto, Slobodan Vucetic, Balint Z. Kacsoh, Luke W Sagers, Alex A. Freitas, Tapio Salakoski, Fran Supek, Alfonso E. Romero, Angela D. Wilkins, Elaine Zosa, Shanshan Zhang, Yotam Frank, Jonathan B. Dayton, Jeffrey M. Yunes, Pier Luigi Martelli, Dallas J. Larsen, Giuliano Grossi, Alexandra J. Lee, Marco Mesiti, Yi-Wei Liu, Jonas Reeb, Damiano Piovesan, Sean D. Mooney, Magdalena Antczak, Erica Suh, Marco Falda, Marie-Dominique Devignes, Castrense Savojardo, Zheng Wang, Danielle A Brackenridge, Peter W. Rose, Enrico Lavezzo, Dane Jo, Ronghui You, Tomislav Šmuc, Liam J. McGuffin, Michael L. Tress, Ilya Novikov, Adrian M. Altenhoff, Burkhard Rost, Miguel Amezola, Mateo Torres, Prajwal Bhat, Wen-Hung Liao, Meet Barot, Marco Notaro, Suyang Dai, Giorgio Valentini, Jari Björne, Nevena Veljkovic, Wei-Cheng Tseng, Po-Han Chi, Alperen Dalkiran, Maxat Kulmanov, Nafiz Hamid, Aashish Jain, Branislava Gemovic, Alexandre Renaux, Ashton Omdahl, Daniel B. Roche, Vladimir Perovic, Iddo Friedberg, Daisuke Kihara, Giovanni Bosco, Gage S. Black, Saso Dzeroski, Liisa Holm, Marco Frasca, Michal Linial, Ehsaneddin Asgari, Tatyana Goldberg, Maria Jesus Martin, Vladimir Gligorijević, Marco Carraro, Shanfeng Zhu, Radoslav Davidovic, Timothy Bergquist, Hai Fang, José M. Fernández, Giuseppe Profiti, Weidong Tian, Imane Boudellioua, Kimberley A. Lewis, Seyed Ziaeddin Alborzi, and Rita Casadio
Subjects: 0303 health sciences, Protein function, biology, Computer science, 030302 biochemistry & molecular biology, Pseudomonas, Computational biology, Biological process, biology.organism_classification, Genome, 3. Good health, 03 medical and health sciences, Molecular function, Cellular component, Mutation screening, Critical assessment, Protein function prediction, Gene, Function (biology), 030304 developmental biology
Abstract: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. We finally report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bioontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Published: 2019
Full Text: View/download PDF

70. The Cafa Challenge Reports Improved Protein Function Prediction And New Functional Annotations For Hundreds Of Genes Through Experimental Screens

Author: Heiko Schoof, Ahmet Sureyya Rifaioglu, Ian Sillitoe, Shanfeng Zhu, Marco Carraro, Naihui Zhou, Asa Ben-Hur, Rui Fa, Alice C. McHardy, David W. Ritchie, George Georghiou, Filip Ginter, Haixuan Yang, Alex A. Freitas, Constance J. Jeffery, Tapio Salakoski, Radoslav Davidovic, Huy N Nguyen, Devon Johnson, Yotam Frank, Alexandra J. Lee, Sean D. Mooney, Marco Falda, Marie-Dominique Devignes, Gianfranco Politano, David T. Jones, Silvio C. E. Tosatto, Renzhi Cao, Zihan Zhang, Sabeur Aridhi, Stefano Pascarelli, Vedrana Vidulin, Qizhong Mao, Balint Z. Kacsoh, Patricia C. Babbitt, Giovanni Bosco, Farrokh Mehryary, Florian Boecker, Alfonso E. Romero, Angela D. Wilkins, Saso Dzeroski, Richard Bonneau, Hans Moen, Chengxin Zhang, Prajwal Bhat, Giuliano Grossi, Martti Tolvanen, Matteo Re, Meet Barot, Mohammad R. K. Mofrad, Predrag Radivojac, Stefano Di Carlo, Tatyana Goldberg, Branislava Gemovic, Suyang Dai, Pier Luigi Martelli, Giorgio Valentini, Maxat Kulmanov, Maria Jesus Martin, Claire O'Donovan, Dallas J. Larsen, Alexandre Renaux, Alan Medlar, Jeffrey M. Yunes, Erica Suh, Volkan Atalay, Vladimir Gligorijević, Fran Supek, Elaine Zosa, Wei-Cheng Tseng, Nafiz Hamid, Marco Mesiti, Tunca Doğan, Petri Törönen, Hafeez Ur Rehman, Jose Manuel Rodriguez, Alessandro Petrini, Sayoni Das, Burkhard Rost, Miguel Amezola, Mateo Torres, Jianlin Cheng, Daisuke Kihara, Liisa Holm, Marco Frasca, Steven E. Brenner, Stefano Toppo, Adrian M. Altenhoff, Chenguang Zhao, Daniel B. Roche, Alperen Dalkiran, Alex W. Crocker, Marco Notaro, Iddo Friedberg, Michal Linial, Julian Gough, Damiano Piovesan, Slobodan Vucetic, Natalie Thurlby, Olivier Lichtarge, Jari Björne, Jonas Reeb, Rabie Saidi, Yuxiang Jiang, Christophe Dessimoz, Jie Hou, Ronghui You, Tomislav Šmuc, Paolo Fontana, Michele Berselli, Jia-Ming Chang, Deborah A. Hogan, Larry Davis, Ehsaneddin Asgari, Shuwei Yao, Zheng Wang, Fabio Fabris, Michael L. Tress, Caleb Chandler, Christine A. Orengo, Rengul Cetin Atalay, Castrense Savojardo, Danielle A Brackenridge, Peter W. Rose, Yang Zhang, Dane Jo, Gage S. Black, Shanshan Zhang, Aashish Jain, Liam J. McGuffin, Timothy Bergquist, Peter L. Freddolino, Robert Hoehndorf, Rita Casadio, Da Chen Emily Koo, Mark N. Wass, Hai Fang, Casey S. Greene, Suwisa Kaewphan, Magdalena Antczak, Wen-Hung Liao, Enrico Lavezzo, Neven Sumonja, Ashton Omdahl, José M. Fernández, Ilya Novikov, Jonathan B. Dayton, Feng Zhang, Vladimir Perovic, Cen Wan, Jonathan G. Lees, Kai Hakala, Weidong Tian, Alex Warwick Vesztrocy, Domenico Cozzetto, Nevena Veljkovic, Yi-Wei Liu, Imane Boudellioua, Po-Han Chi, Kimberley A. Lewis, Seyed Ziaeddin Alborzi, Giuseppe Profiti, Alberto Paccanaro, Itamar Borukhov, Alfredo Benso, Indika Kahanda, Rebecca L. Hurto, Bilgisayar Mühendisliği, National Science Foundation (United States), Gordon and Betty Moore Foundation, United States of Department of Health & Human Services, Cystic Fibrosis Foundation, Consejo Nacional de Ciencia y Tecnología (México), Deutsche Forschungsgemeinschaft (Alemania), European Research Council, Ministerio de Ciencia e Innovación (España), Unión Europea, University of Turku (Finlandia), Finlands Akademi (Finlandia), National Natural Science Foundation of China, Nanjing Agricultural University. The Academy of Science. National Key Research & Development Program of China, Ministero dell Istruzione, dell Universita e della Ricerca (Italia), Shanghai Municipal Science and Technology Major Project, Biotechnology and Biological Sciences Research Council (Reino Unido), Extreme Science and Engineering Discovery Environment, Ministry of Education, Science and Technological Development (Serbia), Ministry of Science and Technology, Ministry for Education (Baviera) (Alemania), Yad Hanadiv, University of Milan (Italia), Swiss National Science Foundation, Unión Europea. European Cooperation in Science and Technology (COST), Plataforma ISCIII de Bioinformática (España), Scientific and Technological Research Council of Turkey, Ministry of Education (China), University of Padua (Italia), Mühendislik ve Doğa Bilimleri Fakültesi -- Bilgisayar Mühendisliği Bölümü, Rifaioğlu, Ahmet Süreyya, Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N., Davis L., Dogan T., Atalay V., Rifaioglu A.S., Dalklran A., Cetin Atalay R., Zhang C., Hurto R.L., Freddolino P.L., Zhang Y., Bhat P., Supek F., Fernandez J.M., Gemovic B., Perovic V.R., Davidovic R.S., Sumonja N., Veljkovic N., Asgari E., Mofrad M.R.K., Profiti G., Savojardo C., Martelli P.L., Casadio R., Boecker F., Schoof H., Kahanda I., Thurlby N., McHardy A.C., Renaux A., Saidi R., Gough J., Freitas A.A., Antczak M., Fabris F., Wass M.N., Hou J., Cheng J., Wang Z., Romero A.E., Paccanaro A., Yang H., Goldberg T., Zhao C., Holm L., Toronen P., Medlar A.J., Zosa E., Borukhov I., Novikov I., Wilkins A., Lichtarge O., Chi P.-H., Tseng W.-C., Linial M., Rose P.W., Dessimoz C., Vidulin V., Dzeroski S., Sillitoe I., Das S., Lees J.G., Jones D.T., Wan C., Cozzetto D., Fa R., Torres M., Warwick Vesztrocy A., Rodriguez J.M., Tress M.L., Frasca M., Notaro M., Grossi G., Petrini A., Re M., Valentini G., Mesiti M., Roche D.B., Reeb J., Ritchie D.W., Aridhi S., Alborzi S.Z., Devignes M.-D., Koo D.C.E., Bonneau R., Gligorijevic V., Barot M., Fang H., Toppo S., Lavezzo E., Falda M., Berselli M., Tosatto S.C.E., Carraro M., Piovesan D., Ur Rehman H., Mao Q., Zhang S., Vucetic S., Black G.S., Jo D., Suh E., Dayton J.B., Larsen D.J., Omdahl A.R., McGuffin L.J., Brackenridge D.A., Babbitt P.C., Yunes J.M., Fontana P., Zhang F., Zhu S., You R., Zhang Z., Dai S., Yao S., Tian W., Cao R., Chandler C., Amezola M., Johnson D., Chang J.-M., Liao W.-H., Liu Y.-W., Pascarelli S., Frank Y., Hoehndorf R., Kulmanov M., Boudellioua I., Politano G., Di Carlo S., Benso A., Hakala K., Ginter F., Mehryary F., Kaewphan S., Bjorne J., Moen H., Tolvanen M.E.E., Salakoski T., Kihara D., Jain A., Smuc T., Altenhoff A., Ben-Hur A., Rost B., Brenner S.E., Orengo C.A., Jeffery C.J., Bosco G., Hogan D.A., Martin M.J., O'Donovan C., Mooney S.D., Greene C.S., Radivojac P., Friedberg I., Faculty of Economic and Social Sciences and Solvay Business School, Faculty of Sciences and Bioengineering Sciences, Faculty of Engineering, Computational genomics, Institute of Biotechnology, Bioinformatics, Genetics, Helsinki Institute of Life Science HiLIFE, Discovery Research Group/Prof. Hannu Toivonen, Iowa State University (ISU), European Bioinformatics Institute, École Polytechnique de Montréal (EPM), Vinča Institute of Nuclear Sciences, University of Belgrade [Belgrade], University of Bologna, Max Planck Institute for Plant Breeding Research (MPIPZ), European Virus Bioinformatics Center [Jena], Université libre de Bruxelles (ULB), Laboratoire d'Informatique, de Modélisation et d'optimisation des Systèmes (LIMOS), SIGMA Clermont (SIGMA Clermont)-Université d'Auvergne - Clermont-Ferrand I (UdA)-Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Blaise Pascal - Clermont-Ferrand 2 (UBP), Department of Computer Science, University of Bristol [Bristol], Department of Computer Science [Columbia], University of Missouri [Columbia] (Mizzou), University of Missouri System-University of Missouri System, Yale School of Public Health (YSPH), Departamento de Geometría y Topología, Universidad de Granada (UGR), Tumor Biology Center, Centre for Nephrology [London, UK], University College of London [London] (UCL), Baylor College of Medicine (BCM), Baylor University, Department of Knowledge Technologies, Structural and Molecular Biology Department, University College London, Queen Mary University of London (QMUL), Spanish National Cancer Research Center (CNIO), Dipartimento di Informatica, Università degli Studi di Milano [Milano] (UNIMI), Dipartimento di Scienze dell'Informazione [Milano], United States Naval Academy, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Department of Molecular Medicine, Universita degli Studi di Padova, Centro de Regulación Genómica (CRG), Universitat Pompeu Fabra [Barcelona] (UPF), Physics Department, National Tsing Hua University [Hsinchu] (NTHU), Dipartimento di Automatica e Informatica [Torino] (DAUIN), Politecnico di Torino = Polytechnic of Turin (Polito), University of Turku, Bioinformatics Laboratory, University of Turku-Turku Center for Computer Science, Toyota Technological Institute at Chicago [Chicago] (TTIC), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne (UNIL), Department of Computer Science [Colorado State University], Colorado State University [Fort Collins] (CSU), Centre for Plant Integrative Biology [Nothingham] (CPIB), University of Nottingham, UK (UON), BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany., University of Bologna/Università di Bologna, Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université d'Auvergne - Clermont-Ferrand I (UdA)-SIGMA Clermont (SIGMA Clermont)-Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS), Universidad de Granada = University of Granada (UGR), Università degli Studi di Milano = University of Milan (UNIMI), Università degli Studi di Padova = University of Padua (Unipd), and Université de Lausanne = University of Lausanne (UNIL)
Subjects: Library, Male, Identification, Candida-albicans, Protein function prediction, Long-term memory, Biofilm, Critical assessment, Community challenge, Procedures, Genome, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 0302 clinical medicine, Candida albicans, Molecular genetics, lcsh:QH301-705.5, ComputingMilieux_MISCELLANEOUS, Biological ontology, Settore BIO/11 - BIOLOGIA MOLECOLARE, 0303 health sciences, 318 Medical biotechnology, Biotechnology & applied microbiology, Ontology, Expectation, Genetics & heredity, Plant leaf, ddc, 3. Good health, Drosophila melanogaster, Human experiment, Fungal genome, Pseudomonas aeruginosa, Female, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Genome, Fungal, BIOINFORMATICS, Long-Term memory, Locomotion, Human, Adult, Memory, Long-Term, lcsh:QH426-470, Bioinformatics, Long term memory, Generation, Bacterial genome, Computational biology, Biology, Article, 03 medical and health sciences, Annotation, Big data, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Pseudomonas, Genetics, Animals, Humans, Gene, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], Animal, Research, Experimental data, Molecular Sequence Annotation, Cell Biology, Nonhuman, Human genetics, lcsh:Genetics, lcsh:Biology (General), Biofilms, Proteins | Genes | Protein functions, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], 030217 neurology & neurosurgery, Function (biology), Genome, Bacterial
Abstract: Tosatto, Silvio/0000-0003-4525-7793; Zhang, Feng/0000-0003-3447-897X; Gonzalez, Jose Maria Fernandez/0000-0002-4806-5140; Devignes, Marie-Dominique/0000-0002-0399-8713; Wass, Mark/0000-0001-5428-6479; Falda, Marco/0000-0003-2642-519X; Thurlby, Natalie/0000-0002-1007-0286; Zosa, Elaine/0000-0003-2482-0663; Dessimoz, Christophe/0000-0002-2170-853X; Yunes, Jeffrey/0000-0003-1869-3231; Hamid, Md Nafiz/0000-0001-8681-6526; Hoehndorf, Robert/0000-0001-8149-5890; Dogan, Tunca/0000-0002-1298-9763; NOTARO, MARCO/0000-0003-4309-2200; Cozzetto, Domenico/0000-0001-6752-5432; Lewis, Kimberley/0000-0003-3010-8453; Roche, Daniel/0000-0002-9204-1840; Martin, Maria-Jesus/0000-0001-5454-2815; Tress, Michael/0000-0001-9046-6370; Tolvanen, Martti/0000-0003-3434-7646; Cheng, Jianlin/0000-0003-0305-2853; Rose, Peter/0000-0001-9981-9750; Renaux, Alexandre/0000-0002-4339-2791; Kacsoh, Balint/0000-0001-9171-0611; O'Donovan, Claire/0000-0001-8051-7429; Kulmanov, Maxat/0000-0003-1710-1820; Friedberg, Iddo/0000-0002-1789-8000; Zhou, Naihui/0000-0001-6268-6149, WOS: 000498615000001, PubMed ID: 31744546, Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens., National Science FoundationNational Science Foundation (NSF) [DBI1564756, DBI-1458359, DBI-1458390, DMS1614777, CMMI1825941, NSF 1458390]; Gordon and Betty Moore FoundationGordon and Betty Moore Foundation [GBMF 4552]; National Institutes of Health NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [P20 GM113132]; Cystic Fibrosis Foundation [CFRDP STANTO19R0]; BBSRCBiotechnology and Biological Sciences Research Council (BBSRC) [BB/K004131/1, BB/F00964X/1, BB/M025047/1, BB/M015009/1]; Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT)Consejo Nacional de Ciencia y Tecnologia (CONACyT) [14-INV-088, PINV15-315]; NSFNational Science Foundation (NSF) [1660648, DBI 1759934, IIS1763246, DBI-1458477, 0965768, DMR-1420073, DBI-1458443]; NIHUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM093123, DP1MH110234, UL1 TR002319, U24 TR002306]; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy-EXC 2155 "RESIST"German Research Foundation (DFG) [39087428]; National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM123055, R01GM60595, R15GM120650, GM083107, GM116960, AI134678, NIH R35-GM128637, R00-GM097033]; ERCEuropean Research Council (ERC) [StG 757700]; Spanish Ministry of Science, Innovation and Universities [BFU2017-89833-P]; Severo Ochoa award; Centre of Excellence project "BioProspecting of Adriatic Sea"; Croatian Government; European Regional Development FundEuropean Union (EU) [KK.01.1.1.01.0002]; ATT Tieto kayttoon grant; Academy of FinlandAcademy of Finland; University of Turku; CSC-IT Center for Science Ltd.; University of Miami; National Cancer Institute of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Cancer Institute (NCI) [U01CA198942]; Helsinki Institute for Life Sciences; Academy of FinlandAcademy of Finland [292589]; National Natural Science Foundation of ChinaNational Natural Science Foundation of China [31671367, 31471245, 91631301, 61872094, 61572139]; National Key Research and Development Program of China [2016YFC1000505, 2017YFC0908402]; Italian Ministry of Education, University and Research (MIUR) PRIN 2017 projectMinistry of Education, Universities and Research (MIUR) [2017483NH8]; Shanghai Municipal Science and Technology Major Project [2017SHZDZX01, 2018SHZDZX01]; UK Biotechnology and Biological Sciences Research CouncilBiotechnology and Biological Sciences Research Council (BBSRC) [BB/N019431/1, BB/L020505/1, BB/L002817/1]; Elsevier; Extreme Science and Engineering Discovery Environment (XSEDE) award [MCB160101, MCB160124]; Ministry of Education, Science and Technological Development of the Republic of Serbia [173001]; Taiwan Ministry of Science and Technology [106-2221-E-004-011-MY2]; Montana State University; Bavarian Ministry for Education; Simons Foundation; NIH NINDSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of Neurological Disorders & Stroke (NINDS) [1R21NS103831-01]; University of Illinois at Chicago (UIC) Cancer Center award; UIC College of Liberal Arts and Sciences Faculty Award; UIC International Development Award; Yad Hanadiv [9660/2019]; National Institute of General Medical Science of the National Institute of Health [GM066099, GM079656]; Research Supporting Plan (PSR) of University of Milan [PSR2018-DIP-010-MFRAS]; Swiss National Science FoundationSwiss National Science Foundation (SNSF) [150654]; EMBL-European Bioinformatics Institute core funds; CAFA BBSRC [BB/N004876/1]; European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grantEuropean Union (EU) [778247]; COST ActionEuropean Cooperation in Science and Technology (COST) [BM1405]; NIH/NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R01 GM071749]; National Human Genome Research Institute of the National of Health [U41 HG007234]; INB Grant (ISCIII-SGEFI/ERDF) [PT17/0009/0001]; TUBITAKTurkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [EEEAG-116E930]; KanSil [2016K121540]; Universita degli Studi di Milano; 111 ProjectMinistry of Education, China - 111 Project [B18015]; key project of Shanghai Science Technology [16JC1420402]; ZJLab; project Ribes Network POR-FESR 3S4H [TOPP-ALFREVE18-01]; PRID/SID of University of Padova [TOPP-SID19-01]; NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R15GM120650]; King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/3454-01-01, URF/1/3790-01-01]; "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education; National Center for High-performance ComputingIstanbul Technical University, The work of IF was funded, in part, by the National Science Foundation award DBI-1458359. The work of CSG and AJL was funded, in part, by the National Science Foundation award DBI-1458390 and GBMF 4552 from the Gordon and Betty Moore Foundation. The work of DAH and KAL was funded, in part, by the National Science Foundation award DBI-1458390, National Institutes of Health NIGMS P20 GM113132, and the Cystic Fibrosis Foundation CFRDP STANTO19R0. The work of AP, HY, AR, and MT was funded by BBSRC grants BB/K004131/1, BB/F00964X/1 and BB/M025047/1, Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT) grants 14-INV-088 and PINV15-315, and NSF Advances in BioInformatics grant 1660648. The work of JC was partially supported by an NIH grant (R01GM093123) and two NSF grants (DBI 1759934 and IIS1763246). ACM acknowledges the support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -EXC 2155 "RESIST" - Project ID 39087428. DK acknowledges the support from the National Institutes of Health (R01GM123055) and the National Science Foundation (DMS1614777, CMMI1825941). PB acknowledges the support from the National Institutes of Health (R01GM60595). GB and BZK acknowledge the support from the National Science Foundation (NSF 1458390) and NIH DP1MH110234. FS was funded by the ERC StG 757700 "HYPER-INSIGHT" and by the Spanish Ministry of Science, Innovation and Universities grant BFU2017-89833-P. FS further acknowledges the funding from the Severo Ochoa award to the IRB Barcelona. TS was funded by the Centre of Excellence project "BioProspecting of Adriatic Sea", co-financed by the Croatian Government and the European Regional Development Fund (KK.01.1.1.01.0002). The work of SK was funded by ATT Tieto kayttoon grant and Academy of Finland. JB and HM acknowledge the support of the University of Turku, the Academy of Finland and CSC -IT Center for Science Ltd. TB and SM were funded by the NIH awards UL1 TR002319 and U24 TR002306. The work of CZ and ZW was funded by the National Institutes of Health R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of PWR was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA198942. PR acknowledges NSF grant DBI-1458477. PT acknowledges the support from Helsinki Institute for Life Sciences. The work of AJM was funded by the Academy of Finland (No. 292589). The work of FZ and WT was funded by the National Natural Science Foundation of China (31671367, 31471245, 91631301) and the National Key Research and Development Program of China (2016YFC1000505, 2017YFC0908402]. CS acknowledges the support by the Italian Ministry of Education, University and Research (MIUR) PRIN 2017 project 2017483NH8. SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). PLF and RLH were supported by the National Institutes of Health NIH R35-GM128637 and R00-GM097033. JG, DTJ, CW, DC, and RF were supported by the UK Biotechnology and Biological Sciences Research Council (BB/N019431/1, BB/L020505/1, and BB/L002817/1) and Elsevier. The work of YZ and CZ was funded in part by the National Institutes of Health award GM083107, GM116960, and AI134678; the National Science Foundation award DBI1564756; and the Extreme Science and Engineering Discovery Environment (XSEDE) award MCB160101 and MCB160124.; The work of BG, VP, RD, NS, and NV was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia, Project No. 173001. The work of YWL, WHL, and JMC was funded by the Taiwan Ministry of Science and Technology (106-2221-E-004-011-MY2). YWL, WHL, and JMC further acknowledge the support from "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education and the National Center for High-performance Computing for computer time and facilities. The work of IK and AB was funded by Montana State University and NSF Advances in Biological Informatics program through grant number 0965768. BR, TG, and JR are supported by the Bavarian Ministry for Education through funding to the TUM. The work of RB, VG, MB, and DCEK was supported by the Simons Foundation, NIH NINDS grant number 1R21NS103831-01 and NSF award number DMR-1420073. CJJ acknowledges the funding from a University of Illinois at Chicago (UIC) Cancer Center award, a UIC College of Liberal Arts and Sciences Faculty Award, and a UIC International Development Award. The work of ML was funded by Yad Hanadiv (grant number 9660/2019). The work of OL and IN was funded by the National Institute of General Medical Science of the National Institute of Health through GM066099 and GM079656. Research Supporting Plan (PSR) of University of Milan number PSR2018-DIP-010-MFRAS. AWV acknowledges the funding from the BBSRC (CASE studentship BB/M015009/1). CD acknowledges the support from the Swiss National Science Foundation (150654). CO and MJM are supported by the EMBL-European Bioinformatics Institute core funds and the CAFA BBSRC BB/N004876/1. GG is supported by CAFA BBSRC BB/N004876/1. SCET acknowledges funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 778247 (IDPfun) and from COST Action BM1405 (NGP-net). SEB was supported by NIH/NIGMS grant R01 GM071749. The work of MLT, JMR, and JMF was supported by the National Human Genome Research Institute of the National of Health, grant numbers U41 HG007234. The work of JMF and JMR was also supported by INB Grant (PT17/0009/0001 - ISCIII-SGEFI/ERDF). VA acknowledges the funding from TUBITAK EEEAG-116E930. RCA acknowledges the funding from KanSil 2016K121540. GV acknowledges the funding from Universita degli Studi di Milano - Project "Discovering Patterns in Multi-Dimensional Data" and Project "Machine Learning and Big Data Analysis for Bioinformatics". SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). RY and SY are supported by the 111 Project (NO. B18015), the key project of Shanghai Science & Technology (No. 16JC1420402), Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01), and ZJLab. ST was supported by project Ribes Network POR-FESR 3S4H (No. TOPP-ALFREVE18-01) and PRID/SID of University of Padova (No. TOPP-SID19-01). CZ and ZW were supported by the NIGMS grant R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of MK and RH was supported by the funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3454-01-01 and URF/1/3790-01-01. The work of SDM is funded, in part, by NSF award DBI-1458443.
Published: 2019

71. Development of the automated primer design workflow uniqprimer and diagnostic primers for the broad-host-range plant pathogen Dickeya dianthicola

Author: R. Ryan McNally, Amy O. Charkowski, Ramil Mauleon, Alexis Dereeper, Jan E. Leach, Shaista Karim, Asa Ben-Hur, Lindsay R. Triplett, and Afnan Shazwan Nasaruddin
Subjects: 0106 biological sciences, 0301 basic medicine, Blackleg, Dickeya dianthicola, Dickeya, Plant Science, Computational biology, Biology, biology.organism_classification, 01 natural sciences, Genome, 03 medical and health sciences, 030104 developmental biology, Primer (molecular biology), Agronomy and Crop Science, Pathogen, 010606 plant biology & botany
Abstract: Uniqprimer, a software pipeline developed in Python, was deployed as a user-friendly internet tool in Rice Galaxy for comparative genome analyses to design primer sets for PCRassays capable of detecting target bacterial taxa. The pipeline was trialed with Dickeya dianthicola, a destructive broad-host-range bacterial pathogen found in most potato-growing regions. Dickeya is a highly variable genus, and some primers available to detect this genus and species exhibit common diagnostic failures. Upon uploading a selection of target and nontarget genomes, six primer sets were rapidly identified with Uniqprimer, of which two were specific and sensitive when tested with D. dianthicola. The remaining four amplified a minority of the nontarget strains tested. The two promising candidate primer sets were trialed with DNA isolated from 116 field samples from across the United States that were previously submitted for testing. D. dianthicola was detected in 41 samples, demonstrating the applicability of our detection primers and suggesting widespread occurrence of D. dianthicola in North America.
Published: 2019

72. BLRM: A Basic Linear Ranking Model for Protein Interface Prediction

Author: Asa Ben-Hur, Basir Shariat, and Don Neumann
Subjects: 0301 basic medicine, Protein interface, Focus (computing), Source code, Computer science, Interface (Java), media_common.quotation_subject, 0206 medical engineering, 02 engineering and technology, computer.software_genre, 03 medical and health sciences, 030104 developmental biology, Ranking, Simple linear model, Learning to rank, Data mining, computer, 020602 bioinformatics, media_common
Abstract: We consider the problem of prediction of the interfaces of protein-protein interactions, a challenging problem with important applications in drug discovery and design. The standard machine learning approach is to attempt to predict the interface in its entirety. Because of the difficulty of the problem, we propose to treat the problem as a ranking problem and focus on getting at least a few correctly predicted interface residues in the top ranked predictions. Our results demonstrate that a simple linear model out-performs more complicated models that try to solve the corresponding classification problem. The source code is available at https://bitbucket.org/afrasiab/blrm.
Published: 2018

73. Learning protein binding affinity using privileged information

Author: Amina Asif, Wajid Arshad Abbasi, Fayyaz ul Amir Afsar Minhas, and Asa Ben-Hur
Subjects: 0301 basic medicine, Support Vector Machine, Computer science, Protein-protein interactions, Protein binding affinity prediction, Plasma protein binding, Ligands, Machine learning, computer.software_genre, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Protein–protein interaction, QA76, 03 medical and health sciences, Protein sequencing, Protein structure, Structural Biology, Amino Acid Sequence, Molecular Biology, lcsh:QH301-705.5, Structure (mathematical logic), Sequence, 030102 biochemistry & molecular biology, business.industry, Methodology Article, Applied Mathematics, QH, Mutagenesis, Computational Biology, Proteins, Reproducibility of Results, Protein engineering, Ligand (biochemistry), QP, Computer Science Applications, 030104 developmental biology, ROC Curve, lcsh:Biology (General), Test set, lcsh:R858-859.7, Artificial intelligence, DNA microarray, business, computer, Privileged information, Algorithms, Protein Binding
Abstract: Background Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. Results In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. Conclusions The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well. Electronic supplementary material The online version of this article (10.1186/s12859-018-2448-z) contains supplementary material, which is available to authorized users.
Published: 2018

74. Combining heterogeneous data sources for accurate functional annotation of proteins.

Author: Artem Sokolov, Christopher S. Funk, Kiley Graim, Karin Verspoor, and Asa Ben-Hur
Published: 2013
Full Text: View/download PDF

75. Transcriptome-Wide Identification of RNA Targets of Arabidopsis SERINE/ARGININE-RICH45 Uncovers the Unexpected Roles of This RNA Binding Protein in RNA Processing

Author: Asa Ben-Hur, Michael Hamilton, Yajun Wang, Anireddy S. N. Reddy, and Denghui Xing
Subjects: Genetics, Messenger RNA, RNA silencing, fungi, RNA splicing, Intron, RNA, RNA-binding protein, Cell Biology, Plant Science, Biology, Non-coding RNA, Gene
Abstract: Plant SR45 and its metazoan ortholog RNPS1 are serine/arginine-rich (SR)-like RNA binding proteins that function in splicing/postsplicing events and regulate diverse processes in eukaryotes. Interactions of SR45 with both RNAs and proteins are crucial for regulating RNA processing. However, in vivo RNA targets of SR45 are currently unclear. Using RNA immunoprecipitation followed by high-throughput sequencing, we identified over 4000 Arabidopsis thaliana RNAs that directly or indirectly associate with SR45, designated as SR45-associated RNAs (SARs). Comprehensive analyses of these SARs revealed several roles for SR45. First, SR45 associates with and regulates the expression of 30% of abscisic acid (ABA) signaling genes at the postsplicing level. Second, although most SARs are derived from intron-containing genes, surprisingly, 340 SARs are derived from intronless genes. Expression analysis of the SARs suggests that SR45 differentially regulates intronless and intron-containing SARs. Finally, we identified four overrepresented RNA motifs in SARs that likely mediate SR45’s recognition of its targets. Therefore, SR45 plays an unexpected role in mRNA processing of intronless genes, and numerous ABA signaling genes are targeted for regulation at the posttranscriptional level. The diverse molecular functions of SR45 uncovered in this study are likely applicable to other species in view of its conservation across eukaryotes.
Published: 2015

76. Predicting metamorphic relations for testing scientific software: a machine learning approach using graph kernels

Author: Asa Ben-Hur, James M. Bieman, and Upulee Kanewala
Subjects: Graph kernel, Theoretical computer science, Computer science, business.industry, Simple Features, 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Oracle, Support vector machine, Data dependency, Control flow, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Metamorphic testing, Artificial intelligence, Safety, Risk, Reliability and Quality, Programmer, business, computer, Software
Abstract: Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behaviour of the programme. But the challenges in creating suitable oracles limit the ability to perform automated testing in some programmes, and especially in scientific software. Metamorphic testing is a method for automating the testing process for programmes without test oracles. This technique operates by checking whether the programme behaves according to properties called metamorphic relations. A metamorphic relation describes the change in output when the input is changed in a prescribed way. Unfortunately, finding the metamorphic relations satisfied by a programme or function remains a labour-intensive task, which is generally performed by a domain expert or a programmer. In this work, we propose a machine learning approach for predicting metamorphic relations that uses a graph-based representation of a programme to represent control flow and data dependency information. In earlier work, we found that simple features derived from such graphs provide good performance. An analysis of the features used in this earlier work led us to explore the effectiveness of several representations of those graphs using the machine learning framework of graph kernels, which provide various ways of measuring similarity between graphs. Our results show that a graph kernel that evaluates the contribution of all paths in the graph has the best accuracy and that control flow information is more useful than data dependency information. The data used in this study are available for download at http://www.cs.colostate.edu/saxs/MRpred/functions.tar.gz to help researchers in further development of metamorphic relation prediction methods. Copyright © 2015 John Wiley & Sons, Ltd.
Published: 2015

77. Transcriptome Analysis of Drought-Resistant and Drought-Sensitive Sorghum (Sorghum bicolor) Genotypes in Response to PEG-Induced Drought Stress

Author: Asa Ben-Hur, Fahad Ullah, Anireddy S. N. Reddy, and Salah E. Abdel-Ghany
Subjects: 0106 biological sciences, 0301 basic medicine, polyethylene glycol (PEG), Genotype, Transcription, Genetic, drought resistance, Drought tolerance, drought, Biology, 01 natural sciences, Article, Catalysis, Polyethylene Glycols, lcsh:Chemistry, Inorganic Chemistry, Crop, Transcriptome, 03 medical and health sciences, Gene Expression Regulation, Plant, parasitic diseases, Gene expression, Physical and Theoretical Chemistry, lcsh:QH301-705.5, Molecular Biology, Gene, Spectroscopy, Dehydration, Gene Expression Profiling, fungi, Organic Chemistry, food and beverages, General Medicine, Sorghum, biology.organism_classification, polyethylene glycol (PEG), drought, Computer Science Applications, 030104 developmental biology, lcsh:Biology (General), lcsh:QD1-999, Agronomy, Seedling, gene expression, sorghum, 010606 plant biology & botany
Abstract: Drought is a major limiting factor of crop yields. In response to drought, plants reprogram their gene expression, which ultimately regulates a multitude of biochemical and physiological processes. The timing of this reprogramming and the nature of the drought-regulated genes in different genotypes are thought to confer differential tolerance to drought stress. Sorghum is a highly drought-tolerant crop and has been increasingly used as a model cereal to identify genes that confer tolerance. Also, there is considerable natural variation in resistance to drought in different sorghum genotypes. Here, we evaluated drought resistance in four genotypes to polyethylene glycol (PEG)-induced drought stress at the seedling stage and performed transcriptome analysis in seedlings of sorghum genotypes that are either drought-resistant or drought-sensitive to identify drought-regulated changes in gene expression that are unique to drought-resistant genotypes of sorghum. Our analysis revealed that about 180 genes are differentially regulated in response to drought stress only in drought-resistant genotypes and most of these (over 70%) are up-regulated in response to drought. Among these, about 70 genes are novel with no known function and the remaining are transcription factors, signaling and stress-related proteins implicated in drought tolerance in other crops. This study revealed a set of drought-regulated genes, including many genes encoding uncharacterized proteins that are associated with drought tolerance at the seedling stage.
Published: 2020

78. Support vector clustering.

Author: Asa Ben-Hur
Published: 2008
Full Text: View/download PDF

79. GOstruct 2.0

Author: Asa Ben-Hur and Indika Kahanda
Subjects: 0301 basic medicine, business.industry, Computer science, media_common.quotation_subject, A protein, Machine learning, computer.software_genre, Task (project management), 03 medical and health sciences, ComputingMethodologies_PATTERNRECOGNITION, 030104 developmental biology, Protein function prediction, Artificial intelligence, Data mining, business, Function (engineering), computer, media_common
Abstract: Automated Protein Function Prediction is the task of automatically predicting functional annotations for a protein based on gold-standard annotations derived from experimental assays. These experiment-based annotations accumulate over time: proteins without annotations get annotated, and new functions of already annotated proteins are discovered. Therefore, function prediction can be considered a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In previous work, we analyzed the performance of several protein function prediction methods in these two scenarios. Our results showed that GOstruct, which is based on the structured output framework, had lower accuracy in the task of predicting annotations for proteins with existing annotations, while its performance on un-annotated proteins was similar to the performance in cross-validation. In this work, we present GOstruct 2.0 which includes improvements that allow the model to make use of information of a protein's current annotations to better handle the task of predicting novel annotations for previously annotated proteins. This is highly important for model organisms where most proteins have some level of annotations. Experimental results on human data show that GOstruct 2.0 outperforms the original GOstruct in this task, demonstrating the effectiveness of the proposed improvements. This is the first study that focuses on adapting the structured output framework for applications in which labels are incomplete by nature.
Published: 2017

80. RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data

Author: Steffen Neumann, Jessica E. Prenni, Asa Ben-Hur, Corey D. Broeckling, and F. A. Afsar
Subjects: Signal processing, Databases, Factual, Chemistry, business.industry, Analytical chemistry, Signal Processing, Computer-Assisted, Pattern recognition, Tandem mass spectrometry, Mass spectrometry, Mass Spectrometry, Analytical Chemistry, Identification (information), Metabolomics, Tandem Mass Spectrometry, Feature (computer vision), Animals, Cluster Analysis, Horses, Artificial intelligence, business, Cluster analysis, Software, Cerebrospinal Fluid, Feature detection (computer vision)
Abstract: Metabolomic data are frequently acquired using chromatographically coupled mass spectrometry (MS) platforms. For such datasets, the first step in data analysis relies on feature detection, where a feature is defined by a mass and retention time. While a feature typically is derived from a single compound, a spectrum of mass signals is more a more-accurate representation of the mass spectrometric signal for a given metabolite. Here, we report a novel feature grouping method that operates in an unsupervised manner to group signals from MS data into spectra without relying on predictability of the in-source phenomenon. We additionally address a fundamental bottleneck in metabolomics, annotation of MS level signals, by incorporating indiscriminant MS/MS (idMS/MS) data implicitly: feature detection is performed on both MS and idMS/MS data, and feature-feature relationships are determined simultaneously from the MS and idMS/MS data. This approach facilitates identification of metabolites using in-source MS and/or idMS/MS spectra from a single experiment, reduces quantitative analytical variation compared to single-feature measures, and decreases false positive annotations of unpredictable phenomenon as novel compounds. This tool is released as a freely available R package, called RAMClustR, and is sufficiently versatile to group features from any chromatographic-spectrometric platform or feature-finding software.
Published: 2014

81. PAIRpred: Partner-specific prediction of interacting residues from sequence and structure

Author: Asa Ben-Hur, Fayyaz ul Amir Afsar Minhas, and Brian J. Geiss
Subjects: business.industry, Protein design, Computational biology, Protein structure prediction, Biology, Machine learning, computer.software_genre, Biochemistry, Support vector machine, Protein structure, Structural Biology, Docking (molecular), Protein function prediction, Pairwise comparison, Artificial intelligence, Homology modeling, business, Molecular Biology, computer
Abstract: We present a novel partner-specific protein-protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter-protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding-associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/.
Published: 2013

82. A survey of the sorghum transcriptome using single-molecule long reads

Author: Nicholas P. Devitt, Jennifer L. Jacobi, Faye D. Schilkey, Anireddy S. N. Reddy, Michael Hamilton, Peter B. Ngam, Salah E. Abdel-Ghany, and Asa Ben-Hur
Subjects: 0301 basic medicine, Gene isoform, RNA, Untranslated, Sequence analysis, Science, RNA Splicing, education, General Physics and Astronomy, Computational biology, Biology, Bioinformatics, Genome, Polymerase Chain Reaction, General Biochemistry, Genetics and Molecular Biology, Article, Transcriptome, 03 medical and health sciences, Gene Expression Regulation, Plant, mental disorders, Protein Isoforms, Gene, Sorghum, Plant Proteins, Regulation of gene expression, Multidisciplinary, Sequence Analysis, RNA, Gene Expression Profiling, Alternative splicing, High-Throughput Nucleotide Sequencing, General Chemistry, Gene expression profiling, 030104 developmental biology, RNA, Plant, psychological phenomena and processes
Abstract: Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism., Alternative splicing and alternative polyadenylation (APA) contribute to mRNA diversity but are difficult to assess using short read RNA-seq data. Here, the authors use single molecule long-read isoform sequencing and develop a computational pipeline to identify full-length splice isoforms and APA sites in sorghum.
Published: 2016

83. Mendel: A Distributed Storage Framework for Similarity Searching over Sequencing Data

Author: Asa Ben-Hur, Sangmi Lee Pallickara, and Cameron Tolooee
Subjects: 0301 basic medicine, Sequence database, Distributed database, Computer science, Nearest neighbor search, Genomic sequencing, Genomics, Sequence alignment, computer.software_genre, Distributed hash table, 03 medical and health sciences, chemistry.chemical_compound, 030104 developmental biology, chemistry, Distributed data store, Scalability, Protein homology, Data mining, computer, DNA, Alignment-free sequence analysis
Abstract: Rapid advances in genomic sequencing technology have resulted in a data deluge in biology and bioinformatics. This increase in data volumes has introduced computational challenges for frequently performed sequence analytics routines such as DNA and protein homology searches, these must also preferably be done in real-time. In this paper, we propose a scalable and similarity-aware distributed storage framework, Mendel, that enables retrieval of biologically significant DNA and protein alignments against a voluminous genomic sequence database. Mendel fragments the sequence data and generates an inverted-index, which is then dispersed over a distributed collection of machines using a locality aware distributed hash table. A novel distributed nearest neighbor search algorithm identifies sequence segments with high similarity and extending them to find an alignment. This paper includes an empirical evaluation of the performance, sensitivity, and scalability of the proposed system versus the National Center for Biotechnology Information's non-redundant protein dataset. Mendel demonstrates higher sensitivity and faster query evaluations when compared to other modern frameworks.
Published: 2016

84. aPPRove: An HMM-Based Method for Accurate Prediction of RNA-Pentatricopeptide Repeat Protein Binding Events

Author: Asa Ben-Hur, Daniel B. Sloan, Thomas Harrison, Jaime Ruiz, and Christina Boucher
Subjects: 0106 biological sciences, 0301 basic medicine, Protein Structure Comparison, lcsh:Medicine, RNA-binding protein, RNA-binding proteins, Plasma protein binding, Protein Sequencing, 01 natural sciences, Biochemistry, Database and Informatics Methods, Chloroplast Proteins, Protein sequencing, RNA structure, lcsh:Science, Genetics, Multidisciplinary, RNA alignment, Markov Chains, Nucleic acids, Sequence Analysis, Algorithms, Research Article, Protein Binding, Protein Structure, Protein domain, Sequence Databases, Sequence alignment, Computational biology, Biology, Research and Analysis Methods, Mitochondrial Proteins, 03 medical and health sciences, Protein Domains, Sequence Motif Analysis, Amino Acid Sequence, Binding site, Molecular Biology Techniques, Sequencing Techniques, Molecular Biology, Internet, Binding Sites, Sequence Homology, Amino Acid, lcsh:R, RNA, Biology and Life Sciences, Proteins, Computational Biology, Macromolecular structure analysis, 030104 developmental biology, Biological Databases, Pentatricopeptide repeat, lcsh:Q, Sequence Alignment, 010606 plant biology & botany
Abstract: Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The [Formula: see text] class contains tandem [Formula: see text]-type motif sequences, and the [Formula: see text] class contains alternating [Formula: see text], [Formula: see text] and [Formula: see text] type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a [Formula: see text]-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the [Formula: see text] class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for [Formula: see text]-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve.
Published: 2016

85. An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Author: Yuxiang Jiang, Tal Ronnen Oron, Wyatt T. Clark, Asma R. Bankapur, Daniel D’Andrea, Rosalba Lepore, Christopher S. Funk, Indika Kahanda, Karin M. Verspoor, Asa Ben-Hur, Da Chen Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed M. E. Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca, Christophe Dessimoz, Tunca Dogan, Kai Hakala, Suwisa Kaewphan, Farrokh Mehryary, Tapio Salakoski, Filip Ginter, Hai Fang, Ben Smithers, Matt Oates, Julian Gough, Petri Törönen, Patrik Koskinen, Liisa Holm, Ching-Tai Chen, Wen-Lian Hsu, Kevin Bryson, Domenico Cozzetto, Federico Minneci, David T. Jones, Samuel Chapman, Dukka BKC, Ishita K. Khan, Daisuke Kihara, Dan Ofer, Nadav Rappoport, Amos Stern, Elena Cibrian-Uhalte, Paul Denny, Rebecca E. Foulger, Reija Hieta, Duncan Legge, Ruth C. Lovering, Michele Magrane, Anna N. Melidoni, Prudence Mutowo-Meullenet, Klemens Pichler, Aleksandra Shypitsyna, Biao Li, Pooya Zakeri, Sarah ElShal, Léon-Charles Tranchevent, Sayoni Das, Natalie L. Dawson, David Lee, Jonathan G. Lees, Ian Sillitoe, Prajwal Bhat, Tamás Nepusz, Alfonso E. Romero, Rajkumar Sasidharan, Haixuan Yang, Alberto Paccanaro, Jesse Gillis, Adriana E. Sedeño-Cortés, Paul Pavlidis, Shou Feng, Juan M. Cejuela, Tatyana Goldberg, Tobias Hamp, Lothar Richter, Asaf Salamov, Toni Gabaldon, Marina Marcet-Houben, Fran Supek, Qingtian Gong, Wei Ning, Yuanpeng Zhou, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Stefano Toppo, Carlo Ferrari, Manuel Giollo, Damiano Piovesan, Silvio C.E. Tosatto, Angela del Pozo, José M. Fernández, Paolo Maietta, Alfonso Valencia, Michael L. Tress, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino, Hafeez Ur Rehman, Matteo Re, Marco Mesiti, Giorgio Valentini, Joachim W. Bargsten, Aalt D. J. van Dijk, Branislava Gemovic, Sanja Glisic, Vladmir Perovic, Veljko Veljkovic, Nevena Veljkovic, Danillo C. Almeida-e-Silva, Ricardo Z. N. Vencio, Malvika Sharan, Jörg Vogel, Lakesh Kansakar, Shanshan Zhang, Slobodan Vucetic, Zheng Wang, Michael J. E. Sternberg, Mark N. Wass, Rachael P. Huntley, Maria J. Martin, Claire O’Donovan, Peter N. Robinson, Yves Moreau, Anna Tramontano, Patricia C. Babbitt, Steven E. Brenner, Michal Linial, Christine A. Orengo, Burkhard Rost, Casey S. Greene, Sean D. Mooney, Iddo Friedberg, Predrag Radivojac, Jiang, Yuxiang, Oron, Tal Ronnen, Clark, Wyatt T., Bankapur, Asma R., D’Andrea, Daniel, Lepore, Rosalba, Funk, Christopher S., Kahanda, Indika, Verspoor, Karin M., Ben-Hur, Asa, Koo, Da Chen Emily, Penfold-Brown, Duncan, Shasha, Denni, Youngs, Noah, Bonneau, Richard, Lin, Alexandra, Sahraeian, Sayed M. E., Martelli, Pier Luigi, Profiti, Giuseppe, Casadio, Rita, Cao, Renzhi, Zhong, Zhaolong, Cheng, Jianlin, Altenhoff, Adrian, Skunca, Nive, Dessimoz, Christophe, Dogan, Tunca, Hakala, Kai, Kaewphan, Suwisa, Mehryary, Farrokh, Salakoski, Tapio, Ginter, Filip, Fang, Hai, Smithers, Ben, Oates, Matt, Gough, Julian, Törönen, Petri, Koskinen, Patrik, Holm, Liisa, Chen, Ching-Tai, Hsu, Wen-Lian, Bryson, Kevin, Cozzetto, Domenico, Minneci, Federico, Jones, David T., Chapman, Samuel, Bkc, Dukka, Khan, Ishita K., Kihara, Daisuke, Ofer, Dan, Rappoport, Nadav, Stern, Amo, Cibrian-Uhalte, Elena, Denny, Paul, Foulger, Rebecca E., Hieta, Reija, Legge, Duncan, Lovering, Ruth C., Magrane, Michele, Melidoni, Anna N., Mutowo-Meullenet, Prudence, Pichler, Klemen, Shypitsyna, Aleksandra, Li, Biao, Zakeri, Pooya, Elshal, Sarah, Tranchevent, Léon-Charle, Das, Sayoni, Dawson, Natalie L., Lee, David, Lees, Jonathan G., Sillitoe, Ian, Bhat, Prajwal, Nepusz, Tamá, Romero, Alfonso E., Sasidharan, Rajkumar, Yang, Haixuan, Paccanaro, Alberto, Gillis, Jesse, Sedeño-Cortés, Adriana E., Pavlidis, Paul, Feng, Shou, Cejuela, Juan M., Goldberg, Tatyana, Hamp, Tobia, Richter, Lothar, Salamov, Asaf, Gabaldon, Toni, Marcet-Houben, Marina, Supek, Fran, Gong, Qingtian, Ning, Wei, Zhou, Yuanpeng, Tian, Weidong, Falda, Marco, Fontana, Paolo, Lavezzo, Enrico, Toppo, Stefano, Ferrari, Carlo, Giollo, Manuel, Piovesan, Damiano, Tosatto, Silvio C.E., del Pozo, Angela, Fernández, José M., Maietta, Paolo, Valencia, Alfonso, Tress, Michael L., Benso, Alfredo, Di Carlo, Stefano, Politano, Gianfranco, Savino, Alessandro, Rehman, Hafeez Ur, Re, Matteo, Mesiti, Marco, Valentini, Giorgio, Bargsten, Joachim W., van Dijk, Aalt D. J., Gemovic, Branislava, Glisic, Sanja, Perovic, Vladmir, Veljkovic, Veljko, Veljkovic, Nevena, Almeida-e-Silva, Danillo C., Vencio, Ricardo Z. N., Sharan, Malvika, Vogel, Jörg, Kansakar, Lakesh, Zhang, Shanshan, Vucetic, Slobodan, Wang, Zheng, Sternberg, Michael J. E., Wass, Mark N., Huntley, Rachael P., Martin, Maria J., O’Donovan, Claire, Robinson, Peter N., Moreau, Yve, Tramontano, Anna, Babbitt, Patricia C., Brenner, Steven E., Linial, Michal, Orengo, Christine A., Rost, Burkhard, Greene, Casey S., Mooney, Sean D., Friedberg, Iddo, Radivojac, Predrag, Friedberg, Iddo [0000-0002-1789-8000], Apollo - University of Cambridge Repository, (ukupan broj autora: 147), Biotechnology and Biological Sciences Research Council (BBSRC), National Science Foundation (Estados Unidos), United States of Department of Health & Human Services, National Natural Science Foundation of China, Natural Sciences and Engineering Research Council (Canadá), São Paulo Research Foundation, Ministerio de Economía y Competitividad (España), Biotechnology and Biological Sciences Research Council (Reino Unido), Katholieke Universiteit Leuven (Bélgica), Newton International Fellowship Scheme of the Royal Society grant, British Heart Foundation, Ministry of Education, Science and Technological Development (Serbia), Office of Biological and Environmental Research (Estados Unidos), Australian Research Council, University of Padua (Italia), Swiss National Science Foundation, Institute of Biotechnology, Computational genomics, and Bioinformatics
Subjects: 0301 basic medicine, Computer science, Disease gene prioritization, Protein function prediction, Ecology, Evolution, Behavior and Systematics, Genetics, Cell Biology, 05 Environmental Sciences, 600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit, computer.software_genre, Quantitative Biology - Quantitative Methods, Wiskundige en Statistische Methoden - Biometris, Field (computer science), Laboratorium voor Plantenveredeling, Function (engineering), Databases, Protein, 1183 Plant biology, microbiology, virology, Quantitative Methods (q-bio.QM), media_common, Genetics & Heredity, Settore BIO/11 - BIOLOGIA MOLECOLARE, Ecology, SISTA, 1184 Genetics, developmental biology, physiology, Life Sciences & Biomedicine, Algorithms, Bioinformatics, Evolution, media_common.quotation_subject, BIOINFORMÁTICA, Machine learning, Bottleneck, Set (abstract data type), BIOS Applied Bioinformatics, 03 medical and health sciences, Annotation, Structure-Activity Relationship, Behavior and Systematics, Human Phenotype Ontology, Humans, ddc:610, DISINTEGRIN, Mathematical and Statistical Methods - Biometris, BIOINFORMATICS, 08 Information And Computing Sciences, Science & Technology, business.industry, Research, ADAM, Proteins, Computational Biology, Molecular Sequence Annotation, 06 Biological Sciences, Data set, ONTOLOGY, Plant Breeding, 030104 developmental biology, Gene Ontology, Biotechnology & Applied Microbiology, FOS: Biological sciences, Artificial intelligence, business, computer, Software
Abstract: BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent., We acknowledge the contributions of Maximilian Hecht, Alexander Grün, Julia Krumhoff, My Nguyen Ly, Jonathan Boidol, Rene Schoeffel, Yann Spöri, Jessika Binder, Christoph Hamm and Karolina Worf. This work was partially supported by the following grants: National Science Foundation grants DBI-1458477 (PR), DBI-1458443 (SDM), DBI-1458390 (CSG), DBI-1458359 (IF), IIS-1319551 (DK), DBI-1262189 (DK), and DBI-1149224 (JC); National Institutes of Health grants R01GM093123 (JC), R01GM097528 (DK), R01GM076990 (PP), R01GM071749 (SEB), R01LM009722 (SDM), and UL1TR000423 (SDM); the National Natural Science Foundation of China grants 3147124 (WT) and 91231116 (WT); the National Basic Research Program of China grant 2012CB316505 (WT); NSERC grant RGPIN 371348-11 (PP); FP7 infrastructure project TransPLANT Award 283496 (ADJvD); Microsoft Research/FAPESP grant 2009/53161-6 and FAPESP fellowship 2010/50491-1 (DCAeS); Biotechnology and Biological Sciences Research Council grants BB/L020505/1 (DTJ), BB/F020481/1 (MJES), BB/K004131/1 (AP), BB/F00964X/1 (AP), and BB/L018241/1 (CD); the Spanish Ministry of Economics and Competitiveness grant BIO2012-40205 (MT); KU Leuven CoE PFV/10/016 SymBioSys (YM); the Newton International Fellowship Scheme of the Royal Society grant NF080750 (TN). CSG was supported in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative grant GBMF4552. Computational resources were provided by CSC – IT Center for Science Ltd., Espoo, Finland (TS). This work was supported by the Academy of Finland (TS). RCL and ANM were supported by British Heart Foundation grant RG/13/5/30112. PD, RCL, and REF were supported by Parkinson’s UK grant G-1307, the Alexander von Humboldt Foundation through the German Federal Ministry for Education and Research, Ernst Ludwig Ehrlich Studienwerk, and the Ministry of Education, Science and Technological Development of the Republic of Serbia grant 173001. This work was a Technology Development effort for ENIGMA – Ecosystems and Networks Integrated with Genes and Molecular Assemblies (http://enigma.lbl.gov), a Scientific Focus Area Program at Lawrence Berkeley National Laboratory, which is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research grant DE-AC02-05CH11231. ENIGMA only covers the application of this work to microbial proteins. NSF DBI-0965616 and Australian Research Council grant DP150101550 (KMV). NSF DBI-0965768 (ABH). NIH T15 LM00945102 (training grant for CSF). FP7 FET grant MAESTRA ICT-2013-612944 and FP7 REGPOT grant InnoMol (FS). NIH R01 GM60595 (PCB). University of Padova grants CPDA138081/13 (ST) and GRIC13AAI9 (EL). Swiss National Science Foundation grant 150654 and UK BBSRC grant BB/M015009/1 (COD). PRB2 IPT13/0001 - ISCIII-SGEFI / FEDER (JMF)., This is the final version of the article. It first appeared from BioMed Central at http://dx.doi.org/10.1186/s13059-016-1037-6.
Published: 2016

86. Identification of an intronic splicing regulatory element involved in auto-regulation of alternative splicing ofSCL33pre-mRNA

Author: Gul Shad Ali, Salah E. Abdel-Ghany, Julie Thomas, K.V.S.K. Prasad, Giridara-Kumar Surabhi, Asa Ben-Hur, Saiprasad G. Palusa, and Anireddy S. N. Reddy
Subjects: Genetics, SR protein, RNA splicing, Alternative splicing, Intron, Exonic splicing enhancer, Cell Biology, Plant Science, Group II intron, Biology, Splicing regulatory element, Minigene
Abstract: †‡ SUMMARY In Arabidopsis, pre-mRNAs of serine/arginine-rich (SR) proteins undergo extensive alternative splicing (AS). However, little is known about the cis-elements and trans-acting proteins involved in regulating AS. Using a splicing reporter (GFP‐intron‐GFP), consisting of the GFP coding sequence interrupted by an alternatively spliced intron of SCL33, we investigated whether cis-elements within this intron are sufficient for AS, and which SR proteins are necessary for regulated AS. Expression of the splicing reporter in protoplasts faithfully produced all splice variants from the intron, suggesting that cis-elements required for AS reside within the intron. To determine which SR proteins are responsible for AS, the splicing pattern of the GFP‐intron‐GFP reporter was investigated in protoplasts of three single and three double mutants of SR genes. These analyses revealed that SCL33 and a closely related paralog, SCL30a, are functionally redundant in generating specific splice variants from this intron. Furthermore, SCL33 protein bound to a conserved sequence in this intron, indicating auto-regulation of AS. Mutations in four GAAG repeats within the conserved region impaired generation of the same splice variants that are affected in the scl33 scl30a double mutant. In conclusion, we have identified the first intronic cis-element involved in AS of a plant SR gene, and elucidated a mechanism for auto-regulation of AS of this intron.
Published: 2012

87. Multiple instance learning of Calmodulin binding sites

Author: Asa Ben-Hur and Fayyaz ul Amir Afsar Minhas
Subjects: Statistics and Probability, Support Vector Machine, Calmodulin, CAM binding, Matlab code, Computational biology, computer.software_genre, Biochemistry, Set (abstract data type), Protein Interaction Domains and Motifs, Binding site, Molecular Biology, Binding Sites, biology, Arabidopsis Proteins, Original Papers, Calmodulin-binding proteins, Computer Science Applications, Support vector machine, Computational Mathematics, ComputingMethodologies_PATTERNRECOGNITION, Computational Theory and Mathematics, Macromolecular Structure, Dynamics and Function, biology.protein, Calmodulin-Binding Proteins, Data mining, Target database, computer
Abstract: Motivation: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important. Results: We present a novel algorithm (MI-1 SVM) for binding site prediction and evaluate its performance on a set of CaM-binding proteins extracted from the Calmodulin Target Database. Our approach directly models the problem of binding site prediction as a large-margin classification problem, and is able to take into account uncertainty in binding site location. We show that the proposed algorithm performs better than the standard SVM formulation, and illustrate its ability to recover known CaM binding motifs. A highly accurate cascaded classification approach using the proposed binding site prediction method to predict CaM binding proteins in Arabidopsis thaliana is also presented. Availability: Matlab code for training MI-1 SVM and the cascaded classification approach is available on request. Contact: fayyazafsar@gmail.com or asa@cs.colostate.edu
Published: 2012

88. HIERARCHICAL CLASSIFICATION OF GENE ONTOLOGY TERMS USING THE GOstruct METHOD

Author: Asa Ben-Hur and Artem Sokolov
Subjects: Computer science, Machine learning, computer.software_genre, Biochemistry, Mice, Annotation, Artificial Intelligence, Animals, Protein function prediction, Databases, Protein, Molecular Biology, Sequence, Models, Statistical, Models, Genetic, Hierarchy (mathematics), business.industry, Computational Biology, Proteins, Function (mathematics), Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Kernel method, Binary classification, Benchmark (computing), Neural Networks, Computer, Artificial intelligence, Data mining, business, Sequence Alignment, computer, Algorithms
Abstract: Protein function prediction is an active area of research in bioinformatics. Yet, the transfer of annotation on the basis of sequence or structural similarity remains widely used as an annotation method. Most of today's machine learning approaches reduce the problem to a collection of binary classification problems: whether a protein performs a particular function, sometimes with a post-processing step to combine the binary outputs. We propose a method that directly predicts a full functional annotation of a protein by modeling the structure of the Gene Ontology hierarchy in the framework of kernel methods for structured-output spaces. Our empirical results show improved performance over a BLAST nearest-neighbor method, and over algorithms that employ a collection of binary classifiers as measured on the Mousefunc benchmark dataset.
Published: 2010

89. A close look at protein function prediction evaluation protocols

Author: Asa Ben-Hur, Fahad Ullah, Karin Verspoor, Christopher S. Funk, and Indika Kahanda
Subjects: Support vector machines, Structured support vector machine, Computer science, Research, media_common.quotation_subject, Proteins, Automated function prediction, Contrast (statistics), Health Informatics, computer.software_genre, Computer Science Applications, Task (project management), Support vector machine, Gene Ontology, Ranking, Machine learning, Protein function prediction, Data mining, Databases, Protein, Critical Assessment of Function Annotation, Function (engineering), computer, media_common
Abstract: Background The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance. Results The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods. Conclusions These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions. Electronic supplementary material The online version of this article (doi:10.1186/s13742-015-0082-5) contains supplementary material, which is available to authorized users.
Published: 2015

90. Genotypic predictors of human immunodeficiency virus type 1 drug resistance

Author: Robert W. Shafer, Soo-Yon Rhee, Douglas L. Brutlag, Jonathan Taylor, Gauhar Wadhera, and Asa Ben-Hur
Subjects: Genetics, Multidisciplinary, Genotype, Least-angle regression, Drug resistance, Biology, Regression, Reverse transcriptase, Phenotype, Drug class, Physical Sciences, Drug Resistance, Viral, Mutation, Linear regression, Mutation (genetic algorithm), HIV-1, Reverse Transcriptase Inhibitors
Abstract: Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations toin vitrosusceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype–phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in ≥2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype–phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.
Published: 2006

91. Large-scale identification of yeast integral membrane protein interactions

Author: John P. Miller, Asa Ben-Hur, William Stafford Noble, Igor Stagljar, Stanley Fields, Russell S. Lo, and Cynthia Desmarais
Subjects: Proteomics, Genetics, Saccharomyces cerevisiae Proteins, Multidisciplinary, biology, Ubiquitin, Low Confidence, Saccharomyces cerevisiae, Membrane Proteins, Computational biology, Biological Sciences, biology.organism_classification, Yeast, Support vector machine, Membrane protein, Protein Interaction Mapping, Identification (biology), Integral membrane protein, Algorithms
Abstract: We carried out a large-scale screen to identify interactions between integral membrane proteins of Saccharomyces cerevisiae by using a modified split-ubiquitin technique. Among 705 proteins annotated as integral membrane, we identified 1,985 putative interactions involving 536 proteins. To ascribe confidence levels to the interactions, we used a support vector machine algorithm to classify interactions based on the assay results and protein data derived from the literature. Previously identified and computationally supported interactions were used to train the support vector machine, which identified 131 interactions of highest confidence, 209 of the next highest confidence, 468 of the next highest, and the remaining 1,085 of low confidence. This study provides numerous putative interactions among a class of proteins that have been difficult to analyze on a high-throughput basis by other approaches. The results identify potential previously undescribed components of established biological processes and roles for integral membrane proteins of ascribed functions.
Published: 2005

92. CREME: Cis-Regulatory Module Explorer for the human genome

Author: Asa Ben-Hur, Ivan Ovcharenko, Gabriela G. Loots, and Roded Sharan
Subjects: Genetics, Internet, Binding Sites, Genome, Human, Computational Biology, Promoter, Articles, Biology, Genome, DNA binding site, User-Computer Interface, Gene Expression Regulation, Transcription (biology), Humans, Human genome, Promoter Regions, Genetic, Gene, Transcription factor, Software, Transcription Factors, Cis-regulatory module
Abstract: The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors whose binding sites are tightly clustered and form cis-regulatory modules. In this paper, we present a web server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this data set to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web server is available at http://creme.dcode.org.
Published: 2004

93. Computation in gene networks

Author: Hava T. Siegelmann and Asa Ben-Hur
Subjects: Computer science, Computation, Analog computer, Chaotic, Information Storage and Retrieval, General Physics and Astronomy, law.invention, Piecewise linear function, Computers, Molecular, Turing machine, symbols.namesake, law, Control theory, Robustness (computer science), Mathematical Physics, Models, Genetic, Applied Mathematics, Statistical and Nonlinear Physics, Nonlinear system, Metabolism, Gene Expression Regulation, Nonlinear Dynamics, Bounded function, symbols, Algorithm, Algorithms
Abstract: Genetic regulatory networks have the complex task of controlling all aspects of life. Using a model of gene expression by piecewise linear differential equations we show that this process can be considered as a process of computation. This is demonstrated by showing that this model can simulate memory bounded Turing machines. The simulation is robust with respect to perturbations of the system, an important property for both analog computers and biological systems. Robustness is achieved using a condition that ensures that the model equations, that are generally chaotic, follow a predictable dynamics.
Published: 2004

94. Random matrix theory for the analysis of the performance of an analog computer: a scaling theory

Author: Asa Ben-Hur, Shmuel Fishman, Hava T. Siegelmann, and Joshua Feinberg
Subjects: Physics, Dynamical systems theory, Computation, Cumulative distribution function, Quantum mechanics, Phase space, General Physics and Astronomy, Applied mathematics, Function (mathematics), Fixed point, Dynamical system, Random matrix
Abstract: The phase space flow of a dynamical system, leading to the solution of linear programming (LP) problems, is explored as an example of complexity analysis in an analog computation framework. In this framework, computation by physical devices and natural systems, evolving in continuous phase space and time (in contrast to the digital computer where these are discrete), is explored. A Gaussian ensemble of LP problems is studied. The convergence time of a flow to the fixed point representing the optimal solution, is computed. The cumulative distribution function of the convergence time is calculated in the framework of random matrix theory (RMT) in the asymptotic limit of large problem size. It is found to be a scaling function, of the form obtained in the theories of critical phenomena and Anderson localization. It demonstrates a correspondence between problems of computer science and physics.
Published: 2004

95. Probabilistic analysis of a differential equation for linear programming

Author: Joshua Feinberg, Asa Ben-Hur, Shmuel Fishman, and Hava T. Siegelmann
Subjects: FOS: Computer and information sciences, Statistics and Probability, Control and Optimization, Linear programming, Differential equation, General Mathematics, F.1.3, F.2, FOS: Physical sciences, Fixed point, Computational Complexity (cs.CC), Scaling, Random Matrix Theory, Dynamical systems, FOS: Mathematics, Probabilistic analysis of algorithms, Limit (mathematics), Mathematics - Optimization and Control, Condensed Matter - Statistical Mechanics, Mathematical Physics, Mathematics, Numerical Analysis, Algebra and Number Theory, Statistical Mechanics (cond-mat.stat-mech), Applied Mathematics, Mathematical analysis, Function (mathematics), Mathematical Physics (math-ph), Theory of Analog Computation, Computer Science - Computational Complexity, Rate of convergence, Optimization and Control (math.OC), Random matrix
Abstract: In this paper we address the complexity of solving linear programming problems with a set of differential equations that converge to a fixed point that represents the optimal solution. Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the distribution of the convergence rate to the attracting fixed point. Using the framework of Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic limit of large problem size. In this limit, we find that the distribution of the convergence rate is a scaling function, namely it is a function of one variable that is a combination of three parameters: the number of variables, the number of constraints and the convergence rate, rather than a function of these parameters separately. We also estimate numerically the distribution of computation times, namely the time required to reach a vicinity of the attracting fixed point, and find that it is also a scaling function. Using the problem size dependence of the distribution functions, we derive high probability bounds on the convergence rates and on the computation times., Comment: 1+37 pages, latex, 5 eps figures. Version accepted for publication in the Journal of Complexity. Changes made: Presentation reorganized for clarity, expanded discussion of measure of complexity in the non-asymptotic regime (added a new section)
Published: 2003
Full Text: View/download PDF

96. A Theory of Complexity for Continuous Time Systems

Author: Shmuel Fishman, Hava T. Siegelmann, and Asa Ben-Hur
Subjects: Statistics and Probability, Discrete mathematics, Numerical Analysis, Polynomial, Control and Optimization, Algebra and Number Theory, Dynamical systems theory, Applied Mathematics, General Mathematics, Model of computation, MathematicsofComputing_NUMERICALANALYSIS, Ode, dynamical systems, Structural complexity theory, Discrete time and continuous time, Flow (mathematics), ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION, Applied mathematics, Time complexity, theory of analog computation, Mathematics
Abstract: We present a model of computation with ordinary differential equations (ODEs) which converge to attractors that are interpreted as the output of a computation. We introduce a measure of complexity for exponentially convergent ODEs, enabling an algorithmic analysis of continuous time flows and their comparison with discrete algorithms. We define polynomial and logarithmic continuous time complexity classes and show that an ODE which solves the maximum network flow problem has polynomial time complexity. We also analyze a simple flow that solves the Maximum problem in logarithmic time. We conjecture that a subclass of the continuous P is equivalent to the classical P.
Published: 2002

97. Amino acid composition predicts prion activity

Author: Asa Ben-Hur, Eric D. Ross, and Fayyaz ul Amir Afsar Minhas
Subjects: 0301 basic medicine, Proteomes, Glutamine, Yeast and Fungal Models, Biochemistry, Prion Diseases, Machine Learning, Database and Informatics Methods, Yeasts, Zoonoses, Medicine and Health Sciences, QD, Asparagine, Computational analysis, lcsh:QH301-705.5, Peptide sequence, Mathematics, Ecology, Infectious Diseases, Experimental Organism Systems, Computational Theory and Mathematics, Amino acid composition, Modeling and Simulation, Saccharomyces Cerevisiae, Primary sequence, Sequence Analysis, Research Article, Amyloid, Computer and Information Sciences, Prions, Bioinformatics, Computational biology, Research and Analysis Methods, QA76, Saccharomyces, 03 medical and health sciences, Cellular and Molecular Neuroscience, Model Organisms, Protein Domains, Amino Acid Sequence Analysis, Artificial Intelligence, Support Vector Machines, Genetics, Code (cryptography), Humans, Amino Acid Sequence, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Sequence (medicine), Organisms, Fungi, Computational Biology, Biology and Life Sciences, Proteins, Composition (combinatorics), QP, Yeast, 030104 developmental biology, lcsh:Biology (General)
Abstract: Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136., Author summary The determinants of prion formation in proteins that are rich in glutamine and asparagine are still under debate: is the process driven by primary sequence or by amino acid composition? In 2015 Sabate et al. published a paper suggesting that the process is triggered by short amyloid-prone sequences. Their argument was based on the success of their pWALTZ classifier, which uses a database of short peptides with known amyloid forming propensities. To explore the validity of their argument we compared their original scoring matrices with shuffled scoring matrices, and found no decrease in accuracy, suggesting that the success of pWALTZ is the result of the ability of the scoring matrices to capture amino acid composition. Furthermore, we propose a novel machine learning approach with accuracy that is superior to all published prion prediction methods that are currently available, and uses sequence composition alone.
Published: 2017

98. Computational Complexity for Continuous Time Dynamics

Author: Shmuel Fishman, Asa Ben-Hur, and Hava T. Siegelmann
Subjects: Polynomial, Turing machine, symbols.namesake, Discrete time and continuous time, Computational complexity theory, Neuromorphic engineering, Dynamical systems theory, Computer science, Computation, symbols, General Physics and Astronomy, Algorithm, Quantum computer
Abstract: Dissipative flows model a large variety of physical systems. In this Letter the evolution of such systems is interpreted as a process of computation; the attractor of the dynamics represents the output. A framework for an algorithmic analysis of dissipative flows is presented, enabling the comparison of the performance of discrete and continuous time analog computation models. A simple algorithm for finding the maximum of n numbers is analyzed, and shown to be highly efficient. The notion of tractable (polynomial) computation in the Turing model is conjectured to correspond to computation with tractable (analytically solvable) dynamical systems having polynomial complexity. The computation of a digital computer, and its mathematical abstraction, the Turing machine is described by a map on a discrete configuration space. In recent years scientists have developed new approaches to computation, some of them based on continuous time analog systems. The most promising are neuromorphic systems [1], models of human memory [2], and experimentally realizable quantum computers [3]. Although continuous time systems are widespread in experimental realizations, no theory exists for their algorithmic analysis. The standard theory of computation and computational complexity [4] deals with computation in discrete time and in a discrete configuration space, and is inadequate for the description of such systems. This Letter describes an attempt to fill this gap. Our model of a computer is based on dissipa
Published: 1999

99. SpliceGrapherXT

Author: Mark F. Rogers, Asa Ben-Hur, and Christina Boucher
Subjects: Computer science, Simulated data, Web page, Leverage (statistics), splice, RNA-Seq, Data mining, computer.software_genre, Precision and recall, computer, Graph
Abstract: Predicting the structure of genes from RNA-Seq data remains a significant challenge in bioinformatics. Although the amount of data available for analysis is growing at an accelerating rate, the capability to leverage these data to construct complete gene models remains elusive. In addition, the tools that predict novel transcripts exhibit poor accuracy. We present a novel approach to predicting splice graphs from RNA-Seq data that uses patterns of acceptor and donor sites to recognize when novel exons can be predicted unequivocally. This simple approach achieves much higher precision and higher recall than methods like Cufflinks or IsoLasso when predicting novel exons from real and simulated data. The ambiguities that arise from RNA-Seq data can preclude making decisive predictions, so we use a realignment procedure that can predict additional novel exons while maintaining high precision. We show that these accurate splice graph predictions provide a suitable basis for making accurate transcript predictions using tools such as IsoLasso and PSGInfer. Using both real and simulated data, we show that this integrated method predicts transcripts with higher recall and precision than using these other tools alone, and in comparison to Cufflinks. SpliceGrapherXT is available from the SpliceGrapher web page at http://SpliceGrapher.sf.net.
Published: 2013

100. PAIRpred: partner-specific prediction of interacting residues from sequence and structure

Author: Fayyaz ul Amir Afsar, Minhas, Brian J, Geiss, and Asa, Ben-Hur
Subjects: Models, Molecular, Binding Sites, Support Vector Machine, Protein Conformation, Sequence Analysis, Protein, Computational Biology, Humans, Proteins, Software, Article, Protein Binding
Abstract: We present a novel partner-specific protein-protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to generate predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter-protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding-associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at: http://combi.cs.colostate.edu/supplements/pairpred/.
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

164 results on '"Asa Ben-Hur"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources