10 results on '"Duncan Penfold-Brown"'
Search Results
2. Parametric Bayesian priors and better choice of negative examples improve protein function prediction
- Author
-
Kevin Drew, Noah Youngs, Richard Bonneau, Dennis Shasha, and Duncan Penfold-Brown
- Subjects
Statistics and Probability ,Proteome ,Computer science ,Gene regulatory network ,Machine learning ,computer.software_genre ,Biochemistry ,Genome ,Mice ,Artificial Intelligence ,Yeasts ,Protein Interaction Mapping ,Animals ,Gene Regulatory Networks ,Protein function prediction ,Molecular Biology ,Parametric statistics ,Protein function ,business.industry ,Proteins ,Bayes Theorem ,Molecular Sequence Annotation ,Function (mathematics) ,Original Papers ,Yeast ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Key (cryptography) ,Data mining ,Artificial intelligence ,Heuristics ,business ,computer ,Algorithms - Abstract
Motivation: Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. Results: We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Availability: Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html Contact: shasha@courant.nyu.edu or bonneau@cs.nyu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2013
- Full Text
- View/download PDF
3. The mRNA-Bound Proteome and Its Global Occupancy Profile on Protein-Coding Transcripts
- Author
-
Kevin Drew, Noah Youngs, Markus Schueler, Markus Landthaler, Alexandra Vasile, Matthias Selbach, Mathias Munschauer, Emanuel Wyler, Alexander G. Baltz, Björn Schwanhäusser, Christoph Dieterich, Duncan Penfold-Brown, Miha Milek, Richard Bonneau, and Yasuhiro Murakawa
- Subjects
Genetics ,Proteomics ,Messenger RNA ,Binding Sites ,Sequence analysis ,Sequence Analysis, RNA ,Quantitative proteomics ,RNA ,RNA-Binding Proteins ,Computational biology ,Cell Biology ,Biology ,Mass Spectrometry ,Cell Line ,RNA splicing ,Proteome ,Humans ,RNA, Messenger ,Binding site ,ICLIP ,Molecular Biology - Abstract
Protein-RNA interactions are fundamental to core biological processes, such as mRNA splicing, localization, degradation, and translation. We developed a photoreactive nucleotide-enhanced UV crosslinking and oligo(dT) purification approach to identify the mRNA-bound proteome using quantitative proteomics and to display the protein occupancy on mRNA transcripts by next-generation sequencing. Application to a human embryonic kidney cell line identified close to 800 proteins. To our knowledge, nearly one-third were not previously annotated as RNA binding, and about 15% were not predictable by computational methods to interact with RNA. Protein occupancy profiling provides a transcriptome-wide catalog of potential cis-regulatory regions on mammalian mRNAs and showed that large stretches in 3' UTRs can be contacted by the mRNA-bound proteome, with numerous putative binding sites in regions harboring disease-associated nucleotide polymorphisms. Our observations indicate the presence of a large number of mRNA binders with diverse molecular functions participating in combinatorial posttranscriptional gene-expression networks.
- Published
- 2012
- Full Text
- View/download PDF
4. Big Data, Social Media, and Protest: Foundations for a Research Agenda
- Author
-
Richard Bonneau, Joshua A. Tucker, Megan MacDuffee Metzger, Pablo Barberá, Duncan Penfold-Brown, and Jonathan Nagler
- Subjects
Social psychology (sociology) ,business.industry ,05 social sciences ,Big data ,050801 communication & media studies ,Public relations ,0506 political science ,Slacktivism ,Geolocation ,0508 media and communications ,Political science ,050602 political science & public administration ,Survey data collection ,Social media ,business - Published
- 2016
- Full Text
- View/download PDF
5. TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT
- Author
-
Richard Bonneau, Yin Aphinyanaphongs, Paul Krebs, Armine Lulejian, and Duncan Penfold Brown
- Subjects
Support Vector Machine ,020205 medical informatics ,Computer science ,medicine.medical_treatment ,MEDLINE ,Pilot Projects ,02 engineering and technology ,Cigarette use ,Electronic Nicotine Delivery Systems ,Usage data ,Article ,03 medical and health sciences ,Bayes' theorem ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,Social media ,030212 general & internal medicine ,Computational Biology ,Bayes Theorem ,Data science ,Support vector machine ,Logistic Models ,Smoking cessation ,Feasibility Studies ,Smoking Cessation ,Classifier (UML) ,Social Media ,Algorithms - Abstract
Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.
- Published
- 2016
6. An expanded evaluation of protein function prediction methods shows an improvement in accuracy
- Author
-
Yuxiang Jiang, Tal Ronnen Oron, Wyatt T. Clark, Asma R. Bankapur, Daniel D’Andrea, Rosalba Lepore, Christopher S. Funk, Indika Kahanda, Karin M. Verspoor, Asa Ben-Hur, Da Chen Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed M. E. Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca, Christophe Dessimoz, Tunca Dogan, Kai Hakala, Suwisa Kaewphan, Farrokh Mehryary, Tapio Salakoski, Filip Ginter, Hai Fang, Ben Smithers, Matt Oates, Julian Gough, Petri Törönen, Patrik Koskinen, Liisa Holm, Ching-Tai Chen, Wen-Lian Hsu, Kevin Bryson, Domenico Cozzetto, Federico Minneci, David T. Jones, Samuel Chapman, Dukka BKC, Ishita K. Khan, Daisuke Kihara, Dan Ofer, Nadav Rappoport, Amos Stern, Elena Cibrian-Uhalte, Paul Denny, Rebecca E. Foulger, Reija Hieta, Duncan Legge, Ruth C. Lovering, Michele Magrane, Anna N. Melidoni, Prudence Mutowo-Meullenet, Klemens Pichler, Aleksandra Shypitsyna, Biao Li, Pooya Zakeri, Sarah ElShal, Léon-Charles Tranchevent, Sayoni Das, Natalie L. Dawson, David Lee, Jonathan G. Lees, Ian Sillitoe, Prajwal Bhat, Tamás Nepusz, Alfonso E. Romero, Rajkumar Sasidharan, Haixuan Yang, Alberto Paccanaro, Jesse Gillis, Adriana E. Sedeño-Cortés, Paul Pavlidis, Shou Feng, Juan M. Cejuela, Tatyana Goldberg, Tobias Hamp, Lothar Richter, Asaf Salamov, Toni Gabaldon, Marina Marcet-Houben, Fran Supek, Qingtian Gong, Wei Ning, Yuanpeng Zhou, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Stefano Toppo, Carlo Ferrari, Manuel Giollo, Damiano Piovesan, Silvio C.E. Tosatto, Angela del Pozo, José M. Fernández, Paolo Maietta, Alfonso Valencia, Michael L. Tress, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino, Hafeez Ur Rehman, Matteo Re, Marco Mesiti, Giorgio Valentini, Joachim W. Bargsten, Aalt D. J. van Dijk, Branislava Gemovic, Sanja Glisic, Vladmir Perovic, Veljko Veljkovic, Nevena Veljkovic, Danillo C. Almeida-e-Silva, Ricardo Z. N. Vencio, Malvika Sharan, Jörg Vogel, Lakesh Kansakar, Shanshan Zhang, Slobodan Vucetic, Zheng Wang, Michael J. E. Sternberg, Mark N. Wass, Rachael P. Huntley, Maria J. Martin, Claire O’Donovan, Peter N. Robinson, Yves Moreau, Anna Tramontano, Patricia C. Babbitt, Steven E. Brenner, Michal Linial, Christine A. Orengo, Burkhard Rost, Casey S. Greene, Sean D. Mooney, Iddo Friedberg, Predrag Radivojac, Jiang, Yuxiang, Oron, Tal Ronnen, Clark, Wyatt T., Bankapur, Asma R., D’Andrea, Daniel, Lepore, Rosalba, Funk, Christopher S., Kahanda, Indika, Verspoor, Karin M., Ben-Hur, Asa, Koo, Da Chen Emily, Penfold-Brown, Duncan, Shasha, Denni, Youngs, Noah, Bonneau, Richard, Lin, Alexandra, Sahraeian, Sayed M. E., Martelli, Pier Luigi, Profiti, Giuseppe, Casadio, Rita, Cao, Renzhi, Zhong, Zhaolong, Cheng, Jianlin, Altenhoff, Adrian, Skunca, Nive, Dessimoz, Christophe, Dogan, Tunca, Hakala, Kai, Kaewphan, Suwisa, Mehryary, Farrokh, Salakoski, Tapio, Ginter, Filip, Fang, Hai, Smithers, Ben, Oates, Matt, Gough, Julian, Törönen, Petri, Koskinen, Patrik, Holm, Liisa, Chen, Ching-Tai, Hsu, Wen-Lian, Bryson, Kevin, Cozzetto, Domenico, Minneci, Federico, Jones, David T., Chapman, Samuel, Bkc, Dukka, Khan, Ishita K., Kihara, Daisuke, Ofer, Dan, Rappoport, Nadav, Stern, Amo, Cibrian-Uhalte, Elena, Denny, Paul, Foulger, Rebecca E., Hieta, Reija, Legge, Duncan, Lovering, Ruth C., Magrane, Michele, Melidoni, Anna N., Mutowo-Meullenet, Prudence, Pichler, Klemen, Shypitsyna, Aleksandra, Li, Biao, Zakeri, Pooya, Elshal, Sarah, Tranchevent, Léon-Charle, Das, Sayoni, Dawson, Natalie L., Lee, David, Lees, Jonathan G., Sillitoe, Ian, Bhat, Prajwal, Nepusz, Tamá, Romero, Alfonso E., Sasidharan, Rajkumar, Yang, Haixuan, Paccanaro, Alberto, Gillis, Jesse, Sedeño-Cortés, Adriana E., Pavlidis, Paul, Feng, Shou, Cejuela, Juan M., Goldberg, Tatyana, Hamp, Tobia, Richter, Lothar, Salamov, Asaf, Gabaldon, Toni, Marcet-Houben, Marina, Supek, Fran, Gong, Qingtian, Ning, Wei, Zhou, Yuanpeng, Tian, Weidong, Falda, Marco, Fontana, Paolo, Lavezzo, Enrico, Toppo, Stefano, Ferrari, Carlo, Giollo, Manuel, Piovesan, Damiano, Tosatto, Silvio C.E., del Pozo, Angela, Fernández, José M., Maietta, Paolo, Valencia, Alfonso, Tress, Michael L., Benso, Alfredo, Di Carlo, Stefano, Politano, Gianfranco, Savino, Alessandro, Rehman, Hafeez Ur, Re, Matteo, Mesiti, Marco, Valentini, Giorgio, Bargsten, Joachim W., van Dijk, Aalt D. J., Gemovic, Branislava, Glisic, Sanja, Perovic, Vladmir, Veljkovic, Veljko, Veljkovic, Nevena, Almeida-e-Silva, Danillo C., Vencio, Ricardo Z. N., Sharan, Malvika, Vogel, Jörg, Kansakar, Lakesh, Zhang, Shanshan, Vucetic, Slobodan, Wang, Zheng, Sternberg, Michael J. E., Wass, Mark N., Huntley, Rachael P., Martin, Maria J., O’Donovan, Claire, Robinson, Peter N., Moreau, Yve, Tramontano, Anna, Babbitt, Patricia C., Brenner, Steven E., Linial, Michal, Orengo, Christine A., Rost, Burkhard, Greene, Casey S., Mooney, Sean D., Friedberg, Iddo, Radivojac, Predrag, Friedberg, Iddo [0000-0002-1789-8000], Apollo - University of Cambridge Repository, (ukupan broj autora: 147), Biotechnology and Biological Sciences Research Council (BBSRC), National Science Foundation (Estados Unidos), United States of Department of Health & Human Services, National Natural Science Foundation of China, Natural Sciences and Engineering Research Council (Canadá), São Paulo Research Foundation, Ministerio de Economía y Competitividad (España), Biotechnology and Biological Sciences Research Council (Reino Unido), Katholieke Universiteit Leuven (Bélgica), Newton International Fellowship Scheme of the Royal Society grant, British Heart Foundation, Ministry of Education, Science and Technological Development (Serbia), Office of Biological and Environmental Research (Estados Unidos), Australian Research Council, University of Padua (Italia), Swiss National Science Foundation, Institute of Biotechnology, Computational genomics, and Bioinformatics
- Subjects
0301 basic medicine ,Computer science ,Disease gene prioritization ,Protein function prediction ,Ecology, Evolution, Behavior and Systematics ,Genetics ,Cell Biology ,05 Environmental Sciences ,600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit ,computer.software_genre ,Quantitative Biology - Quantitative Methods ,Wiskundige en Statistische Methoden - Biometris ,Field (computer science) ,Laboratorium voor Plantenveredeling ,Function (engineering) ,Databases, Protein ,1183 Plant biology, microbiology, virology ,Quantitative Methods (q-bio.QM) ,media_common ,Genetics & Heredity ,Settore BIO/11 - BIOLOGIA MOLECOLARE ,Ecology ,SISTA ,1184 Genetics, developmental biology, physiology ,Life Sciences & Biomedicine ,Algorithms ,Bioinformatics ,Evolution ,media_common.quotation_subject ,BIOINFORMÁTICA ,Machine learning ,Bottleneck ,Set (abstract data type) ,BIOS Applied Bioinformatics ,03 medical and health sciences ,Annotation ,Structure-Activity Relationship ,Behavior and Systematics ,Human Phenotype Ontology ,Humans ,ddc:610 ,DISINTEGRIN ,Mathematical and Statistical Methods - Biometris ,BIOINFORMATICS ,08 Information And Computing Sciences ,Science & Technology ,business.industry ,Research ,ADAM ,Proteins ,Computational Biology ,Molecular Sequence Annotation ,06 Biological Sciences ,Data set ,ONTOLOGY ,Plant Breeding ,030104 developmental biology ,Gene Ontology ,Biotechnology & Applied Microbiology ,FOS: Biological sciences ,Artificial intelligence ,business ,computer ,Software - Abstract
BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent., We acknowledge the contributions of Maximilian Hecht, Alexander Grün, Julia Krumhoff, My Nguyen Ly, Jonathan Boidol, Rene Schoeffel, Yann Spöri, Jessika Binder, Christoph Hamm and Karolina Worf. This work was partially supported by the following grants: National Science Foundation grants DBI-1458477 (PR), DBI-1458443 (SDM), DBI-1458390 (CSG), DBI-1458359 (IF), IIS-1319551 (DK), DBI-1262189 (DK), and DBI-1149224 (JC); National Institutes of Health grants R01GM093123 (JC), R01GM097528 (DK), R01GM076990 (PP), R01GM071749 (SEB), R01LM009722 (SDM), and UL1TR000423 (SDM); the National Natural Science Foundation of China grants 3147124 (WT) and 91231116 (WT); the National Basic Research Program of China grant 2012CB316505 (WT); NSERC grant RGPIN 371348-11 (PP); FP7 infrastructure project TransPLANT Award 283496 (ADJvD); Microsoft Research/FAPESP grant 2009/53161-6 and FAPESP fellowship 2010/50491-1 (DCAeS); Biotechnology and Biological Sciences Research Council grants BB/L020505/1 (DTJ), BB/F020481/1 (MJES), BB/K004131/1 (AP), BB/F00964X/1 (AP), and BB/L018241/1 (CD); the Spanish Ministry of Economics and Competitiveness grant BIO2012-40205 (MT); KU Leuven CoE PFV/10/016 SymBioSys (YM); the Newton International Fellowship Scheme of the Royal Society grant NF080750 (TN). CSG was supported in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative grant GBMF4552. Computational resources were provided by CSC – IT Center for Science Ltd., Espoo, Finland (TS). This work was supported by the Academy of Finland (TS). RCL and ANM were supported by British Heart Foundation grant RG/13/5/30112. PD, RCL, and REF were supported by Parkinson’s UK grant G-1307, the Alexander von Humboldt Foundation through the German Federal Ministry for Education and Research, Ernst Ludwig Ehrlich Studienwerk, and the Ministry of Education, Science and Technological Development of the Republic of Serbia grant 173001. This work was a Technology Development effort for ENIGMA – Ecosystems and Networks Integrated with Genes and Molecular Assemblies (http://enigma.lbl.gov), a Scientific Focus Area Program at Lawrence Berkeley National Laboratory, which is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research grant DE-AC02-05CH11231. ENIGMA only covers the application of this work to microbial proteins. NSF DBI-0965616 and Australian Research Council grant DP150101550 (KMV). NSF DBI-0965768 (ABH). NIH T15 LM00945102 (training grant for CSF). FP7 FET grant MAESTRA ICT-2013-612944 and FP7 REGPOT grant InnoMol (FS). NIH R01 GM60595 (PCB). University of Padova grants CPDA138081/13 (ST) and GRIC13AAI9 (EL). Swiss National Science Foundation grant 150654 and UK BBSRC grant BB/M015009/1 (COD). PRB2 IPT13/0001 - ISCIII-SGEFI / FEDER (JMF)., This is the final version of the article. It first appeared from BioMed Central at http://dx.doi.org/10.1186/s13059-016-1037-6.
- Published
- 2016
7. Negative example selection for protein function prediction: the NoGO database
- Author
-
Richard Bonneau, Noah Youngs, Duncan Penfold-Brown, and Dennis Shasha
- Subjects
Topic model ,Computer and Information Sciences ,Saccharomyces cerevisiae Proteins ,Proteome ,Computer science ,computer.software_genre ,Cellular and Molecular Neuroscience ,Annotation ,Mice ,Artificial Intelligence ,Databases, Genetic ,Genetics ,Animals ,Humans ,Protein function prediction ,Molecular Biology ,lcsh:QH301-705.5 ,Ecology, Evolution, Behavior and Systematics ,Genome ,Ecology ,Database ,Arabidopsis Proteins ,Applied Mathematics ,Conditional probability ,Computational Biology ,Proteins ,Biology and Life Sciences ,Molecular Sequence Annotation ,Gene Annotation ,Gene Ontology ,Computational Theory and Mathematics ,lcsh:Biology (General) ,Modeling and Simulation ,Physical Sciences ,Data mining ,Heuristics ,Open-world assumption ,computer ,Algorithms ,Mathematics ,Research Article - Abstract
Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html)., Author Summary Many machine learning methods have been applied to the task of predicting the biological function of proteins based on a variety of available data. The majority of these methods require negative examples: proteins that are known not to perform a function, in order to achieve meaningful predictions, but negative examples are often not available. In addition, past heuristic methods for negative example selection suffer from a high error rate. Here, we rigorously compare two novel algorithms against past heuristics, as well as some algorithms adapted from a similar task in text-classification. Through this comparison, performed on several different benchmarks, we demonstrate that our algorithms make significantly fewer mistakes when predicting negative examples. We also provide a database of negative examples for general use in machine learning for protein function prediction (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).
- Published
- 2014
8. The plant proteome folding project: structure and positive selection in plant protein families
- Author
-
Kevin Drew, Apurva Narechania, Michael D. Purugganan, Patrick Winters, Rob DeSalle, Duncan Penfold-Brown, Richard Bonneau, and Melissa M. Pentony
- Subjects
Protein Folding ,Protein family ,Proteome ,Protein domain ,plant evolution ,adaptation ,Biology ,Evolution, Molecular ,03 medical and health sciences ,0302 clinical medicine ,Protein structure ,Genetics ,Protein function prediction ,fold prediction ,Selection, Genetic ,protein structure ,Ecology, Evolution, Behavior and Systematics ,Research Articles ,030304 developmental biology ,Plant Proteins ,2. Zero hunger ,0303 health sciences ,Structural Classification of Proteins database ,Protein structure prediction ,Plant protein ,Threading (protein sequence) ,030217 neurology & neurosurgery - Abstract
Despite its importance, relatively little is known about the relationship between the structure, function, and evolution of proteins, particularly in land plant species. We have developed a database with predicted protein domains for five plant proteomes (http://pfp.bio.nyu.edu) and used both protein structural fold recognition and de novo Rosetta-based protein structure prediction to predict protein structure for Arabidopsis and rice proteins. Based on sequence similarity, we have identified ~15,000 orthologous/paralogous protein family clusters among these species and used codon-based models to predict positive selection in protein evolution within 175 of these sequence clusters. Our results show that codons that display positive selection appear to be less frequent in helical and strand regions and are overrepresented in amino acid residues that are associated with a change in protein secondary structure. Like in other organisms, disordered protein regions also appear to have more selected sites. Structural information provides new functional insights into specific plant proteins and allows us to map positively selected amino acid sites onto protein structures and view these sites in a structural and functional context.
- Published
- 2012
9. CANFAR: the Canadian Advanced Network for Astronomical Research
- Author
-
E. L. Chapin, Sharon Goliath, Brian Major, Michael Peddle, Nicholas M. Ball, Adrian Damian, John Ouellette, Chris Pritchet, Jeff Burke, Norman Hill, P Armstrong, Brian Chapel, Alinga Yeung, Ian Gable, Michael Paterson, Yuehai Zhang, Duncan Penfold-Brown, Pat Dowler, David Woods, Séverin Gaudet, Isabella Ghiurea, Dustin Jenkins, Stephen Gwyn, Sebastien Fabbro, D. Schade, J. J. Kavelaars, and Randall Sobie
- Subjects
Computer science ,business.industry ,Provisioning ,Cloud computing ,Virtual observatory ,computer.software_genre ,Virtualization ,World Wide Web ,Cyberinfrastructure ,Data center ,Web service ,business ,computer ,Cloud storage - Abstract
The Canadian Advanced Network For Astronomical Research (CANFAR) is a 2 1/2-year project that is delivering a network-enabled platform for the accessing, processing, storage, analysis, and distribution of very large astronomical datasets. The CANFAR infrastructure is being implemented as an International Virtual Observatory Alliance (IVOA) compliant web service infrastructure. A challenging feature of the project is to channel all survey data through Canadian research cyberinfrastructure. Sitting behind the portal service, the internal architecture makes use of high-speed networking, cloud computing, cloud storage, meta-scheduling, provisioning and virtualisation. This paper describes the high-level architecture and the current state of the project., SPIE 7740, Software and Cyberinfrastructure for Astronomy, June 27th 2010, San Diego CA, USA, Series: Proceedings of SPIE; no. 7740
- Published
- 2010
- Full Text
- View/download PDF
10. Research computing in a distributed cloud environment
- Author
-
Chris Pritchet, P Armstrong, S Goliath, S. Gaudet, K Fransham, Duncan Penfold-Brown, Randall Sobie, W Podaima, Ron Desmarais, Michael Paterson, Roger Impey, Colin Leavett-Brown, A Agarwal, Andre Charbonneau, A. Bishop, N. Hill, D. Schade, Ian Gable, and J. Ouellete
- Subjects
History ,Cloud resources ,Computer science ,business.industry ,Distributed computing ,Cloud computing ,computer.software_genre ,Computer Science Applications ,Education ,Early results ,Software deployment ,Virtual machine ,Order (business) ,Cloud testing ,Batch processing ,business ,computer - Abstract
The recent increase in availability of Infrastructure-as-a-Service (IaaS) computing clouds provides a new way for researchers to run complex scientific applications. However, using cloud resources for a large number of research jobs requires significant effort and expertise. Furthermore, running jobs on many different clouds presents even more difficulty. In order to make it easy for researchers to deploy scientific applications across many cloud resources, we have developed a virtual machine resource manager (Cloud Scheduler) for distributed compute clouds. In response to a user's job submission to a batch system, the Cloud Scheduler manages the distribution and deployment of user-customized virtual machines across multiple clouds. We describe the motivation for and implementation of a distributed cloud using the Cloud Scheduler that is spread across both commercial and dedicated private sites, and present some early results of scientific data analysis using the system.
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.