13 results on '"Evan M. Cofer"'
Search Results
2. A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type.
- Author
-
Alan Le Goallec, Braden T Tierney, Jacob M Luber, Evan M Cofer, Aleksandar D Kostic, and Chirag J Patel
- Subjects
Biology (General) ,QH301-705.5 - Abstract
The microbiome is a new frontier for building predictors of human phenotypes. However, machine learning in the microbiome is fraught with issues of reproducibility, driven in large part by the wide range of analytic models and metagenomic data types available. We aimed to build robust metagenomic predictors of host phenotype by comparing prediction performances and biological interpretation across 8 machine learning methods and 4 different types of metagenomic data. Using 1,570 samples from 300 infants, we fit 7,865 models for 6 host phenotypes. We demonstrate the dependence of accuracy on algorithm choice and feature definition in microbiome data and propose a framework for building microbiome-derived indicators of host phenotype. We additionally identify biological features predictive of age, sex, breastfeeding status, historical antibiotic usage, country of origin, and delivery type. Our complete results can be viewed at http://apps.chiragjpgroup.org/ubiome_predictions/.
- Published
- 2020
- Full Text
- View/download PDF
3. Saturating the eQTL map inDrosophila melanogaster: genome-wide patterns of cis and trans regulation of transcriptional variation in outbred populations
- Author
-
Luisa F. Pallares, Diogo Melo, Scott Wolf, Evan M. Cofer, Varada Abhyankar, Julie Peng, and Julien F. Ayroles
- Abstract
Decades of genome-wide mapping have shown that most genetic polymorphisms associated with complex traits are found in non-coding regions of the genome. Characterizing the effect of such genetic variation presents a formidable challenge, and eQTL mapping has been a key approach to understand the non-coding genome. However, comprehensive eQTL maps are available only for a few species like yeast and humans. With the aim of understanding the genetic landscape that regulates transcriptional variation inDrosophila melanogaster, we developed an outbred mapping panel in this species, theDrosophilaOutbred Synthetic Panel (Dros-OSP). Using this community resource, we collected transcriptomic and genomic data for 1800 individual flies and were able to mapcisandtranseQTLs for 98% of the genes expressed inD. melanogaster, increasing by thousands the number of genes for which regulatory loci are known in this species. We described, for the first time in the context of an outbred population, the properties of local and distal regulation of gene expression in terms of genetic diversity, heritability, connectivity, and pleiotropy. We uncovered that, contrary to long-standing assumptions, a significant part of gene co-expression networks is organized in a non-modular fashion. These results bring the fruit fly to the level of understanding that was only available for a few other organisms, and offer a new mapping resource that will expand the possibilities currently available to theDrosophilacommunity. This data is available atDrosophilaeqtl.org.
- Published
- 2023
4. Selene: a PyTorch-based deep learning library for sequence data
- Author
-
Olga G. Troyanskaya, Kathleen M. Chen, Jian Zhou, and Evan M. Cofer
- Subjects
Normal Distribution ,Machine learning ,computer.software_genre ,Biochemistry ,Article ,03 medical and health sciences ,Deep Learning ,Data sequences ,Software ,Alzheimer Disease ,Area under curve ,Humans ,Architecture ,Molecular Biology ,Gene Library ,030304 developmental biology ,0303 health sciences ,Models, Statistical ,Artificial neural network ,business.industry ,Extramural ,Deep learning ,Computational Biology ,Genomics ,Sequence Analysis, DNA ,Cell Biology ,Mutagenesis ,Area Under Curve ,Mutation ,Programming Languages ,Neural Networks, Computer ,Artificial intelligence ,business ,computer ,Algorithms ,Biotechnology - Abstract
To enable the application of deep learning in biology, we present Selene (https://selene.flatironinstitute.org/), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequences. We demonstrate how Selene allows researchers to easily train a published architecture on new data, develop and evaluate a new architecture, and use a trained model to answer biological questions of interest., Reporting Summary Further information on research design is available in the Life Sciences Reporting Summary linked to this article.
- Published
- 2019
5. Ten Quick Tips for Deep Learning in Biology
- Author
-
Benjamin D. Lee, Anthony Gitter, Casey S. Greene, Sebastian Raschka, Finlay Maguire, Alexander J. Titus, Michael D. Kessler, Alexandra J. Lee, Marc G. Chevrette, Paul Allen Stewart, Thiago Britto-Borges, Evan M. Cofer, Kun-Hsing Yu, Juan Jose Carmona, Elana J. Fertig, Alexandr A. Kalinin, Brandon Signal, Benjamin J. Lengerich, Timothy J. Triche, and Simina M. Boca
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Ecology ,Computational Biology ,Other Quantitative Biology (q-bio.OT) ,Quantitative Biology - Other Quantitative Biology ,Machine Learning (cs.LG) ,Cellular and Molecular Neuroscience ,Deep Learning ,Computational Theory and Mathematics ,FOS: Biological sciences ,Modeling and Simulation ,Genetics ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics - Abstract
Machine learning is a modern approach to problem-solving and task automation. In particular, machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Artificial neural networks are a particular class of machine learning algorithms and models that evolved into what is now described as deep learning. Given the computational advances made in the last decade, deep learning can now be applied to massive data sets and in innumerable contexts. Therefore, deep learning has become its own subfield of machine learning. In the context of biological research, it has been increasingly used to derive novel insights from high-dimensional biological data. To make the biological applications of deep learning more accessible to scientists who have some experience with machine learning, we solicited input from a community of researchers with varied biological and deep learning interests. These individuals collaboratively contributed to this manuscript's writing using the GitHub version control platform and the Manubot manuscript generation toolset. The goal was to articulate a practical, accessible, and concise set of guidelines and suggestions to follow when using deep learning. In the course of our discussions, several themes became clear: the importance of understanding and applying machine learning fundamentals as a baseline for utilizing deep learning, the necessity for extensive model comparisons with careful evaluation, and the need for critical thought in interpreting results generated by deep learning, among others., 23 pages, 2 figures
- Published
- 2021
6. AMBIENT: Accelerated Convolutional Neural Network Architecture Search for Regulatory Genomics
- Author
-
Zijun Zhang, Evan M. Cofer, and Olga G. Troyanskaya
- Subjects
Computer science ,business.industry ,Deep learning ,Genomics ,Artificial intelligence ,Architecture ,business ,Machine learning ,computer.software_genre ,Convolutional neural network ,computer - Abstract
Convolutional neural networks (CNN) have become a standard approach for modeling genomic sequences. CNNs can be effectively built by Neural Architecture Search (NAS) by trading computing power for accurate neural architectures. Yet, the consumption of immense computing power is a major practical, financial, and environmental issue for deep learning. Here, we present a novel NAS framework, AMBIENT, that generates highly accurate CNN architectures for biological sequences of diverse functions, while substantially reducing the computing cost of conventional NAS.
- Published
- 2021
7. Translating genetic risk variants in disease‐associated enhancers into novel mouse models of Alzheimer’s disease
- Author
-
Gregory A. Cary, Gregory W. Carter, Olga G. Troyanskaya, Lara M. Mangravite, Xi Chen, Ben Logsdon, Chandra L. Theesfeld, Kevin P. Kotredes, Michael Sasner, Evan M. Cofer, Dylan Garceau, Ravi S. Pandey, Christoph Preuss, Asli Uyar, Kathleen M. Chen, and Gareth R. Howell
- Subjects
Psychiatry and Mental health ,Cellular and Molecular Neuroscience ,Developmental Neuroscience ,Epidemiology ,Health Policy ,Neurology (clinical) ,Computational biology ,Disease ,Geriatrics and Gerontology ,Biology ,Genetic risk ,Enhancer ,Analysis method - Published
- 2020
8. Modeling transcriptional regulation of model species with deep learning
- Author
-
Alicja Tadych, Evan M. Cofer, Yuji Yamazaki, Michael Levine, João Raimundo, Aaron K. Wong, Chandra L. Theesfeld, and Olga G. Troyanskaya
- Subjects
Resource ,Danio ,Computational biology ,03 medical and health sciences ,Mice ,0302 clinical medicine ,Deep Learning ,Genetics ,Transcriptional regulation ,Animals ,Caenorhabditis elegans ,Transcription factor ,Genetics (clinical) ,Zebrafish ,030304 developmental biology ,0303 health sciences ,biology ,business.industry ,Deep learning ,biology.organism_classification ,Chromatin ,Histone ,Drosophila melanogaster ,Gene Expression Regulation ,biology.protein ,Artificial intelligence ,business ,030217 neurology & neurosurgery - Abstract
To enable large-scale analyses of transcription regulation in model species, we developed DeepArk, a set of deep learning models of the cis-regulatory activities for four widely studied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus. DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed) and enables the regulatory annotation of understudied model species.
- Published
- 2020
9. A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type
- Author
-
Evan M. Cofer, Braden T. Tierney, Chirag J. Patel, Aleksandar Kostic, Jacob M. Luber, and Alan Le Goallec
- Subjects
0301 basic medicine ,Male ,Maternal Health ,Breastfeeding ,computer.software_genre ,Pediatrics ,Machine Learning ,0302 clinical medicine ,Mathematical and Statistical Techniques ,Antibiotics ,Feature (machine learning) ,Medicine and Health Sciences ,Biology (General) ,Data Management ,Ecology ,Geography ,Antimicrobials ,Applied Mathematics ,Simulation and Modeling ,Statistics ,Drugs ,Genome project ,Genomics ,Anti-Bacterial Agents ,Breast Feeding ,Computational Theory and Mathematics ,Medical Microbiology ,Modeling and Simulation ,Physical Sciences ,Female ,Algorithms ,Research Article ,Computer and Information Sciences ,QH301-705.5 ,Microbial Genomics ,Machine learning ,Research and Analysis Methods ,Data type ,Microbiology ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Machine Learning Algorithms ,Artificial Intelligence ,Microbial Control ,Genetics ,Humans ,Microbiome ,Statistical Methods ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Taxonomy ,Pharmacology ,business.industry ,Biology and Life Sciences ,Infant ,Models, Theoretical ,Country of origin ,030104 developmental biology ,Metagenomics ,Women's Health ,Artificial intelligence ,Neonatology ,business ,computer ,Breast feeding ,030217 neurology & neurosurgery ,Mathematics ,Forecasting - Abstract
The microbiome is a new frontier for building predictors of human phenotypes. However, machine learning in the microbiome is fraught with issues of reproducibility, driven in large part by the wide range of analytic models and metagenomic data types available. We aimed to build robust metagenomic predictors of host phenotype by comparing prediction performances and biological interpretation across 8 machine learning methods and 4 different types of metagenomic data. Using 1,570 samples from 300 infants, we fit 7,865 models for 6 host phenotypes. We demonstrate the dependence of accuracy on algorithm choice and feature definition in microbiome data and propose a framework for building microbiome-derived indicators of host phenotype. We additionally identify biological features predictive of age, sex, breastfeeding status, historical antibiotic usage, country of origin, and delivery type. Our complete results can be viewed at http://apps.chiragjpgroup.org/ubiome_predictions/., Author summary The human microbiome is hypothesized to influence human phenotype. However, many published host-microbe associations may not be reproducible. A number of reasons could be behind irreproducible results, including a wide array of methods for measuring the microbiome through genetic sequence, annotation pipelines, and analytical models/prediction approaches. Therefore, there is a need to compare different modeling strategies and microbiome data types (i.e. species abundance versus metabolic pathway abundance) to determine how to build robust and reproducible host-microbiome predictions. In this work, we executed a broad comparison of different predictive methods as a function of microbiome data types to effectively predict host characteristics. Our pipeline was able uncover robust microbial associations with phenotype. We additionally recommended considerations for reproducible microbiome-host association pipeline development. We claim our work is a necessary stepping stone in increasing the utility of emerging cohort data and enabling the next generation of efficient microbiome association studies in human health.
- Published
- 2020
10. DeepArk: modeling cis-regulatory codes of model species with deep learning
- Author
-
Michael Levine, Chandra L. Theesfeld, Aaron K. Wong, João Raimundo, Olga G. Troyanskaya, Yuji Yamazaki, Evan M. Cofer, and Alicja Tadych
- Subjects
0303 health sciences ,biology ,business.industry ,Deep learning ,fungi ,Danio ,Computational biology ,biology.organism_classification ,Chromatin ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Histone ,biology.protein ,Artificial intelligence ,Drosophila melanogaster ,business ,Transcription factor ,030217 neurology & neurosurgery ,Caenorhabditis elegans ,030304 developmental biology - Abstract
To enable large-scale analyses of regulatory logic in model species, we developed DeepArk (https://DeepArk.princeton.edu), a set of deep learning models of the cis-regulatory codes of four widely-studied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus. DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed), and enables the regulatory annotation of understudied model species.
- Published
- 2020
- Full Text
- View/download PDF
11. Aether: leveraging linear programming for optimal cloud computing in genomics
- Author
-
Jacob M. Luber, Aleksandar Kostic, Chirag J. Patel, Evan M. Cofer, and Braden T. Tierney
- Subjects
0301 basic medicine ,Statistics and Probability ,Mathematical optimization ,Source code ,Linear programming ,Computer science ,Distributed computing ,media_common.quotation_subject ,Cloud computing ,02 engineering and technology ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,Software ,Documentation ,Aether ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Production (economics) ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Molecular Biology ,030304 developmental biology ,media_common ,Supplementary data ,0303 health sciences ,Database ,business.industry ,Scale (chemistry) ,Genomics ,Programming, Linear ,Cloud Computing ,Genome Analysis ,Applications Notes ,Computer Science Applications ,Task (computing) ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Scalability ,business ,computer - Abstract
Motivation Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Results Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementation Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2017
12. Selene: a PyTorch-based deep learning library for biological sequence-level data
- Author
-
Olga G. Troyanskaya, Evan M. Cofer, Jian Zhou, and Kathleen M. Chen
- Subjects
0303 health sciences ,business.industry ,Deep learning ,Level data ,Machine learning ,computer.software_genre ,03 medical and health sciences ,0302 clinical medicine ,Artificial intelligence ,Architecture ,business ,computer ,030217 neurology & neurosurgery ,030304 developmental biology ,Sequence (medicine) - Abstract
To enable the application of deep learning in biology, we present Selene (https://selene.flatironinstitute.org/), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequences. We demonstrate how Selene allows researchers to easily train a published architecture on new data, develop and evaluate a new architecture, and use a trained model to answer biological questions of interest.
- Published
- 2018
- Full Text
- View/download PDF
13. Opportunities and obstacles for deep learning in biology and medicine
- Author
-
Travers Ching, S. Joshua Swamidass, Anne E. Carpenter, Michael M. Hoffman, Gregory P. Way, Dave DeCaprio, Benjamin J. Lengerich, Avanti Shrikumar, Johnny Israeli, Zhiyong Lu, Austin Huang, Amr Alexandari, Christopher A. Lavender, Anthony Gitter, Enrico Ferrero, Jack Lanchantin, Evan M. Cofer, Brett K. Beaulieu-Jones, Wei Xie, Michael Zietz, Laura K. Wiley, Anshul Kundaje, Paul-Michael Agapow, Yifan Peng, Stephen Woloszynek, Jinbo Xu, Srinivas C. Turaga, Yanjun Qi, Brian T. Do, Marwin H. S. Segler, Alexandr A. Kalinin, Gail L. Rosen, Simina M. Boca, David J. Harris, Daniel Himmelstein, and Casey S. Greene
- Subjects
0301 basic medicine ,Biomedical Research ,Computer science ,Big data ,Health records ,Biochemistry ,Field (computer science) ,0302 clinical medicine ,Electronic Health Records ,Disease ,Review Articles ,Interpretability ,0303 health sciences ,Class (computer programming) ,Artificial neural network ,Management science ,3. Good health ,Variety (cybernetics) ,machine learning ,Algorithms ,Biotechnology ,precision medicine ,Decision Making ,Biomedical Technology ,Biomedical Engineering ,Biophysics ,Bioengineering ,Biomaterials ,03 medical and health sciences ,Terminology as Topic ,genomics ,Humans ,Baseline (configuration management) ,Biomedicine ,030304 developmental biology ,business.industry ,Deep learning ,deep learning ,Precision medicine ,Data science ,030104 developmental biology ,Drug Design ,Labeled data ,Artificial intelligence ,business ,Delivery of Health Care ,030217 neurology & neurosurgery ,Headline Review - Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.