1,172 results on '"Bonneau, Richard"'
Search Results
2. LLMs are Highly-Constrained Biophysical Sequence Optimizers
- Author
-
Chen, Angelica, Stanton, Samuel D., Alberstein, Robert G., Watkins, Andrew M., Bonneau, Richard, Gligorijević, Vladimir, Cho, Kyunghyun, and Frey, Nathan C.
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Large language models (LLMs) have recently shown significant potential in various biological tasks such as protein engineering and molecule design. These tasks typically involve black-box discrete sequence optimization, where the challenge lies in generating sequences that are not only biologically feasible but also adhere to hard fine-grained constraints. However, LLMs often struggle with such constraints, especially in biological contexts where verifying candidate solutions is costly and time-consuming. In this study, we explore the possibility of employing LLMs as highly-constrained bilevel optimizers through a methodology we refer to as Language Model Optimization with Margin Expectation (LLOME). This approach combines both offline and online optimization, utilizing limited oracle evaluations to iteratively enhance the sequences generated by the LLM. We additionally propose a novel training objective -- Margin-Aligned Expectation (MargE) -- that trains the LLM to smoothly interpolate between the reward and reference distributions. Lastly, we introduce a synthetic test suite that bears strong geometric similarity to real biophysical problems and enables rapid evaluation of LLM optimizers without time-consuming lab validation. Our findings reveal that, in comparison to genetic algorithm baselines, LLMs achieve significantly lower regret solutions while requiring fewer test function evaluations. However, we also observe that LLMs exhibit moderate miscalibration, are susceptible to generator collapse, and have difficulty finding the optimal solution when no explicit ground truth rewards are available., Comment: Supercedes arXiv:2407.00236v1
- Published
- 2024
3. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
- Author
-
Ahdritz, Gustaf, Bouatta, Nazim, Floristean, Christina, Kadyan, Sachin, Xia, Qinghui, Gerecke, William, O’Donnell, Timothy J., Berenberg, Daniel, Fisk, Ian, Zanichelli, Niccolò, Zhang, Bo, Nowaczynski, Arkadiusz, Wang, Bei, Stepniewska-Dziubinska, Marta M., Zhang, Shang, Ojewole, Adegoke, Guney, Murat Efe, Biderman, Stella, Watkins, Andrew M., Ra, Stephen, Lorenzo, Pablo Ribalta, Nivon, Lucas, Weitzner, Brian, Ban, Yih-En Andrew, Chen, Shiyang, Zhang, Minjia, Li, Conglong, Song, Shuaiwen Leon, He, Yuxiong, Sorger, Peter K., Mostaque, Emad, Zhang, Zhao, Bonneau, Richard, and AlQuraishi, Mohammed
- Published
- 2024
- Full Text
- View/download PDF
4. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization
- Author
-
Skok Gibbs, Claudia, Mahmood, Omar, Bonneau, Richard, and Cho, Kyunghyun
- Published
- 2024
- Full Text
- View/download PDF
5. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference
- Author
-
Tjärnberg, Andreas, Beheler-Amass, Maggie, Jackson, Christopher A., Christiaen, Lionel A., Gresham, David, and Bonneau, Richard
- Published
- 2024
- Full Text
- View/download PDF
6. OpenProteinSet: Training data for structural biology at scale
- Author
-
Ahdritz, Gustaf, Bouatta, Nazim, Kadyan, Sachin, Jarosch, Lukas, Berenberg, Daniel, Fisk, Ian, Watkins, Andrew M., Ra, Stephen, Bonneau, Richard, and AlQuraishi, Mohammed
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Machine Learning - Abstract
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
- Published
- 2023
7. Protein remote homology detection and structural alignment using deep learning
- Author
-
Hamamsy, Tymor, Morton, James T., Blackwell, Robert, Berenberg, Daniel, Carriero, Nicholas, Gligorijevic, Vladimir, Strauss, Charlie E. M., Leman, Julia Koehler, Cho, Kyunghyun, and Bonneau, Richard
- Published
- 2024
- Full Text
- View/download PDF
8. AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies
- Author
-
Martinkus, Karolis, Ludwiczak, Jan, Cho, Kyunghyun, Liang, Wei-Ching, Lafrance-Vanasse, Julien, Hotzel, Isidro, Rajpal, Arvind, Wu, Yan, Bonneau, Richard, Gligorijevic, Vladimir, and Loukas, Andreas
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints; handles sequence-length changes; and reduces memory complexity by an order of magnitude, enabling backbone and side chain generation. We validate AbDiffuser in silico and in vitro. Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set. Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of the selected designs were tight binders., Comment: NeurIPS 2023
- Published
- 2023
9. Generalization within in silico screening
- Author
-
Loukas, Andreas, Kessel, Pan, Gligorijevic, Vladimir, and Bonneau, Richard
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. Unlike conventional learning paradigms, success in this context is measured by the performance of the predictive model on the selected subset of compounds rather than the entire set of predictions. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization, with a higher risk of errors occurring when exclusively selecting predicted positives and when targeting rare properties. Our analysis suggests a way to mitigate these challenges. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch. This is promising, as the primary aim of screening is not necessarily to pinpoint the label of each compound individually, but rather to assemble a batch enriched for desirable compounds. Our theoretical insights are empirically validated across diverse tasks, architectures, and screening scenarios, underscoring their applicability., Comment: 9 pages, 3 figures
- Published
- 2023
10. Protein Discovery with Discrete Walk-Jump Sampling
- Author
-
Frey, Nathan C., Berenberg, Daniel, Zadorozhny, Karina, Kleinhenz, Joseph, Lafrance-Vanasse, Julien, Hotzel, Isidro, Wu, Yan, Ra, Stephen, Bonneau, Richard, Cho, Kyunghyun, Loukas, Andreas, Gligorijevic, Vladimir, and Saremi, Saeed
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Machine Learning - Abstract
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain., Comment: ICLR 2024 oral presentation, top 1.2% of submissions; {ICLR 2023 Physics for Machine Learning, NeurIPS 2023 GenBio, MLCB 2023} Spotlight
- Published
- 2023
11. Multi-level analysis of the gut-brain axis shows autism spectrum disorder-associated molecular and microbial profiles.
- Author
-
Morton, James T, Jin, Dong-Min, Mills, Robert H, Shao, Yan, Rahman, Gibraan, McDonald, Daniel, Zhu, Qiyun, Balaban, Metin, Jiang, Yueyu, Cantrell, Kalen, Gonzalez, Antonio, Carmel, Julie, Frankiensztajn, Linoy Mia, Martin-Brevet, Sandra, Berding, Kirsten, Needham, Brittany D, Zurita, María Fernanda, David, Maude, Averina, Olga V, Kovtun, Alexey S, Noto, Antonio, Mussap, Michele, Wang, Mingbang, Frank, Daniel N, Li, Ellen, Zhou, Wenhao, Fanos, Vassilios, Danilenko, Valery N, Wall, Dennis P, Cárdenas, Paúl, Baldeón, Manuel E, Jacquemont, Sébastien, Koren, Omry, Elliott, Evan, Xavier, Ramnik J, Mazmanian, Sarkis K, Knight, Rob, Gilbert, Jack A, Donovan, Sharon M, Lawley, Trevor D, Carpenter, Bob, Bonneau, Richard, and Taroncher-Oldenburg, Gaspar
- Subjects
Humans ,Cytokines ,Bayes Theorem ,Cross-Sectional Studies ,Reproducibility of Results ,Autism Spectrum Disorder ,Gastrointestinal Microbiome ,Brain-Gut Axis ,Mental Health ,Behavioral and Social Science ,Brain Disorders ,Pediatric ,Intellectual and Developmental Disabilities (IDD) ,Genetics ,Pediatric Research Initiative ,Neurosciences ,Autism ,2.3 Psychological ,social and economic factors ,2.1 Biological and endogenous factors ,Aetiology ,Mental health ,Psychology ,Cognitive Sciences ,Neurology & Neurosurgery - Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.
- Published
- 2023
12. Dictionary-Assisted Supervised Contrastive Learning
- Author
-
Wu, Patrick Y., Bonneau, Richard, Tucker, Joshua A., and Nagler, Jonathan
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods., Comment: 6 pages, 5 figures, EMNLP 2022
- Published
- 2022
13. A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
- Author
-
Tagasovska, Nataša, Frey, Nathan C., Loukas, Andreas, Hötzel, Isidro, Lafrance-Vanasse, Julien, Kelly, Ryan Lewis, Wu, Yan, Rajpal, Arvind, Bonneau, Richard, Cho, Kyunghyun, Ra, Stephen, and Gligorijević, Vladimir
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other. In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks.
- Published
- 2022
14. PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design
- Author
-
Park, Ji Won, Stanton, Samuel, Saremi, Saeed, Watkins, Andrew, Dwyer, Henri, Gligorijevic, Vladimir, Bonneau, Richard, Ra, Stephen, and Cho, Kyunghyun
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task., Comment: 9 pages, 7 figures. Submitted to NeurIPS 2022 AI4Science Workshop
- Published
- 2022
15. Multi-segment preserving sampling for deep manifold sampler
- Author
-
Berenberg, Daniel, Lee, Jae Hyeon, Kelow, Simon, Park, Ji Won, Watkins, Andrew, Gligorijević, Vladimir, Bonneau, Richard, Ra, Stephen, and Cho, Kyunghyun
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Biomolecules - Abstract
Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions.
- Published
- 2022
16. Sequence-structure-function relationships in the microbial protein universe
- Author
-
Koehler Leman, Julia, Szczerbiak, Pawel, Renfrew, P Douglas, Gligorijevic, Vladimir, Berenberg, Daniel, Vatanen, Tommi, Taylor, Bryn C, Chandler, Chris, Janssen, Stefan, Pataki, Andras, Carriero, Nick, Fisk, Ian, Xavier, Ramnik J, Knight, Rob, Bonneau, Richard, and Kosciolek, Tomasz
- Subjects
Biochemistry and Cell Biology ,Bioinformatics and Computational Biology ,Biological Sciences ,Biotechnology ,1.1 Normal biological development and functioning ,Underpinning research ,Infection ,Generic health relevance ,Proteins ,Amino Acid Sequence ,Structure-Activity Relationship ,Databases ,Protein ,Protein Folding - Abstract
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
- Published
- 2023
17. Analytical challenges in omics research on asthma and allergy: A National Institute of Allergy and Infectious Diseases workshop
- Author
-
Bunyavanich, Supinda, Becker, Patrice M., Altman, Matthew C., Lasky-Su, Jessica, Ober, Carole, Zengler, Karsten, Berdyshev, Evgeny, Bonneau, Richard, Chatila, Talal, Chatterjee, Nilanjan, Chung, Kian Fan, Cutcliffe, Colleen, Davidson, Wendy, Dong, Gang, Fang, Gang, Fulkerson, Patricia, Himes, Blanca E., Liang, Liming, Mathias, Rasika A., Ogino, Shuji, Petrosino, Joseph, Price, Nathan D., Schadt, Eric, Schofield, James, Seibold, Max A., Steen, Hanno, Wheatley, Lisa, Zhang, Hongmei, Togias, Alkis, and Hasegawa, Kohei
- Published
- 2024
- Full Text
- View/download PDF
18. YouTube Recommendations and Effects on Sharing Across Online Social Platforms
- Author
-
Buntain, Cody, Bonneau, Richard, Nagler, Jonathan, and Tucker, Joshua A.
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Human-Computer Interaction - Abstract
In January 2019, YouTube announced it would exclude potentially harmful content from video recommendations but allow such videos to remain on the platform. While this step intends to reduce YouTube's role in propagating such content, continued availability of these videos in other online spaces makes it unclear whether this compromise actually reduces their spread. To assess this impact, we apply interrupted time series models to measure whether different types of YouTube sharing in Twitter and Reddit changed significantly in the eight months around YouTube's announcement. We evaluate video sharing across three curated sets of potentially harmful, anti-social content: a set of conspiracy videos that have been shown to experience reduced recommendations in YouTube, a larger set of videos posted by conspiracy-oriented channels, and a set of videos posted by alternative influence network (AIN) channels. As a control, we also evaluate effects on video sharing in a dataset of videos from mainstream news channels. Results show conspiracy-labeled and AIN videos that have evidence of YouTube's de-recommendation experience a significant decreasing trend in sharing on both Twitter and Reddit. For videos from conspiracy-oriented channels, however, we see no significant effect in Twitter but find a significant increase in the level of conspiracy-channel sharing in Reddit. For mainstream news sharing, we actually see an increase in trend on both platforms, suggesting YouTube's suppressing particular content types has a targeted effect. This work finds evidence that reducing exposure to anti-social videos within YouTube, without deletion, has potential pro-social, cross-platform effects. At the same time, increases in the level of conspiracy-channel sharing raise concerns about content producers' responses to these changes, and platform transparency is needed to evaluate these effects further.
- Published
- 2020
- Full Text
- View/download PDF
19. Exposure to the Russian Internet Research Agency foreign influence campaign on Twitter in the 2016 US election and its relationship to attitudes and voting behavior
- Author
-
Eady, Gregory, Paskhalis, Tom, Zilinsky, Jan, Bonneau, Richard, Nagler, Jonathan, and Tucker, Joshua A.
- Published
- 2023
- Full Text
- View/download PDF
20. Maternal cecal microbiota transfer rescues early-life antibiotic-induced enhancement of type 1 diabetes in mice
- Author
-
Zhang, Xue-Song, Yin, Yue Sandra, Wang, Jincheng, Battaglia, Thomas, Krautkramer, Kimberly, Li, Wei Vivian, Li, Jackie, Brown, Mark, Zhang, Meifan, Badri, Michelle H, Armstrong, Abigail JS, Strauch, Christopher M, Wang, Zeneng, Nemet, Ina, Altomare, Nicole, Devlin, Joseph C, He, Linchen, Morton, Jamie T, Chalk, John Alex, Needles, Kelly, Liao, Viviane, Mount, Julia, Li, Huilin, Ruggles, Kelly V, Bonneau, Richard A, Dominguez-Bello, Maria Gloria, Bäckhed, Fredrik, Hazen, Stanley L, and Blaser, Martin J
- Subjects
Diabetes ,Autoimmune Disease ,Pediatric ,Aetiology ,Development of treatments and therapeutic interventions ,5.1 Pharmaceuticals ,2.1 Biological and endogenous factors ,Metabolic and endocrine ,Good Health and Well Being ,Animals ,Anti-Bacterial Agents ,Autoimmune Diseases ,Bacteria ,Cecum ,Diabetes Mellitus ,Type 1 ,Disease Models ,Animal ,Female ,Gastrointestinal Microbiome ,Gene Expression ,Histone Code ,Intestines ,Male ,Metabolic Networks and Pathways ,Metagenome ,Mice ,Mice ,Inbred NOD ,MicroRNAs ,NOD mice ,animal models ,autoimmune ,cecal material transfer ,gene expression ,histone modification ,innate immune ,microRNA ,microbiome ,type 1 diabetes ,Microbiology ,Medical Microbiology ,Immunology - Abstract
Early-life antibiotic exposure perturbs the intestinal microbiota and accelerates type 1 diabetes (T1D) development in the NOD mouse model. Here, we found that maternal cecal microbiota transfer (CMT) to NOD mice after early-life antibiotic perturbation largely rescued the induced T1D enhancement. Restoration of the intestinal microbiome was significant and persistent, remediating the antibiotic-depleted diversity, relative abundance of particular taxa, and metabolic pathways. CMT also protected against perturbed metabolites and normalized innate and adaptive immune effectors. CMT restored major patterns of ileal microRNA and histone regulation of gene expression. Further experiments suggest a gut-microbiota-regulated T1D protection mechanism centered on Reg3γ, in an innate intestinal immune network involving CD44, TLR2, and Reg3γ. This regulation affects downstream immunological tone, which may lead to protection against tissue-specific T1D injury.
- Published
- 2021
21. Experiences and lessons learned from two virtual, hands-on microbiome bioinformatics workshops.
- Author
-
Dillon, Matthew R, Bolyen, Evan, Adamov, Anja, Belk, Aeriel, Borsom, Emily, Burcham, Zachary, Debelius, Justine W, Deel, Heather, Emmons, Alex, Estaki, Mehrbod, Herman, Chloe, Keefe, Christopher R, Morton, Jamie T, Oliveira, Renato RM, Sanchez, Andrew, Simard, Anthony, Vázquez-Baeza, Yoshiki, Ziemski, Michal, Miwa, Hazuki E, Kerere, Terry A, Coote, Carline, Bonneau, Richard, Knight, Rob, Oliveira, Guilherme, Gopalasingam, Piraveen, Kaehler, Benjamin D, Cope, Emily K, Metcalf, Jessica L, Robeson Ii, Michael S, Bokulich, Nicholas A, and Caporaso, J Gregory
- Subjects
Humans ,Computational Biology ,Feedback ,Microbiota ,COVID-19 ,SARS-CoV-2 ,Bioinformatics ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences - Abstract
In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn't work well, and what we plan to do differently in future workshops.
- Published
- 2021
22. Structure-based protein function prediction using graph convolutional networks.
- Author
-
Gligorijević, Vladimir, Renfrew, P Douglas, Kosciolek, Tomasz, Leman, Julia Koehler, Berenberg, Daniel, Vatanen, Tommi, Chandler, Chris, Taylor, Bryn C, Fisk, Ian M, Vlamakis, Hera, Xavier, Ramnik J, Knight, Rob, Cho, Kyunghyun, and Bonneau, Richard
- Subjects
Proteins ,Computational Biology ,Amino Acid Sequence ,Protein Structure ,Tertiary ,Structure-Activity Relationship ,Models ,Biological ,Models ,Molecular ,Databases ,Protein ,Datasets as Topic ,Deep Learning - Abstract
The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .
- Published
- 2021
23. Reply to: Examining microbe-metabolite correlations by linear methods.
- Author
-
Morton, James T, McDonald, Daniel, Aksenov, Alexander A, Nothias, Louis Felix, Foulds, James R, Quinn, Robert A, Badri, Michelle H, Swenson, Tami L, Van Goethem, Marc W, Northen, Trent R, Vazquez-Baeza, Yoshiki, Wang, Mingxun, Bokulich, Nicholas A, Watters, Aaron, Song, Se Jin, Bonneau, Richard, Dorrestein, Pieter C, and Knight, Rob
- Subjects
Microbial Interactions ,Developmental Biology ,Biological Sciences ,Technology ,Medical and Health Sciences - Published
- 2021
24. Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks
- Author
-
Koehler Leman, Julia, Lyskov, Sergey, Lewis, Steven M, Adolf-Bryfogle, Jared, Alford, Rebecca F, Barlow, Kyle, Ben-Aharon, Ziv, Farrell, Daniel, Fell, Jason, Hansen, William A, Harmalkar, Ameya, Jeliazkov, Jeliazko, Kuenze, Georg, Krys, Justyna D, Ljubetič, Ajasja, Loshbaugh, Amanda L, Maguire, Jack, Moretti, Rocco, Mulligan, Vikram Khipple, Nance, Morgan L, Nguyen, Phuong T, Ó Conchúir, Shane, Roy Burman, Shourya S, Samanta, Rituparna, Smith, Shannon T, Teets, Frank, Tiemann, Johanna KS, Watkins, Andrew, Woods, Hope, Yachnin, Brahm J, Bahl, Christopher D, Bailey-Kellogg, Chris, Baker, David, Das, Rhiju, DiMaio, Frank, Khare, Sagar D, Kortemme, Tanja, Labonte, Jason W, Lindorff-Larsen, Kresten, Meiler, Jens, Schief, William, Schueler-Furman, Ora, Siegel, Justin B, Stein, Amelie, Yarov-Yarovoy, Vladimir, Kuhlman, Brian, Leaver-Fay, Andrew, Gront, Dominik, Gray, Jeffrey J, and Bonneau, Richard
- Subjects
Information and Computing Sciences ,Software Engineering ,Networking and Information Technology R&D (NITRD) ,Bioengineering ,Benchmarking ,Binding Sites ,Humans ,Ligands ,Macromolecular Substances ,Molecular Docking Simulation ,Protein Binding ,Proteins ,Reproducibility of Results ,Software - Abstract
Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework, and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.
- Published
- 2021
25. Interleukin‐17 Inhibition in Spondyloarthritis Is Associated With Subclinical Gut Microbiome Perturbations and a Distinctive Interleukin‐25–Driven Intestinal Inflammation
- Author
-
Manasson, Julia, Wallach, David S, Guggino, Giuliana, Stapylton, Matthew, Badri, Michelle H, Solomon, Gary, Reddy, Soumya M, Coras, Roxana, Aksenov, Alexander A, Jones, Drew R, Girija, Parvathy V, Neimann, Andrea L, Heguy, Adriana, Segal, Leopoldo N, Dorrestein, Pieter C, Bonneau, Richard, Guma, Monica, Ciccia, Francesco, Ubeda, Carles, Clemente, Jose C, and Scher, Jose U
- Subjects
Biomedical and Clinical Sciences ,Clinical Sciences ,Digestive Diseases ,Autoimmune Disease ,Clinical Research ,2.1 Biological and endogenous factors ,Aetiology ,Inflammatory and immune system ,Antibodies ,Monoclonal ,Humanized ,Arthritis ,Psoriatic ,Female ,Gastrointestinal Microbiome ,Humans ,Inflammation ,Interleukin-17 ,Intestinal Mucosa ,Intestines ,Male ,Middle Aged ,Spondylarthritis ,Tumor Necrosis Factor Inhibitors ,Immunology ,Public Health and Health Services ,Arthritis & Rheumatology ,Clinical sciences - Abstract
ObjectiveTo characterize the ecological effects of biologic therapies on the gut bacterial and fungal microbiome in psoriatic arthritis (PsA)/spondyloarthritis (SpA) patients.MethodsFecal samples from PsA/SpA patients pre- and posttreatment with tumor necrosis factor inhibitors (TNFi; n = 15) or an anti-interleukin-17A monoclonal antibody inhibitor (IL-17i; n = 14) underwent sequencing (16S ribosomal RNA, internal transcribed spacer and shotgun metagenomics) and computational microbiome analysis. Fecal levels of fatty acid metabolites and cytokines/proteins implicated in PsA/SpA pathogenesis or intestinal inflammation were correlated with sequence data. Additionally, ileal biopsies obtained from SpA patients who developed clinically overt Crohn's disease (CD) after treatment with IL-17i (n = 5) were analyzed for expression of IL-23/Th17-related cytokines, IL-25/IL-17E-producing cells, and type 2 innate lymphoid cells (ILC2s).ResultsThere were significant shifts in abundance of specific taxa after treatment with IL-17i compared to TNFi, particularly Clostridiales (P = 0.016) and Candida albicans (P = 0.041). These subclinical alterations correlated with changes in bacterial community co-occurrence, metabolic pathways, IL-23/Th17-related cytokines, and various fatty acids. Ileal biopsies showed that clinically overt CD was associated with expansion of IL-25/IL-17E-producing tuft cells and ILC2s (P < 0.05), compared to pre-IL-17i treatment levels.ConclusionIn a subgroup of SpA patients, the initiation of IL-17A blockade correlated with features of subclinical gut inflammation and intestinal dysbiosis of certain bacterial and fungal taxa, most notably C albicans. Further, IL-17i-related CD was associated with overexpression of IL-25/IL-17E-producing tuft cells and ILC2s. These results may help to explain the potential link between inhibition of a specific IL-17 pathway and the (sub)clinical gut inflammation observed in SpA.
- Published
- 2020
26. Specificities of Modeling of Membrane Proteins Using Multi-Template Homology Modeling
- Author
-
Koehler Leman, Julia, primary and Bonneau, Richard, additional
- Published
- 2023
- Full Text
- View/download PDF
27. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
- Author
-
Zhou, Naihui, Jiang, Yuxiang, Bergquist, Timothy R, Lee, Alexandra J, Kacsoh, Balint Z, Crocker, Alex W, Lewis, Kimberley A, Georghiou, George, Nguyen, Huy N, Hamid, Md Nafiz, Davis, Larry, Dogan, Tunca, Atalay, Volkan, Rifaioglu, Ahmet S, Dalkıran, Alperen, Cetin Atalay, Rengul, Zhang, Chengxin, Hurto, Rebecca L, Freddolino, Peter L, Zhang, Yang, Bhat, Prajwal, Supek, Fran, Fernández, José M, Gemovic, Branislava, Perovic, Vladimir R, Davidović, Radoslav S, Sumonja, Neven, Veljkovic, Nevena, Asgari, Ehsaneddin, Mofrad, Mohammad RK, Profiti, Giuseppe, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Boecker, Florian, Schoof, Heiko, Kahanda, Indika, Thurlby, Natalie, McHardy, Alice C, Renaux, Alexandre, Saidi, Rabie, Gough, Julian, Freitas, Alex A, Antczak, Magdalena, Fabris, Fabio, Wass, Mark N, Hou, Jie, Cheng, Jianlin, Wang, Zheng, Romero, Alfonso E, Paccanaro, Alberto, Yang, Haixuan, Goldberg, Tatyana, Zhao, Chenguang, Holm, Liisa, Törönen, Petri, Medlar, Alan J, Zosa, Elaine, Borukhov, Itamar, Novikov, Ilya, Wilkins, Angela, Lichtarge, Olivier, Chi, Po-Han, Tseng, Wei-Cheng, Linial, Michal, Rose, Peter W, Dessimoz, Christophe, Vidulin, Vedrana, Dzeroski, Saso, Sillitoe, Ian, Das, Sayoni, Lees, Jonathan Gill, Jones, David T, Wan, Cen, Cozzetto, Domenico, Fa, Rui, Torres, Mateo, Warwick Vesztrocy, Alex, Rodriguez, Jose Manuel, Tress, Michael L, Frasca, Marco, Notaro, Marco, Grossi, Giuliano, Petrini, Alessandro, Re, Matteo, Valentini, Giorgio, Mesiti, Marco, Roche, Daniel B, Reeb, Jonas, Ritchie, David W, Aridhi, Sabeur, Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Koo, Da Chen Emily, Bonneau, Richard, Gligorijević, Vladimir, Barot, Meet, Fang, Hai, Toppo, Stefano, and Lavezzo, Enrico
- Subjects
Human Genome ,Networking and Information Technology R&D (NITRD) ,Genetics ,Generic health relevance ,Animals ,Biofilms ,Candida albicans ,Drosophila melanogaster ,Genome ,Bacterial ,Genome ,Fungal ,Humans ,Locomotion ,Memory ,Long-Term ,Molecular Sequence Annotation ,Pseudomonas aeruginosa ,Protein function prediction ,Long-term memory ,Biofilm ,Critical assessment ,Community challenge ,Environmental Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
- Published
- 2019
28. Learning representations of microbe–metabolite interactions
- Author
-
Morton, James T, Aksenov, Alexander A, Nothias, Louis Felix, Foulds, James R, Quinn, Robert A, Badri, Michelle H, Swenson, Tami L, Van Goethem, Marc W, Northen, Trent R, Vazquez-Baeza, Yoshiki, Wang, Mingxun, Bokulich, Nicholas A, Watters, Aaron, Song, Se Jin, Bonneau, Richard, Dorrestein, Pieter C, and Knight, Rob
- Subjects
Biological Sciences ,Lung ,Microbiome ,2.1 Biological and endogenous factors ,Animals ,Bacteria ,Benchmarking ,Cyanobacteria ,Cystic Fibrosis ,Inflammatory Bowel Diseases ,Mice ,Microbiota ,Neural Networks ,Computer ,Pseudomonas aeruginosa ,Technology ,Medical and Health Sciences ,Developmental Biology ,Biological sciences - Abstract
Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural networks (https://github.com/biocore/mmvec) to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.
- Published
- 2019
29. Engineered multivalent self-assembled binder protein against SARS-CoV-2 RBD
- Author
-
Britton, Dustin, Punia, Kamia, Mahmoudinobar, Farbod, Tada, Takuya, Jiang, Xunqing, Renfrew, P. Douglas, Bonneau, Richard, Landau, Nathaniel R., Kong, Xiang-Peng, and Montclare, Jin Kim
- Published
- 2022
- Full Text
- View/download PDF
30. Testing the effects of Facebook usage in an ethnically polarized setting
- Author
-
Asimovic, Nejla, Naglera, Jonathan, Bonneau, Richard, and Tucker, Joshua A.
- Published
- 2021
31. Computationally designed peptide macrocycle inhibitors of New Delhi metallo-β-lactamase 1
- Author
-
Mulligan, Vikram Khipple, Workman, Sean, Sun, Tianjun, Rettie, Stephen, Li, Xinting, Worrall, Liam J., Craven, Timothy W., King, Dustin T., Hosseinzadeh, Parisa, Watkins, Andrew M., Renfrew, P. Douglas, Guffy, Sharon, Labonte, Jason W., Moretti, Rocco, Bonneau, Richard, Strynadka, Natalie C. J., and Baker, David
- Published
- 2021
32. Fungi stabilize connectivity in the lung and skin microbial ecosystems
- Author
-
Tipton, Laura, Müller, Christian L, Kurtz, Zachary D, Huang, Laurence, Kleerup, Eric, Morris, Alison, Bonneau, Richard, and Ghedin, Elodie
- Subjects
Lung ,Aetiology ,2.1 Biological and endogenous factors ,Adult ,Bacteria ,Computational Biology ,Female ,Fungi ,Humans ,Male ,Microbial Consortia ,Middle Aged ,RNA ,Ribosomal ,16S ,Sequence Analysis ,DNA ,Skin ,Ecology ,Microbiology ,Medical Microbiology - Abstract
BACKGROUND:No microbe exists in isolation, and few live in environments with only members of their own kingdom or domain. As microbiome studies become increasingly more interested in the interactions between microbes than in cataloging which microbes are present, the variety of microbes in the community should be considered. However, the majority of ecological interaction networks for microbiomes built to date have included only bacteria. Joint association inference across multiple domains of life, e.g., fungal communities (the mycobiome) and bacterial communities, has remained largely elusive. RESULTS:Here, we present a novel extension of the SParse InversE Covariance estimation for Ecological ASsociation Inference (SPIEC-EASI) framework that allows statistical inference of cross-domain associations from targeted amplicon sequencing data. For human lung and skin micro- and mycobiomes, we show that cross-domain networks exhibit higher connectivity, increased network stability, and similar topological re-organization patterns compared to single-domain networks. We also validate in vitro a small number of cross-domain interactions predicted by the skin association network. CONCLUSIONS:For the human lung and skin micro- and mycobiomes, our findings suggest that fungi play a stabilizing role in ecological network organization. Our study suggests that computational efforts to infer association networks that include all forms of microbial life, paired with large-scale culture-based association validation experiments, will help formulate concrete hypotheses about the underlying biological mechanisms of species interactions and, ultimately, help understand microbial communities as a whole.
- Published
- 2018
33. SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases
- Author
-
Wu, Fuqing, Xiao, Amy, Zhang, Jianbo, Moniz, Katya, Endo, Noriko, Armas, Federica, Bonneau, Richard, Brown, Megan A., Bushman, Mary, Chai, Peter R., Duvallet, Claire, Erickson, Timothy B., Foppe, Katelyn, Ghaeli, Newsha, Gu, Xiaoqiong, Hanage, William P., Huang, Katherine H., Lee, Wei Lin, Matus, Mariana, McElroy, Kyle A., Nagler, Jonathan, Rhode, Steven F., Santillana, Mauricio, Tucker, Joshua A., Wuertz, Stefan, Zhao, Shijie, Thompson, Janelle, and Alm, Eric J.
- Published
- 2022
- Full Text
- View/download PDF
34. Generalized Stability Approach for Regularized Graphical Models
- Author
-
Müller, Christian L., Bonneau, Richard, and Kurtz, Zachary
- Subjects
Statistics - Methodology ,Quantitative Biology - Molecular Networks ,Statistics - Applications ,Statistics - Computation - Abstract
Selecting regularization parameters in penalized high-dimensional graphical models in a principled, data-driven, and computationally efficient manner continues to be one of the key challenges in high-dimensional statistics. We present substantial computational gains and conceptual generalizations of the Stability Approach to Regularization Selection (StARS), a state-of-the-art graphical model selection scheme. Using properties of the Poisson-Binomial distribution and convex non-asymptotic distributional modeling we propose lower and upper bounds on the StARS graph regularization path which results in greatly reduced computational cost without compromising regularization selection. We also generalize the StARS criterion from single edge to induced subgraph (graphlet) stability. We show that simultaneously requiring edge and graphlet stability leads to superior graph recovery performance independent of graph topology. These novel insights render Gaussian graphical model selection a routine task on standard multi-core computers.
- Published
- 2016
35. GABA-receptive microglia selectively sculpt developing inhibitory circuits
- Author
-
Favuzzi, Emilia, Huang, Shuhan, Saldi, Giuseppe A., Binan, Loïc, Ibrahim, Leena A., Fernández-Otero, Marian, Cao, Yuqing, Zeine, Ayman, Sefah, Adwoa, Zheng, Karen, Xu, Qing, Khlestova, Elizaveta, Farhi, Samouil L., Bonneau, Richard, Datta, Sandeep Robert, Stevens, Beth, and Fishell, Gord
- Published
- 2021
- Full Text
- View/download PDF
36. Genetic and epigenetic coordination of cortical interneuron development
- Author
-
Allaway, Kathryn C., Gabitto, Mariano I., Wapinski, Orly, Saldi, Giuseppe, Wang, Chen-Yu, Bandler, Rachel C., Wu, Sherry Jingjing, Bonneau, Richard, and Fishell, Gord
- Published
- 2021
- Full Text
- View/download PDF
37. An expanded evaluation of protein function prediction methods shows an improvement in accuracy
- Author
-
Jiang, Yuxiang, Oron, Tal Ronnen, Clark, Wyatt T, Bankapur, Asma R, D'Andrea, Daniel, Lepore, Rosalba, Funk, Christopher S, Kahanda, Indika, Verspoor, Karin M, Ben-Hur, Asa, Koo, Emily, Penfold-Brown, Duncan, Shasha, Dennis, Youngs, Noah, Bonneau, Richard, Lin, Alexandra, Sahraeian, Sayed ME, Martelli, Pier Luigi, Profiti, Giuseppe, Casadio, Rita, Cao, Renzhi, Zhong, Zhaolong, Cheng, Jianlin, Altenhoff, Adrian, Skunca, Nives, Dessimoz, Christophe, Dogan, Tunca, Hakala, Kai, Kaewphan, Suwisa, Mehryary, Farrokh, Salakoski, Tapio, Ginter, Filip, Fang, Hai, Smithers, Ben, Oates, Matt, Gough, Julian, Törönen, Petri, Koskinen, Patrik, Holm, Liisa, Chen, Ching-Tai, Hsu, Wen-Lian, Bryson, Kevin, Cozzetto, Domenico, Minneci, Federico, Jones, David T, Chapman, Samuel, C., Dukka B K., Khan, Ishita K, Kihara, Daisuke, Ofer, Dan, Rappoport, Nadav, Stern, Amos, Cibrian-Uhalte, Elena, Denny, Paul, Foulger, Rebecca E, Hieta, Reija, Legge, Duncan, Lovering, Ruth C, Magrane, Michele, Melidoni, Anna N, Mutowo-Meullenet, Prudence, Pichler, Klemens, Shypitsyna, Aleksandra, Li, Biao, Zakeri, Pooya, ElShal, Sarah, Tranchevent, Léon-Charles, Das, Sayoni, Dawson, Natalie L, Lee, David, Lees, Jonathan G, Sillitoe, Ian, Bhat, Prajwal, Nepusz, Tamás, Romero, Alfonso E, Sasidharan, Rajkumar, Yang, Haixuan, Paccanaro, Alberto, Gillis, Jesse, Sedeño-Cortés, Adriana E, Pavlidis, Paul, Feng, Shou, Cejuela, Juan M, Goldberg, Tatyana, Hamp, Tobias, Richter, Lothar, Salamov, Asaf, Gabaldon, Toni, Marcet-Houben, Marina, Supek, Fran, Gong, Qingtian, Ning, Wei, Zhou, Yuanpeng, Tian, Weidong, Falda, Marco, Fontana, Paolo, Lavezzo, Enrico, Toppo, Stefano, Ferrari, Carlo, Giollo, Manuel, Piovesan, Damiano, Tosatto, Silvio, del Pozo, Angela, Fernández, José M, Maietta, Paolo, Valencia, Alfonso, Tress, Michael L, Benso, Alfredo, Di Carlo, Stefano, Politano, Gianfranco, Savino, Alessandro, Rehman, Hafeez Ur, Re, Matteo, Mesiti, Marco, Valentini, Giorgio, Bargsten, Joachim W, van Dijk, Aalt DJ, Gemovic, Branislava, Glisic, Sanja, Perovic, Vladmir, Veljkovic, Veljko, Veljkovic, Nevena, Almeida-e-Silva, Danillo C, Vencio, Ricardo ZN, Sharan, Malvika, Vogel, Jörg, Kansakar, Lakesh, Zhang, Shanshan, Vucetic, Slobodan, Wang, Zheng, Sternberg, Michael JE, Wass, Mark N, Huntley, Rachael P, Martin, Maria J, O'Donovan, Claire, Robinson, Peter N, Moreau, Yves, Tramontano, Anna, Babbitt, Patricia C, Brenner, Steven E, Linial, Michal, Orengo, Christine A, Rost, Burkhard, Greene, Casey S, Mooney, Sean D, Friedberg, Iddo, and Radivojac, Predrag
- Subjects
Quantitative Biology - Quantitative Methods - Abstract
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction., Comment: Submitted to Genome Biology
- Published
- 2016
- Full Text
- View/download PDF
38. Tissue and cellular spatiotemporal dynamics in colon aging
- Author
-
Daly, Aidan C, primary, Cambuli, Francesco, additional, Aijo, Tarmo, additional, Lotstedt, Britta, additional, Marjanovic, Nemanja, additional, Kuksenko, Olena, additional, Smith-Erb, Matthew, additional, Fernandez, Sara, additional, Domovic, Daniel, additional, Van Wittenberghe, Nicholas, additional, Drokhlyansky, Eugene, additional, Griffin, Gabriel K, additional, Phatnani, Hemali, additional, Bonneau, Richard, additional, Regev, Aviv, additional, and Vickovic, Sanja, additional
- Published
- 2024
- Full Text
- View/download PDF
39. Correction to “Computational Prediction of Coiled–Coil Protein Gelation Dynamics and Structure”
- Author
-
Britton, Dustin, primary, Christians, Luc F., additional, Liu, Chengliang, additional, Legocki, Jakub, additional, Xiao, Yingxin, additional, Meleties, Michael, additional, Yang, Lin, additional, Cammer, Michael, additional, Jia, Sihan, additional, Zhang, Zihan, additional, Mahmoudinobar, Farbod, additional, Kowalski, Zuzanna, additional, Renfrew, P. Douglas, additional, Bonneau, Richard, additional, Pochan, Darrin J., additional, Pak, Alexander J., additional, and Montclare, Jin Kim, additional
- Published
- 2024
- Full Text
- View/download PDF
40. Computational Design of Phosphotriesterase Improves V‐Agent Degradation Efficiency
- Author
-
Kronenberg, Jacob, primary, Chu, Stanley, additional, Olsen, Andrew, additional, Britton, Dustin, additional, Halvorsen, Leif, additional, Guo, Shengbo, additional, Lakshmi, Ashwitha, additional, Chen, Jason, additional, Kulapurathazhe, Maria Jinu, additional, Baker, Cetara A., additional, Wadsworth, Benjamin C., additional, Van Acker, Cynthia J., additional, Lehman, John G., additional, Otto, Tamara C., additional, Renfrew, P. Douglas, additional, Bonneau, Richard, additional, and Montclare, Jin Kim, additional
- Published
- 2024
- Full Text
- View/download PDF
41. Meta-analysis of the human gut microbiome uncovers shared and distinct microbial signatures between diseases
- Author
-
Jin, Dong-Min, primary, Morton, James T., additional, and Bonneau, Richard, additional
- Published
- 2024
- Full Text
- View/download PDF
42. Estimating the Ideology of Political YouTube Videos
- Author
-
Lai, Angela, primary, Brown, Megan A., additional, Bisbee, James, additional, Tucker, Joshua A., additional, Nagler, Jonathan, additional, and Bonneau, Richard, additional
- Published
- 2024
- Full Text
- View/download PDF
43. Identification of multi-loci hubs from 4C-seq demonstrates the functional importance of simultaneous interactions
- Author
-
Jiang, Tingting, Raviram, Ramya, Snetkova, Valentina, Rocha, Pedro P, Proudhon, Charlotte, Badri, Sana, Bonneau, Richard, Skok, Jane A, and Kluger, Yuval
- Subjects
Genetics ,Biotechnology ,Human Genome ,Underpinning research ,1.1 Normal biological development and functioning ,Chromatin ,Chromosomes ,Enhancer Elements ,Genetic ,Estrogen Receptor beta ,Genetic Loci ,Genome ,Nucleic Acid Conformation ,Receptors ,Antigen ,T-Cell ,alpha-beta ,Sequence Analysis ,DNA ,Environmental Sciences ,Biological Sciences ,Information and Computing Sciences ,Developmental Biology - Abstract
Use of low resolution single cell DNA FISH and population based high resolution chromosome conformation capture techniques have highlighted the importance of pairwise chromatin interactions in gene regulation. However, it is unlikely that associations involving regulatory elements act in isolation of other interacting partners that also influence their impact. Indeed, the influence of multi-loci interactions remains something of an enigma as beyond low-resolution DNA FISH we do not have the appropriate tools to analyze these. Here we present a method that uses standard 4C-seq data to identify multi-loci interactions from the same cell. We demonstrate the feasibility of our method using 4C-seq data sets that identify known pairwise and novel tri-loci interactions involving the Tcrb and Igk antigen receptor enhancers. We further show that the three Igk enhancers, MiEκ, 3'Eκ and Edκ, interact simultaneously in this super-enhancer cluster, which add to our previous findings showing that loss of one element decreases interactions between all three elements as well as reducing their transcriptional output. These findings underscore the functional importance of simultaneous interactions and provide new insight into the relationship between enhancer elements. Our method opens the door for studying multi-loci interactions and their impact on gene regulation in other biological settings.
- Published
- 2016
44. Active and Inactive Enhancers Cooperate to Exert Localized and Long-Range Control of Gene Regulation.
- Author
-
Proudhon, Charlotte, Snetkova, Valentina, Raviram, Ramya, Lobry, Camille, Badri, Sana, Jiang, Tingting, Hao, Bingtao, Trimarchi, Thomas, Kluger, Yuval, Aifantis, Iannis, Bonneau, Richard, and Skok, Jane A
- Subjects
T-Lymphocytes ,Animals ,Mice ,Inbred C57BL ,Receptors ,Antigen ,T-Cell ,alpha-beta ,RNA ,Messenger ,Gene Rearrangement ,beta-Chain T-Cell Antigen Receptor ,Gene Expression Regulation ,Protein Binding ,Enhancer Elements ,Genetic ,Igk ,Tcrb ,enhancer-sharing ,gene regulation ,localized and long-range contacts ,nuclear architecture ,super-enhancer ,transcription factor binding ,transcriptional output ,Mice ,Inbred C57BL ,Receptors ,Antigen ,T-Cell ,alpha-beta ,RNA ,Messenger ,Gene Rearrangement ,beta-Chain T-Cell Antigen Receptor ,Enhancer Elements ,Genetic ,Biochemistry and Cell Biology ,Medical Physiology - Abstract
V(D)J recombination relies on the presence of proximal enhancers that activate the antigen receptor (AgR) loci in a lineage- and stage-specific manner. Unexpectedly, we find that both active and inactive AgR enhancers cooperate to disseminate their effects in a localized and long-range manner. Here, we demonstrate the importance of short-range contacts between active enhancers that constitute an Igk super-enhancer in B cells. Deletion of one element reduces the interaction frequency between other enhancers in the hub, which compromises the transcriptional output of each component. Furthermore, we establish that, in T cells, long-range contact and cooperation between the inactive Igk enhancer MiEκ and the active Tcrb enhancer Eβ alters enrichment of CBFβ binding in a manner that impacts Tcrb recombination. These findings underline the complexities of enhancer regulation and point to a role for localized and long-range enhancer-sharing between active and inactive elements in lineage- and stage-specific control.
- Published
- 2016
45. 4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.
- Author
-
Raviram, Ramya, Rocha, Pedro P, Müller, Christian L, Miraldi, Emily R, Badri, Sana, Fu, Yi, Swanzey, Emily, Proudhon, Charlotte, Snetkova, Valentina, Bonneau, Richard, and Skok, Jane A
- Subjects
DNA ,Catalytic ,Restriction Mapping ,Sequence Analysis ,DNA ,Binding Sites ,Base Sequence ,Protein Binding ,Genome ,Algorithms ,Software ,Molecular Sequence Data ,Bioinformatics ,Biological Sciences ,Information and Computing Sciences ,Mathematical Sciences - Abstract
4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.
- Published
- 2016
46. Sparse and compositionally robust inference of microbial ecological networks
- Author
-
Kurtz, Zachary D., Mueller, Christian L., Miraldi, Emily R., Littman, Dan R., Blaser, Martin J., and Bonneau, Richard A.
- Subjects
Statistics - Applications ,Quantitative Biology - Genomics ,Statistics - Computation - Abstract
16S-ribosomal sequencing and other metagonomic techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions, identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from 16S datasets are compositional, and thus, microbial abundances are not independent. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU interaction networks is severely under-powered, and additional assumptions are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological interactions from metagenomic datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological interaction network is sparse. To reconstruct the interaction network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. Because no large-scale microbial ecological networks have been experimentally validated, SPIEC-EASI comprises computational tools to generate realistic OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods in terms of edge recovery and network properties on realistic synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial interactions using data from the American Gut project.
- Published
- 2014
- Full Text
- View/download PDF
47. Pre-detection history of extensively drug-resistant tuberculosis in KwaZulu-Natal, South Africa
- Author
-
Brown, Tyler S., Challagundla, Lavanya, Baugh, Evan H., Omar, Shaheed Vally, Mustaev, Arkady, Auld, Sara C., Shah, N. Sarita, Kreiswirth, Barry N., Brust, James C. M., Nelson, Kristin N., Narechania, Apurva, Kurepina, Natalia, Mlisana, Koleka, Bonneau, Richard, Eldholm, Vegard, Ismail, Nazir, Kolokotronis, Sergios-Orestis, Robinson, D. Ashley, Gandhi, Neel R., and Mathema, Barun
- Published
- 2019
48. Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data
- Author
-
BARBERÁ, PABLO, CASAS, ANDREU, NAGLER, JONATHAN, EGAN, PATRICK J., BONNEAU, RICHARD, JOST, JOHN T., and TUCKER, JOSHUA A.
- Published
- 2019
49. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis
- Author
-
Maniatis, Silas, Äijö, Tarmo, Vickovic, Sanja, Braine, Catherine, Kang, Kristy, Mollbrink, Annelie, Fagegaltier, Delphine, Andrusivová, Žaneta, Saarenpää, Sami, Saiz-Castro, Gonzalo, Cuevas, Miguel, Watters, Aaron, Lundeberg, Joakim, Bonneau, Richard, and Phatnani, Hemali
- Published
- 2019
50. A Comprehensive Map of the Monocyte-Derived Dendritic Cell Transcriptional Network Engaged upon Innate Sensing of HIV
- Author
-
Johnson, Jarrod S., De Veaux, Nicholas, Rives, Alexander W., Lahaye, Xavier, Lucas, Sasha Y., Perot, Brieuc P., Luka, Marine, Garcia-Paredes, Victor, Amon, Lynn M., Watters, Aaron, Abdessalem, Ghaith, Aderem, Alan, Manel, Nicolas, Littman, Dan R., Bonneau, Richard, and Ménager, Mickaël M.
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.